Hello to everybody: I have the following macro to replace missing values: DEFINE variable_perdida (variable = !CMDEND). AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /mediana=MEDIAN(!variable). IF (SYSMIS(!variable)) !variable = mediana. EXECUTE. DELETE VARIABLE mediana. !ENDDEFINE. variable_perdida variable = it001. variable_perdida variable = it002. variable_perdida variable = it003. ... variable_perdida variable = it186. The proplem is that it has to be applied to ítems 001 to it186, so I need 186 times the “variable_perdida variable = X”. Is there a way to reduce the times I repeat this? Kindly Andrés |
Hola;
Tal vez así: DEFINE variable_perdida (variable = !CMDEND). !DO !i !IN (!variable) AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /mediana=MEDIAN(!i). IF (SYSMIS(!i)) !i = mediana. EXECUTE.
DELETE VARIABLE mediana. !DOEND.!ENDDEFINE. Y luego corres variable_perdida variable = [todo el listado de variables]. No puedo probarlo aquí mismo pero supongo que funciona. Saludos, Carlos 2014-06-11 17:42 GMT-03:00 ANDRES ALBERTO BURGA LEON [via SPSSX Discussion] <[hidden email]>:
|
In reply to this post by ANDRES ALBERTO BURGA LEON
Macros can be useful, but if I understand what you want to do they are unnecessary here.
something like this untested ordinary syntax should work. compute nobreak=1. aggregate outfile= * mode=addvariables /break= nobreak /MedianItem001 to median186 = median(item001 to item186). do repeat item= item001 to item186 /patched = patched001 to patched186 /ItemMedian = medianitem001 to medianItem186.. do if missing(item) or sysmis(item). compute patched = ItemMedian. else. compute patched = item. Why do you have values as sysmis? It usually useful to have user missing values with labels whn you know why items are missing and reserve sysmis for the situation where teh values are missign because teh sytem could not follow your instructions for creating the variable.
Art Kendall
Social Research Consultants |
Administrator
|
You'll need an "end if" line at the end of that, Art. ;-)
But I think the bigger issue here is WHY does the OP want to impute the median where there is missing data? It is pretty well established nowadays that old-school methods for dealing with missing data (such as listwise & pairwise deletion, mean substitution, and last-observation-carried-forward) don't work very well. I doubt that median substitution would fare much better than mean substitution. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
yes it needs an end if.
I did not want to clobber the OP with too many things so picked on the existence of sysmis values as input to transformations rather than cleaning them up early in the processing. Median plugging does seem an unusual thing to do but I do not have the context in which it was being done. It is definitely true that in many situations more modern imputation is preferable. Since the OP used the term "item" and has so many items, I would guess that these are items that are intended to be used in summative scales, i.e., are a form of repeated measure. Also since the OP wanted to use the median, I guessed that the items are not dichotomous right/wrong items I have not seen anything that would support using anything other than other items in the same scale to impute missing values. I wonder if there would be meaningful differences in more complex treatment of missing item scores than simply using the mean of some minimum number of valid item as the actual score. What would happen to the meaning of a score if items measuring other constructs were used to compute the score? Over the years I have come to use the mean of items as the score because it is strongly analogous to the response scale. e.g. for a Likert scale where the item response scale was 1 to 5 (SD, D, mid, A SA). compute myscore = mean.7 (item01 to item10). would give a score from 1 to 5 which is reminiscent of a tendency to agree with items measuring a construct.
Art Kendall
Social Research Consultants |
Thank you very much to everybody for the answers.
I'm using the median because the items are a likert scale. As you see, there are 180 items, and the date base that was given to me has only 305 cases. The problem is that there are 200 cases that have one or two not responded items (163 items have at least one blank). Andres ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Aside from concerns about the use of the median as an imputed value, a general preference is to work in LONG rather than WIDE format. The syntax Art provided should work, however I find it rather messy to create 189 new variables and deal with all of that unnecessary DO REPEAT/ compute and cleanup overhead .
Consider the following: Drum roll... DATASET COPY copydata. DATASET ACTIVATE copydata. VARSTOCASES MAKE x FROM x001 TO x189 / INDEX=varname(x)/NULL = KEEP. AGGREGATE OUTFILE * MODE=ADDVARIABLE / BREAK varname /medianx=MEDIAN(x). IF MISSING(x) x=medianx. EXECUTE /* required to enable DELETE VARIABLES command*/. DELETE VARIABLES medianx. CASESTOVARS ID=ID /INDEX=varname.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by ANDRES ALBERTO BURGA LEON
Is there a reason you are not using Replace Missing Values? RMV new variables={LINT (varlist) }
{MEAN (varlist [{,2 }]) }
{ {,n } }
{ {ALL} }
{MEDIAN (varlist
[{,2 }])} { {,n } }
{
{ALL} } {SMEAN (varlist) }
{TREND (varlist) }
[/new variables=function (varlist [,span])]
From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of ANDRES ALBERTO BURGA LEON Hello to everybody: I have the following macro to replace missing values: DEFINE variable_perdida (variable = !CMDEND). AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /mediana=MEDIAN(!variable). IF (SYSMIS(!variable)) !variable = mediana. EXECUTE. DELETE VARIABLE mediana. !ENDDEFINE. variable_perdida variable = it001. variable_perdida variable = it002. variable_perdida variable = it003. ... variable_perdida variable = it186. The proplem is that it has to be applied to ítems 001 to it186, so I need 186 times the “variable_perdida variable = X”. Is there a way to reduce the times I repeat this? Kindly Andrés This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations. |
Administrator
|
DOH re ME...
In my best Homer Simpson voice... https://www.youtube.com/watch?v=_83RSSqydEk I'd forgotten that RMV had a MEDIAN function ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by ANDRES ALBERTO BURGA LEON
How was the data gathered? If it was a paper and pencil instrument, the test administrator should have caught that.
That is a lot of missing data. If you have a set of Likert scales I would question imputing items. try something like compute OkayItems= nvalid(item001 to item180). compute OkayScale1 = nvalid(... compute OkayScale2 = nvalid(... frequencies variables = OkayItems to OkayScale____. Why is the data coded sysmis? Are the data missing because people stopped answering completely? Do you have access to the original respondents to debrief them on why they did not answer all questions? Where did the items come from? Are these scales developed in previous studies? How many scales are there? How many items are in each scale? Are there items which are very frequently skipped? Is there something about the substantive questions that would make people reluctant to answer
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |