Is there a command (or a syntax, or python program) that counts different (numerical) values in a row.
Example: variables: V1 V2 V3 V4 values row1: 1 2 3 4 values row2: 1 1 1 1 row1 has 4 different values: 1-4 row 2 has only one different value: 1 I tried to develop a python program but failed. Also couldn't find a solution on raynald's. Thanks
Dr. Frank Gaeth
|
try something like this untested syntax.
count n1 = v1 to v4(1). count n2 = v1 to v4(2). count n3= v1 to v4(3). count n4 = v1 to v4(4). missing values n1 to n4 (0). nvalues = nvalid (n1 to n4). Art Kendall Social Research Consultants On 3/6/2011 4:11 AM, drfg2008 wrote: > Is there a command (or a syntax, or python program) that counts different > (numerical) values in a row. > > Example: > > > variables: V1 V2 V3 V4 > values row1: 1 2 3 4 > values row2: 1 1 1 1 > > row1 has 4 different values: 1-4 > row 2 has only one different value: 1 > > I tried to develop a python program but failed. Also couldn't find a > solution on raynald's. > > Thanks > > ----- > FUB > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/different-values-in-a-row-tp3411280p3411280.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by drfg2008
I don't have SPSS on this machine, so the following is untested, but I think it might work. You may need an EXECUTE after the loop.
numeric i1 to i4 (f1.0). /* 4 indicator variables. recode i1 to i4 (else=0). /* initialize to 0. vector v = v1 to v4 / i = i1 to i4. loop # = 1 to 4. - compute i(v(#)) = 1. /* value stored in v(#) flagged as present. end loop. compute unique_values = sum(i1 to i4).
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I tried your syntax, but couldn't get it running :
*---------------- first build a file ----------------. input program. loop a =1 to 100 by 1. end case. end loop. end file. end input program. exe. comp v1 =RV.BINOM(5,0.5). comp v2 =RV.BINOM(5,0.5). comp v3 =RV.BINOM(5,0.5). comp v4 =RV.BINOM(5,0.5). EXECUTE . *-----------your syntax with exe.------------------------. numeric i1 to i4 (f1.0). /* 4 indicator variables. recode i1 to i4 (else=0). /* initialize to 0. vector v = v1 to v4 / i = i1 to i4. loop # = 1 to 4. - compute i(v(#)) = 1. /* value stored in v(#) flagged as present. end loop. EXECUTE . compute unique_values = sum(i1 to i4). ------ this is an excerpt of the error messages : >Warnung Nr. 525 >An attempt was made to store a value into an element of a vector the subscript >of which was missing or otherwise invalid. The subscript must be a positive >integer and must not be greater than the length of the vector. No store can >occur. >Command line: 207 Current case: 6 Current splitfile group: 1
Dr. Frank Gaeth
|
In reply to this post by drfg2008
Thanks Art,
for you message. However 'count' seems to be a problem, if you have numbers like: 1,23456 ; 1,234567 (real numbers) or a wide range - let's say from 1 to a few billion. Frank try something like this untested syntax. count n1 = v1 to v4(1). count n2 = v1 to v4(2). count n3= v1 to v4(3). count n4 = v1 to v4(4). missing values n1 to n4 (0). nvalues = nvalid (n1 to n4). Art Kendall Social Research Consultants
Dr. Frank Gaeth
|
In reply to this post by Bruce Weaver
Clever, but it only works with values in
the 1-4 range. Try, for example,
1,2,99,12 Here's a Python solution. I used the SPSSINC TRANS extension command, but you could pass the data explicitly instead. * define a counting function. begin program. import spss def countthem(*args): return len(set(args)) end program. * Run it over the data, storing the result in variable uniquecount. spssinc trans result=uniquecount /formula "countthem(v1,v2,v3,v4)". Note that any sysmis values will contribute to the count. The countthem function could be modifed to ignore those. HTH, Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: Bruce Weaver <[hidden email]> To: [hidden email] Date: 03/06/2011 07:21 AM Subject: Re: [SPSSX-L] different values in a row Sent by: "SPSSX(r) Discussion" <[hidden email]> I don't have SPSS on this machine, so the following is untested, but I think it might work. You may need an EXECUTE after the loop. numeric i1 to i4 (f1.0). /* 4 indicator variables. recode i1 to i4 (else=0). /* initialize to 0. vector v = v1 to v4 / i = i1 to i4. loop # = 1 to 4. - compute i(v(#)) = 1. /* value stored in v(#) flagged as present. end loop. compute unique_values = sum(i1 to i4). drfg2008 wrote: > > Is there a command (or a syntax, or python program) that counts different > (numerical) values in a row. > > Example: > > > variables: V1 V2 V3 V4 > values row1: 1 2 3 4 > values row2: 1 1 1 1 > > row1 has 4 different values: 1-4 > row 2 has only one different value: 1 > > I tried to develop a python program but failed. Also couldn't find a > solution on raynald's. > > Thanks > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/different-values-in-a-row-tp3411280p3411438.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Something about that solution was troubling me, and Jon put his finger on it. Thanks Jon. ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by drfg2008
As Jon pointed out, it works only for whole number values 1-4 (the values shown in the original example).
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
Here is a non-Python (untested) solution [1] that should work quite well, provided the number of variables is not too large. It assumes that any missing values are SYSMIS.
* UVC = unique value count . recode v1 to v4 (sysmis=9999). /* or some other user-defined value . compute uvc = 1. compute uvc = uvc + (v2 NE v1) . compute uvc = uvc + ((v3 NE v1) and (v3 NE v2)). compute uvc = uvc + ((v4 NE v1) and (v4 NE v2) and (v4 NE v1)). missing values v1 to v4 (9999). This will count SYSMIS as one of the possible unique values. If you don't want to include SYMIS, then correct UVC by subtraction. E.g., compute uvc = uvc - (nmiss(v1 to v4) GT 0). One could possibly stick this basic idea into a macro that would make it more feasible for a large number of variables. [1] As I often point out, many of the academic SPSS users I know will probably *never* install and use Python; therefore, I always try to offer solutions that will run in native SPSS code. This should not be taken as any slight against Python. It's just being realistic, IMO.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by drfg2008
Simple!
Restructure data wide to long retaining identifier for row (ROWID) and pushing all relevant columns into a single column (VAR). AGGREGATE breaking on RowID and Var use N finction. REAGGREGATE breaking on RowID and use N. MATCH to original file. Done. say the file has variables V1 TO V100. COMPUTE RowID=$CASENUM. SAVE OUTFILE "origData.sav". VECTOR V=V1 TO V100. LOOP #=1 TO 100. COMPUTE Var=V(#). XSAVE OUTFILE "temp.sav" / KEEP RowID Var. END LOOP. EXE. GET FILE "temp.sav". AGGREGATE OUTFILE * / BREAK RowID Var /N=N. AGGREGATE OUTFILE * / BREAK RowID /Unique=N. MATCH FILES / FILE "origData.sav" / FILE * / BY RowID.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Bruce Weaver
Looks unwieldy with even a few variables.
See my post using Restructure and AGGREGATE. "I always try to offer solutions that will run in native SPSS code. " I concur. I have an OLD (11.5) version of SPSS and can't afford to upgrade, so you will find my solutions will work in ANY version of SPSS (except PC+) back to version 4 on a mainframe/Unix box/Ancient Mac -any version which supports VECTOR/LOOP XSAVE etc. I even tend to use old skool restructure rather than CASESTOVARS or VARSTOCASES. I am typing this on my Mac and my SPSS is on the Windows partition. I can't remember the VARSTOCASES syntax off the top of my head but the VECTOR/LOOP/XSAVE is a permanent part of my neural wiring at this point ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by drfg2008
I only have a crude concept, not fully worked out:
For one case you may use the flip command to get the values in one column, named variable. Than you sort and write the command if (lag(variable) = variable) indicator = 1. you delete all cases with indicator = 1, the number of cases equals the number of different values. This solution works for arbitrary numbers per row, but only for one case in the original file. Maybe its possible to extend it to arbitrary number of cases in the original file by including a loop on the variables in the flipped file. Peter |
Administrator
|
See my previously posted solution. The First piece is essentially a FLIP BY ID.
AGGREGATE takes care of the rest. ---------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Marso
Thanks everybody!
I understood the syntax best which david provided. -> aggregate. I should have known. Thank you. Here the complete syntax (with 4 variables computed randomly) in case someone needs it. input program. loop a =1 to 100 by 1. end case. end loop. end file. end input program. exe. comp v1 =RV.BINOM(5,0.5). comp v2 =RV.BINOM(5,0.5). comp v3 =RV.BINOM(5,0.5). comp v4 =RV.BINOM(5,0.5). EXECUTE . *say the file has variables V1 TO V4. COMPUTE RowID=$CASENUM. SAVE OUTFILE "C:\<path>\differentValues.sav". VECTOR V=V1 TO V4. LOOP #=1 TO 4. COMPUTE Var=V(#). XSAVE OUTFILE "temp.sav" / KEEP RowID Var. END LOOP. EXE. GET FILE "temp.sav". AGGREGATE OUTFILE * / BREAK RowID Var /N=N. AGGREGATE OUTFILE * / BREAK RowID /Unique=N. MATCH FILES / FILE "C:\<path>\differentValues.sav" / FILE * / BY RowID. EXECUTE .
Dr. Frank Gaeth
|
by the way, here is my python version of David Marsos fantastic (simple and effective) solution. The advantage of python is, that it counts automaticly the number and names of variables.
*testsample: --------. input program. loop a =1 to 100 by 1. end case. end loop. end file. end input program. exe. comp v1 =RV.BINOM(5,0.5). comp v2 =RV.BINOM(5,0.5). comp v3 =RV.BINOM(5,0.5). comp v4 =RV.BINOM(5,0.5). comp v5 =RV.BINOM(5,0.5). comp v6 =RV.BINOM(5,0.5). comp v7 =RV.BINOM(5,0.5). comp v8 =RV.BINOM(5,0.5). EXECUTE . DELETE VARIABLES a. * so here comes the python version: ------------. COMPUTE RowID=$CASENUM. SAVE OUTFILE "C:\differentValues.sav". begin program. import spss FileN=spss.GetVariableCount()-1 varName_1 = spss.GetVariableName(0) varName_n = spss.GetVariableName(spss.GetVariableCount()-2) spss.Submit("VECTOR V=" + varName_1 +" to " + varName_n + ".") spss.Submit("LOOP #=1 TO "+ str(FileN) + ".") spss.Submit(r"""COMPUTE Var=V(#). XSAVE OUTFILE "temp.sav" / KEEP RowID Var. END LOOP. EXE.""") end program. GET FILE "temp.sav". AGGREGATE OUTFILE * / BREAK RowID Var /N=N. AGGREGATE OUTFILE * / BREAK RowID /Unique=N. MATCH FILES / FILE "C:\differentValues.sav" / FILE * / BY RowID. EXECUTE .
Dr. Frank Gaeth
|
Administrator
|
In reply to this post by David Marso
Yes, that's more like it. (If I post something unwieldy enough, it always provokes David to post a much better solution.) ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
ROFL !!! I sometimes provoke myself as well ;-)
Case in point: Use VARSTOCASES to go wide to long (eliminating the external file -for which I omitted the ERASE command in my previous solution- ;-) * I should maybe call all my temp files for posted code "C:\dmmtemp" and every once in awhile post random code which has ERASE "C:\dmmtemp" somewhere in the mix ;-)))....* ie, substitute the VECTOR/LOOP/XSAVE biz for an appropriate V2C. Use MODE=ADDVARIABLES in the AGGREGATE(s) to remove the need for the MATCH. Leaving this as an exercise for those that wish to pursue it. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |