different values in a row

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

different values in a row

drfg2008
Is there a command (or a syntax, or python program) that counts different (numerical) values in a row.

Example:


variables: V1 V2 V3 V4
values row1: 1 2 3 4
values row2: 1 1 1 1

row1 has 4 different values: 1-4
row 2 has only one different value: 1

I tried to develop a python program but failed. Also couldn't find a solution on raynald's.

Thanks
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

Art Kendall-2
try something like this untested syntax.

count n1 = v1 to v4(1).
count n2 = v1 to v4(2).
count n3= v1 to v4(3).
count n4 = v1 to v4(4).
missing values n1 to n4 (0).
nvalues = nvalid (n1 to n4).

Art Kendall
Social Research Consultants

On 3/6/2011 4:11 AM, drfg2008 wrote:

> Is there a command (or a syntax, or python program) that counts different
> (numerical) values in a row.
>
> Example:
>
>
> variables: V1 V2 V3 V4
> values row1: 1 2 3 4
> values row2: 1 1 1 1
>
> row1 has 4 different values: 1-4
> row 2 has only one different value: 1
>
> I tried to develop a python program but failed. Also couldn't find a
> solution on raynald's.
>
> Thanks
>
> -----
> FUB
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/different-values-in-a-row-tp3411280p3411280.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

Bruce Weaver
Administrator
In reply to this post by drfg2008
I don't have SPSS on this machine, so the following is untested, but I think it might work.  You may need an EXECUTE after the loop.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
compute unique_values = sum(i1 to i4).


drfg2008 wrote
Is there a command (or a syntax, or python program) that counts different (numerical) values in a row.

Example:


variables: V1 V2 V3 V4
values row1: 1 2 3 4
values row2: 1 1 1 1

row1 has 4 different values: 1-4
row 2 has only one different value: 1

I tried to develop a python program but failed. Also couldn't find a solution on raynald's.

Thanks
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

drfg2008
I tried your syntax, but couldn't get it running :

*---------------- first build a file ----------------.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).

EXECUTE .

*-----------your syntax with exe.------------------------.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
EXECUTE .
compute unique_values = sum(i1 to i4).

------ this is an excerpt of the error messages :

>Warnung Nr.  525
>An attempt was made to store a value into an element of a vector the subscript
>of which was missing or otherwise invalid.  The subscript must be a positive
>integer and must not be greater than the length of the vector.  No store can
>occur.
>Command line: 207  Current case: 6  Current splitfile group: 1

Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

drfg2008
In reply to this post by drfg2008
Thanks Art,
for you message. However 'count' seems to be a problem, if you have numbers like: 1,23456 ; 1,234567
(real numbers) or a wide range - let's say from 1 to a few billion.

Frank


try something like this untested syntax.

count n1 = v1 to v4(1).
count n2 = v1 to v4(2).
count n3= v1 to v4(3).
count n4 = v1 to v4(4).
missing values n1 to n4 (0).
nvalues = nvalid (n1 to n4).

Art Kendall
Social Research Consultants
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

Jon K Peck
In reply to this post by Bruce Weaver
Clever, but it only works with values in the 1-4 range.  Try, for example,
1,2,99,12

Here's a Python solution.  I used the SPSSINC TRANS extension command, but you could pass the data explicitly instead.

* define a counting function.
begin program.
import spss
def countthem(*args):
  return len(set(args))
end program.

* Run it over the data, storing the result in variable uniquecount.
spssinc trans result=uniquecount
/formula "countthem(v1,v2,v3,v4)".

Note that any sysmis values will contribute to the count.  The countthem function could be modifed to ignore those.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Bruce Weaver <[hidden email]>
To:        [hidden email]
Date:        03/06/2011 07:21 AM
Subject:        Re: [SPSSX-L] different values in a row
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I don't have SPSS on this machine, so the following is untested, but I think
it might work.  You may need an EXECUTE after the loop.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
compute unique_values = sum(i1 to i4).



drfg2008 wrote:
>
> Is there a command (or a syntax, or python program) that counts different
> (numerical) values in a row.
>
> Example:
>
>
> variables: V1 V2 V3 V4
> values row1: 1 2 3 4
> values row2: 1 1 1 1
>
> row1 has 4 different values: 1-4
> row 2 has only one different value: 1
>
> I tried to develop a python program but failed. Also couldn't find a
> solution on raynald's.
>
> Thanks
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/different-values-in-a-row-tp3411280p3411438.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

Bruce Weaver
Administrator
Something about that solution was troubling me, and Jon put his finger on it.  Thanks Jon.  ;-)


Jon K Peck wrote
Clever, but it only works with values in the 1-4 range.  Try, for example,
1,2,99,12

Here's a Python solution.  I used the SPSSINC TRANS extension command, but
you could pass the data explicitly instead.

* define a counting function.
begin program.
import spss
def countthem(*args):
  return len(set(args))
end program.

* Run it over the data, storing the result in variable uniquecount.
spssinc trans result=uniquecount
/formula "countthem(v1,v2,v3,v4)".

Note that any sysmis values will contribute to the count.  The countthem
function could be modifed to ignore those.

HTH,

Jon Peck
Senior Software Engineer, IBM
peck@us.ibm.com
312-651-3435



From:   Bruce Weaver <bruce.weaver@hotmail.com>
To:     SPSSX-L@LISTSERV.UGA.EDU
Date:   03/06/2011 07:21 AM
Subject:        Re: [SPSSX-L] different values in a row
Sent by:        "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>



I don't have SPSS on this machine, so the following is untested, but I
think
it might work.  You may need an EXECUTE after the loop.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
compute unique_values = sum(i1 to i4).



drfg2008 wrote:
>
> Is there a command (or a syntax, or python program) that counts
different
> (numerical) values in a row.
>
> Example:
>
>
> variables: V1 V2 V3 V4
> values row1: 1 2 3 4
> values row2: 1 1 1 1
>
> row1 has 4 different values: 1-4
> row 2 has only one different value: 1
>
> I tried to develop a python program but failed. Also couldn't find a
> solution on raynald's.
>
> Thanks
>


-----
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/different-values-in-a-row-tp3411280p3411438.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

Bruce Weaver
Administrator
In reply to this post by drfg2008
As Jon pointed out, it works only for whole number values 1-4 (the values shown in the original example).


drfg2008 wrote
I tried your syntax, but couldn't get it running :

*---------------- first build a file ----------------.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).

EXECUTE .

*-----------your syntax with exe.------------------------.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
EXECUTE .
compute unique_values = sum(i1 to i4).

------ this is an excerpt of the error messages :

>Warnung Nr.  525
>An attempt was made to store a value into an element of a vector the subscript
>of which was missing or otherwise invalid.  The subscript must be a positive
>integer and must not be greater than the length of the vector.  No store can
>occur.
>Command line: 207  Current case: 6  Current splitfile group: 1
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

Bruce Weaver
Administrator
Here is a non-Python (untested) solution [1] that should work quite well, provided the number of variables is not too large.  It assumes that any missing values are SYSMIS.

* UVC = unique value count .
recode v1 to v4 (sysmis=9999). /* or some other user-defined value .
compute uvc = 1.
compute uvc = uvc +  (v2 NE v1) .
compute uvc = uvc + ((v3 NE v1) and (v3 NE v2)).
compute uvc = uvc + ((v4 NE v1) and (v4 NE v2) and (v4 NE v1)).
missing values v1 to v4 (9999).

This will count SYSMIS as one of the possible unique values.  If you don't want to include SYMIS, then correct UVC by subtraction.  E.g.,

compute uvc = uvc - (nmiss(v1 to v4) GT 0).

One could possibly stick this basic idea into a macro that would make it more feasible for a large number of variables.

[1] As I often point out, many of the academic SPSS users I know will probably *never* install and use Python; therefore, I always try to offer solutions that will run in native SPSS code.  This should not be taken as any slight against Python.  It's just being realistic, IMO.



Bruce Weaver wrote
As Jon pointed out, it works only for whole number values 1-4 (the values shown in the original example).


drfg2008 wrote
I tried your syntax, but couldn't get it running :

*---------------- first build a file ----------------.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).

EXECUTE .

*-----------your syntax with exe.------------------------.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
EXECUTE .
compute unique_values = sum(i1 to i4).

------ this is an excerpt of the error messages :

>Warnung Nr.  525
>An attempt was made to store a value into an element of a vector the subscript
>of which was missing or otherwise invalid.  The subscript must be a positive
>integer and must not be greater than the length of the vector.  No store can
>occur.
>Command line: 207  Current case: 6  Current splitfile group: 1
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

David Marso
Administrator
In reply to this post by drfg2008
Simple!
Restructure data wide to long retaining identifier for row (ROWID) and pushing all relevant columns into a single column (VAR).
AGGREGATE breaking on RowID and Var use N finction.
REAGGREGATE breaking on RowID and use N.
MATCH to original file.
Done.
say the file has variables V1 TO V100.
COMPUTE RowID=$CASENUM.
SAVE OUTFILE "origData.sav".

VECTOR V=V1 TO V100.
LOOP #=1 TO 100.
COMPUTE Var=V(#).
XSAVE OUTFILE "temp.sav" / KEEP RowID Var.
END LOOP.
EXE.

GET FILE "temp.sav".
AGGREGATE OUTFILE * / BREAK RowID Var /N=N.
AGGREGATE OUTFILE * / BREAK RowID  /Unique=N.
MATCH FILES / FILE "origData.sav" / FILE * / BY RowID.


Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

David Marso
Administrator
In reply to this post by Bruce Weaver
Looks unwieldy with even a few variables.
See my post using Restructure and AGGREGATE.
"I always try to offer solutions that will run in native SPSS code. "
I concur.  I have an OLD (11.5) version of SPSS and can't afford to upgrade, so you will find my solutions will work in ANY version of SPSS (except PC+) back to version 4 on a mainframe/Unix box/Ancient Mac -any version which supports VECTOR/LOOP XSAVE etc.
I even tend to use old skool restructure rather than CASESTOVARS or VARSTOCASES.
I am typing this on my Mac and my SPSS is on the Windows partition.  I can't remember the VARSTOCASES syntax off the top of my head but the VECTOR/LOOP/XSAVE is a permanent part of my neural wiring at this point ;-)

Bruce Weaver wrote
Here is a non-Python (untested) solution [1] that should work quite well, provided the number of variables is not too large.  It assumes that any missing values are SYSMIS.

* UVC = unique value count .
recode v1 to v4 (sysmis=9999). /* or some other user-defined value .
compute uvc = 1.
compute uvc = uvc +  (v2 NE v1) .
compute uvc = uvc + ((v3 NE v1) and (v3 NE v2)).
compute uvc = uvc + ((v4 NE v1) and (v4 NE v2) and (v4 NE v1)).
missing values v1 to v4 (9999).

This will count SYSMIS as one of the possible unique values.  If you don't want to include SYMIS, then correct UVC by subtraction.  E.g.,

compute uvc = uvc - (nmiss(v1 to v4) GT 0).

One could possibly stick this basic idea into a macro that would make it more feasible for a large number of variables.

[1] As I often point out, many of the academic SPSS users I know will probably *never* install and use Python; therefore, I always try to offer solutions that will run in native SPSS code.  This should not be taken as any slight against Python.  It's just being realistic, IMO.



Bruce Weaver wrote
As Jon pointed out, it works only for whole number values 1-4 (the values shown in the original example).


drfg2008 wrote
I tried your syntax, but couldn't get it running :

*---------------- first build a file ----------------.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).

EXECUTE .

*-----------your syntax with exe.------------------------.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
EXECUTE .
compute unique_values = sum(i1 to i4).

------ this is an excerpt of the error messages :

>Warnung Nr.  525
>An attempt was made to store a value into an element of a vector the subscript
>of which was missing or otherwise invalid.  The subscript must be a positive
>integer and must not be greater than the length of the vector.  No store can
>occur.
>Command line: 207  Current case: 6  Current splitfile group: 1
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

martus
In reply to this post by drfg2008
I only have a crude concept, not fully worked out:
For one case you may use the flip command to get the values
in one column, named variable. Than you sort and  write the command
if (lag(variable) = variable) indicator = 1.
you delete all cases with indicator = 1, the number of cases equals
the number of different values.
This solution works for arbitrary numbers per row, but only for
one case in the original file. Maybe its possible to extend it to arbitrary number of
cases in the original file by including a loop on the variables in
the flipped file.
Peter
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

David Marso
Administrator
See my previously posted solution.  The First piece is essentially a FLIP BY ID.
AGGREGATE takes care of the rest.
---------
martus wrote
I only have a crude concept, not fully worked out:
For one case you may use the flip command to get the values
in one column, named variable. Than you sort and  write the command
if (lag(variable) = variable) indicator = 1.
you delete all cases with indicator = 1, the number of cases equals
the number of different values.
This solution works for arbitrary numbers per row, but only for
one case in the original file. Maybe its possible to extend it to arbitrary number of
cases in the original file by including a loop on the variables in
the flipped file.
Peter
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

drfg2008
In reply to this post by David Marso
Thanks everybody!

I understood the syntax best which david provided.  

-> aggregate. I should have known.

Thank you.

Here the complete syntax (with 4 variables computed randomly) in case someone needs it.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).

EXECUTE .

*say the file has variables V1 TO V4.
COMPUTE RowID=$CASENUM.
SAVE OUTFILE "C:\<path>\differentValues.sav".

VECTOR V=V1 TO V4.
LOOP #=1 TO 4.
COMPUTE Var=V(#).
XSAVE OUTFILE "temp.sav" / KEEP RowID Var.
END LOOP.
EXE.

GET FILE "temp.sav".
AGGREGATE OUTFILE * / BREAK RowID Var /N=N.
AGGREGATE OUTFILE * / BREAK RowID  /Unique=N.
MATCH FILES / FILE "C:\<path>\differentValues.sav" / FILE * / BY RowID.
EXECUTE .


Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

drfg2008
by the way, here is my python version of David Marsos fantastic (simple and effective) solution. The advantage of python is, that it counts automaticly the number and names of variables.

*testsample: --------.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).
comp v5 =RV.BINOM(5,0.5).
comp v6 =RV.BINOM(5,0.5).
comp v7 =RV.BINOM(5,0.5).
comp v8 =RV.BINOM(5,0.5).

EXECUTE .
DELETE VARIABLES a.

* so here comes the python version: ------------.


COMPUTE RowID=$CASENUM.
SAVE OUTFILE "C:\differentValues.sav".

begin program.
import spss
FileN=spss.GetVariableCount()-1
varName_1 = spss.GetVariableName(0)
varName_n = spss.GetVariableName(spss.GetVariableCount()-2)

spss.Submit("VECTOR V=" +  varName_1 +" to " + varName_n + ".")
spss.Submit("LOOP #=1 TO "+ str(FileN) + ".")
spss.Submit(r"""COMPUTE Var=V(#).
                        XSAVE OUTFILE "temp.sav" / KEEP RowID Var.
                        END LOOP.
                        EXE.""")
end program.

GET FILE "temp.sav".
AGGREGATE OUTFILE * / BREAK RowID Var /N=N.
AGGREGATE OUTFILE * / BREAK RowID  /Unique=N.
MATCH FILES / FILE "C:\differentValues.sav" / FILE * / BY RowID.
EXECUTE .

Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

Bruce Weaver
Administrator
In reply to this post by David Marso
Yes, that's more like it.  (If I post something unwieldy enough, it always provokes David to post a much better solution.)  ;-)


David Marso wrote
Looks unwieldy with even a few variables.
See my post using Restructure and AGGREGATE.
"I always try to offer solutions that will run in native SPSS code. "
I concur.  I have an OLD (11.5) version of SPSS and can't afford to upgrade, so you will find my solutions will work in ANY version of SPSS (except PC+) back to version 4 on a mainframe/Unix box/Ancient Mac -any version which supports VECTOR/LOOP XSAVE etc.
I even tend to use old skool restructure rather than CASESTOVARS or VARSTOCASES.
I am typing this on my Mac and my SPSS is on the Windows partition.  I can't remember the VARSTOCASES syntax off the top of my head but the VECTOR/LOOP/XSAVE is a permanent part of my neural wiring at this point ;-)

Bruce Weaver wrote
Here is a non-Python (untested) solution [1] that should work quite well, provided the number of variables is not too large.  It assumes that any missing values are SYSMIS.

* UVC = unique value count .
recode v1 to v4 (sysmis=9999). /* or some other user-defined value .
compute uvc = 1.
compute uvc = uvc +  (v2 NE v1) .
compute uvc = uvc + ((v3 NE v1) and (v3 NE v2)).
compute uvc = uvc + ((v4 NE v1) and (v4 NE v2) and (v4 NE v1)).
missing values v1 to v4 (9999).

This will count SYSMIS as one of the possible unique values.  If you don't want to include SYMIS, then correct UVC by subtraction.  E.g.,

compute uvc = uvc - (nmiss(v1 to v4) GT 0).

One could possibly stick this basic idea into a macro that would make it more feasible for a large number of variables.

[1] As I often point out, many of the academic SPSS users I know will probably *never* install and use Python; therefore, I always try to offer solutions that will run in native SPSS code.  This should not be taken as any slight against Python.  It's just being realistic, IMO.



Bruce Weaver wrote
As Jon pointed out, it works only for whole number values 1-4 (the values shown in the original example).


drfg2008 wrote
I tried your syntax, but couldn't get it running :

*---------------- first build a file ----------------.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).

EXECUTE .

*-----------your syntax with exe.------------------------.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
EXECUTE .
compute unique_values = sum(i1 to i4).

------ this is an excerpt of the error messages :

>Warnung Nr.  525
>An attempt was made to store a value into an element of a vector the subscript
>of which was missing or otherwise invalid.  The subscript must be a positive
>integer and must not be greater than the length of the vector.  No store can
>occur.
>Command line: 207  Current case: 6  Current splitfile group: 1
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: different values in a row

David Marso
Administrator
ROFL !!! I sometimes provoke myself as well ;-)
Case in point:
Use VARSTOCASES to go wide to long (eliminating the external file -for which I omitted the ERASE command in my previous solution- ;-) * I should maybe call all my temp files for posted code "C:\dmmtemp" and every once in awhile post random code which has ERASE "C:\dmmtemp" somewhere in the mix ;-)))....*
ie, substitute the VECTOR/LOOP/XSAVE biz for an appropriate V2C.
Use MODE=ADDVARIABLES in the AGGREGATE(s) to remove the need for the MATCH.
Leaving this as an exercise for those that wish to pursue it.
---

Bruce Weaver wrote
Yes, that's more like it.  (If I post something unwieldy enough, it always provokes David to post a much better solution.)  ;-)


David Marso wrote
Looks unwieldy with even a few variables.
See my post using Restructure and AGGREGATE.
"I always try to offer solutions that will run in native SPSS code. "
I concur.  I have an OLD (11.5) version of SPSS and can't afford to upgrade, so you will find my solutions will work in ANY version of SPSS (except PC+) back to version 4 on a mainframe/Unix box/Ancient Mac -any version which supports VECTOR/LOOP XSAVE etc.
I even tend to use old skool restructure rather than CASESTOVARS or VARSTOCASES.
I am typing this on my Mac and my SPSS is on the Windows partition.  I can't remember the VARSTOCASES syntax off the top of my head but the VECTOR/LOOP/XSAVE is a permanent part of my neural wiring at this point ;-)

Bruce Weaver wrote
Here is a non-Python (untested) solution [1] that should work quite well, provided the number of variables is not too large.  It assumes that any missing values are SYSMIS.

* UVC = unique value count .
recode v1 to v4 (sysmis=9999). /* or some other user-defined value .
compute uvc = 1.
compute uvc = uvc +  (v2 NE v1) .
compute uvc = uvc + ((v3 NE v1) and (v3 NE v2)).
compute uvc = uvc + ((v4 NE v1) and (v4 NE v2) and (v4 NE v1)).
missing values v1 to v4 (9999).

This will count SYSMIS as one of the possible unique values.  If you don't want to include SYMIS, then correct UVC by subtraction.  E.g.,

compute uvc = uvc - (nmiss(v1 to v4) GT 0).

One could possibly stick this basic idea into a macro that would make it more feasible for a large number of variables.

[1] As I often point out, many of the academic SPSS users I know will probably *never* install and use Python; therefore, I always try to offer solutions that will run in native SPSS code.  This should not be taken as any slight against Python.  It's just being realistic, IMO.



Bruce Weaver wrote
As Jon pointed out, it works only for whole number values 1-4 (the values shown in the original example).


drfg2008 wrote
I tried your syntax, but couldn't get it running :

*---------------- first build a file ----------------.

input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
exe.

comp v1 =RV.BINOM(5,0.5).
comp v2 =RV.BINOM(5,0.5).
comp v3 =RV.BINOM(5,0.5).
comp v4 =RV.BINOM(5,0.5).

EXECUTE .

*-----------your syntax with exe.------------------------.

numeric i1 to i4 (f1.0). /* 4 indicator variables.
recode i1 to i4 (else=0). /* initialize to 0.
vector v = v1 to v4 / i = i1 to i4.
loop # = 1 to 4.
- compute i(v(#)) = 1. /* value stored in v(#) flagged as present.
end loop.
EXECUTE .
compute unique_values = sum(i1 to i4).

------ this is an excerpt of the error messages :

>Warnung Nr.  525
>An attempt was made to store a value into an element of a vector the subscript
>of which was missing or otherwise invalid.  The subscript must be a positive
>integer and must not be greater than the length of the vector.  No store can
>occur.
>Command line: 207  Current case: 6  Current splitfile group: 1
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"