Identifying values across records

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Identifying values across records

pkcdust
This is my first question here, so I apologize for any simplicity within it.

I'm looking for a way to mark cases that share a value for one variable, and one of those variables has a certain value of another variable. (I probably can't find my answer because I can't construct my question properly).

IE.

X           Y            Z
123       Yes         1
123                     1
123                     1
111          
145       Yes         1
145                     1

I want to create Z variable '1' to show any instance of 'Yes' in Y for a variable that shares the same X value of as the case with the 'Yes' in Y

Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

David Marso
Administrator
Why do I get the feeling that this is weird and there will be other wrinkles popping out of the edges of my Friday?
--

SORT CASES BY X (A) Y (D).
IF Y='Yes' Z=1.
IF X=LAG(X) Y=LAG(Y).
IF X=LAG(X) Z=LAG(Z).


For the peanut gallery.
I had considered the following but it is 2 lines more '-)
You all know how I enjoy concise.

SORT CASES BY X (A) Y (D).
IF Y='Yes' Z=1.
DO IF ( X=LAG(X)).
COMPUTE Y=LAG(Y).
COMPUTE Z=LAG(Z).
END IF.

pkcdust wrote
This is my first question here, so I apologize for any simplicity within it.

I'm looking for a way to mark cases that share a value for one variable, and one of those variables has a certain value of another variable. (I probably can't find my answer because I can't construct my question properly).

IE.

X           Y            Z
123       Yes         1
123                     1
123                     1
111          
145       Yes         1
145                     1

I want to create Z variable '1' to show any instance of 'Yes' in Y for a variable that shares the same X value of as the case with the 'Yes' in Y
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Jignesh Sutar
In reply to this post by pkcdust
You'll need to/should review the AGGREGATE command if you are not familiar to it, to understand a little how this works. Below is a demonstration which you can run via syntax at the start of a new session in SPSS. If you have any questions feel free to ask.
 
DATA LIST FREE / X (F3.0) Y(A3).
BEGIN DATA 
123 Yes
123 No
123 No
111 No
145 Yes
145 No
END DATA.

COMPUTE @Y=Y="Yes".
AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=X /Z=MAX(@Y).
ADD FILES FILE=* /DROP=@Y.
EXE.

(David, I've taken up your prefixing of temporary variable names with @ symbol convention, :-D!)


On 17 April 2015 at 19:51, pkcdust <[hidden email]> wrote:
This is my first question here, so I apologize for any simplicity within it.

I'm looking for a way to mark cases that share a value for one variable, and
one of those variables has a certain value of another variable. (I probably
can't find my answer because I can't construct my question properly).

IE.

X           Y            Z
123       Yes         1
123                     1
123                     1
111
145       Yes         1
145                     1

I want to create Z variable '1' to show any instance of 'Yes' in Y for a
variable that shares the same X value of as the case with the 'Yes' in Y





--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

David Marso
Administrator
I'd say just go for the jugular and assume Z does not already exist.
OVERWRITE=YES allows you to omit the ADD FILES.
In any case, I believe it good practice to use variable names which are unlikely to exist in the data file.
My convention is prefix with @.  In this case it is simpler to just go for it ;-) and eliminate the mop up.

IF Y='Yes' Z=1.
AGGREGATE OUTFILE * MODE=ADDVARIABLES OVERWRITE=YES / BREAK X /Z=MAX(Z).
-----
Jignesh Sutar wrote
You'll need to/should review the AGGREGATE
<http://www-01.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_aggregate_overview.htm>
command
if you are not familiar to it, to understand a little how this works. Below
is a demonstration which you can run via syntax at the start of a new
session in SPSS. If you have any questions feel free to ask.

DATA LIST FREE / X (F3.0) Y(A3).
BEGIN DATA
123 Yes
123 No
123 No
111 No
145 Yes
145 No
END DATA.

COMPUTE @Y=Y="Yes".
AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=X /Z=MAX(@Y).
ADD FILES FILE=* /DROP=@Y.
EXE.

(David, I've taken up your prefixing of temporary variable names with @
symbol convention, :-D!)


On 17 April 2015 at 19:51, pkcdust <[hidden email]> wrote:

> This is my first question here, so I apologize for any simplicity within
> it.
>
> I'm looking for a way to mark cases that share a value for one variable,
> and
> one of those variables has a certain value of another variable. (I probably
> can't find my answer because I can't construct my question properly).
>
> IE.
>
> X           Y            Z
> 123       Yes         1
> 123                     1
> 123                     1
> 111
> 145       Yes         1
> 145                     1
>
> I want to create Z variable '1' to show any instance of 'Yes' in Y for a
> variable that shares the same X value of as the case with the 'Yes' in Y
>
>
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Jignesh Sutar
Any reason why you'd favour IF over COMPUTE?
On Fri, 17 Apr 2015 at 20:59, David Marso <[hidden email]> wrote:
I'd say just go for the jugular and assume Z does not already exist.
OVERWRITE=YES allows you to omit the ADD FILES.
In any case, I believe it good practice to use variable names which are
unlikely to exist in the data file.
My convention is prefix with @.  In this case it is simpler to just go for
it ;-) and eliminate the mop up.

IF Y='Yes' Z=1.
AGGREGATE OUTFILE * MODE=ADDVARIABLES OVERWRITE=YES / BREAK X /Z=MAX(Z).
-----

Jignesh Sutar wrote
> You'll need to/should review the AGGREGATE
> &lt;http://www-01.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_aggregate_overview.htm&gt;
> command
> if you are not familiar to it, to understand a little how this works.
> Below
> is a demonstration which you can run via syntax at the start of a new
> session in SPSS. If you have any questions feel free to ask.
>
> DATA LIST FREE / X (F3.0) Y(A3).
> BEGIN DATA
> 123 Yes
> 123 No
> 123 No
> 111 No
> 145 Yes
> 145 No
> END DATA.
>
> COMPUTE @Y=Y="Yes".
> AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=X /Z=MAX(@Y).
> ADD FILES FILE=* /DROP=@Y.
> EXE.
>
> (David, I've taken up your prefixing of temporary variable names with @
> symbol convention, :-D!)
>
>
> On 17 April 2015 at 19:51, pkcdust &lt;

> bruinchiq@

> &gt; wrote:
>
>> This is my first question here, so I apologize for any simplicity within
>> it.
>>
>> I'm looking for a way to mark cases that share a value for one variable,
>> and
>> one of those variables has a certain value of another variable. (I
>> probably
>> can't find my answer because I can't construct my question properly).
>>
>> IE.
>>
>> X           Y            Z
>> 123       Yes         1
>> 123                     1
>> 123                     1
>> 111
>> 145       Yes         1
>> 145                     1
>>
>> I want to create Z variable '1' to show any instance of 'Yes' in Y for a
>> variable that shares the same X value of as the case with the 'Yes' in Y
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249p5729252.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

David Marso
Administrator
No particular reason aside from it being a bit more transparent to less experienced users.
OTOH:  I would likely in retrospect concede to Art's 'soapboxing' and use EQ for the comparison as distinct from assignment.
IF (Y EQ 'Yes') Z=1.
alternatively
COMPUTE Z=(Y EQ 'Yes').

OTOH, the Boolean assignment can bite you (and sometimes the bug is elusive).
DO REPEAT var=var1 TO var100.
COMPUTE flag = var EQ value.
END REPEAT.

vs
DO REPEAT var=var1 TO var100.
IF ( var EQ value) flag = 1.
END REPEAT.

Which I prefer to do in a loop with escape anyhow (so Boolean can't bite in this case and you bail once you have located the condition of interest).
VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
COMPUTE flag = (vars(#) EQ value).
END LOOP IF flag.
vs
VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
IF  (vars(#) EQ value) flag =1.
END LOOP IF flag.


--
Jignesh Sutar wrote
Any reason why you'd favour IF over COMPUTE?
On Fri, 17 Apr 2015 at 20:59, David Marso <[hidden email]> wrote:

> I'd say just go for the jugular and assume Z does not already exist.
> OVERWRITE=YES allows you to omit the ADD FILES.
> In any case, I believe it good practice to use variable names which are
> unlikely to exist in the data file.
> My convention is prefix with @.  In this case it is simpler to just go for
> it ;-) and eliminate the mop up.
>
> IF Y='Yes' Z=1.
> AGGREGATE OUTFILE * MODE=ADDVARIABLES OVERWRITE=YES / BREAK X /Z=MAX(Z).
> -----
>
> Jignesh Sutar wrote
> > You'll need to/should review the AGGREGATE
> > <
> http://www-01.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/syn_aggregate_overview.htm>
> ;
> > command
> > if you are not familiar to it, to understand a little how this works.
> > Below
> > is a demonstration which you can run via syntax at the start of a new
> > session in SPSS. If you have any questions feel free to ask.
> >
> > DATA LIST FREE / X (F3.0) Y(A3).
> > BEGIN DATA
> > 123 Yes
> > 123 No
> > 123 No
> > 111 No
> > 145 Yes
> > 145 No
> > END DATA.
> >
> > COMPUTE @Y=Y="Yes".
> > AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=X /Z=MAX(@Y).
> > ADD FILES FILE=* /DROP=@Y.
> > EXE.
> >
> > (David, I've taken up your prefixing of temporary variable names with @
> > symbol convention, :-D!)
> >
> >
> > On 17 April 2015 at 19:51, pkcdust <
>
> > bruinchiq@
>
> > > wrote:
> >
> >> This is my first question here, so I apologize for any simplicity within
> >> it.
> >>
> >> I'm looking for a way to mark cases that share a value for one variable,
> >> and
> >> one of those variables has a certain value of another variable. (I
> >> probably
> >> can't find my answer because I can't construct my question properly).
> >>
> >> IE.
> >>
> >> X           Y            Z
> >> 123       Yes         1
> >> 123                     1
> >> 123                     1
> >> 111
> >> 145       Yes         1
> >> 145                     1
> >>
> >> I want to create Z variable '1' to show any instance of 'Yes' in Y for a
> >> variable that shares the same X value of as the case with the 'Yes' in Y
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >>
>
> > LISTSERV@.UGA
>
> >  (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >>
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
>
> > LISTSERV@.UGA
>
> >  (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249p5729252.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

John F Hall
What's wrong with using COUNT

COUNT <newvar> = <var1> (<value1>) <var2> (<value2>.
FREQ  <newvar>.

Should yield 0, 1 or 2.  The 2s are your marker.

John F Hall (Mr)
[Retired academic survey researcher]

Email:   [hidden email]  
Website: www.surveyresearch.weebly.com
SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop





-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: 17 April 2015 23:02
To: [hidden email]
Subject: Re: Identifying values across records

No particular reason aside from it being a bit more transparent to less
experienced users.
OTOH:  I would likely in retrospect concede to Art's 'soapboxing' and use EQ
for the comparison as distinct from assignment.
IF (Y EQ 'Yes') Z=1.
alternatively
COMPUTE Z=(Y EQ 'Yes').

OTOH, the Boolean assignment can bite you (and sometimes the bug is
elusive).
DO REPEAT var=var1 TO var100.
COMPUTE flag = var EQ value.
END REPEAT.

vs
DO REPEAT var=var1 TO var100.
IF ( var EQ value) flag = 1.
END REPEAT.

Which I prefer to do in a loop with escape anyhow (so Boolean can't bite in
this case and you bail once you have located the condition of interest).
VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
COMPUTE flag = (vars(#) EQ value).
END LOOP IF flag.
vs
VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
IF  (vars(#) EQ value) flag =1.
END LOOP IF flag.


--

Jignesh Sutar wrote
> Any reason why you'd favour IF over COMPUTE?
> On Fri, 17 Apr 2015 at 20:59, David Marso &lt;

> david.marso@

> &gt; wrote:
>
>> I'd say just go for the jugular and assume Z does not already exist.
>> OVERWRITE=YES allows you to omit the ADD FILES.
>> In any case, I believe it good practice to use variable names which
>> are unlikely to exist in the data file.
>> My convention is prefix with @.  In this case it is simpler to just
>> go for it ;-) and eliminate the mop up.
>>
>> IF Y='Yes' Z=1.
>> AGGREGATE OUTFILE * MODE=ADDVARIABLES OVERWRITE=YES / BREAK X /Z=MAX(Z).
>> -----
>>
>> Jignesh Sutar wrote
>> > You'll need to/should review the AGGREGATE &lt;
>> http://www-01.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.s
>> pss.statistics.help/syn_aggregate_overview.htm&gt
>> ;
>> > command
>> > if you are not familiar to it, to understand a little how this works.
>> > Below
>> > is a demonstration which you can run via syntax at the start of a new
>> > session in SPSS. If you have any questions feel free to ask.
>> >
>> > DATA LIST FREE / X (F3.0) Y(A3).
>> > BEGIN DATA
>> > 123 Yes
>> > 123 No
>> > 123 No
>> > 111 No
>> > 145 Yes
>> > 145 No
>> > END DATA.
>> >
>> > COMPUTE @Y=Y="Yes".
>> > AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=X /Z=MAX(@Y).
>> > ADD FILES FILE=* /DROP=@Y.
>> > EXE.
>> >
>> > (David, I've taken up your prefixing of temporary variable names with @
>> > symbol convention, :-D!)
>> >
>> >
>> > On 17 April 2015 at 19:51, pkcdust &lt;
>>
>> > bruinchiq@
>>
>> > &gt; wrote:
>> >
>> >> This is my first question here, so I apologize for any simplicity
>> within
>> >> it.
>> >>
>> >> I'm looking for a way to mark cases that share a value for one
>> variable,
>> >> and
>> >> one of those variables has a certain value of another variable. (I
>> >> probably
>> >> can't find my answer because I can't construct my question properly).
>> >>
>> >> IE.
>> >>
>> >> X           Y            Z
>> >> 123       Yes         1
>> >> 123                     1
>> >> 123                     1
>> >> 111
>> >> 145       Yes         1
>> >> 145                     1
>> >>
>> >> I want to create Z variable '1' to show any instance of 'Yes' in Y for
>> a
>> >> variable that shares the same X value of as the case with the 'Yes' in
>> Y
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>>
http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-reco
rds-tp5729249.html

>> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>> >>
>> >> =====================
>> >> To manage your subscription to SPSSX-L, send a message to
>> >>
>>
>> > LISTSERV@.UGA
>>
>> >  (not to SPSSX-L), with no body text except the
>> >> command. To leave the list, send the command
>> >> SIGNOFF SPSSX-L
>> >> For a list of commands to manage subscriptions, send the command
>> >> INFO REFCARD
>> >>
>> >
>> > =====================
>> > To manage your subscription to SPSSX-L, send a message to
>>
>> > LISTSERV@.UGA
>>
>> >  (not to SPSSX-L), with no body text except the
>> > command. To leave the list, send the command
>> > SIGNOFF SPSSX-L
>> > For a list of commands to manage subscriptions, send the command
>> > INFO REFCARD
>>
>>
>>
>>
>>
>> -----
>> Please reply to the list and not to my personal email.
>> Those desiring my consulting or training services please feel free to
>> email me.
>> ---
>> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
>> porcos
>> ne forte conculcent eas pedibus suis."
>> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff
>> in
>> abyssum?"
>> --
>> View this message in context:
>>
http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-reco
rds-tp5729249p5729252.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email
me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-reco
rds-tp5729249p5729254.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Kirill Orlov
In reply to this post by David Marso
David, why are you so fond of double controls? Is there any hidden secret esoterism in the guru head that you might agree to reveal?
VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
IF  (vars(#) EQ value) flag =1.
END LOOP IF flag.
Why not use BREAK under DO IF here?
VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
DO IF  (vars(#) EQ value).
COMP flag =1.
BREAK.
END IF.
END LOOP.
Sometimes (it depends on the data structure) a version without IF-interruption will be faster:

COMPUTE flag= 0.
DO REPEAT var=var1 TO var100.
COMPUTE flag = flag or (var EQ value).
END REPEAT.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

David Marso
Administrator
Aesthetics (4 lines vs 7) and Intuitiveness?
Note the DO REPEAT will search all 100 variables even if the flag trigger is already located in the 1st variable.  How can that possibly be faster than the broken loop?
Kirill Orlov wrote
David, why are you so fond of double controls? Is there any hidden
secret esoterism in the guru head that you might agree to reveal?

VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
*IF*   (vars(#) EQ value) flag =1.
END LOOP*IF*  flag.

Why not use BREAK under DO IF here?

VECTOR vars=var1 TO var100.
LOOP #=1 TO 100.
*DO IF*   (vars(#) EQ value).
COMP flag =1.
BREAK.
END IF.
END LOOP.

Sometimes (it depends on the data structure) a version without
IF-interruption will be faster:

COMPUTE flag= 0.
DO REPEAT var=var1 TO var100.
COMPUTE flag = flag or (var EQ value).
END REPEAT.


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Kirill Orlov
I'm for shorter ("aesthetical") code too. Unfortunately, shorter code is not always a faster code.

>Note the DO REPEAT will search all 100 variables
But on the other hand IF condition on every cycle takes time to check it.
Therefore I said that sometimes blunt looping till the end may be faster than that additional operation at every step.
I don't think you expect the search value be always located in few first variables.

18.04.2015 23:34, David Marso пишет:
Aesthetics (4 lines vs 7) and Intuitiveness?
Note the DO REPEAT will search all 100 variables even if the flag trigger is
already located in the 1st variable.  How can that possibly be faster than
the broken loop?



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

David Marso
Administrator
I guess this is an empirical question.
Please demonstrate any instance in which the exhaustive search is faster than the immediate abort.
I'm certain that this conditional test takes a tiny fraction of needlessly continuing on after already locating the answer.

Kirill Orlov wrote
I'm for shorter ("aesthetical") code too. Unfortunately, shorter code is
not always a faster code.

 >Note the DO REPEAT will search all 100 variables
But on the other hand IF condition on every cycle takes time to check it.
Therefore I said that *sometimes *blunt looping till the end may be
faster than that additional operation at every step.
I don't think you expect the search value be always located in few first
variables.

18.04.2015 23:34, David Marso пишет:
> Aesthetics (4 lines vs 7) and Intuitiveness?
> Note the DO REPEAT will search all 100 variables even if the flag trigger is
> already located in the 1st variable.  How can that possibly be faster than
> the broken loop?
>
>


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Kirill Orlov
OK. Here is an example, David.

Generate data.

SET RNG=MC SEED=574352.
matrix.
*comp divers= 100. /*With low diversity of values
*comp divers= 100000. /*Or with high diversity of values
comp vars= rnd(uniform(100000,1000)*divers).
save vars /out= * /vari= var1 to var1000.
end matrix.

*******************.
*"Do repeat with OR" syntax.

cache.
VECTOR vars= var1 TO var1000.
COMPUTE flag= 0.
DO REPEAT var= var1 TO var1000.
COMPUTE flag= flag or (var EQ 10).
END REPEAT.
freq flag.
delete var flag.

******************.
*"Loop with IF" syntax.

cache.
VECTOR vars= var1 TO var1000.
COMPUTE flag= 0.
LOOP #= 1 TO 1000.
DO IF (vars(#) EQ 10).
COMP flag= 1.
BREAK.
END IF.
END LOOP.
freq flag.
delete var flag.


I inserted VECTOR command in both pieces for "all other things being equal" convention.

With divers=100 data, the Loop syntax is faster.
With divers= 100000 data, the Do repeat syntax is becoming faster.


19.04.2015 17:35, David Marso пишет:
I guess this is an empirical question.
Please demonstrate any instance in which the exhaustive search is faster
than the immediate abort.
I'm certain that this conditional test takes a tiny fraction of needlessly
continuing on after already locating the answer.



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Art Kendall
In reply to this post by David Marso
it would be interesting to see whether it take more CPU time to do the process with (a) fewer machine instructions per case but exhaustive or (b) a loop with an escape command.

perhaps modify the syntax below to answer
How many cases does it to make a detectable difference between the times to use each of (1) a loop with and escape specification (2) a loop that searches exhaustively (3) a DO REPEAT.

vs
how many cases does the data set have to be to make a meaningful  difference, e.g., 10 second.* cases and items.
new file.
input program.
   vector x (100,f3).
   loop id = 1 to 10000.
      loop #p = 1 to 100.
         compute x(#p) = rnd(rv.normal(50,10)).
      end loop.
      end case.
   end loop.
   end file.
end input program.
DO IF $CASENUM=1.
   PRINT /"'start generation'"   $time (time20.3).
END IF.
execute.
DO IF $CASENUM=1.
   PRINT /"'start descriptives 1'"   $time (time20.3).
END IF.
descriptives variables = x1 to x10 /statistics=all.
DO IF $CASENUM=1.
   PRINT /"'start descriptives 2'"   $time (time20.3).
END IF.
descriptives variables = x1 to x10 /statistics=all.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Jon K Peck
In reply to this post by Kirill Orlov
Benchmarking is tricky.  I used the STATS BENCHMRK extension command to test the alternative syntax approaches.  This extension command runs each job multiple times and interleaves execution of the two versions in order to minimize environmental effects.  The command can record various measures of times, memory usage, including paging, and i/o for each Statistics process in the session.  In this case I measured only the total time for the spssengine process, ignoring the stats process, since spssengine time is the only interesting measure here.

I factored out the data generation and followed it with an execute in order to isolate the differences to the transformation alternatives.  I  would not expect the results to be affected by generating a different dataset for each comparison round.  I ran 5 repetitions of each version.

In the results below, group 1 is the do repeat syntax, and group 2 is the loop.  The t test results are below.

For Kirill's first case with divers = 100, loop is much faster, and the difference is highly significant.

for Kirill's second case, with divers=100,000, do repeat is significantly faster, but the difference is smaller.

As you can see, the do repeat time does not vary much between the two tests, but the loop time goes up a lot in the second scenario.  That makes intuitive sense, since the breakout from the loop will generally occur later in the second scenario.


divers=100

Group Statistics
Cmdset0
N
Mean
Std. Deviation
Std. Error Mean
time 1
5
7.0148
.04042
.01808
2
5
2.9438
.03176
.01420


Independent Samples Test
Levene's Test for Equality of Variances
t-test for Equality of Means
F
Sig.
t
df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval of the Difference
Lower
Upper
time Equal variances assumed
.903
.370
177.087
8
.000
4.07100
.02299
4.01799
4.12401
Equal variances not assumed
177.087
7.576
.000
4.07100
.02299
4.01747
4.12453


divers=100,000


Group Statistics
Cmdset0
N
Mean
Std. Deviation
Std. Error Mean
time 1
5
7.4096
.11962
.05350
2
5
9.9328
.01743
.00779


Independent Samples Test
Levene's Test for Equality of Variances
t-test for Equality of Means
F
Sig.
t
df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval of the Difference
Lower
Upper
time Equal variances assumed
4.471
.067
-46.673
8
.000
-2.52320
.05406
-2.64787
-2.39853
Equal variances not assumed
-46.673
4.170
.000
-2.52320
.05406
-2.67092
-2.37548


The STATS BENCHMRK extension command is available from the SPSS Communit website or, with V22 or 23, it can be installed from the Utilities menu.  However, it requires the free Python Extensions for Windows, which you can find by Googling.  That may require a registered Python in order to install.   The command only works on Windows, since it uses system measures that are Windows specific.




Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Kirill Orlov <[hidden email]>
To:        [hidden email]
Date:        04/19/2015 10:05 AM
Subject:        Re: [SPSSX-L] Identifying values across records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




OK. Here is an example, David.

Generate data.

SET RNG=MC SEED=574352.
matrix.
*comp divers= 100. /*With low diversity of values
*comp divers= 100000. /*Or with high diversity of values
comp vars= rnd(uniform(100000,1000)*divers).
save vars /out= * /vari= var1 to var1000.
end matrix.

*******************.
*"Do repeat with OR" syntax.

cache.
VECTOR vars= var1 TO var1000.
COMPUTE flag= 0.
DO REPEAT var= var1 TO var1000.
COMPUTE flag= flag or (var EQ 10).
END REPEAT.
freq flag.
delete var flag.

******************.
*"Loop with IF" syntax.

cache.
VECTOR vars= var1 TO var1000.
COMPUTE flag= 0.
LOOP #= 1 TO 1000.
DO IF (vars(#) EQ 10).
COMP flag= 1.
BREAK.
END IF.
END LOOP.
freq flag.
delete var flag.


I inserted VECTOR command in both pieces for "all other things being equal" convention.

With divers=100 data, the Loop syntax is faster.
With divers= 100000 data, the Do repeat syntax is becoming faster.


19.04.2015 17:35, David Marso пишет:
I guess this is an empirical question.
Please demonstrate any instance in which the exhaustive search is faster
than the immediate abort.
I'm certain that this conditional test takes a tiny fraction of needlessly
continuing on after already locating the answer.




===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Art Kendall
Am i correct in reading the means being in seconds
7 vs 3 seconds and
7.4 vs 9.9 seconds
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Jon K Peck
Yes, but the ratios are more interesting, since the absolute times will obviously depend on the dataset size.  For modest dataset sizes as here, of course, the differences are trivial.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        04/20/2015 09:05 AM
Subject:        Re: [SPSSX-L] Identifying values across records
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Am i correct in reading the means being in seconds
7 vs 3 seconds and
7.4 vs 9.9 seconds




-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249p5729276.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

Richard Ristow
In reply to this post by Jon K Peck
At 02:25 PM 4/19/2015, Jon K Peck wrote:

>I used the STATS BENCHMRK extension command to test the alternative
>syntax approaches.

>For Kirill's first case with divers = 100, loop is much faster, and
>the difference is highly significant.

That makes sense. In both cases, the loop terminates when value 10 is
found. With divers=100, values are 10 with probability .01, and the
mean number of loop passes to reach one is 100, much less than the
1,000 passes needed for a full search.

>for Kirill's second case, with divers=100,000, do repeat is
>significantly faster, but the difference is smaller.

In this case, values are 10 with probability 1E-5, and about 99% of
the cases, the LOOP will check all 1,000 values.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

pkcdust
In reply to this post by David Marso
Thank you so much for all of the great code/help with this! It's worked great so far.

I've tried to introduce multiple variables into this matching equation now but without success.

If I want to match on more than 1 variable

W         X           Y            
12        123       Yes      
12        123                      
13        123                      
12        111          
14        145       Yes          
14        145                      

And create Z that identifies W & X having the same values as those in a row that has Y='yes'

row     W         X           Y            Z
1      12        123       Yes         1
2      12        123                     1
3      13        123                      
4      12        111          
5      14        145       Yes         1
6      14        145                     1

Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

David Marso
Administrator
"I've tried to introduce multiple variables into this matching equation now but without success."
OK, what exactly have you tried?
Nudge towards cliff--
Have you bothered to read up on LAG and AGGREGATE?
That is what the original code involved.
Feel free to post back with your 'try' and people will help you sort it.
--

pkcdust wrote
Thank you so much for all of the great code/help with this! It's worked great so far.

I've tried to introduce multiple variables into this matching equation now but without success.

If I want to match on more than 1 variable

W         X           Y            
12        123       Yes      
12        123                      
13        123                      
12        111          
14        145       Yes          
14        145                      

And create Z that identifies W & X having the same values as those in a row that has Y='yes'

row     W         X           Y            Z
1      12        123       Yes         1
2      12        123                     1
3      13        123                      
4      12        111          
5      14        145       Yes         1
6      14        145                     1
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Identifying values across records

MLIves
In reply to this post by pkcdust
It is possible to match on more than one variable.
(e.g. Match file file=File1/File=File2/by Var1 Var2.  This requires that both files HAVE Var1 and Var2 and are sorted by Var1 Var2. I do this quite frequently and have not had a problem.)

HOWEVER, you aren't really trying to match files since you have only one file.
Version 22 or higher can do something like this. (UNTESTED).
Aggregate outfile=* mode=addvariables overwrite=yes
  /break=w x
  /z=cin(y,"Yes").
Z will be the number of Yes values for cases with the same W and X.

Or (also UNTESTED) something like this.
Compute z=y="Yes".
Sort cases by w x (a) y (d).
If ( y="" and lag(w)=w and lag(x)=x) z=lag(z).

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of pkcdust
Sent: Wednesday, May 13, 2015 4:59 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Identifying values across records

Thank you so much for all of the great code/help with this! It's worked great so far.

I've tried to introduce multiple variables into this matching equation now but without success.

If I want to match on more than 1 variable

W         X           Y
12        123       Yes
12        123
13        123
12        111
14        145       Yes
14        145

And create Z that identifies W & X having the same values as those in a row that has Y='yes'

row     W         X           Y            Z
1      12        123       Yes         1
2      12        123                     1
3      13        123
4      12        111
5      14        145       Yes         1
6      14        145                     1





--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Identifying-values-across-records-tp5729249p5729556.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

________________________________

This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD