SPSSX Discussion

Counting Strings

Classic

List

Threaded

4 messages Options

Matthew Pirritano

Counting Strings

Good evening list,
I have string data. There are multiple time points. Some have data at only one or a few of these time points. Many participants are missing all data. I am trying to select a random sample of folks that have ANY data. Is there someway to create a new variable that returns 1 when there is data amongst the string variables and 0 when there is none? I know that you cannot use count for strings.
Helps is thanks,
Matt
Matthew Pirritano, Ph.D.
Email: [hidden email]

----- Original Message ----
From: Richard Ristow <[hidden email]>
To: [hidden email]
Sent: Monday, May 12, 2008 6:35:12 AM
Subject: Re: seeking help with matching two files

At 10:27 PM 5/11/2008, Thara Vardhan wrote:

>I have two files -
>
>1. Person victims - 3818 records
>2. Organization victims - 1063
>
>Both of them have duplicate records on the IncidentRefNum because
>some incidents can have more than one person victim; the person
>victims are identified by PersonCNI variable. Likewise the
>organization victims file also has incident ref number with two
>victims identified by PartyCNI variable.
>
>I tried to merge them using the data/merge files/add variables function.
>The key variable is the Increfnum:
>
>DATASET ACTIVATE DataSet14.
>MATCH FILES /FILE=*
> /FILE='DataSet13'
> /RENAME (EventRefNum PrimaryFirst UniqueParties = d0 d1 d2)
> /BY IncidentRefNum
> /DROP= d0 d1 d2.
>
>I get the following warnings
>File #1
> KEY: 26669895
>
>>Warning # 5132
>>Duplicate key in a file. The BY variables do not uniquely identify each
>>case on the indicated file. Please check the results carefully.

In this logic, you're trying to associate each person victim with an
organization victim, and vice versa, matching by Increfnum. Is that
what you want? Might you be wanting to use the "add cases" function
instead, to get all records you currently have combined into a single file?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Albert-Jan Roskam

Re: Counting Strings

Hi Matthew,

This could perhaps be done in a more compact way, but
it works. T1 to T4 are the time point vars (string).

vector #l (4).
do repeat #in = t1 to t4 / #out = #l1 to #l4.
+ compute #out = length(rtrim(#in)).
end repeat print.
compute filter = sum (#l1 to #l4) > 0.
exe.

Cheers!!
Albert-Jan

--- Matthew Pirritano <[hidden email]>
wrote:

> Good evening list,
> I have string data. There are multiple time points.
> Some have data at only one or a few of these time
> points. Many participants are missing all data. I am
> trying to select a random sample of folks that have
> ANY data. Is there someway to create a new variable
> that returns 1 when there is data amongst the string
> variables and 0 when there is none? I know that you
> cannot use count for strings.
> Helps is thanks,
> Matt
> Matthew Pirritano, Ph.D.
> Email: [hidden email]
>
>
>
> ----- Original Message ----
> From: Richard Ristow <[hidden email]>
> To: [hidden email]
> Sent: Monday, May 12, 2008 6:35:12 AM
> Subject: Re: seeking help with matching two files
>
> At 10:27 PM 5/11/2008, Thara Vardhan wrote:
>
> >I have two files -
> >
> >1. Person victims - 3818 records
> >2. Organization victims - 1063
> >
> >Both of them have duplicate records on the
> IncidentRefNum because
> >some incidents can have more than one person
> victim; the person
> >victims are identified by PersonCNI variable.
> Likewise the
> >organization victims file also has incident ref
> number with two
> >victims identified by PartyCNI variable.
> >
> >I tried to merge them using the data/merge
> files/add variables function.
> >The key variable is the Increfnum:
> >
> >DATASET ACTIVATE DataSet14.
> >MATCH FILES /FILE=*
> > /FILE='DataSet13'
> > /RENAME (EventRefNum PrimaryFirst UniqueParties =
> d0 d1 d2)
> > /BY IncidentRefNum
> > /DROP= d0 d1 d2.
> >
> >I get the following warnings
> >File #1
> > KEY: 26669895
> >
> >>Warning # 5132
> >>Duplicate key in a file. The BY variables do not
> uniquely identify each
> >>case on the indicated file. Please check the
> results carefully.
>
> In this logic, you're trying to associate each
> person victim with an
> organization victim, and vice versa, matching by
> Increfnum. Is that
> what you want? Might you be wanting to use the "add
> cases" function
> instead, to get all records you currently have
> combined into a single file?
>
> =====================
> To manage your subscription to SPSSX-L, send a
> message to
> [hidden email] (not to SPSSX-L), with no
> body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send
> the command
> INFO REFCARD
>
>
> ====================To manage your subscription to
> SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no
> body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send
> the command
> INFO REFCARD
>

Matthew Pirritano

Re: Counting Strings

In reply to this post by Matthew Pirritano

Thanks a ton Albert-Jan,
Couple questions about the syntax. It works. I tested it out, but call me silly -- I want to understand what's going on.
I understand everything up to the print command. What does the print command do? Here's what it printed for me with a simple test data set that looks like this:

id string1 string2 string3 filter
1 abc abc abc 1
5 abc 1
6 abc 1
3 abc 1
7 abc 1
2 0
1 0
2 abc abc 1
3 0
4 abc abc 1
5 0
The syntax was:
vector #l(3).
do repeat #in = string1 to string3 / #out = #l1 to #l3.
+ compute #out = length(rtrim(#in)).
end repeat print.
compute filter = sum (#l1 to #l3) > 0.
exe.
And my output was:
vector #l(3).
do repeat #in = string1 to string3 / #out = #l1 to #l3.
+ compute #out = length(rtrim(#in)).
end repeat print.
19 0 +compute #l1 = length(rtrim(string1))
20 0 +compute #l2 = length(rtrim(string2))
21 0 +compute #l3 = length(rtrim(string3))
compute filter = sum (#l1 to #l3) > 0.
exe.
So what exactly is the information after the print statement telling me? What does the 19, 20, and 21 mean?
And lastly it seems as if the compute filter statement is doing something I didn't expect. It is behaving as if I had told it to code the sum of lengths greater than 0 as 1, yet I don't see that in the syntax? By including >0 after the compute statement are you combining the compute statement with some sort of 'if' statement? I've never seen this before and I'm interested.
Hope that isn't too much info. I just want to learn. Maybe this is in a text that you could refer me to.
Again. All of my thanks!
Matt
Matthew Pirritano, Ph.D.
Email: [hidden email]

----- Original Message ----
From: Albert-jan Roskam <[hidden email]>
To: [hidden email]
Sent: Tuesday, May 13, 2008 12:51:36 AM
Subject: Re: Counting Strings

Hi Matthew,

This could perhaps be done in a more compact way, but
it works. T1 to T4 are the time point vars (string).

vector #l (4).
do repeat #in = t1 to t4 / #out = #l1 to #l4.
+ compute #out = length(rtrim(#in)).
end repeat print.
compute filter = sum (#l1 to #l4) > 0.
exe.

Cheers!!
Albert-Jan

--- Matthew Pirritano <[hidden email]>
wrote:

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Albert-Jan Roskam

Re: Counting Strings

Hi Matthew,

You're welcome! The print command can be ommitted; the
only thing it does is that it shows what the DO REPEAT
statement is actually doing in the background. I think
19, 20, 21 indicate the line numbers (not sure).

Another way of saying:
compute filter = sum (#l1 to #l3) > 0.
Is:
compute filter = 0.
if (sum (#l1 to #l3) > 0) filter = 1.

You're right, the latter syntax is more intuitive, but
also a lil' longer. =)

Cheers!!
Albert-Jan

vector #l(3).
do repeat #in = string1 to string3 / #out = #l1 to
#l3.
+ compute #out = length(rtrim(#in)).
end repeat print.
compute filter = sum (#l1 to #l3) > 0.
exe.

--- Matthew Pirritano <[hidden email]>
wrote:

> Thanks a ton Albert-Jan,
> Couple questions about the syntax. It works. I
> tested it out, but call me silly -- I want to
> understand what's going on.
> I understand everything up to the print command.
> What does the print command do? Here's what it
> printed for me with a simple test data set that
> looks like this:
>
> id string1 string2 string3 filter
> 1 abc abc abc 1
> 5 abc 1
> 6 abc 1
> 3 abc 1
> 7 abc 1
> 2 0
> 1 0
> 2 abc abc 1
> 3 0
> 4 abc abc 1
> 5 0
> The syntax was:
> vector #l(3).
> do repeat #in = string1 to string3 / #out = #l1 to
> #l3.
> + compute #out = length(rtrim(#in)).
> end repeat print.
> compute filter = sum (#l1 to #l3) > 0.
> exe.
> And my output was:
> vector #l(3).
> do repeat #in = string1 to string3 / #out = #l1 to
> #l3.
> + compute #out = length(rtrim(#in)).
> end repeat print.
> 19 0 +compute #l1 = length(rtrim(string1))
> 20 0 +compute #l2 = length(rtrim(string2))
> 21 0 +compute #l3 = length(rtrim(string3))
> compute filter = sum (#l1 to #l3) > 0.
> exe.
> So what exactly is the information after the print
> statement telling me? What does the 19, 20, and 21
> mean?
> And lastly it seems as if the compute filter
> statement is doing something I didn't expect. It is
> behaving as if I had told it to code the sum of
> lengths greater than 0 as 1, yet I don't see that in
> the syntax? By including >0 after the compute
> statement are you combining the compute statement
> with some sort of 'if' statement? I've never seen
> this before and I'm interested.
> Hope that isn't too much info. I just want to learn.
> Maybe this is in a text that you could refer me to.
> Again. All of my thanks!
> Matt
> Matthew Pirritano, Ph.D.
> Email: [hidden email]
>
>
>
> ----- Original Message ----
> From: Albert-jan Roskam <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, May 13, 2008 12:51:36 AM
> Subject: Re: Counting Strings
>
> Hi Matthew,
>
> This could perhaps be done in a more compact way,
> but
> it works. T1 to T4 are the time point vars (string).
>
> vector #l (4).
> do repeat #in = t1 to t4 / #out = #l1 to #l4.
> + compute #out = length(rtrim(#in)).
> end repeat print.
> compute filter = sum (#l1 to #l4) > 0.
> exe.
>
> Cheers!!
> Albert-Jan
>
> --- Matthew Pirritano
> <[hidden email]>
> wrote:
>
> > Good evening list,
> > I have string data. There are multiple time
> points.
> > Some have data at only one or a few of these time
> > points. Many participants are missing all data. I
> am
> > trying to select a random sample of folks that
> have
> > ANY data. Is there someway to create a new
> variable
> > that returns 1 when there is data amongst the
> string
> > variables and 0 when there is none? I know that
> you
> > cannot use count for strings.
> > Helps is thanks,
> > Matt
> > Matthew Pirritano, Ph.D.
> > Email: [hidden email]
> >
> >
> >
> > ----- Original Message ----
> > From: Richard Ristow <[hidden email]>
> > To: [hidden email]
> > Sent: Monday, May 12, 2008 6:35:12 AM
> > Subject: Re: seeking help with matching two files
> >
> > At 10:27 PM 5/11/2008, Thara Vardhan wrote:
> >
> > >I have two files -
> > >
> > >1. Person victims - 3818 records
> > >2. Organization victims - 1063
> > >
> > >Both of them have duplicate records on the
> > IncidentRefNum because
> > >some incidents can have more than one person
> > victim; the person
> > >victims are identified by PersonCNI variable.
> > Likewise the
> > >organization victims file also has incident ref
> > number with two
> > >victims identified by PartyCNI variable.
> > >
> > >I tried to merge them using the data/merge
> > files/add variables function.
> > >The key variable is the Increfnum:
> > >
> > >DATASET ACTIVATE DataSet14.
> > >MATCH FILES /FILE=*
> > > /FILE='DataSet13'
> > > /RENAME (EventRefNum PrimaryFirst UniqueParties
> =
> > d0 d1 d2)
> > > /BY IncidentRefNum
> > > /DROP= d0 d1 d2.
> > >
> > >I get the following warnings
> > >File #1
> > > KEY: 26669895
> > >
> > >>Warning # 5132
> > >>Duplicate key in a file. The BY variables do not
> > uniquely identify each
> > >>case on the indicated file. Please check the
> > results carefully.
> >
> > In this logic, you're trying to associate each
> > person victim with an
> > organization victim, and vice versa, matching by
> > Increfnum. Is that
> > what you want? Might you be wanting to use the
> "add
> > cases" function
> > instead, to get all records you currently have
> > combined into a single file?
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a
> > message to
> > [hidden email] (not to SPSSX-L), with
> no
> > body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions,
> send
> > the command
> > INFO REFCARD
> >
> >
> > ====================To manage your subscription to
> > SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with
> no
> > body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions,
> send
> > the command
> > INFO REFCARD
> >
>
> =====================
> To manage your subscription to SPSSX-L, send a
> message to
> [hidden email] (not to SPSSX-L), with no
> body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send
> the command
> INFO REFCARD
>

=== message truncated ===

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD