SPSSX Discussion

SUBSTR into new variable

Classic

List

Threaded

9 messages Options

Lacay, Phebe

SUBSTR into new variable

Hi All,

I am looking to do some type of SUBSTR to get the following results for
the data in the variable 'Race'. Race has multiple items separated by a
comma. I would like to see each item in race separated into the
appropriate variables (r1 to r5).

Race --> r1 r2 r3 r4 r5
1,2,3 1 1 1 0 0
1,5, 1 0 0 0 1
2,4 0 1 0 1 0

Any help would be much appreciated.

Thank you,

Phebe Lacay
Assistant Director of Institutional Research
Raritan Valley Community College
P.O. Box 3300
Somerville, NJ 08876
908.526.1200 x 8355
908.429.0034 (fax)
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: SUBSTR into new variable

Administrator

Lacay, Phebe wrote

Hi All,

I am looking to do some type of SUBSTR to get the following results for
the data in the variable 'Race'. Race has multiple items separated by a
comma. I would like to see each item in race separated into the
appropriate variables (r1 to r5).

Race --> r1 r2 r3 r4 r5
1,2,3 1 1 1 0 0
1,5, 1 0 0 0 1
2,4 0 1 0 1 0

Any help would be much appreciated.

Thank you,

compute r1 = char.index(race,"1") GT 0.
compute r2 = char.index(race,"2") GT 0.
compute r3 = char.index(race,"3") GT 0.
compute r4 = char.index(race,"4") GT 0.
compute r5 = char.index(race,"5") GT 0.
format r1 to r5 (f1.0).
exe.

If there is a large number of possible race numbers, you could stick this into a macro and loop through the values. I expect someone will also propose a Python solution.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Jon K Peck

Re: SUBSTR into new variable

Well, Bruce, since you asked, here is a general and robust programmability solution. It has the advantage that it will work for any number of values, and the further advantage that it is easy to test. With a long
batch of separate compute statements, it would be easy for one to be messed up.

--- First, some data:
data list fixed/s(a10).
begin data
1,2,3
1,3,5
1,2,3,4,5
end data.
list.
exec.

--- Now define a tiny Python function named splitit that evaluates a variable and returns a list of true/false values. The logic is the same for every value, so it is easy to test: if one is wrong, they are all wrong.
The function will fail loudly if the string after comma-based splitting has any nonnumeric values

begin program.
def splitit(x, limit):
items = [int(i) for i in x.split(",")] # break out the comma-separated fields into integers
return [i in items for i in range(1, limit+1)] # test for presence of each integer value
end program.

--- Now run the extension command that calls the formula for each case and creates new numeric variables, d1, d2, d3, d4, d5. The new variables are numeric. It takes the input variable, s, and the upper end of the range of values, 5.

spssinc trans result=d1 to d5
/formula "splitit(s,5)".

Unlike the typical use of TO in Statistics, variables d1 to d5 can already exist or be created new by this command, and, if they exist, they do not need to be in sequential order in the dataset.

Regards,
Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	Bruce Weaver <[hidden email]>
To:	[hidden email]
Date:	07/15/2010 09:52 AM
Subject:	Re: [SPSSX-L] SUBSTR into new variable
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Lacay, Phebe wrote: > > Hi All, > > I am looking to do some type of SUBSTR to get the following results for > the data in the variable 'Race'. Race has multiple items separated by a > comma. I would like to see each item in race separated into the > appropriate variables (r1 to r5). > > Race --> r1 r2 r3 r4 r5 > 1,2,3 1 1 1 0 0 > 1,5, 1 0 0 0 1 > 2,4 0 1 0 1 0 > > Any help would be much appreciated. > > Thank you, > > compute r1 = char.index(race,"1") GT 0. compute r2 = char.index(race,"2") GT 0. compute r3 = char.index(race,"3") GT 0. compute r4 = char.index(race,"4") GT 0. compute r5 = char.index(race,"5") GT 0. format r1 to r5 (f1.0). exe. If there is a large number of possible race numbers, you could stick this into a macro and loop through the values. I expect someone will also propose a Python solution. ----- -- Bruce Weaver [hidden email]http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/SUBSTR-into-new-variable-tp1223571p1223579.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Bruce Weaver

Re: SUBSTR into new variable

Administrator

Jon K Peck wrote

Well, Bruce, since you asked, here is a general and robust programmability
solution. It has the advantage that it will work for any number of
values, and the further advantage that it is easy to test. With a long
batch of separate compute statements, it would be easy for one to be
messed up.

--- First, some data:
data list fixed/s(a10).
begin data
1,2,3
1,3,5
1,2,3,4,5
end data.
list.
exec.

--- Now define a tiny Python function named splitit that evaluates a
variable and returns a list of true/false values. The logic is the same
for every value, so it is easy to test: if one is wrong, they are all
wrong.
The function will fail loudly if the string after comma-based splitting
has any nonnumeric values

begin program.
def splitit(x, limit):
items = [int(i) for i in x.split(",")] # break out the comma-separated
fields into integers
return [i in items for i in range(1, limit+1)] # test for presence of
each integer value
end program.

--- Now run the extension command that calls the formula for each case and
creates new numeric variables, d1, d2, d3, d4, d5. The new variables are
numeric. It takes the input variable, s, and the upper end of the range
of values, 5.

spssinc trans result=d1 to d5
/formula "splitit(s,5)".

Unlike the typical use of TO in Statistics, variables d1 to d5 can already
exist or be created new by this command, and, if they exist, they do not
need to be in sequential order in the dataset.

Jon's solution highlights a problem with my list of COMPUTE statements: It would run into trouble with two-digit (or more) numbers, treating the "1" in "10" as if it was simply a "1", for example. So here is a more general BP (Before Python) solution. One slight drawback (compared to Jon's method) is that it does require one to know the maximum race code at the outset.

data list list / race (a10).
begin data
"1,2,3"
"1,5,11"
"2,4,10"
"3,8,10,11"
end data.

define !splitit ( max !tokens(1))
string race2 (a10).
compute race2 = race. /* A copy of RACE .
numeric r1 to !concat("r",!max) (f1.0).
!do !i = !max !to 1 !by -1
!let !r = !concat("r",!i)
compute !r = char.index(race2,!quote(!i)) GT 0.
* Once a value has been processed, remove it from RACE2.
* This prevents the "1" in "10" from being counted as a "1", for example.
compute race2 = replace(race2,!quote(!i),"").
format !r (f1.0).
!doend
exe.
delete var race2. /* Delete the copy of RACE .
!enddefine.

!splitit max=11.
list.

OUTPUT:

race r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11

1,2,3 1 1 1 0 0 0 0 0 0 0 0
1,5,11 1 0 0 0 1 0 0 0 0 0 1
2,4,10 0 1 0 1 0 0 0 0 0 1 0
3,8,10,11 0 0 1 0 0 0 0 1 0 1 1

Lacay, Phebe

Re: SUBSTR into new variable

In reply to this post by Lacay, Phebe

Thank you Sonia for your response. I was hoping for something
simpler...maybe using an IfAny( or using the comma as an INDEX..??
anyone else have any suggestions?
Thanks,
Phebe

-----Original Message-----
From: Lacay, Phebe
Sent: Thursday, July 15, 2010 10:42 AM
To: [hidden email]
Subject: SUBSTR into new variable

Hi All,

I am looking to do some type of SUBSTR to get the following results for
the data in the variable 'Race'. Race has multiple items separated by a
comma. I would like to see each item in race separated into the
appropriate variables (r1 to r5).

Race --> r1 r2 r3 r4 r5
1,2,3 1 1 1 0 0
1,5, 1 0 0 0 1
2,4 0 1 0 1 0

Any help would be much appreciated.

Thank you,

Phebe Lacay
Assistant Director of Institutional Research Raritan Valley Community
College P.O. Box 3300 Somerville, NJ 08876
908.526.1200 x 8355
908.429.0034 (fax)
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Albert-Jan Roskam

Re: SUBSTR into new variable

In reply to this post by Lacay, Phebe

Hi,

In case you have an old Spss version, such as yours truly:

data list free / Race (a8)    r1      r2      r3      r4      r5.
begin data.
"1,2,3"     1       1       1       0       0
"1,5"       1       0       0       0       1
"2,4"        0       1       0       1       0
end data.

begin program.
import spss, csv
cursor = spss.Cursor([0]) # 'race' is located in the first (zeroeth) column.
data = cursor.fetchall()
cursor.close()
positions = [case[0].strip().split(",") for case in data]
maxPos = int(max([max(pos) for pos in positions]))
computes = ["compute rr%s = index(race, '%s') > 0." % (i + 1, i + 1) for i in range(maxPos)]
spss.Submit(computes)
end program.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Mon, 7/19/10, Lacay, Phebe <[hidden email]> wrote:

From: Lacay, Phebe <[hidden email]>
Subject: Re: [SPSSX-L] SUBSTR into new variable
To: [hidden email]
Date: Monday, July 19, 2010, 9:00 PM

Thank you Sonia for your response. I was hoping for something
simpler...maybe using an IfAny( or using the comma as an INDEX..??
anyone else have any suggestions?
Thanks,
Phebe

-----Original Message-----
From: Lacay, Phebe
Sent: Thursday, July 15, 2010 10:42 AM
To: SPSSX-L@...
Subject: SUBSTR into new variable

Hi All,

I am looking to do some type of SUBSTR to get the following results for
the data in the variable 'Race'. Race has multiple items separated by a
comma. I would like to see each item in race separated into the
appropriate variables (r1 to r5).

Race -->    r1 r2 r3 r4 r5
1,2,3    1    1    1    0    0
1,5, 1    0    0    0    1
2,4    0    1    0    1    0

Any help would be much appreciated.

Thank you,

Phebe Lacay
Assistant Director of Institutional Research Raritan Valley Community
College P.O. Box 3300 Somerville, NJ 08876
908.526.1200 x 8355
908.429.0034 (fax)
placay@...

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rick Oliver-3

Re: SUBSTR into new variable

If all you have is values 1 through 5, separated by commas, all you need is five COMPUTE commands:

compute r1=char.index(race, "1")>0.
compute r2=char.index(race, "2")>0.
etc.

If you have an older version of SPSS, use index instead of char.index.

You can't use DO REPEAT in this case to shorten the syntax because of the quoted values.

You could use DO REPEAT in a Python program block, but it ends up being more lines of code than 5 COMPUTE commands.

From:	Albert-Jan Roskam <[hidden email]>
To:	[hidden email]
Date:	07/20/2010 01:06 PM
Subject:	Re: SUBSTR into new variable
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Hi,

In case you have an old Spss version, such as yours truly:

data list free / Race (a8) r1 r2 r3 r4 r5.
begin data.
"1,2,3" 1 1 1 0 0
"1,5" 1 0 0 0 1
"2,4" 0 1 0 1 0
end data.

begin program.
import spss, csv
cursor = spss.Cursor([0]) # 'race' is located in the first (zeroeth) column.
data = cursor.fetchall()
cursor.close()
positions = [case[0].strip().split(",") for case in data]
maxPos = int(max([max(pos) for pos in positions]))
computes = ["compute rr%s = index(race, '%s') > 0." % (i + 1, i + 1) for i in range(maxPos)]
spss.Submit(computes)
end program.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Mon, 7/19/10, Lacay, Phebe <[hidden email]> wrote:

From: Lacay, Phebe <[hidden email]>
Subject: Re: [SPSSX-L] SUBSTR into new variable
To: [hidden email]
Date: Monday, July 19, 2010, 9:00 PM

Thank you Sonia for your response. I was hoping for something
simpler...maybe using an IfAny( or using the comma as an INDEX..??
anyone else have any suggestions?
Thanks,
Phebe

-----Original Message-----
From: Lacay, Phebe
Sent: Thursday, July 15, 2010 10:42 AM
To: SPSSX-L@...
Subject: SUBSTR into new variable

Hi All,

I am looking to do some type of SUBSTR to get the following results for
the data in the variable 'Race'. Race has multiple items separated by a
comma. I would like to see each item in race separated into the
appropriate variables (r1 to r5).

Race --> r1 r2 r3 r4 r5
1,2,3 1 1 1 0 0
1,5, 1 0 0 0 1
2,4 0 1 0 1 0

Any help would be much appreciated.

Thank you,

Phebe Lacay
Assistant Director of Institutional Research Raritan Valley Community
College P.O. Box 3300 Somerville, NJ 08876
908.526.1200 x 8355
908.429.0034 (fax)
placay@...

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ruben Geert van den Berg

Re: SUBSTR into new variable

Hm, do repeat seems to work just fine here:

data list free/race(a10).

begin data

'1,2,3'

'1,5'

'2,4'

end data.

do repeat r=r_1 to r_5/v=1 to 5.

compute r=index(race,str(v,f1))>0.

end repeat.

execute.

Best,

Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: [hidden email]
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com

Date: Tue, 20 Jul 2010 13:50:11 -0500
From: [hidden email]
Subject: Re: SUBSTR into new variable
To: [hidden email]

If all you have is values 1 through 5, separated by commas, all you need is five COMPUTE commands:

compute r1=char.index(race, "1")>0.
compute r2=char.index(race, "2")>0.
etc.

If you have an older version of SPSS, use index instead of char.index.

You can't use DO REPEAT in this case to shorten the syntax because of the quoted values.

You could use DO REPEAT in a Python program block, but it ends up being more lines of code than 5 COMPUTE commands.

From:	Albert-Jan Roskam <[hidden email]>
To:	[hidden email]
Date:	07/20/2010 01:06 PM
Subject:	Re: SUBSTR into new variable
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

New Windows 7: Simplify what you do everyday. Find the right PC for you.

Rick Oliver-3

Re: SUBSTR into new variable

Cool. It didn't occur to me to use the string function in place of a quoted value.

From:	Ruben van den Berg <[hidden email]>
To:	[hidden email]
Date:	07/20/2010 03:33 PM
Subject:	Re: SUBSTR into new variable
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Hm, do repeat seems to work just fine here:

data list free/race(a10).
begin data
'1,2,3'
'1,5'
'2,4'
end data.

do repeat r=r_1 to r_5/v=1 to 5.
compute r=index(race,str(v,f1))>0.
end repeat.
execute.

Best,

Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: [hidden email]
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com

From:	Albert-Jan Roskam <[hidden email]>
To:	[hidden email]
Date:	07/20/2010 01:06 PM
Subject:	Re: SUBSTR into new variable
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

New Windows 7: Simplify what you do everyday. Find the right PC for you.