|
Hi All,
I am looking to do some type of SUBSTR to get the following results for the data in the variable 'Race'. Race has multiple items separated by a comma. I would like to see each item in race separated into the appropriate variables (r1 to r5). Race --> r1 r2 r3 r4 r5 1,2,3 1 1 1 0 0 1,5, 1 0 0 0 1 2,4 0 1 0 1 0 Any help would be much appreciated. Thank you, Phebe Lacay Assistant Director of Institutional Research Raritan Valley Community College P.O. Box 3300 Somerville, NJ 08876 908.526.1200 x 8355 908.429.0034 (fax) [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
compute r1 = char.index(race,"1") GT 0. compute r2 = char.index(race,"2") GT 0. compute r3 = char.index(race,"3") GT 0. compute r4 = char.index(race,"4") GT 0. compute r5 = char.index(race,"5") GT 0. format r1 to r5 (f1.0). exe. If there is a large number of possible race numbers, you could stick this into a macro and loop through the values. I expect someone will also propose a Python solution.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Well, Bruce, since you asked, here is a general and robust programmability solution. It has the advantage that it will work for any number of values, and the further advantage that it is easy to test. With a long batch of separate compute statements, it would be easy for one to be messed up. --- First, some data: data list fixed/s(a10). begin data 1,2,3 1,3,5 1,2,3,4,5 end data. list. exec. --- Now define a tiny Python function named splitit that evaluates a variable and returns a list of true/false values. The logic is the same for every value, so it is easy to test: if one is wrong, they are all wrong. The function will fail loudly if the string after comma-based splitting has any nonnumeric values begin program. def splitit(x, limit): items = [int(i) for i in x.split(",")] # break out the comma-separated fields into integers return [i in items for i in range(1, limit+1)] # test for presence of each integer value end program. --- Now run the extension command that calls the formula for each case and creates new numeric variables, d1, d2, d3, d4, d5. The new variables are numeric. It takes the input variable, s, and the upper end of the range of values, 5. spssinc trans result=d1 to d5 /formula "splitit(s,5)". Unlike the typical use of TO in Statistics, variables d1 to d5 can already exist or be created new by this command, and, if they exist, they do not need to be in sequential order in the dataset. Regards, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Lacay, Phebe wrote: > > Hi All, > > I am looking to do some type of SUBSTR to get the following results for > the data in the variable 'Race'. Race has multiple items separated by a > comma. I would like to see each item in race separated into the > appropriate variables (r1 to r5). > > Race --> r1 r2 r3 r4 r5 > 1,2,3 1 1 1 0 0 > 1,5, 1 0 0 0 1 > 2,4 0 1 0 1 0 > > Any help would be much appreciated. > > Thank you, > > compute r1 = char.index(race,"1") GT 0. compute r2 = char.index(race,"2") GT 0. compute r3 = char.index(race,"3") GT 0. compute r4 = char.index(race,"4") GT 0. compute r5 = char.index(race,"5") GT 0. format r1 to r5 (f1.0). exe. If there is a large number of possible race numbers, you could stick this into a macro and loop through the values. I expect someone will also propose a Python solution. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SUBSTR-into-new-variable-tp1223571p1223579.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
Jon's solution highlights a problem with my list of COMPUTE statements: It would run into trouble with two-digit (or more) numbers, treating the "1" in "10" as if it was simply a "1", for example. So here is a more general BP (Before Python) solution. One slight drawback (compared to Jon's method) is that it does require one to know the maximum race code at the outset. data list list / race (a10). begin data "1,2,3" "1,5,11" "2,4,10" "3,8,10,11" end data. define !splitit ( max !tokens(1)) string race2 (a10). compute race2 = race. /* A copy of RACE . numeric r1 to !concat("r",!max) (f1.0). !do !i = !max !to 1 !by -1 !let !r = !concat("r",!i) compute !r = char.index(race2,!quote(!i)) GT 0. * Once a value has been processed, remove it from RACE2. * This prevents the "1" in "10" from being counted as a "1", for example. compute race2 = replace(race2,!quote(!i),""). format !r (f1.0). !doend exe. delete var race2. /* Delete the copy of RACE . !enddefine. !splitit max=11. list. OUTPUT: race r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 1,2,3 1 1 1 0 0 0 0 0 0 0 0 1,5,11 1 0 0 0 1 0 0 0 0 0 1 2,4,10 0 1 0 1 0 0 0 0 0 1 0 3,8,10,11 0 0 1 0 0 0 0 1 0 1 1
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Lacay, Phebe
Thank you Sonia for your response. I was hoping for something
simpler...maybe using an IfAny( or using the comma as an INDEX..?? anyone else have any suggestions? Thanks, Phebe -----Original Message----- From: Lacay, Phebe Sent: Thursday, July 15, 2010 10:42 AM To: [hidden email] Subject: SUBSTR into new variable Hi All, I am looking to do some type of SUBSTR to get the following results for the data in the variable 'Race'. Race has multiple items separated by a comma. I would like to see each item in race separated into the appropriate variables (r1 to r5). Race --> r1 r2 r3 r4 r5 1,2,3 1 1 1 0 0 1,5, 1 0 0 0 1 2,4 0 1 0 1 0 Any help would be much appreciated. Thank you, Phebe Lacay Assistant Director of Institutional Research Raritan Valley Community College P.O. Box 3300 Somerville, NJ 08876 908.526.1200 x 8355 908.429.0034 (fax) [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Lacay, Phebe
|
|
If all you have is values 1 through 5, separated by commas, all you need is five COMPUTE commands: compute r1=char.index(race, "1")>0. compute r2=char.index(race, "2")>0. etc. If you have an older version of SPSS, use index instead of char.index. You can't use DO REPEAT in this case to shorten the syntax because of the quoted values. You could use DO REPEAT in a Python program block, but it ends up being more lines of code than 5 COMPUTE commands.
|
|
Hm, do repeat seems to work just fine here:
data list free/race(a10). begin data '1,2,3' '1,5' '2,4' end data. do repeat r=r_1 to r_5/v=1 to 5. compute r=index(race,str(v,f1))>0. end repeat. execute. Best, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Tue, 20 Jul 2010 13:50:11 -0500 From: [hidden email] Subject: Re: SUBSTR into new variable To: [hidden email] If all you have is values 1 through 5, separated by commas, all you need is five COMPUTE commands: compute r1=char.index(race, "1")>0. compute r2=char.index(race, "2")>0. etc. If you have an older version of SPSS, use index instead of char.index. You can't use DO REPEAT in this case to shorten the syntax because of the quoted values. You could use DO REPEAT in a Python program block, but it ends up being more lines of code than 5 COMPUTE commands.
New Windows 7: Simplify what you do everyday. Find the right PC for you. |
|
Cool. It didn't occur to me to use the string function in place of a quoted value.
Hm, do repeat seems to work just fine here: data list free/race(a10). begin data '1,2,3' '1,5' '2,4' end data. do repeat r=r_1 to r_5/v=1 to 5. compute r=index(race,str(v,f1))>0. end repeat. execute. Best, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Tue, 20 Jul 2010 13:50:11 -0500 From: [hidden email] Subject: Re: SUBSTR into new variable To: [hidden email] If all you have is values 1 through 5, separated by commas, all you need is five COMPUTE commands: compute r1=char.index(race, "1")>0. compute r2=char.index(race, "2")>0. etc. If you have an older version of SPSS, use index instead of char.index. You can't use DO REPEAT in this case to shorten the syntax because of the quoted values. You could use DO REPEAT in a Python program block, but it ends up being more lines of code than 5 COMPUTE commands.
New Windows 7: Simplify what you do everyday. Find the right PC for you. |
| Free forum by Nabble | Edit this page |
