Dear list!
I have a very long string variable consisting of up to 8 groups of string portions separated by blanks (1 or sometimes 2). I want to put each of these string values into separate string variables. The data can be something like this: 814F 915E F18 645 F18 G564 754T 814.6 65 G42 F4567 I look in Ray's pantry and found something similar (but the separator was a slash and the string portions of equal length) as follows: *(Q) My string variable has a variable number of 4 character items separated by '/' * for instance C206/E210/F210 contains C206 E210 and F210. * How can I asssign each of these elements to a different variable? (A) by Raynald Levesque 2002/04/30. DATA LIST LIST /a(A70). BEGIN DATA C206/E210/F210 E206/P206/F210/G210/X210 END DATA. LIST. VECTOR B(7A4). DO REPEAT v=b1 TO b5 /b=1 6 11 16 21 /e=4 9 14 19 24. COMPUTE v=SUBSTR(a,b,e). END REPEAT PRINT. EXECUTE. However, I cannot figure out how to modify this in order to suit my needs. Can anyone help? best Staffan Lindberg National Institute of Public Health Sweden |
Staffan,
It seems to me that a possibly easy way to do this is to read the data as a free format file with a single variable with space as the delimiter. In making this suggestion, I'd like to acknowledge that I rarely need to read from what I assume to be an ascii (text) file so I don't know if double or triple spaces would pose a problem; although, you could easily strip out the double or triple spaces with a text or word processing program. I think the problem with using Ray's syntax for your task is that the incoming dataset is more highly structured than yours is. His syntax makes the assumption, for instance, that every record has a four character variable in columns 1-4. That doesn't look to be true for you. Gene Maguin |
In reply to this post by Staffan Lindberg
I would just put the data into MS Excel and use the Data/Text to Columns to parce it out there.
Then read or copy it into SPSS. If you do not have too much data this is a easy and useful way to parce the data especially since you have a very nice common separated file. Perhaps not a fancy solution, but I do it all the time. Good luck meljr
|
In reply to this post by Staffan Lindberg
At 10:03 AM 3/26/2007, Staffan Lindberg wrote:
>I have a very long string variable consisting of up to 8 groups of >string portions separated by blanks (1 or sometimes 2). I want to put >each of these string values into separate string variables. Pretty straightforward. Doesn't even need Python. This ran on the first try, though it took a couple of adjustments to make the lengths of the printed lines come out right. ("Three things you should be wary of: A new kid in his prime...") Does this do it for you? SPSS 15 draft output. <WRR-not saved separately.> STRING_IN 814F 915E F18 645 F18 G564 754T 814.6 65 G42 F4567 Number of cases read: 3 Number of cases listed: 3 * "a very long string variable consisting of up to 8 groups of . * string portions separated by blanks (1 or sometimes 2)" . STRING Group1 TO Group8 (A6). VECTOR Groups = Group1 TO Group8. STRING #Parsing (A70). COMPUTE #Parsing = LTRIM(STRING_IN). LOOP #GrpNum = 1 TO 8 IF #Parsing NE ' '. . COMPUTE #BlnkSpc=INDEX(#Parsing,' '). . COMPUTE Groups(#GrpNum) = SUBSTR(#Parsing,1,#BlnkSpc). . COMPUTE #Parsing = LTRIM(SUBSTR(#Parsing,#BlnkSpc)). END LOOP. LIST. List |-----------------------------|---------------------------| |Output Created |26-MAR-2007 12:51:32 | |-----------------------------|---------------------------| The variables are listed in the following order: LINE 1: STRING_IN LINE 2: Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8 STRING_IN: 814F 915E F18 Group1: 814F 915E F18 STRING_IN: 645 Group1: 645 STRING_IN: F18 G564 754T 814.6 65 G42 F4567 Group1: F18 G564 754T 814.6 65 G42 F4567 Number of cases read: 3 Number of cases listed: 3 =================== APPENDIX: Test data =================== DATA LIST FIXED /STRING_IN(A70). BEGIN DATA 814F 915E F18 645 F18 G564 754T 814.6 65 G42 F4567 END DATA. |
But here is the Python solution anyway, using SPSS 15.
begin program. import spss, spssdata curs=spssdata.Spssdata(indexes='longstr', accessType='w', maxaddbuffer=800) for v in range(8): curs.append(spssdata.vdef('v'+str(v), vtype=100)) curs.commitdict() for case in curs: curs.casevalues(case[0].split()) curs.CClose() end program. The program gets a cursor and iterates over the cases. It creates eight new variables (all strings of length 100) named v0 to v7. The key part is for case in curs: curs.casevalues(case[0].split()) That loops over the cases and applies the split function to the variable retrieved. The values created are returned as the values of the new variables for each case. -Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Monday, March 26, 2007 11:58 AM To: [hidden email] Subject: Re: [SPSSX-L] Parsing a long string variable into component parts At 10:03 AM 3/26/2007, Staffan Lindberg wrote: >I have a very long string variable consisting of up to 8 groups of >string portions separated by blanks (1 or sometimes 2). I want to put >each of these string values into separate string variables. Pretty straightforward. Doesn't even need Python. This ran on the first try, though it took a couple of adjustments to make the lengths of the printed lines come out right. ("Three things you should be wary of: A new kid in his prime...") Does this do it for you? SPSS 15 draft output. <WRR-not saved separately.> STRING_IN 814F 915E F18 645 F18 G564 754T 814.6 65 G42 F4567 Number of cases read: 3 Number of cases listed: 3 * "a very long string variable consisting of up to 8 groups of . * string portions separated by blanks (1 or sometimes 2)" . STRING Group1 TO Group8 (A6). VECTOR Groups = Group1 TO Group8. STRING #Parsing (A70). COMPUTE #Parsing = LTRIM(STRING_IN). LOOP #GrpNum = 1 TO 8 IF #Parsing NE ' '. . COMPUTE #BlnkSpc=INDEX(#Parsing,' '). . COMPUTE Groups(#GrpNum) = SUBSTR(#Parsing,1,#BlnkSpc). . COMPUTE #Parsing = LTRIM(SUBSTR(#Parsing,#BlnkSpc)). END LOOP. LIST. List |-----------------------------|---------------------------| |Output Created |26-MAR-2007 12:51:32 | |-----------------------------|---------------------------| The variables are listed in the following order: LINE 1: STRING_IN LINE 2: Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8 STRING_IN: 814F 915E F18 Group1: 814F 915E F18 STRING_IN: 645 Group1: 645 STRING_IN: F18 G564 754T 814.6 65 G42 F4567 Group1: F18 G564 754T 814.6 65 G42 F4567 Number of cases read: 3 Number of cases listed: 3 =================== APPENDIX: Test data =================== DATA LIST FIXED /STRING_IN(A70). BEGIN DATA 814F 915E F18 645 F18 G564 754T 814.6 65 G42 F4567 END DATA. |
Free forum by Nabble | Edit this page |