SPSSX Discussion

Parsing a long string variable into component parts of different lengths

Classic

List

Threaded

5 messages Options

Staffan Lindberg

Parsing a long string variable into component parts of different lengths

Dear list!

I have a very long string variable consisting of up to 8 groups of string
portions separated by blanks (1 or sometimes 2). I want to put each of these
string values into separate string variables. The data can be something like
this:

814F 915E F18
645
F18 G564 754T 814.6 65 G42 F4567

I look in Ray's pantry and found something similar (but the separator was a
slash and the string portions of equal length) as follows:

*(Q) My string variable has a variable number of 4 character items separated
by '/'

* for instance C206/E210/F210 contains C206 E210 and F210.

* How can I asssign each of these elements to a different variable?

(A) by Raynald Levesque 2002/04/30.

DATA LIST LIST /a(A70).

BEGIN DATA

C206/E210/F210

E206/P206/F210/G210/X210

END DATA.

LIST.

VECTOR B(7A4).

DO REPEAT v=b1 TO b5 /b=1 6 11 16 21 /e=4 9 14 19 24.

COMPUTE v=SUBSTR(a,b,e).

END REPEAT PRINT.

EXECUTE.

However, I cannot figure out how to modify this in order to suit my needs.
Can anyone help?
best
Staffan Lindberg
National Institute of Public Health
Sweden

Maguin, Eugene

Re: Parsing a long string variable into component parts of different lengths

Staffan,

It seems to me that a possibly easy way to do this is to read the data as a
free format file with a single variable with space as the delimiter. In
making this suggestion, I'd like to acknowledge that I rarely need to read
from what I assume to be an ascii (text) file so I don't know if double or
triple spaces would pose a problem; although, you could easily strip out the
double or triple spaces with a text or word processing program.

I think the problem with using Ray's syntax for your task is that the
incoming dataset is more highly structured than yours is. His syntax makes
the assumption, for instance, that every record has a four character
variable in columns 1-4. That doesn't look to be true for you.

Gene Maguin

meljr

Re: Parsing a long string variable into component parts of different lengths

In reply to this post by Staffan Lindberg

I would just put the data into MS Excel and use the Data/Text to Columns to parce it out there.
Then read or copy it into SPSS.
If you do not have too much data this is a easy and useful way to parce the data especially since you have a very nice common separated file.
Perhaps not a fancy solution, but I do it all the time.
Good luck
meljr

Staffan Lindberg wrote

Dear list!

I have a very long string variable consisting of up to 8 groups of string
portions separated by blanks (1 or sometimes 2). I want to put each of these
string values into separate string variables. The data can be something like
this:

814F 915E F18
645
F18 G564 754T 814.6 65 G42 F4567

I look in Ray's pantry and found something similar (but the separator was a
slash and the string portions of equal length) as follows:

*(Q) My string variable has a variable number of 4 character items separated
by '/'

* for instance C206/E210/F210 contains C206 E210 and F210.

* How can I asssign each of these elements to a different variable?

(A) by Raynald Levesque 2002/04/30.

DATA LIST LIST /a(A70).

BEGIN DATA

C206/E210/F210

E206/P206/F210/G210/X210

END DATA.

LIST.

VECTOR B(7A4).

DO REPEAT v=b1 TO b5 /b=1 6 11 16 21 /e=4 9 14 19 24.

COMPUTE v=SUBSTR(a,b,e).

END REPEAT PRINT.

EXECUTE.

However, I cannot figure out how to modify this in order to suit my needs.
Can anyone help?
best
Staffan Lindberg
National Institute of Public Health
Sweden

Richard Ristow

Re: Parsing a long string variable into component parts

In reply to this post by Staffan Lindberg

At 10:03 AM 3/26/2007, Staffan Lindberg wrote:

>I have a very long string variable consisting of up to 8 groups of
>string portions separated by blanks (1 or sometimes 2). I want to put
>each of these string values into separate string variables.

Pretty straightforward. Doesn't even need Python. This ran on the first
try, though it took a couple of adjustments to make the lengths of the
printed lines come out right. ("Three things you should be wary of: A
new kid in his prime...") Does this do it for you? SPSS 15 draft
output. <WRR-not saved separately.>

STRING_IN

814F 915E F18
645
F18 G564 754T 814.6 65 G42 F4567

Number of cases read: 3 Number of cases listed: 3

* "a very long string variable consisting of up to 8 groups of .
* string portions separated by blanks (1 or sometimes 2)" .

STRING Group1 TO Group8 (A6).
VECTOR Groups = Group1 TO Group8.

STRING #Parsing (A70).
COMPUTE #Parsing = LTRIM(STRING_IN).
LOOP #GrpNum = 1 TO 8 IF #Parsing NE ' '.
. COMPUTE #BlnkSpc=INDEX(#Parsing,' ').
. COMPUTE Groups(#GrpNum) = SUBSTR(#Parsing,1,#BlnkSpc).
. COMPUTE #Parsing = LTRIM(SUBSTR(#Parsing,#BlnkSpc)).
END LOOP.
LIST.

List
|-----------------------------|---------------------------|
|Output Created |26-MAR-2007 12:51:32 |
|-----------------------------|---------------------------|
The variables are listed in the following order:

LINE 1: STRING_IN
LINE 2: Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8

STRING_IN: 814F 915E F18
Group1: 814F 915E F18

STRING_IN: 645
Group1: 645

STRING_IN: F18 G564 754T 814.6 65 G42 F4567
Group1: F18 G564 754T 814.6 65 G42 F4567

Number of cases read: 3 Number of cases listed: 3

===================
APPENDIX: Test data
===================
DATA LIST FIXED
/STRING_IN(A70).
BEGIN DATA
814F 915E F18
645
F18 G564 754T 814.6 65 G42 F4567
END DATA.

Peck, Jon

Re: Parsing a long string variable into component parts

But here is the Python solution anyway, using SPSS 15.

begin program.
import spss, spssdata

curs=spssdata.Spssdata(indexes='longstr', accessType='w', maxaddbuffer=800)
for v in range(8):
curs.append(spssdata.vdef('v'+str(v), vtype=100))
curs.commitdict()
for case in curs:
curs.casevalues(case[0].split())
curs.CClose()
end program.

The program gets a cursor and iterates over the cases. It creates eight new variables (all strings of length 100) named v0 to v7.

The key part is
for case in curs:
curs.casevalues(case[0].split())

That loops over the cases and applies the split function to the variable retrieved. The values created are returned as the values of the new variables for each case.

-Jon Peck
SPSS

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Monday, March 26, 2007 11:58 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Parsing a long string variable into component parts

At 10:03 AM 3/26/2007, Staffan Lindberg wrote:

>I have a very long string variable consisting of up to 8 groups of
>string portions separated by blanks (1 or sometimes 2). I want to put
>each of these string values into separate string variables.

Pretty straightforward. Doesn't even need Python. This ran on the first
try, though it took a couple of adjustments to make the lengths of the
printed lines come out right. ("Three things you should be wary of: A
new kid in his prime...") Does this do it for you? SPSS 15 draft
output. <WRR-not saved separately.>

STRING_IN

814F 915E F18
645
F18 G564 754T 814.6 65 G42 F4567

Number of cases read: 3 Number of cases listed: 3

* "a very long string variable consisting of up to 8 groups of .
* string portions separated by blanks (1 or sometimes 2)" .

STRING Group1 TO Group8 (A6).
VECTOR Groups = Group1 TO Group8.

STRING #Parsing (A70).
COMPUTE #Parsing = LTRIM(STRING_IN).
LOOP #GrpNum = 1 TO 8 IF #Parsing NE ' '.
. COMPUTE #BlnkSpc=INDEX(#Parsing,' ').
. COMPUTE Groups(#GrpNum) = SUBSTR(#Parsing,1,#BlnkSpc).
. COMPUTE #Parsing = LTRIM(SUBSTR(#Parsing,#BlnkSpc)).
END LOOP.
LIST.

List
|-----------------------------|---------------------------|
|Output Created |26-MAR-2007 12:51:32 |
|-----------------------------|---------------------------|
The variables are listed in the following order:

LINE 1: STRING_IN
LINE 2: Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8

STRING_IN: 814F 915E F18
Group1: 814F 915E F18

STRING_IN: 645
Group1: 645

STRING_IN: F18 G564 754T 814.6 65 G42 F4567
Group1: F18 G564 754T 814.6 65 G42 F4567

Number of cases read: 3 Number of cases listed: 3

===================
APPENDIX: Test data
===================
DATA LIST FIXED
/STRING_IN(A70).
BEGIN DATA
814F 915E F18
645
F18 G564 754T 814.6 65 G42 F4567
END DATA.