parse a string

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

parse a string

Brian Moore-7
parse a string

Hi all-

Im using SPSS v15 and trying to parse out a string that can have 1 to n words in it to put each word in order into its own variable.

Case Phrase

1         Two words

2         Five words or so

3         Solo

To

Case Phrase                        Parse1         Parse2       Parse3   Parse4  Parsen

1         Two words                  Two             words

2         Five words or so          Five          words              or            so

3         Solo                                Solo

I think I could achieve a solution that makes new phrase fields word by word working to the right [using index and substring] & pointing at whats left each time to make the next word, but making n more phrase variables is probably not a great solution in this case [already a > 1 million case file]

Thanks in advance for any other solutions.

Regards,

Brian

Reply | Threaded
Open this post in threaded view
|

Re: parse a string

Oliver, Richard
parse a string

The solution below requires that you specify a maximum number of possible words since you have to know how many variables to create, but you can pick an arbitrarily high number and then delete the extraneous variables. In this example, I used 50, which is of course way more than the example needs.

 

data list list (";") /phrase (a50).

begin data

one

one two

one two three

one two

one

one two three four

end data.

string #temp (a50).

compute #temp=phrase.

vector var(50,a50).

loop #i=1 to 50.

compute #index=index(#temp, " ").

compute var(#i)=substr(#temp,1, #index-1).

compute #temp=substr(#temp, #index+1).

end loop if index(#temp, " ")=0.

execute.

 

*This of course assumes that there is only one space between words and a space is the only delimiter you care about. Some fine-tuning would be required for other conditions. For example you could use the Replace function to remove multiple contiguous spaces prior to the loop.

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Brian Moore
Sent: Wednesday, October 14, 2009 10:29 AM
To: [hidden email]
Subject: parse a string

 

Hi all-

I’m using SPSS v15 and trying to parse out a string that can have 1 to n words in it to put each word in order into its own variable.

Case Phrase

1         Two words

2         Five words or so

3         Solo

To

Case Phrase                        Parse1         Parse2       Parse3   Parse4  Parse…n

1         Two words                  Two             words

2         Five words or so          Five          words              or            so

3         Solo                                Solo

I think I could achieve a solution that makes new “phrase” fields word by word working to the right [using index and substring] & pointing at what’s left each time to make the next word, but making ‘n’ more phrase variables is probably not a great solution in this case [already a > 1 million case file]

Thanks in advance for any other solutions.

Regards,

Brian

Reply | Threaded
Open this post in threaded view
|

Re: parse a string

Richard Ristow
At 01:06 PM 10/14/2009, Oliver, Richard wrote:

*[The code I posted] assumes that there is only one space between words and a space is the only delimiter you care about. Some fine-tuning would be required for other conditions. For example you could use the Replace function to remove multiple contiguous spaces prior to the loop.

The easiest way, I think: replace

compute #temp=substr(#temp, #index+1).

with

compute #temp=RTRIM(substr(#temp, #index+1)).

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD