|
I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated.
What I have got is a variable like the following: p12 p12p23 p13 p23p34p35p56 Sequences of three characters, and no determined limit for the number of sequences. The goal is to pick out the numbers in each sequence and put them in new variables: var0 var1 var2 var3 p12 -> 12 p12p23 -> 12 23 p13 -> 13 p23p34p35 -> 23 34 35 There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer. My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed? Robert ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Robert Lundqvist
|
|
This command takes some setup, but I claim
that it the simplest solution to this problem.
Assuming that the strings are in variable s, this command allows for up to ten extracted sequences of digits stored as numeric variables n1 to n10. Empties are set to system missing. spssinc trans result = n1 to n10 /formula "re.findall(r'\d+', s)". This matches each sequence of one or more digits and each matched sequence becomes the value of a variable. To use this, you need to install the Python plugin or Essentials and get the SPSSINC TRANS extension command from SPSS Developer Central, www.spss.com/devcentral. HTH, Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: Robert Lundqvist <[hidden email]> To: [hidden email] Date: 10/20/2010 05:46 AM Subject: [SPSSX-L] Splitting character strings? Sent by: "SPSSX(r) Discussion" <[hidden email]> I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated. What I have got is a variable like the following: p12 p12p23 p13 p23p34p35p56 Sequences of three characters, and no determined limit for the number of sequences. The goal is to pick out the numbers in each sequence and put them in new variables: var0 var1 var2 var3 p12 -> 12 p12p23 -> 12 23 p13 -> 13 p23p34p35 -> 23 34 35 There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer. My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed? Robert ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Robert L
The data in your email was pasted into input.txt.
If you already have an existing string that has the sequences adapt the lower portion of the syntax below. Open a clean instance of SPSS. Copy the whole set of syntax below into the syntax window. Run it. *make up a variable with up to 10 sequences for simulating situation with existing big string.. GET DATA /TYPE=TXT /FILE="C:\Users\Art\Desktop\input.txt" /FIXCASE=1 /ARRANGEMENT=FIXED /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= /1 bigstring 0-29 A30. execute. dataset name seq. LIST. *adapt this portion to a reasonable max of sequence variables. string sequence1 to sequence10(a3). compute #start = 1. do repeat target = sequence1 to sequence10. compute target = substr(bigstring,#start,3). compute #start= #start + 3. end repeat. list. If you are reading in text data as FIXED and there are blanks for non-existing sequences just read those columns as a set of variable. something elaborated on this. DATA LIST fixed FILE="C:\Users\Art\Desktop\input.txt" /sequence1 to sequence10(10a3). list. Art Kendall Social Research Consultants. On 10/20/2010 7:41 AM, Robert Lundqvist wrote: > I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated. > > What I have got is a variable like the following: > > p12 > p12p23 > p13 > p23p34p35p56 > > Sequences of three characters, and no determined limit for the number of sequences. > > The goal is to pick out the numbers in each sequence and put them in new variables: > > var0 var1 var2 var3 > p12 -> 12 > p12p23 -> 12 23 > p13 -> 13 > p23p34p35 -> 23 34 35 > > There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer. > > My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed? > > Robert > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Robert L
Missed teh part about retaining just the numbers.
This is the part to adapt. *adapt this portion to a reasonable max of sequence variables. numeric sequence1 to sequence10(f3). compute #start = 2. do repeat target = sequence1 to sequence10. compute target = number(substr(bigstring,#start,3),f2). compute #start= #start + 3. end repeat. list. Art On 10/20/2010 7:41 AM, Robert Lundqvist wrote: > I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated. > > What I have got is a variable like the following: > > p12 > p12p23 > p13 > p23p34p35p56 > > Sequences of three characters, and no determined limit for the number of sequences. > > The goal is to pick out the numbers in each sequence and put them in new variables: > > var0 var1 var2 var3 > p12 -> 12 > p12p23 -> 12 23 > p13 -> 13 > p23p34p35 -> 23 34 35 > > There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer. > > My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed? > > Robert > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
| Free forum by Nabble | Edit this page |
