Splitting character strings?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Splitting character strings?

Robert L
I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated.

What I have got is a variable like the following:

p12
p12p23
p13
p23p34p35p56

Sequences of three characters, and no determined limit for the number of sequences.

The goal is to pick out the numbers in each sequence and put them in new variables:

var0            var1    var2    var3
p12     ->      12
p12p23  ->      12      23
p13     ->      13
p23p34p35       ->      23      34      35

There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer.

My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed?

Robert

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Robert Lundqvist
Reply | Threaded
Open this post in threaded view
|

Re: Splitting character strings?

Jon K Peck
This command takes some setup, but I claim that it the simplest solution to this problem.
Assuming that the strings are in variable s, this command allows for up to ten extracted sequences
of digits stored as numeric variables n1 to n10.  Empties are set to system missing.

spssinc trans result = n1 to n10
/formula  "re.findall(r'\d+', s)".

This matches each sequence of one or more digits and each matched sequence becomes the value of a variable.

To use this, you need to install the Python plugin or Essentials and get the SPSSINC TRANS extension command from SPSS Developer Central, www.spss.com/devcentral.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Robert Lundqvist <[hidden email]>
To:        [hidden email]
Date:        10/20/2010 05:46 AM
Subject:        [SPSSX-L] Splitting character strings?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated.

What I have got is a variable like the following:

p12
p12p23
p13
p23p34p35p56

Sequences of three characters, and no determined limit for the number of sequences.

The goal is to pick out the numbers in each sequence and put them in new variables:

var0            var1    var2    var3
p12     ->      12
p12p23  ->      12      23
p13     ->      13
p23p34p35       ->      23      34      35

There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer.

My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed?

Robert

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Splitting character strings?

Art Kendall
In reply to this post by Robert L
The data in your email was pasted into input.txt.
If you already have an existing string that has the sequences adapt the
lower portion of the syntax below.
Open a clean instance of SPSS.  Copy the whole set of syntax below into
the syntax window. Run it.

*make up a variable with up to 10 sequences for simulating situation
with existing big string..
GET DATA
   /TYPE=TXT
   /FILE="C:\Users\Art\Desktop\input.txt"
   /FIXCASE=1
   /ARRANGEMENT=FIXED
   /FIRSTCASE=1
   /IMPORTCASE=ALL
   /VARIABLES=
   /1 bigstring 0-29 A30.
execute.
dataset name seq.
LIST.

*adapt this portion to a reasonable max of sequence variables.
string  sequence1 to sequence10(a3).
compute #start = 1.
do repeat target = sequence1 to sequence10.
compute target = substr(bigstring,#start,3).
compute #start= #start + 3.
end repeat.
list.

If you are reading in text data as FIXED and there are blanks for
non-existing sequences
just read those columns as a set of variable. something elaborated on this.
DATA LIST  fixed FILE="C:\Users\Art\Desktop\input.txt" /sequence1 to
sequence10(10a3).
list.

Art Kendall
Social Research Consultants.



On 10/20/2010 7:41 AM, Robert Lundqvist wrote:

> I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated.
>
> What I have got is a variable like the following:
>
> p12
> p12p23
> p13
> p23p34p35p56
>
> Sequences of three characters, and no determined limit for the number of sequences.
>
> The goal is to pick out the numbers in each sequence and put them in new variables:
>
> var0            var1    var2    var3
> p12     ->       12
> p12p23  ->       12      23
> p13     ->       13
> p23p34p35       ->       23      34      35
>
> There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer.
>
> My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed?
>
> Robert
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Splitting character strings?

Art Kendall
In reply to this post by Robert L
Missed teh part about retaining just the numbers.

This is the part to adapt.
*adapt this portion to a reasonable max of sequence variables.
numeric  sequence1 to sequence10(f3).
compute #start = 2.
do repeat target = sequence1 to sequence10.
compute target = number(substr(bigstring,#start,3),f2).
compute #start= #start + 3.
end repeat.
list.


Art
On 10/20/2010 7:41 AM, Robert Lundqvist wrote:

> I have got this problem which must be fairly easy to solve. However, my SPSS skills are limited, so input would be most appreciated.
>
> What I have got is a variable like the following:
>
> p12
> p12p23
> p13
> p23p34p35p56
>
> Sequences of three characters, and no determined limit for the number of sequences.
>
> The goal is to pick out the numbers in each sequence and put them in new variables:
>
> var0            var1    var2    var3
> p12     ->       12
> p12p23  ->       12      23
> p13     ->       13
> p23p34p35       ->       23      34      35
>
> There is no limit in advance for the length of the original strings in var0, they can be as short as here (2 characters) or much longer.
>
> My knowledge of loops and dealing with strings in SPSS is apparently too limited. Any suggestions as to how I could proceed?
>
> Robert
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants