Syntax to reverse concatenate in SPSS?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Syntax to reverse concatenate in SPSS?

Theodora B. Consolacion
I know this is possible in Excel, but I was wondering if SPSS had a similar
feature.

The problem:
Multiple data entries (String) in a single field separated by semi-colons.

Is there syntax code to separate out the multiple data entries in a single
field into separate fields for each datum?

Thanks in advance for any help.
Teddy

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax to reverse concatenate in SPSS?

Oliver, Richard
Well, my first recommendation would be to specify a semi-colon as a value delimiter when reading the data, but that only works for delimited text files (using either GET DATA or DATA LIST).

But assuming your stuck with a string value that contains multiple values delimited by a semi-colon:

*sample data.
data list free /oldvar (a22).
begin data
ab;abc;abcd;abcde;f
end data.
*real code starts here.
string #temp(a22).
compute #temp=oldvar.
vector newvar(5, a5).
loop if index(#temp, ";")>0.
compute #index=index(#temp,";").
compute #i=#i+1.
compute newvar(#i)=substr(#temp,1, #index-1).
compute #temp=substr(#temp, #index+1).
end loop.
compute newvar(#i+1)=#temp.
execute.

The one catch here is that you either need to know the maximum number of values and the maximum width of a single value (for the Vector command that creates the new variables) or you need to overestimate to make sure you get all of them, in which case you may end up with a bunch of superfluous empty new variables. For example,  "vector newvar(100,a100)" would also work in my example, but the defined string widths would be much wider than necessary and you'd have 95 unnecessary extra (and empty) variables.

There may be more elegant solutions.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Theodora B. Consolacion
Sent: Thursday, November 29, 2007 1:09 PM
To: [hidden email]
Subject: Syntax to reverse concatenate in SPSS?

I know this is possible in Excel, but I was wondering if SPSS had a similar
feature.

The problem:
Multiple data entries (String) in a single field separated by semi-colons.

Is there syntax code to separate out the multiple data entries in a single
field into separate fields for each datum?

Thanks in advance for any help.
Teddy

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax to reverse concatenate in SPSS?

Oliver, Richard
In reply to this post by Theodora B. Consolacion
Oops. My bad. I forgot that scratch variables don't automatically get reinitialized...

*sample data.
data list list /oldvar (a22).
begin data
ab;abc;abcd
abcde;f
end data.
*real code starts here.
string #temp(a22).
compute #temp=oldvar.
vector newvar(5, a5).
compute #i=0. /*reinitialize the counter for each case.
loop if index(#temp, ";")>0.
compute #index=index(#temp,";").
compute #i=#i+1.
compute newvar(#i)=substr(#temp,1, #index-1).
compute #temp=substr(#temp, #index+1).
end loop.
compute newvar(#i+1)=#temp.
execute.

-----Original Message-----
From: Oliver, Richard
Sent: Thursday, November 29, 2007 3:33 PM
To: 'Theodora B. Consolacion'; [hidden email]
Subject: RE: Syntax to reverse concatenate in SPSS?

Well, my first recommendation would be to specify a semi-colon as a value delimiter when reading the data, but that only works for delimited text files (using either GET DATA or DATA LIST).

But assuming your stuck with a string value that contains multiple values delimited by a semi-colon:

*sample data.
data list free /oldvar (a22).
begin data
ab;abc;abcd;abcde;f
end data.
*real code starts here.
string #temp(a22).
compute #temp=oldvar.
vector newvar(5, a5).
loop if index(#temp, ";")>0.
compute #index=index(#temp,";").
compute #i=#i+1.
compute newvar(#i)=substr(#temp,1, #index-1).
compute #temp=substr(#temp, #index+1).
end loop.
compute newvar(#i+1)=#temp.
execute.

The one catch here is that you either need to know the maximum number of values and the maximum width of a single value (for the Vector command that creates the new variables) or you need to overestimate to make sure you get all of them, in which case you may end up with a bunch of superfluous empty new variables. For example,  "vector newvar(100,a100)" would also work in my example, but the defined string widths would be much wider than necessary and you'd have 95 unnecessary extra (and empty) variables.

There may be more elegant solutions.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Theodora B. Consolacion
Sent: Thursday, November 29, 2007 1:09 PM
To: [hidden email]
Subject: Syntax to reverse concatenate in SPSS?

I know this is possible in Excel, but I was wondering if SPSS had a similar
feature.

The problem:
Multiple data entries (String) in a single field separated by semi-colons.

Is there syntax code to separate out the multiple data entries in a single
field into separate fields for each datum?

Thanks in advance for any help.
Teddy

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax to reverse concatenate in SPSS?

Richard Ristow
At 08:59 PM 11/29/2007, Oliver, Richard wrote:

>Oops. My bad. I forgot that scratch variables don't automatically
>get reinitialized...

Sure. But, why not make #i the loop counter instead, and save
initializing and incrementing it? The loop has a determinate upper
limit: you're allowing for parsing out up to 5 tokens. (Your code
would blow on an indexing error, if there were more than 5 in the
string.) The LTRIM strips leading blanks from tokens; omit, if that's
not desired.

|-----------------------------|---------------------------|
|Output Created               |29-NOV-2007 22:20:49       |
|-----------------------------|---------------------------|
oldvar

ab;abc;abcd
abcde;f
abcde;hijk;lmn;op;qr;s
zzzz;yyy;xwvu;aeiou;

Number of cases read:  4    Number of cases listed:  4


*real code starts here.
string  #temp(a22).
compute #temp=oldvar.
vector  newvar(5, a5).
loop #i = 1 TO 5.
.  compute #index=index(#temp,";").
.  DO IF   #index GT 0.
.    compute newvar(#i)=LTRIM(substr(#temp,1, #index-1)).
.    compute #temp=substr(#temp, #index+1).
.  ELSE.
.    compute newvar(#i)=LTRIM(#temp).
.    compute #temp=''.
.  END IF.
end loop IF #index EQ 0.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |29-NOV-2007 22:20:50       |
|-----------------------------|---------------------------|
oldvar                 newvar1 newvar2 newvar3 newvar4 newvar5

ab;abc;abcd            ab      abc     abcd
abcde;f                abcde   f
abcde;hijk;lmn;op;qr;s abcde   hijk    lmn     op      qr
zzzz;yyy;xwvu;aeiou;   zzzz    yyy     xwvu    aeiou

Number of cases read:  4    Number of cases listed:  4
===================
APPENDIX: Test data
===================
*sample data.
data list list /oldvar (a22).
begin data
ab;abc;abcd
abcde;f
abcde;hijk;lmn;op;qr;s
zzzz;yyy;xwvu;aeiou;
end data.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax to reverse concatenate in SPSS?

Richard Ristow
In reply to this post by Oliver, Richard
At 04:32 PM 11/29/2007, Oliver, Richard wrote:

>Assuming you're stuck with a string value that contains multiple
>values delimited by a semi-colon:

[...parsing logic omitted...]

>The one catch here is that you either need to know the maximum
>number of values and the maximum width of a single value (for the
>Vector command that creates the new variables) or you need to
>overestimate to make sure you get all of them, in which case you may
>end up with a bunch of superfluous empty new variables. For
>example,  "vector newvar(100,a100)" would also work in my example,
>but the defined string widths would be much wider than necessary and
>you'd have 95 unnecessary extra (and empty) variables.
>
>There may be more elegant solutions.

Yours is pretty much what I'd do myself.

As for the limitations, I don't know any way around setting a maximum
token length. But if you don't know the maximum number of tokens, you
can write the tokens to separate records, using XSAVE; read back the
resulting file, and use CASESTOVARS put the tokens into a set of
variables instead of separate cases. (Is it obvious how to write this code?)

This works because CASESTOVARS scans the data to determine the
largest number of tokens in any case, and builds the variable list
accordingly. I believe that it is the only command in native SPSS
that can generate a data dictionary whose content depends on the data
in the file (can anybody think of another?); that can also be done
with Python, or by generating SPSS code from an SPSS program, writing
it out, and INCLUDE/INSERTing it.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax to reverse concatenate in SPSS?

Oliver, Richard
In reply to this post by Richard Ristow
I was trying to provide a solution that would work (albeit not in an optimal fashion) even if you didn't know the maximum number of tokens.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Thursday, November 29, 2007 9:24 PM
To: Oliver, Richard; [hidden email]
Cc: Theodora B. Consolacion
Subject: Re: Syntax to reverse concatenate in SPSS?

At 08:59 PM 11/29/2007, Oliver, Richard wrote:

>Oops. My bad. I forgot that scratch variables don't automatically
>get reinitialized...

Sure. But, why not make #i the loop counter instead, and save
initializing and incrementing it? The loop has a determinate upper
limit: you're allowing for parsing out up to 5 tokens. (Your code
would blow on an indexing error, if there were more than 5 in the
string.) The LTRIM strips leading blanks from tokens; omit, if that's
not desired.

|-----------------------------|---------------------------|
|Output Created               |29-NOV-2007 22:20:49       |
|-----------------------------|---------------------------|
oldvar

ab;abc;abcd
abcde;f
abcde;hijk;lmn;op;qr;s
zzzz;yyy;xwvu;aeiou;

Number of cases read:  4    Number of cases listed:  4


*real code starts here.
string  #temp(a22).
compute #temp=oldvar.
vector  newvar(5, a5).
loop #i = 1 TO 5.
.  compute #index=index(#temp,";").
.  DO IF   #index GT 0.
.    compute newvar(#i)=LTRIM(substr(#temp,1, #index-1)).
.    compute #temp=substr(#temp, #index+1).
.  ELSE.
.    compute newvar(#i)=LTRIM(#temp).
.    compute #temp=''.
.  END IF.
end loop IF #index EQ 0.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |29-NOV-2007 22:20:50       |
|-----------------------------|---------------------------|
oldvar                 newvar1 newvar2 newvar3 newvar4 newvar5

ab;abc;abcd            ab      abc     abcd
abcde;f                abcde   f
abcde;hijk;lmn;op;qr;s abcde   hijk    lmn     op      qr
zzzz;yyy;xwvu;aeiou;   zzzz    yyy     xwvu    aeiou

Number of cases read:  4    Number of cases listed:  4
===================
APPENDIX: Test data
===================
*sample data.
data list list /oldvar (a22).
begin data
ab;abc;abcd
abcde;f
abcde;hijk;lmn;op;qr;s
zzzz;yyy;xwvu;aeiou;
end data.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax to reverse concatenate in SPSS?

Theodora B. Consolacion
In reply to this post by Oliver, Richard
Thank you! This works fantastically for the data that I have.

cheers,
teddy

On Nov 29, 2007 5:59 PM, Oliver, Richard <[hidden email]> wrote:

> Oops. My bad. I forgot that scratch variables don't automatically get
> reinitialized...
>
> *sample data.
> data list list /oldvar (a22).
> begin data
> ab;abc;abcd
> abcde;f
> end data.
> *real code starts here.
> string #temp(a22).
> compute #temp=oldvar.
> vector newvar(5, a5).
> compute #i=0. /*reinitialize the counter for each case.
> loop if index(#temp, ";")>0.
> compute #index=index(#temp,";").
> compute #i=#i+1.
> compute newvar(#i)=substr(#temp,1, #index-1).
> compute #temp=substr(#temp, #index+1).
> end loop.
> compute newvar(#i+1)=#temp.
> execute.
>
> -----Original Message-----
> From: Oliver, Richard
> Sent: Thursday, November 29, 2007 3:33 PM
> To: 'Theodora B. Consolacion'; [hidden email]
> Subject: RE: Syntax to reverse concatenate in SPSS?
>
> Well, my first recommendation would be to specify a semi-colon as a value
> delimiter when reading the data, but that only works for delimited text
> files (using either GET DATA or DATA LIST).
>
> But assuming your stuck with a string value that contains multiple values
> delimited by a semi-colon:
>
> *sample data.
> data list free /oldvar (a22).
> begin data
> ab;abc;abcd;abcde;f
> end data.
> *real code starts here.
> string #temp(a22).
> compute #temp=oldvar.
> vector newvar(5, a5).
> loop if index(#temp, ";")>0.
> compute #index=index(#temp,";").
> compute #i=#i+1.
> compute newvar(#i)=substr(#temp,1, #index-1).
> compute #temp=substr(#temp, #index+1).
> end loop.
> compute newvar(#i+1)=#temp.
> execute.
>
> The one catch here is that you either need to know the maximum number of
> values and the maximum width of a single value (for the Vector command that
> creates the new variables) or you need to overestimate to make sure you get
> all of them, in which case you may end up with a bunch of superfluous empty
> new variables. For example,  "vector newvar(100,a100)" would also work in my
> example, but the defined string widths would be much wider than necessary
> and you'd have 95 unnecessary extra (and empty) variables.
>
> There may be more elegant solutions.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Theodora B. Consolacion
> Sent: Thursday, November 29, 2007 1:09 PM
> To: [hidden email]
> Subject: Syntax to reverse concatenate in SPSS?
>
> I know this is possible in Excel, but I was wondering if SPSS had a
> similar
> feature.
>
> The problem:
> Multiple data entries (String) in a single field separated by semi-colons.
>
> Is there syntax code to separate out the multiple data entries in a single
> field into separate fields for each datum?
>
> Thanks in advance for any help.
> Teddy
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD