One String Variable into Many Binary Variables

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

One String Variable into Many Binary Variables

Joseph Schneider
Dear List,

I have a string variable that I need to decompose into multiple binary
variables.  The String Variable potentially has the following characters:

Name A(comments) Name B (comments) Name C (comments)

I need to create a data set that decomposes this into binary data ie
create a variable called nameA, so for every case that contains Name A,
this new variable will have a 1, and a 0, if name A is not present, so the
new dataset will look like:


                                                nameA nameB   nameC
Name A(comments) Name B (comments) Name C (comments)  1 1 1
Name A(comments) Name B (comments)                   1       1       0
Name A(comments)  Name C (comments)                  1 0 1
Name A Name B                                          1 1 0


So there is avaraible created for each potential name.
Now, here is the fun part, the names are not always in that order, and
they are not always followed by comments.  Any ideas?  Hand is out there
are over 5000 cases.

Joe

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: One String Variable into Many Binary Variables

Bruce Weaver
Administrator
Joseph Schneider wrote
Dear List,

I have a string variable that I need to decompose into multiple binary
variables.  The String Variable potentially has the following characters:

Name A(comments) Name B (comments) Name C (comments)

I need to create a data set that decomposes this into binary data ie
create a variable called nameA, so for every case that contains Name A,
this new variable will have a 1, and a 0, if name A is not present, so the
new dataset will look like:


                                                nameA nameB   nameC
Name A(comments) Name B (comments) Name C (comments)  1 1 1
Name A(comments) Name B (comments)                   1       1       0
Name A(comments)  Name C (comments)                  1 0 1
Name A Name B                                          1 1 0


So there is avaraible created for each potential name.
Now, here is the fun part, the names are not always in that order, and
they are not always followed by comments.  Any ideas?  Hand is out there
are over 5000 cases.

Joe
Hi Joe.  Take a look at using the CHAR.INDEX function to search for instances of the various names in your string variable.  If the name does not appear in the string, CHAR.INDEX will return a 0.  If it does exist, CHAR.INDEX returns the position of the first character in the sub-string you are searching for.  So the logic is pretty simple:

compute NameA = CHAR.INDEX(OriginalString,"Name A") GT 0.
compute NameB = CHAR.INDEX(OriginalString,"Name B") GT 0.
etc.

The other thing to watch for is lower vs upper case.  I think you can probably just embed an UPPER (or LOWER) function in the lines shown above.  E.g.,

compute NameA = CHAR.INDEX(Upper(OriginalString),"NAME A") GT 0.
compute NameB = CHAR.INDEX(Upper(OriginalString),"NAME B") GT 0.

HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: One String Variable into Many Binary Variables

Marta Garcia-Granero
In reply to this post by Joseph Schneider
Joseph Schneider wrote:

> I have a string variable that I need to decompose into multiple binary
> variables.  The String Variable potentially has the following characters:
>
> Name A(comments) Name B (comments) Name C (comments)
>
> I need to create a data set that decomposes this into binary data ie
> create a variable called nameA, so for every case that contains Name A,
> this new variable will have a 1, and a 0, if name A is not present, so the
> new dataset will look like:
>
>
>                                                 nameA nameB   nameC
> Name A(comments) Name B (comments) Name C (comments)  1 1 1
> Name A(comments) Name B (comments)                   1       1       0
> Name A(comments)  Name C (comments)                  1 0 1
> Name A Name B                                          1 1 0
>
>
> So there is avaraible created for each potential name.
> Now, here is the fun part, the names are not always in that order, and
> they are not always followed by comments.  Any ideas?  Hand is out there
> are over 5000 cases.
>
>
Hi Joe:
This should work:

* Sample dataset *.
DATA LIST LIST/StringVar(A55).
BEGIN DATA
"Name A(comments) Name B (comments) Name C (comments)"
"Name A(comments) Name B (comments)"
"Name A(comments)  Name C (comments)"
"Name A Name B"
END DATA.

NUMERIC NameA NameB NameC (F8).
COMPUTE NameA=0.
COMPUTE NameB=0.
COMPUTE NameC=0.
IF (INDEX(StringVar,"Name A")>0) NameA = 1 .
IF (INDEX(StringVar,"Name B")>0) NameB = 1 .
IF (INDEX(StringVar,"Name C")>0) NameC = 1 .
LIST.

HTH,
Marta GG

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: One String Variable into Many Binary Variables

Mike
In reply to this post by Joseph Schneider
You need to be more specific about certain things.  For example
are whether all of the Name variables left-justified, that is, every
name starts in the first place/space.  If so, consider the following:

STRING test_A, test_B, test_C (a1)

compute test_A=nameA.
compute test_B=nameB.
compute test_C=nameC.

*Note: The test variables are one character long while the name
*variables are some arbitrary length.  Setting test = name means
*that only the first position of name is inserted into test -- the
*rest of the name variable is right trimmed;  someone correct
*me if this has changed.

compute Dichot_1=0.
compute Dichot_2=0.
compute Dichot_3=0.

if(test_A ne " ")Dichot_1=1.
if(test_B ne " ")Dichot_2=1.
if(test_C ne " ")Dichot_3=1.

Desc var=Dichot_1, Dichot_2, Dichot_3.

The means from the Descriptives procedure will tell you how many
names you have (or how many name fields without initial blanks).

There may be simpler or more elegant ways of doing this.  Also
this is based on knowledge of how some old SPSS operations should
procedure.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Joseph Schneider" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, March 09, 2010 10:47 AM
Subject: One String Variable into Many Binary Variables


> Dear List,
>
> I have a string variable that I need to decompose into multiple binary
> variables.  The String Variable potentially has the following characters:
>
> Name A(comments) Name B (comments) Name C (comments)
>
> I need to create a data set that decomposes this into binary data ie
> create a variable called nameA, so for every case that contains Name A,
> this new variable will have a 1, and a 0, if name A is not present, so the
> new dataset will look like:
>
>
>                                                nameA nameB   nameC
> Name A(comments) Name B (comments) Name C (comments)  1 1 1
> Name A(comments) Name B (comments)                   1       1       0
> Name A(comments)  Name C (comments)                  1 0 1
> Name A Name B                                          1 1 0
>
>
> So there is avaraible created for each potential name.
> Now, here is the fun part, the names are not always in that order, and
> they are not always followed by comments.  Any ideas?  Hand is out there
> are over 5000 cases.
>
> Joe
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: One String Variable into Many Binary Variables

Maguin, Eugene
In reply to this post by Joseph Schneider
Joeseph,

Bruce and Marta have already provided suggestions and solutions but I want
to just check on something that was not clear to me from your posting. You
said:

>>I have a string variable that I need to decompose into multiple binary
variables.  The String Variable potentially has the following characters:

Name A(comments) Name B (comments) Name C (comments)

I want to know what Name A, Name B, and Name C are. Does the variable on
each record contain at least one of the strings 'Name A', 'Name B', and
'Name C'?
OR, is Name A in your example data actually replaced by a person's name in
the real data? AND, if so, does  the name recorded in Name A remain the same
from record to record?

I may be imagining something that is not there but sometimes Car A is a
'Ford' and sometimes it is a 'Toyota' and not just 'Car A'.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD