Parsing String Name Variable

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing String Name Variable

Maggie Greiner
Dear List.

I know this is a simple question, but I've messed around with substr and
rindex and can't get the syntax to work. Truth be told, I'm not sure what
I'm doing anyway.

I have a file with 2300 records. The First Middle Last names are in one
variable (name). I want to parse out the first, middle and last into three
separate variables. Spaces separate (one space) each of the name
components. The name components are of varying lengths.

This is what I have:

Name
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman

This is what I want

First   Middle     Last
Brenda              Jones
Steven  Patrick     Leesman

Thanks for any help.

Maggie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Bruce Weaver
Administrator
How about this?

data list / name(a25).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.

string first middle last (a15).
compute #sp1 = index(name," ").
compute #sp2 = rindex(rtrim(name)," ").
compute first = substr(name,1,#sp1-1).
compute last = substr(name,#sp2+1).
if #sp2 NE #sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.

OUTPUT:
name                      first           middle          last
 
Brenda Jones              Brenda                          Jones
Chuck Fred Smith          Chuck           Fred            Smith
Alyssa Gwen Mulder        Alyssa          Gwen            Mulder
Steven Patrick Leesman    Steven          Patrick         Leesman
 
Number of cases read:  4    Number of cases listed:  4

Maggie wrote
Dear List.

I know this is a simple question, but I've messed around with substr and
rindex and can't get the syntax to work. Truth be told, I'm not sure what
I'm doing anyway.

I have a file with 2300 records. The First Middle Last names are in one
variable (name). I want to parse out the first, middle and last into three
separate variables. Spaces separate (one space) each of the name
components. The name components are of varying lengths.

This is what I have:

Name
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman

This is what I want

First   Middle     Last
Brenda              Jones
Steven  Patrick     Leesman

Thanks for any help.

Maggie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Rick Oliver-3
In reply to this post by Maggie Greiner
There are numerous ways to accomplish this. Here's one:

*sample data.
data list list (",") /name (a100).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.
*end sample data, start code.
string fname mname lname #name (a100).
compute fname=substr(name, 1, index(name, " ")-1).
compute lname=substr(name, rindex(name, " ")+1).
compute #name=replace(name, fname, "").
compute #name=replace(#name, lname, "").
compute mname=ltrim(#name).
list.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        Maggie <[hidden email]>
To:        [hidden email]
Date:        07/05/2012 04:13 PM
Subject:        Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear List.

I know this is a simple question, but I've messed around with substr and
rindex and can't get the syntax to work. Truth be told, I'm not sure what
I'm doing anyway.

I have a file with 2300 records. The First Middle Last names are in one
variable (name). I want to parse out the first, middle and last into three
separate variables. Spaces separate (one space) each of the name
components. The name components are of varying lengths.

This is what I have:

Name
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman

This is what I want

First   Middle     Last
Brenda              Jones
Steven  Patrick     Leesman

Thanks for any help.

Maggie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

David Marso
Administrator
In reply to this post by Maggie Greiner
For simple 2/3 element parsing use Bruce's or Rick's solutions OTOH if you encounter general issues where you need to split out multiple elements from arbitrarily long sequences the following ideas are useful.
data list / str 1-40 (A).
begin data
john stuart mills
captain crunch
end data

string fname mname lname (A20) #str (a40).
VECTOR names=fname TO lname.
COMPUTE #str=LTRIM(str).
COMPUTE #I=1.
LOOP .
+  COMPUTE #Found=INDEX(#str," ").
+  COMPUTE names(#I)=SUBSTR(#str,1,#Found).
+  COMPUTE #str=LTRIM(SUBSTR(#str,#Found+1)).
+  COMPUTE #I=#I+1.
END LOOP IF #str=" ".
** Fix middle and last names **.
DO IF #I=3.
+  COMPUTE names(3)=names(2).
+  COMPUTE names(2)=" ".
END IF.

list.
Maggie wrote
Dear List.

I know this is a simple question, but I've messed around with substr and
rindex and can't get the syntax to work. Truth be told, I'm not sure what
I'm doing anyway.

I have a file with 2300 records. The First Middle Last names are in one
variable (name). I want to parse out the first, middle and last into three
separate variables. Spaces separate (one space) each of the name
components. The name components are of varying lengths.

This is what I have:

Name
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman

This is what I want

First   Middle     Last
Brenda              Jones
Steven  Patrick     Leesman

Thanks for any help.

Maggie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Jon K Peck
Or this, using a Python extension command.
data list / name 1-40 (A).
begin data
john stuart mills
captain crunch
end data
dataset name names.

spssinc trans result = first middle last  type=15
/formula "string.split(name)".
do if last = "".
compute last = middle.
compute middle = "".
end if.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        07/05/2012 05:55 PM
Subject:        Re: [SPSSX-L] Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




For simple 2/3 element parsing use Bruce's or Rick's solutions OTOH if you
encounter general issues where you need to split out multiple elements from
arbitrarily long sequences the following ideas are useful.
data list / str 1-40 (A).
begin data
john stuart mills
captain crunch
end data

string fname mname lname (A20) #str (a40).
VECTOR names=fname TO lname.
COMPUTE #str=LTRIM(str).
COMPUTE #I=1.
LOOP .
+  COMPUTE #Found=INDEX(#str," ").
+  COMPUTE names(#I)=SUBSTR(#str,1,#Found).
+  COMPUTE #str=LTRIM(SUBSTR(#str,#Found+1)).
+  COMPUTE #I=#I+1.
END LOOP IF #str=" ".
** Fix middle and last names **.
DO IF #I=3.
+  COMPUTE names(3)=names(2).
+  COMPUTE names(2)=" ".
END IF.

list.

Maggie wrote
>
> Dear List.
>
> I know this is a simple question, but I've messed around with substr and
> rindex and can't get the syntax to work. Truth be told, I'm not sure what
> I'm doing anyway.
>
> I have a file with 2300 records. The First Middle Last names are in one
> variable (name). I want to parse out the first, middle and last into three
> separate variables. Spaces separate (one space) each of the name
> components. The name components are of varying lengths.
>
> This is what I have:
>
> Name
> Brenda Jones
> Chuck Fred Smith
> Alyssa Gwen Mulder
> Steven Patrick Leesman
>
> This is what I want
>
> First   Middle     Last
> Brenda              Jones
> Steven  Patrick     Leesman
>
> Thanks for any help.
>
> Maggie
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714040.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Albert-Jan Roskam
Cool, but it won't work with double names (double middle names, noble names). Here are a bunch of names I plucked from wikipedia, the last few are noble names.

data list / naam 1-50 (A).
begin data
Henk Baars
Arjen de Baat
W.G. del Baere
Maarten den Bakker
Cees Bal
Frank van Bakel
Maas van Beek
Daniëlle Bekkering
Dirk Bellemakers
Chantal Beltman
Henk Benjamins
Camiel van den Bergh
Jan van Benthem van den Bergh
Johan van den Berch van Heemstede
Jan-Willem van Beresteyn
Hans-Willem van Bevervoorden van Oldemeule
end data
dataset name namen.
begin program python.
import re
def giveNames(name):
 first = re.split("\s+(?=[a-z]?)", name)[0]
 last = " ".join(re.split("\s+(?=[A-Z])", name)[1:])
 middle = re.sub("(%s|%s)" % (first, last), "",  name).strip()
 return first, middle, last
end program.
spssinc trans result = first middle last  type=30
  /formula "giveNames(naam)".
execute.
 
Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Friday, July 6, 2012 3:00 AM
Subject: Re: [SPSSX-L] Parsing String Name Variable

Or this, using a Python extension command.
data list / name 1-40 (A).
begin data
john stuart mills
captain crunch
end data
dataset name names.

spssinc trans result = first middle last  type=15
/formula "string.split(name)".
do if last = "".
compute last = middle.
compute middle = "".
end if.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        07/05/2012 05:55 PM
Subject:        Re: [SPSSX-L] Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



For simple 2/3 element parsing use Bruce's or Rick's solutions OTOH if you
encounter general issues where you need to split out multiple elements from
arbitrarily long sequences the following ideas are useful.
data list / str 1-40 (A).
begin data
john stuart mills
captain crunch
end data

string fname mname lname (A20) #str (a40).
VECTOR names=fname TO lname.
COMPUTE #str=LTRIM(str).
COMPUTE #I=1.
LOOP .
+  COMPUTE #Found=INDEX(#str," ").
+  COMPUTE names(#I)=SUBSTR(#str,1,#Found).
+  COMPUTE #str=LTRIM(SUBSTR(#str,#Found+1)).
+  COMPUTE #I=#I+1.
END LOOP IF #str=" ".
** Fix middle and last names **.
DO IF #I=3.
+  COMPUTE names(3)=names(2).
+  COMPUTE names(2)=" ".
END IF.

list.

Maggie wrote

>
> Dear List.
>
> I know this is a simple question, but I've messed around with substr and
> rindex and can't get the syntax to work. Truth be told, I'm not sure what
> I'm doing anyway.
>
> I have a file with 2300 records. The First Middle Last names are in one
> variable (name). I want to parse out the first, middle and last into three
> separate variables. Spaces separate (one space) each of the name
> components. The name components are of varying lengths.
>
> This is what I have:
>
> Name
> Brenda Jones
> Chuck Fred Smith
> Alyssa Gwen Mulder
> Steven Patrick Leesman
>
> This is what I want
>
> First   Middle     Last
> Brenda              Jones
> Steven  Patrick     Leesman
>
> Thanks for any help.
>
> Maggie
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714040.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




Reply | Threaded
Open this post in threaded view
|

Automatic reply: Parsing String Name Variable

Kylie
I am currently out of the office at a conference and will return on Friday July 13th. I will be monitoring my email for urgent issues, but otherwise I will get back to you as soon as possible after my return.

Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Maggie Greiner
In reply to this post by Rick Oliver-3
Wow! This has been helpful. Thanks to all of you for your syntax help. This has saved me hours and probably days because I sometimes have a short attention span. Thanks Albert, I didn't think about double names, and even where I live names are not as straight forward as they used to be (e.g. Fred Jones); the names of  people coming into our systems are more and more complicated. I'll check my list and take that into account.
 
Bruce, I'm using your syntax, but there are a couple of things I don't understand, and I want to learn. I understand everything up to "name,1,#sp1-1" and "#sp2+1". What does that syntax do?
 
Again, thanks to everyone for your help.
 
Maggie.

 
 
 
From:        Bruce Weaver <[hidden email]>
To:        [hidden email]
Date:        07/05/2012 05:03 PM
Subject:        Re: Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



How about this?

data list / name(a25).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.

string first middle last (a15).
compute #sp1 = index(name," ").
compute #sp2 = rindex(rtrim(name)," ").
compute first = substr(name,1,#sp1-1).
compute last = substr(name,#sp2+1).
if #sp2 NE #sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.

OUTPUT:
name                      first           middle          last

Brenda Jones              Brenda                          Jones
Chuck Fred Smith          Chuck           Fred            Smith
Alyssa Gwen Mulder        Alyssa          Gwen            Mulder
Steven Patrick Leesman    Steven          Patrick         Leesman

Number of cases read:  4    Number of cases listed:  4


Maggie wrote

>
> Dear List.
>
> I know this is a simple question, but I've messed around with substr and
> rindex and can't get the syntax to work. Truth be told, I'm not sure what
> I'm doing anyway.
>
> I have a file with 2300 records. The First Middle Last names are in one
> variable (name). I want to parse out the first, middle and last into three
> separate variables. Spaces separate (one space) each of the name
> components. The name components are of varying lengths.
>
> This is what I have:
>
> Name
> Brenda Jones
> Chuck Fred Smith
> Alyssa Gwen Mulder
> Steven Patrick Leesman
>
> This is what I want
>
> First   Middle     Last
> Brenda              Jones
> Steven  Patrick     Leesman
>
> Thanks for any help.
>
> Maggie
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714038.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Rick Oliver-3
They are index values that identify the position (character count) of the first space encountered  working from the left and the first space (rindex) working from the right.

The general form of the SUBSTR function is SUBSTR(varname, start location, number of characters). The third argument is optional. If it's omitted, all characters from start location to the end will be included.

Everything between the first and last space will be included in the middle name.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]
Phone: 312.893.4922 | T/L: 206-4922




From:        Maggie Greiner <[hidden email]>
To:        [hidden email]
Date:        07/06/2012 08:57 AM
Subject:        Re: Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Wow! This has been helpful. Thanks to all of you for your syntax help. This has saved me hours and probably days because I sometimes have a short attention span. Thanks Albert, I didn't think about double names, and even where I live names are not as straight forward as they used to be (e.g. Fred Jones); the names of  people coming into our systems are more and more complicated. I'll check my list and take that into account.
 
Bruce, I'm using your syntax, but there are a couple of things I don't understand, and I want to learn. I understand everything up to "name,1,#sp1-1" and "#sp2+1". What does that syntax do?
 
Again, thanks to everyone for your help.
 
Maggie.

 
 
 
From:        Bruce Weaver <[hidden email]>
To:        
[hidden email]
Date:        
07/05/2012 05:03 PM
Subject:        
Re: Parsing String Name Variable
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>



How about this?

data list / name(a25).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.

string first middle last (a15).
compute #sp1 = index(name," ").
compute #sp2 = rindex(rtrim(name)," ").
compute first = substr(name,1,#sp1-1).
compute last = substr(name,#sp2+1).
if #sp2 NE #sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.

OUTPUT:
name                      first           middle          last

Brenda Jones              Brenda                          Jones
Chuck Fred Smith          Chuck           Fred            Smith
Alyssa Gwen Mulder        Alyssa          Gwen            Mulder
Steven Patrick Leesman    Steven          Patrick         Leesman

Number of cases read:  4    Number of cases listed:  4


Maggie wrote
>
> Dear List.
>
> I know this is a simple question, but I've messed around with substr and
> rindex and can't get the syntax to work. Truth be told, I'm not sure what
> I'm doing anyway.
>
> I have a file with 2300 records. The First Middle Last names are in one
> variable (name). I want to parse out the first, middle and last into three
> separate variables. Spaces separate (one space) each of the name
> components. The name components are of varying lengths.
>
> This is what I have:
>
> Name
> Brenda Jones
> Chuck Fred Smith
> Alyssa Gwen Mulder
> Steven Patrick Leesman
>
> This is what I want
>
> First   Middle     Last
> Brenda              Jones
> Steven  Patrick     Leesman
>
> Thanks for any help.
>
> Maggie
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]

http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714038.html
Sent from the SPSSX Discussion mailing list archive at
Nabble.com.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Bruce Weaver
Administrator
In reply to this post by Maggie Greiner
Hi Maggie.  As Rick explained, my #sp1 and #sp2 scratch variables were indices giving the positions of the first and second spaces.  To see this explicitly, you could run a version of my syntax that removes the hash characters (#) from those variables.  That will make them regular variables that you can then inspect in the data file afterwards to confirm that they give the positions of the 1st and 2nd spaces -- and when there is only one space (i.e., only first and last names), they will be equal.

string first middle last (a15).
compute sp1 = index(name," "). /* position of 1st space.
compute sp2 = rindex(rtrim(name)," "). /* position of 2nd space.
compute first = substr(name,1,sp1-1). /* Sub-string from 1 to SP1-1.
compute last = substr(name,sp2+1). /* Sub-string from SP2+1 to the end.
* If sp2 EQ sp1, there are only two names, so no middle name.
* If sp2 NE sp1, grab everything between the 2 spaces and assign to MIDDLE.
if sp2 NE sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.


HTH.


Maggie wrote
Wow! This has been helpful. Thanks to all of you for your syntax help. This has saved me hours and probably days because I sometimes have a short attention span. Thanks Albert, I didn't think about double names, and even where I live names are not as straight forward as they used to be (e.g. Fred Jones); the names of  people coming into our systems are more and more complicated. I'll check my list and take that into account.
 
Bruce, I'm using your syntax, but there are a couple of things I don't understand, and I want to learn. I understand everything up to "name,1,#sp1-1" and "#sp2+1". What does that syntax do?
 
Again, thanks to everyone for your help.
 
Maggie.


________________________________

 
 
 
From:        Bruce Weaver <[hidden email]>
To:        [hidden email] 
Date:        07/05/2012 05:03 PM
Subject:        Re: Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>


________________________________




How about this?

data list / name(a25).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.

string first middle last (a15).
compute #sp1 = index(name," ").
compute #sp2 = rindex(rtrim(name)," ").
compute first = substr(name,1,#sp1-1).
compute last = substr(name,#sp2+1).
if #sp2 NE #sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.

OUTPUT:
name                      first           middle          last

Brenda Jones              Brenda                          Jones
Chuck Fred Smith          Chuck           Fred            Smith
Alyssa Gwen Mulder        Alyssa          Gwen            Mulder
Steven Patrick Leesman    Steven          Patrick         Leesman

Number of cases read:  4    Number of cases listed:  4


Maggie wrote
>
> Dear List.
>
> I know this is a simple question, but I've messed around with substr and
> rindex and can't get the syntax to work. Truth be told, I'm not sure what
> I'm doing anyway.
>
> I have a file with 2300 records. The First Middle Last names are in one
> variable (name). I want to parse out the first, middle and last into three
> separate variables. Spaces separate (one space) each of the name
> components. The name components are of varying lengths.
>
> This is what I have:
>
> Name
> Brenda Jones
> Chuck Fred Smith
> Alyssa Gwen Mulder
> Steven Patrick Leesman
>
> This is what I want
>
> First   Middle     Last
> Brenda              Jones
> Steven  Patrick     Leesman
>
> Thanks for any help.
>
> Maggie
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714038.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

David Marso
Administrator
Alternatively keep them as #scratch variables and use PRINT to inspect them in the output.
Note a few mods (CHAR.) to Bruce's code and a fix in the middle computation.
I like/tend to use CAPITAL letters for SPSS commands and lower case for vars.  
I find it easier to quickly scan code for basic operations and logic.

STRING first middle last (A15).
COMPUTE #sp1 = CHAR.INDEX(name," "). /* position of 1st space.
COMPUTE #sp2 = CHAR.RINDEX(RTRIM(name)," "). /* position of *last* space.
COMPUTE first = CHAR.SUBSTR(name,1,#sp1-1). /* Sub-string from 1 to #SP1-1.
COMPUTE last = CHAR.SUBSTR(name,#sp2+1). /* Sub-string from #SP2+1 to the end.
IF #sp2 NE #sp1 middle = CHAR.SUBSTR(name,#sp1+1,#sp2-#sp1-1).

PRINT / name first last #sp1 #sp2.
EXE.

Bruce Weaver wrote
Hi Maggie.  As Rick explained, my #sp1 and #sp2 scratch variables were indices giving the positions of the first and second spaces.  To see this explicitly, you could run a version of my syntax that removes the hash characters (#) from those variables.  That will make them regular variables that you can then inspect in the data file afterwards to confirm that they give the positions of the 1st and 2nd spaces -- and when there is only one space (i.e., only first and last names), they will be equal.

string first middle last (a15).
compute sp1 = index(name," "). /* position of 1st space.
compute sp2 = rindex(rtrim(name)," "). /* position of 2nd space.
compute first = substr(name,1,sp1-1). /* Sub-string from 1 to SP1-1.
compute last = substr(name,sp2+1). /* Sub-string from SP2+1 to the end.
* If sp2 EQ sp1, there are only two names, so no middle name.
* If sp2 NE sp1, grab everything between the 2 spaces and assign to MIDDLE.
if sp2 NE sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.


HTH.


Maggie wrote
Wow! This has been helpful. Thanks to all of you for your syntax help. This has saved me hours and probably days because I sometimes have a short attention span. Thanks Albert, I didn't think about double names, and even where I live names are not as straight forward as they used to be (e.g. Fred Jones); the names of  people coming into our systems are more and more complicated. I'll check my list and take that into account.
 
Bruce, I'm using your syntax, but there are a couple of things I don't understand, and I want to learn. I understand everything up to "name,1,#sp1-1" and "#sp2+1". What does that syntax do?
 
Again, thanks to everyone for your help.
 
Maggie.


________________________________

 
 
 
From:        Bruce Weaver <[hidden email]>
To:        [hidden email] 
Date:        07/05/2012 05:03 PM
Subject:        Re: Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>


________________________________




How about this?

data list / name(a25).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.

string first middle last (a15).
compute #sp1 = index(name," ").
compute #sp2 = rindex(rtrim(name)," ").
compute first = substr(name,1,#sp1-1).
compute last = substr(name,#sp2+1).
if #sp2 NE #sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.

OUTPUT:
name                      first           middle          last

Brenda Jones              Brenda                          Jones
Chuck Fred Smith          Chuck           Fred            Smith
Alyssa Gwen Mulder        Alyssa          Gwen            Mulder
Steven Patrick Leesman    Steven          Patrick         Leesman

Number of cases read:  4    Number of cases listed:  4


Maggie wrote
>
> Dear List.
>
> I know this is a simple question, but I've messed around with substr and
> rindex and can't get the syntax to work. Truth be told, I'm not sure what
> I'm doing anyway.
>
> I have a file with 2300 records. The First Middle Last names are in one
> variable (name). I want to parse out the first, middle and last into three
> separate variables. Spaces separate (one space) each of the name
> components. The name components are of varying lengths.
>
> This is what I have:
>
> Name
> Brenda Jones
> Chuck Fred Smith
> Alyssa Gwen Mulder
> Steven Patrick Leesman
>
> This is what I want
>
> First   Middle     Last
> Brenda              Jones
> Steven  Patrick     Leesman
>
> Thanks for any help.
>
> Maggie
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714038.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Bruce Weaver
Administrator
Thanks David.  I didn't think of PRINT.  

By the way, I LIKE the uppercase letters for commands too, but am usually too lazy to hold down the shift key while I type them.  ;-)

And before someone jumps in suggesting that I use auto-complete (or whatever it's called), I've turned it off because I find it gets in the way.



David Marso wrote
Alternatively keep them as #scratch variables and use PRINT to inspect them in the output.
Note a few mods (CHAR.) to Bruce's code and a fix in the middle computation.
I like/tend to use CAPITAL letters for SPSS commands and lower case for vars.  
I find it easier to quickly scan code for basic operations and logic.

STRING first middle last (A15).
COMPUTE #sp1 = CHAR.INDEX(name," "). /* position of 1st space.
COMPUTE #sp2 = CHAR.RINDEX(RTRIM(name)," "). /* position of *last* space.
COMPUTE first = CHAR.SUBSTR(name,1,#sp1-1). /* Sub-string from 1 to #SP1-1.
COMPUTE last = CHAR.SUBSTR(name,#sp2+1). /* Sub-string from #SP2+1 to the end.
IF #sp2 NE #sp1 middle = CHAR.SUBSTR(name,#sp1+1,#sp2-#sp1-1).

PRINT / name first last #sp1 #sp2.
EXE.

Bruce Weaver wrote
Hi Maggie.  As Rick explained, my #sp1 and #sp2 scratch variables were indices giving the positions of the first and second spaces.  To see this explicitly, you could run a version of my syntax that removes the hash characters (#) from those variables.  That will make them regular variables that you can then inspect in the data file afterwards to confirm that they give the positions of the 1st and 2nd spaces -- and when there is only one space (i.e., only first and last names), they will be equal.

string first middle last (a15).
compute sp1 = index(name," "). /* position of 1st space.
compute sp2 = rindex(rtrim(name)," "). /* position of 2nd space.
compute first = substr(name,1,sp1-1). /* Sub-string from 1 to SP1-1.
compute last = substr(name,sp2+1). /* Sub-string from SP2+1 to the end.
* If sp2 EQ sp1, there are only two names, so no middle name.
* If sp2 NE sp1, grab everything between the 2 spaces and assign to MIDDLE.
if sp2 NE sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.


HTH.


Maggie wrote
Wow! This has been helpful. Thanks to all of you for your syntax help. This has saved me hours and probably days because I sometimes have a short attention span. Thanks Albert, I didn't think about double names, and even where I live names are not as straight forward as they used to be (e.g. Fred Jones); the names of  people coming into our systems are more and more complicated. I'll check my list and take that into account.
 
Bruce, I'm using your syntax, but there are a couple of things I don't understand, and I want to learn. I understand everything up to "name,1,#sp1-1" and "#sp2+1". What does that syntax do?
 
Again, thanks to everyone for your help.
 
Maggie.


________________________________

 
 
 
From:        Bruce Weaver <[hidden email]>
To:        [hidden email] 
Date:        07/05/2012 05:03 PM
Subject:        Re: Parsing String Name Variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>


________________________________




How about this?

data list / name(a25).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.

string first middle last (a15).
compute #sp1 = index(name," ").
compute #sp2 = rindex(rtrim(name)," ").
compute first = substr(name,1,#sp1-1).
compute last = substr(name,#sp2+1).
if #sp2 NE #sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.

OUTPUT:
name                      first           middle          last

Brenda Jones              Brenda                          Jones
Chuck Fred Smith          Chuck           Fred            Smith
Alyssa Gwen Mulder        Alyssa          Gwen            Mulder
Steven Patrick Leesman    Steven          Patrick         Leesman

Number of cases read:  4    Number of cases listed:  4


Maggie wrote
>
> Dear List.
>
> I know this is a simple question, but I've messed around with substr and
> rindex and can't get the syntax to work. Truth be told, I'm not sure what
> I'm doing anyway.
>
> I have a file with 2300 records. The First Middle Last names are in one
> variable (name). I want to parse out the first, middle and last into three
> separate variables. Spaces separate (one space) each of the name
> components. The name components are of varying lengths.
>
> This is what I have:
>
> Name
> Brenda Jones
> Chuck Fred Smith
> Alyssa Gwen Mulder
> Steven Patrick Leesman
>
> This is what I want
>
> First   Middle     Last
> Brenda              Jones
> Steven  Patrick     Leesman
>
> Thanks for any help.
>
> Maggie
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714038.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Maggie Greiner
In reply to this post by Maggie Greiner
This helped. I did remove the # and I could follow the logic of the code. Thanks.!
 
Maggie

From: Bruce Weaver <[hidden email]>
Date: 07/06/2012 01:13 PM
Subject: Re: Parsing String Name Variable
Sent by: "SPSSX(r) Discussion" <[hidden email]>
 
 
Hi Maggie. As Rick explained, my #sp1 and #sp2 scratch variables were
indices giving the positions of the first and second spaces. To see this
explicitly, you could run a version of my syntax that removes the hash
characters (#) from those variables. That will make them regular variables
that you can then inspect in the data file afterwards to confirm that they
give the positions of the 1st and 2nd spaces -- and when there is only one
space (i.e., only first and last names), they will be equal.

string first middle last (a15).
compute sp1 = index(name," "). /* position of 1st space.
compute sp2 = rindex(rtrim(name)," "). /* position of 2nd space.
compute first = substr(name,1,sp1-1). /* Sub-string from 1 to SP1-1.
compute last = substr(name,sp2+1). /* Sub-string from SP2+1 to the end.
* If sp2 EQ sp1, there are only two names, so no middle name.
* If sp2 NE sp1, grab everything between the 2 spaces and assign to MIDDLE.
if sp2 NE sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.


HTH.



Maggie wrote

>
> Wow! This has been helpful. Thanks to all of you for your syntax help.
> This has saved me hours and probably days because I sometimes have a short
> attention span. Thanks Albert, I didn't think about double names, and even
> where I live names are not as straight forward as they used to be (e.g.
> Fred Jones); the names of people coming into our systems are more and
> more complicated. I'll check my list and take that into account.
>
> Bruce, I'm using your syntax, but there are a couple of things I don't
> understand, and I want to learn. I understand everything up to
> "name,1,#sp1-1" and "#sp2+1". What does that syntax do?
>
> Again, thanks to everyone for your help.
>
> Maggie.
>



Reply | Threaded
Open this post in threaded view
|

Re: Parsing String Name Variable

Art Kendall
In reply to this post by Bruce Weaver
The use of color in the syntax editor and the auto indent tool do a lot to help readability.

However, since long before the age of PCs, I have often asked SPSS to implement these things, but they have never been implemented.

Perhaps someone up on Python could write something to do what PRETTY did for FORTRAN in the 1972
era.
1) have all variables be lower case
2) have all functions have initial caps
3) have all reserve words be all caps.
4) leave comments, labels, etc. alone
5) be sure there is a blank space on each side of any symbolic operator.



If my vague understanding of Python is correct, someone who knew what (s)he was doing could also
1)expand abbreviated syntax, e.g., somebody posts syntax that says "freq" it would be expanded to "FREQUENCIES".
2) convert symbolic operators to the conventional operators. "&" to "AND", ">=" to "GE", etc.


Art Kendall
Social Research Consultants
On 7/6/2012 5:02 PM, Bruce Weaver wrote:
Thanks David.  I didn't think of PRINT.

By the way, I LIKE the uppercase letters for commands too, but am usually
too lazy to hold down the shift key while I type them.  ;-)

And before someone jumps in suggesting that I use auto-complete (or whatever
it's called), I've turned it off because I find it gets in the way.




David Marso wrote
Alternatively keep them as #scratch variables and use PRINT to inspect
them in the output.
Note a few mods (CHAR.) to Bruce's code and a fix in the middle
computation.
I like/tend to use CAPITAL letters for SPSS commands and lower case for
vars.
I find it easier to quickly scan code for basic operations and logic.

STRING first middle last (A15).
COMPUTE #sp1 = CHAR.INDEX(name," "). /* position of 1st space.
COMPUTE #sp2 = CHAR.RINDEX(RTRIM(name)," "). /* position of *last* space.
COMPUTE first = CHAR.SUBSTR(name,1,#sp1-1). /* Sub-string from 1 to
#SP1-1.
COMPUTE last = CHAR.SUBSTR(name,#sp2+1). /* Sub-string from #SP2+1 to the
end.
IF #sp2 NE #sp1 middle = CHAR.SUBSTR(name,#sp1+1,#sp2-#sp1-1).

PRINT / name first last #sp1 #sp2.
EXE.


Bruce Weaver wrote
Hi Maggie.  As Rick explained, my #sp1 and #sp2 scratch variables were
indices giving the positions of the first and second spaces.  To see this
explicitly, you could run a version of my syntax that removes the hash
characters (#) from those variables.  That will make them regular
variables that you can then inspect in the data file afterwards to
confirm that they give the positions of the 1st and 2nd spaces -- and
when there is only one space (i.e., only first and last names), they will
be equal.

string first middle last (a15).
compute sp1 = index(name," "). /* position of 1st space.
compute sp2 = rindex(rtrim(name)," "). /* position of 2nd space.
compute first = substr(name,1,sp1-1). /* Sub-string from 1 to SP1-1.
compute last = substr(name,sp2+1). /* Sub-string from SP2+1 to the end.
* If sp2 EQ sp1, there are only two names, so no middle name.
* If sp2 NE sp1, grab everything between the 2 spaces and assign to
MIDDLE.
if sp2 NE sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.


HTH.



Maggie wrote
Wow! This has been helpful. Thanks to all of you for your syntax help.
This has saved me hours and probably days because I sometimes have a
short attention span. Thanks Albert,� I didn't think about double names,
and even where I live names are not as straight forward as they used to
be (e.g. Fred Jones); the names of � people coming into our systems
are� more and more complicated. I'll check my list and take that into
account.
�
Bruce, I'm using your syntax,� but there are a couple of things I don't
understand, and I want to learn. I understand everything up to
"name,1,#sp1-1" and "#sp2+1". What does that syntax do?
�
Again, thanks to everyone for your help.
�
Maggie.


________________________________

�
�
�
From: �  �  �  � Bruce Weaver &lt;bruce.weaver@&gt;
To: �  �  �  � [hidden email]
Date: �  �  �  � 07/05/2012 05:03 PM
Subject: �  �  �  � Re: Parsing String Name Variable
Sent by: �  �  �  � "SPSSX(r) Discussion" &lt;SPSSX-L@.UGA&gt;


________________________________




How about this?

data list / name(a25).
begin data
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman
end data.

string first middle last (a15).
compute #sp1 = index(name," ").
compute #sp2 = rindex(rtrim(name)," ").
compute first = substr(name,1,#sp1-1).
compute last = substr(name,#sp2+1).
if #sp2 NE #sp1 middle = substr(name,#sp1+1,#sp2-#sp1-1).
list.

OUTPUT:
name �  �  �  �  �  �  �  �  �  �  � first �  �  �  �  �  middle �  �  �  �  � last

Brenda Jones �  �  �  �  �  �  � Brenda �  �  �  �  �  �  �  �  �  �  �  �  � Jones
Chuck Fred Smith �  �  �  �  � Chuck �  �  �  �  �  Fred �  �  �  �  �  � Smith
Alyssa Gwen Mulder �  �  �  � Alyssa �  �  �  �  � Gwen �  �  �  �  �  � Mulder
Steven Patrick Leesman �  � Steven �  �  �  �  � Patrick �  �  �  �  Leesman

Number of cases read: � 4 �  � Number of cases listed: � 4


Maggie wrote
Dear List.

I know this is a simple question, but I've messed around with substr
and
rindex and can't get the syntax to work. Truth be told, I'm not sure
what
I'm doing anyway.

I have a file with 2300 records. The First Middle Last names are in one
variable (name). I want to parse out the first, middle and last into
three
separate variables. Spaces separate (one space) each of the name
components. The name components are of varying lengths.

This is what I have:

Name
Brenda Jones
Chuck Fred Smith
Alyssa Gwen Mulder
Steven Patrick Leesman

This is what I want

First �  Middle �  �  Last
Brenda �  �  �  �  �  �  � Jones
Steven � Patrick �  �  Leesman

Thanks for any help.

Maggie

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


-----
--
Bruce Weaver
bweaver@
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714038.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.


        

      

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tp5714037p5714061.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants