advice sought

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

advice sought

Rod Turner
Colleagues
I hope this forum can help.
I am looking to investigate job requirements by carrying out a content analysis of job ads on popular job sites in Australia (eg SEEK, MYCAREER CAREERONE etc). Can anyone recommend suitable software that has been used to do this and also can anyone suggest a convenient way of automatically reading and storing the advertisement's content suitable for later analysis.
 
Thanks
Rod
Reply | Threaded
Open this post in threaded view
|

Re: advice sought

Björn Türoque
Rod,

To do what what you are thinking is quite a large undertaking, it is going
to require a lot of time to do all of that content analysis, and get
everything set up to analyze. I think your search for a quick way to do the
analysis will lead you down one of two paths either creating a custom
program, or putting a great deal of man hours towards your work.

If you don't mind investing the time, there are some programs out there that
can be helpful in storing web pages for analysis and coding at a later date.
The one program I would recommend is called website Extractor. It goes on to
web pages and archives the entire web page onto your hard drive for offline
viewing. This way you can go on to the web at regular intervals download the
conent of the page, and spend time coding your findings at a later date.

In terms of analyzing the text of the advertisements, you can copy the text
off of the web pages and store it in a program such as "SPSS - Text
Analysis" or NVIVO. These programs do not entirely do all of the work for
you, but they can speed along the process. Even using these programs you
have to read all of your entries to ensure they are classified correctly and
to categorize them into categories not detected by the text analyzer. Do not
forget that in content analysis a lot of the information you take away not
only comes from the number of times individual words are used, but also the
tone of the piece and how it is positioned.

One final question, Are you sure content analysis is the best way to
go? Perhaps if you let us know what your research goals are, we can come up
with alternative methodologies for you and your research partners to
employ. Notice how I assume you have research partners working on this
project with you... make sure that in your research design you find a way to
measure inter rater reliability, that is something the list can
definitely help you with.


Don

On 7/15/07, Rod Turner <[hidden email]> wrote:

>
> Colleagues
> I hope this forum can help.
> I am looking to investigate job requirements by carrying out a content
> analysis of job ads on popular job sites in Australia (eg SEEK, MYCAREER
> CAREERONE etc). Can anyone recommend suitable software that has been used to
> do this and also can anyone suggest a convenient way of automatically
> reading and storing the advertisement's content suitable for later analysis.
>
>
> Thanks
> Rod
>
Reply | Threaded
Open this post in threaded view
|

Breaking up string variables

Matthew Reeder
Hi all,

  Easy question. I have a string variable, we'll call "location", in the form of "Ithica-NY-US." I need to extract the city component of this variable to create a string "city" variable, so, for present purposes, all I would really be concerned with is Ithica.

  I tried the following (below), which uses RTRIM, figuring that I could just conveniently "trim away" from the right. First step was to compute citystat (e.g., Ithica-NY), thinking that the RTRIM function may only trim off from the first hyphen on the right. Since I wanted to get only city, I would then have to compute the city variable by doing the RTRIM function again, but on citystat  But it doesn't quite do the job (actually, it doesn't do much of anything), so I'm likely either using RTRIM incorrectly or I need to use a different function. Seems like a pretty routine procedure, so I took a quick browse through the user's guide, but couldn't find much. What is the easiest way to do this?

  STRING citystat (A20) .
  COMPUTE citystat=RTRIM(location, "-").
  execute .

  STRING city (A20) .
  COMPUTE city=RTRIM(citystat, "-").
  execute .


  Thanks,

  - Matt


---------------------------------
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us.
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Marta Garcia-Granero
Hi Matthew:

Try this:

* Sample dataset*.
DATA LIST LIST/location(A20).
BEGIN DATA
"Ithica-NY-US."
END DATA.

STRING city (A20) .
COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
LIST.

HTH,
Marta

>   Easy question. I have a string variable, we'll call "location", in the form of "Ithica-NY-US." I need to extract the city component of this variable to create a string "city" variable, so, for present purposes, all I would really be concerned with is Ithica.
>
>   I tried the following (below), which uses RTRIM, figuring that I could just conveniently "trim away" from the right. First step was to compute citystat (e.g., Ithica-NY), thinking that the RTRIM function may only trim off from the first hyphen on the right. Since I wanted to get only city, I would then have to compute the city variable by doing the RTRIM function again, but on citystat  But it doesn't quite do the job (actually, it doesn't do much of anything), so I'm likely either using RTRIM incorrectly or I need to use a different function. Seems like a pretty routine procedure, so I took a quick browse through the user's guide, but couldn't find much. What is the easiest way to do this?
>
>   STRING citystat (A20) .
>   COMPUTE citystat=RTRIM(location, "-").
>   execute .
>
>   STRING city (A20) .
>   COMPUTE city=RTRIM(citystat, "-").
>   execute .
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Marks, Jim
Watch out for "Wilkes-Barre-PA-US"

:~)

--jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta Garcia-Granero
Sent: Wednesday, July 18, 2007 2:41 AM
To: [hidden email]
Subject: Re: Breaking up string variables

Hi Matthew:

Try this:

* Sample dataset*.
DATA LIST LIST/location(A20).
BEGIN DATA
"Ithica-NY-US."
END DATA.

STRING city (A20) .
COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
LIST.

HTH,
Marta
>   Easy question. I have a string variable, we'll call "location", in
the form of "Ithica-NY-US." I need to extract the city component of this
variable to create a string "city" variable, so, for present purposes,
all I would really be concerned with is Ithica.
>
>   I tried the following (below), which uses RTRIM, figuring that I
could just conveniently "trim away" from the right. First step was to
compute citystat (e.g., Ithica-NY), thinking that the RTRIM function may
only trim off from the first hyphen on the right. Since I wanted to get
only city, I would then have to compute the city variable by doing the
RTRIM function again, but on citystat  But it doesn't quite do the job
(actually, it doesn't do much of anything), so I'm likely either using
RTRIM incorrectly or I need to use a different function. Seems like a
pretty routine procedure, so I took a quick browse through the user's
guide, but couldn't find much. What is the easiest way to do this?

>
>   STRING citystat (A20) .
>   COMPUTE citystat=RTRIM(location, "-").
>   execute .
>
>   STRING city (A20) .
>   COMPUTE city=RTRIM(citystat, "-").
>   execute .
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Marta Garcia-Granero
Marks, Jim escribió:
> Watch out for "Wilkes-Barre-PA-US"
>
>
Thanks for pointing it out. How about this then?:

* Sample dataset*.
DATA LIST LIST/location(A20).
BEGIN DATA
"Ithica-NY-US."
"Wilkes-Barre-PA-US"
END DATA.

STRING #step city (A20).
* In two steps *.
COMPUTE #step = SUBSTR(location,1,RINDEX(location,"-")-1).
COMPUTE city = SUBSTR(#step,1,RINDEX(#step,"-")-1).
LIST.


> --jim
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Marta Garcia-Granero
> Sent: Wednesday, July 18, 2007 2:41 AM
> To: [hidden email]
> Subject: Re: Breaking up string variables
>
> Hi Matthew:
>
> Try this:
>
> * Sample dataset*.
> DATA LIST LIST/location(A20).
> BEGIN DATA
> "Ithica-NY-US."
> END DATA.
>
> STRING city (A20) .
> COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
> LIST.
>
> HTH,
> Marta
>
>>   Easy question. I have a string variable, we'll call "location", in
>>
> the form of "Ithica-NY-US." I need to extract the city component of this
> variable to create a string "city" variable, so, for present purposes,
> all I would really be concerned with is Ithica.
>
>>   I tried the following (below), which uses RTRIM, figuring that I
>>
> could just conveniently "trim away" from the right. First step was to
> compute citystat (e.g., Ithica-NY), thinking that the RTRIM function may
> only trim off from the first hyphen on the right. Since I wanted to get
> only city, I would then have to compute the city variable by doing the
> RTRIM function again, but on citystat  But it doesn't quite do the job
> (actually, it doesn't do much of anything), so I'm likely either using
> RTRIM incorrectly or I need to use a different function. Seems like a
> pretty routine procedure, so I took a quick browse through the user's
> guide, but couldn't find much. What is the easiest way to do this?
>
>>   STRING citystat (A20) .
>>   COMPUTE citystat=RTRIM(location, "-").
>>   execute .
>>
>>   STRING city (A20) .
>>   COMPUTE city=RTRIM(citystat, "-").
>>   execute .
>>
>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Peck, Jon
In reply to this post by Marks, Jim
Here is a little Python code to do this job - and it allows for Wilkes-Barre.  Explanation follows the code.  This approach does not use regular expressions, since the problem is so simple, but a regular expression could be used to do a much more complex extraction.

This code assumes that the values always have the specified form.  It would need a little modification to handle blank or more irregular values.


data list free/cityStateCountry(a30).
begin data.
ithaca-NY-US
chicago-IL-US
wilkes-barre-pa-ca
end data.

begin program.
import spss
from spssdata import *

cursor = Spssdata(indexes='cityStateCountry', accessType='w')
cursor.append(vdef("city", vtype=20))
cursor.commitdict()

for case in cursor:
        city = "-".join(case.cityStateCountry.split("-")[:-2])
        cursor.casevalues([city])
cursor.CClose()
end program.


- The cursor = line gets access to the cityStateCountry variable in the active dataset and specifies write mode.
- cursor.append defines a new string variable named city with a width of 20  (A20).
- cursor.commitdict gets ready to pass the data.
- the for case.. line loops over the data.
- city = gets the city value out of the cityStateCountry variable.  It first splits up the variable at each "-" into a list of values.  Then it takes all but the last two items (the state and country) and joins the items back together with the "-" again.  That's how it accommodates Wilkes-Barre.
- cursor.casevalues passes the city value to SPSS
- cursor.CClose ends the access to the data.

This code requires SPSS 14.0.1 or later, Python, the Python programmability plug-in, and a few modules from SPSS Developer Central (www.spss.com/devcentral).

HTH.
Jon Peck



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marks, Jim
Sent: Wednesday, July 18, 2007 6:15 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Breaking up string variables

Watch out for "Wilkes-Barre-PA-US"

:~)

--jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta Garcia-Granero
Sent: Wednesday, July 18, 2007 2:41 AM
To: [hidden email]
Subject: Re: Breaking up string variables

Hi Matthew:

Try this:

* Sample dataset*.
DATA LIST LIST/location(A20).
BEGIN DATA
"Ithica-NY-US."
END DATA.

STRING city (A20) .
COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
LIST.

HTH,
Marta
>   Easy question. I have a string variable, we'll call "location", in
the form of "Ithica-NY-US." I need to extract the city component of this
variable to create a string "city" variable, so, for present purposes,
all I would really be concerned with is Ithica.
>
>   I tried the following (below), which uses RTRIM, figuring that I
could just conveniently "trim away" from the right. First step was to
compute citystat (e.g., Ithica-NY), thinking that the RTRIM function may
only trim off from the first hyphen on the right. Since I wanted to get
only city, I would then have to compute the city variable by doing the
RTRIM function again, but on citystat  But it doesn't quite do the job
(actually, it doesn't do much of anything), so I'm likely either using
RTRIM incorrectly or I need to use a different function. Seems like a
pretty routine procedure, so I took a quick browse through the user's
guide, but couldn't find much. What is the easiest way to do this?

>
>   STRING citystat (A20) .
>   COMPUTE citystat=RTRIM(location, "-").
>   execute .
>
>   STRING city (A20) .
>   COMPUTE city=RTRIM(citystat, "-").
>   execute .
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Peck, Jon
In reply to this post by Marta Garcia-Granero
Well, what about
aix-en-provence-pa-us
?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta Garcia-Granero
Sent: Thursday, July 19, 2007 3:46 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Breaking up string variables

Marks, Jim escribió:
> Watch out for "Wilkes-Barre-PA-US"
>
>
Thanks for pointing it out. How about this then?:

* Sample dataset*.
DATA LIST LIST/location(A20).
BEGIN DATA
"Ithica-NY-US."
"Wilkes-Barre-PA-US"
END DATA.

STRING #step city (A20).
* In two steps *.
COMPUTE #step = SUBSTR(location,1,RINDEX(location,"-")-1).
COMPUTE city = SUBSTR(#step,1,RINDEX(#step,"-")-1).
LIST.


> --jim
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Marta Garcia-Granero
> Sent: Wednesday, July 18, 2007 2:41 AM
> To: [hidden email]
> Subject: Re: Breaking up string variables
>
> Hi Matthew:
>
> Try this:
>
> * Sample dataset*.
> DATA LIST LIST/location(A20).
> BEGIN DATA
> "Ithica-NY-US."
> END DATA.
>
> STRING city (A20) .
> COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
> LIST.
>
> HTH,
> Marta
>
>>   Easy question. I have a string variable, we'll call "location", in
>>
> the form of "Ithica-NY-US." I need to extract the city component of this
> variable to create a string "city" variable, so, for present purposes,
> all I would really be concerned with is Ithica.
>
>>   I tried the following (below), which uses RTRIM, figuring that I
>>
> could just conveniently "trim away" from the right. First step was to
> compute citystat (e.g., Ithica-NY), thinking that the RTRIM function may
> only trim off from the first hyphen on the right. Since I wanted to get
> only city, I would then have to compute the city variable by doing the
> RTRIM function again, but on citystat  But it doesn't quite do the job
> (actually, it doesn't do much of anything), so I'm likely either using
> RTRIM incorrectly or I need to use a different function. Seems like a
> pretty routine procedure, so I took a quick browse through the user's
> guide, but couldn't find much. What is the easiest way to do this?
>
>>   STRING citystat (A20) .
>>   COMPUTE citystat=RTRIM(location, "-").
>>   execute .
>>
>>   STRING city (A20) .
>>   COMPUTE city=RTRIM(citystat, "-").
>>   execute .
>>
>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Melissa Ives
In reply to this post by Matthew Reeder
It may be better to work from the right, rather than from the left to find the characters prior to the 2nd to last hyphen.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon
Sent: Thursday, July 19, 2007 12:41 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Breaking up string variables

Well, what about
aix-en-provence-pa-us
?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta Garcia-Granero
Sent: Thursday, July 19, 2007 3:46 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Breaking up string variables

Marks, Jim escribió:
> Watch out for "Wilkes-Barre-PA-US"
>
>
Thanks for pointing it out. How about this then?:

* Sample dataset*.
DATA LIST LIST/location(A20).
BEGIN DATA
"Ithica-NY-US."
"Wilkes-Barre-PA-US"
END DATA.

STRING #step city (A20).
* In two steps *.
COMPUTE #step = SUBSTR(location,1,RINDEX(location,"-")-1).
COMPUTE city = SUBSTR(#step,1,RINDEX(#step,"-")-1).
LIST.


> --jim
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
> Of Marta Garcia-Granero
> Sent: Wednesday, July 18, 2007 2:41 AM
> To: [hidden email]
> Subject: Re: Breaking up string variables
>
> Hi Matthew:
>
> Try this:
>
> * Sample dataset*.
> DATA LIST LIST/location(A20).
> BEGIN DATA
> "Ithica-NY-US."
> END DATA.
>
> STRING city (A20) .
> COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
> LIST.
>
> HTH,
> Marta
>
>>   Easy question. I have a string variable, we'll call "location", in
>>
> the form of "Ithica-NY-US." I need to extract the city component of
> this variable to create a string "city" variable, so, for present
> purposes, all I would really be concerned with is Ithica.
>
>>   I tried the following (below), which uses RTRIM, figuring that I
>>
> could just conveniently "trim away" from the right. First step was to
> compute citystat (e.g., Ithica-NY), thinking that the RTRIM function
> may only trim off from the first hyphen on the right. Since I wanted
> to get only city, I would then have to compute the city variable by
> doing the RTRIM function again, but on citystat  But it doesn't quite
> do the job (actually, it doesn't do much of anything), so I'm likely
> either using RTRIM incorrectly or I need to use a different function.
> Seems like a pretty routine procedure, so I took a quick browse
> through the user's guide, but couldn't find much. What is the easiest way to do this?
>
>>   STRING citystat (A20) .
>>   COMPUTE citystat=RTRIM(location, "-").
>>   execute .
>>
>>   STRING city (A20) .
>>   COMPUTE city=RTRIM(citystat, "-").
>>   execute .
>>
>>
>>
>>
>
>


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Melissa Ives
In reply to this post by Matthew Reeder
Or...if the last 6 characters are always the ones to drop (2 digit state and 2 digit country codes plus 2 hyphens)
You could do something like
        compute city=substr(location,1,length(location)-6))

Then it never matters how many hyphens there are within the city name.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Melissa Ives
Sent: Thursday, July 19, 2007 1:10 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Breaking up string variables

It may be better to work from the right, rather than from the left to find the characters prior to the 2nd to last hyphen.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon
Sent: Thursday, July 19, 2007 12:41 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Breaking up string variables

Well, what about
aix-en-provence-pa-us
?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta Garcia-Granero
Sent: Thursday, July 19, 2007 3:46 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Breaking up string variables

Marks, Jim escribió:
> Watch out for "Wilkes-Barre-PA-US"
>
>
Thanks for pointing it out. How about this then?:

* Sample dataset*.
DATA LIST LIST/location(A20).
BEGIN DATA
"Ithica-NY-US."
"Wilkes-Barre-PA-US"
END DATA.

STRING #step city (A20).
* In two steps *.
COMPUTE #step = SUBSTR(location,1,RINDEX(location,"-")-1).
COMPUTE city = SUBSTR(#step,1,RINDEX(#step,"-")-1).
LIST.


> --jim
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
> Of Marta Garcia-Granero
> Sent: Wednesday, July 18, 2007 2:41 AM
> To: [hidden email]
> Subject: Re: Breaking up string variables
>
> Hi Matthew:
>
> Try this:
>
> * Sample dataset*.
> DATA LIST LIST/location(A20).
> BEGIN DATA
> "Ithica-NY-US."
> END DATA.
>
> STRING city (A20) .
> COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
> LIST.
>
> HTH,
> Marta
>
>>   Easy question. I have a string variable, we'll call "location", in
>>
> the form of "Ithica-NY-US." I need to extract the city component of
> this variable to create a string "city" variable, so, for present
> purposes, all I would really be concerned with is Ithica.
>
>>   I tried the following (below), which uses RTRIM, figuring that I
>>
> could just conveniently "trim away" from the right. First step was to
> compute citystat (e.g., Ithica-NY), thinking that the RTRIM function
> may only trim off from the first hyphen on the right. Since I wanted
> to get only city, I would then have to compute the city variable by
> doing the RTRIM function again, but on citystat  But it doesn't quite
> do the job (actually, it doesn't do much of anything), so I'm likely
> either using RTRIM incorrectly or I need to use a different function.
> Seems like a pretty routine procedure, so I took a quick browse
> through the user's guide, but couldn't find much. What is the easiest way to do this?
>
>>   STRING citystat (A20) .
>>   COMPUTE citystat=RTRIM(location, "-").
>>   execute .
>>
>>   STRING city (A20) .
>>   COMPUTE city=RTRIM(citystat, "-").
>>   execute .
>>
>>
>>
>>
>
>


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Hal 9000
In reply to this post by Peck, Jon
On 7/19/07, Hal 9000 <[hidden email]> wrote:

>
> Haha, the obvious solution is to work backwards from the 2 hyphens
> seperating state/country. There are no hyphenated states, right?
>
> On 7/19/07, Peck, Jon <[hidden email]> wrote:
> >
> > Well, what about
> > aix-en-provence-pa-us
> > ?
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion [mailto: [hidden email]] On Behalf
> > Of Marta Garcia-Granero
> > Sent: Thursday, July 19, 2007 3:46 AM
> > To: [hidden email]
> > Subject: Re: [SPSSX-L] Breaking up string variables
> >
> > Marks, Jim escribió:
> > > Watch out for "Wilkes-Barre-PA-US"
> > >
> > >
> > Thanks for pointing it out. How about this then?:
> >
> > * Sample dataset*.
> > DATA LIST LIST/location(A20).
> > BEGIN DATA
> > "Ithica-NY-US."
> > "Wilkes-Barre-PA-US"
> > END DATA.
> >
> > STRING #step city (A20).
> > * In two steps *.
> > COMPUTE #step = SUBSTR(location,1,RINDEX(location,"-")-1).
> > COMPUTE city = SUBSTR(#step,1,RINDEX(#step,"-")-1).
> > LIST.
> >
> >
> > > --jim
> > >
> > > -----Original Message-----
> > > From: SPSSX(r) Discussion [mailto: [hidden email]] On Behalf
> > Of
> > > Marta Garcia-Granero
> > > Sent: Wednesday, July 18, 2007 2:41 AM
> > > To: [hidden email]
> > > Subject: Re: Breaking up string variables
> > >
> > > Hi Matthew:
> > >
> > > Try this:
> > >
> > > * Sample dataset*.
> > > DATA LIST LIST/location(A20).
> > > BEGIN DATA
> > > "Ithica-NY-US."
> > > END DATA.
> > >
> > > STRING city (A20) .
> > > COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
> > > LIST.
> > >
> > > HTH,
> > > Marta
> > >
> > >>   Easy question. I have a string variable, we'll call "location", in
> > >>
> > > the form of "Ithica-NY-US." I need to extract the city component of
> > this
> > > variable to create a string "city" variable, so, for present purposes,
> > > all I would really be concerned with is Ithica.
> > >
> > >>   I tried the following (below), which uses RTRIM, figuring that I
> > >>
> > > could just conveniently "trim away" from the right. First step was to
> > > compute citystat (e.g., Ithica-NY), thinking that the RTRIM function
> > may
> > > only trim off from the first hyphen on the right. Since I wanted to
> > get
> > > only city, I would then have to compute the city variable by doing the
> > > RTRIM function again, but on citystat  But it doesn't quite do the job
> >
> > > (actually, it doesn't do much of anything), so I'm likely either using
> > > RTRIM incorrectly or I need to use a different function. Seems like a
> > > pretty routine procedure, so I took a quick browse through the user's
> > > guide, but couldn't find much. What is the easiest way to do this?
> > >
> > >>   STRING citystat (A20) .
> > >>   COMPUTE citystat=RTRIM(location, "-").
> > >>   execute .
> > >>
> > >>   STRING city (A20) .
> > >>   COMPUTE city=RTRIM(citystat, "-").
> > >>   execute .
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Marta Garcia-Granero
In reply to this post by Melissa Ives
Melissa Ives escribió:
> It may be better to work from the right, rather than from the left to find the characters prior to the 2nd to last hyphen.
>

That's exactly what my solution does (uses RINDEX instead of INDEX).

Marta

> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon
> Sent: Thursday, July 19, 2007 12:41 PM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Breaking up string variables
>
> Well, what about
> aix-en-provence-pa-us
> ?
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta Garcia-Granero
> Sent: Thursday, July 19, 2007 3:46 AM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Breaking up string variables
>
> Marks, Jim escribió:
>
>> Watch out for "Wilkes-Barre-PA-US"
>>
>>
>>
> Thanks for pointing it out. How about this then?:
>
> * Sample dataset*.
> DATA LIST LIST/location(A20).
> BEGIN DATA
> "Ithica-NY-US."
> "Wilkes-Barre-PA-US"
> END DATA.
>
> STRING #step city (A20).
> * In two steps *.
> COMPUTE #step = SUBSTR(location,1,RINDEX(location,"-")-1).
> COMPUTE city = SUBSTR(#step,1,RINDEX(#step,"-")-1).
> LIST.
>
>
>
>> --jim
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
>> Of Marta Garcia-Granero
>> Sent: Wednesday, July 18, 2007 2:41 AM
>> To: [hidden email]
>> Subject: Re: Breaking up string variables
>>
>> Hi Matthew:
>>
>> Try this:
>>
>> * Sample dataset*.
>> DATA LIST LIST/location(A20).
>> BEGIN DATA
>> "Ithica-NY-US."
>> END DATA.
>>
>> STRING city (A20) .
>> COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
>> LIST.
>>
>> HTH,
>> Marta
>>
>>
>>>   Easy question. I have a string variable, we'll call "location", in
>>>
>>>
>> the form of "Ithica-NY-US." I need to extract the city component of
>> this variable to create a string "city" variable, so, for present
>> purposes, all I would really be concerned with is Ithica.
>>
>>
>>>   I tried the following (below), which uses RTRIM, figuring that I
>>>
>>>
>> could just conveniently "trim away" from the right. First step was to
>> compute citystat (e.g., Ithica-NY), thinking that the RTRIM function
>> may only trim off from the first hyphen on the right. Since I wanted
>> to get only city, I would then have to compute the city variable by
>> doing the RTRIM function again, but on citystat  But it doesn't quite
>> do the job (actually, it doesn't do much of anything), so I'm likely
>> either using RTRIM incorrectly or I need to use a different function.
>> Seems like a pretty routine procedure, so I took a quick browse
>> through the user's guide, but couldn't find much. What is the easiest way to do this?
>>
>>
>>>   STRING citystat (A20) .
>>>   COMPUTE citystat=RTRIM(location, "-").
>>>   execute .
>>>
>>>   STRING city (A20) .
>>>   COMPUTE city=RTRIM(citystat, "-").
>>>   execute .
>>>
>>>
>>>
>>>
>>>
>>
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Breaking up string variables

Marta Garcia-Granero
In reply to this post by Peck, Jon
Peck, Jon escribió:
> Well, what about
> aix-en-provence-pa-us
> ?
>

No problem with the solution I provided (using RINDEX) since it works in
two steps, eliminating country first, and then state, leaving the city
untouched (even if it has a bunch of hyphens)

Regards,
Marta

> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta Garcia-Granero
> Sent: Thursday, July 19, 2007 3:46 AM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Breaking up string variables
>
> Marks, Jim escribió:
>
>> Watch out for "Wilkes-Barre-PA-US"
>>
>>
>>
> Thanks for pointing it out. How about this then?:
>
> * Sample dataset*.
> DATA LIST LIST/location(A20).
> BEGIN DATA
> "Ithica-NY-US."
> "Wilkes-Barre-PA-US"
> END DATA.
>
> STRING #step city (A20).
> * In two steps *.
> COMPUTE #step = SUBSTR(location,1,RINDEX(location,"-")-1).
> COMPUTE city = SUBSTR(#step,1,RINDEX(#step,"-")-1).
> LIST.
>
>
>
>> --jim
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>> Marta Garcia-Granero
>> Sent: Wednesday, July 18, 2007 2:41 AM
>> To: [hidden email]
>> Subject: Re: Breaking up string variables
>>
>> Hi Matthew:
>>
>> Try this:
>>
>> * Sample dataset*.
>> DATA LIST LIST/location(A20).
>> BEGIN DATA
>> "Ithica-NY-US."
>> END DATA.
>>
>> STRING city (A20) .
>> COMPUTE city=SUBSTR(location,1,INDEX(location,"-")-1).
>> LIST.
>>
>> HTH,
>> Marta
>>
>>
>>>   Easy question. I have a string variable, we'll call "location", in
>>>
>>>
>> the form of "Ithica-NY-US." I need to extract the city component of this
>> variable to create a string "city" variable, so, for present purposes,
>> all I would really be concerned with is Ithica.
>>
>>
>>>   I tried the following (below), which uses RTRIM, figuring that I
>>>
>>>
>> could just conveniently "trim away" from the right. First step was to
>> compute citystat (e.g., Ithica-NY), thinking that the RTRIM function may
>> only trim off from the first hyphen on the right. Since I wanted to get
>> only city, I would then have to compute the city variable by doing the
>> RTRIM function again, but on citystat  But it doesn't quite do the job
>> (actually, it doesn't do much of anything), so I'm likely either using
>> RTRIM incorrectly or I need to use a different function. Seems like a
>> pretty routine procedure, so I took a quick browse through the user's
>> guide, but couldn't find much. What is the easiest way to do this?
>>
>>
>>>   STRING citystat (A20) .
>>>   COMPUTE citystat=RTRIM(location, "-").
>>>   execute .
>>>
>>>   STRING city (A20) .
>>>   COMPUTE city=RTRIM(citystat, "-").
>>>   execute .
>>>
>>>
>>>
>>>
>>>
>>