Extracting characters from variable name

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Extracting characters from variable name

Fiveja
Every year, Census releases new population estimates for the recent year. The dataset they provide hard codes the year in the variable name (e.g., POPESTIMATE2013).

How I can I extract the year element (e.g., 2013) from the variable name and put that value in a new variable, CensusYear? I can count on the year portion always being 4 characters (e.g., "2013) and always occurring at the 12th character spot (i.e., after "POPESTIMATE").

My understanding is that the char.substr function only extracts characters from data values, not variable names.
Reply | Threaded
Open this post in threaded view
|

Re: Extracting characters from variable name

Jon K Peck
This would only make sense if the year governs the entire dataset, but then what would be the point of putting it in a new variable, where it would be a constant.

If you want to do it, though, it would require a few lines of Python code like this.  It finds the POPESTIMATE<year> variable name, extracts the last four characters, and runs a compute on that value.

begin program.
import spss, spssaux

yearpart = spssaux.VariableDict(pattern="POPESTIMATE\d\d\d\d").variables[0][-4:]
spss.Submit("""Compute CensusYear = %s""" % yearpart)
end program.

begin program.
import spss, spssaux




Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Fiveja <[hidden email]>
To:        [hidden email]
Date:        05/04/2015 08:18 AM
Subject:        [SPSSX-L] Extracting characters from variable name
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Every year, Census releases new population estimates for the recent year. The
dataset they provide hard codes the year in the variable name (e.g.,
POPESTIMATE2013).

How I can I extract the year element (e.g., 2013) from the variable name and
put that value in a new variable, CensusYear? I can count on the year
portion always being 4 characters (e.g., "2013) and always occurring at the
12th character spot (i.e., after "POPESTIMATE").

My understanding is that the char.substr function only extracts characters
from data values, not variable names.



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Extracting-characters-from-variable-name-tp5729482.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Extracting characters from variable name

Maguin, Eugene
In reply to this post by Fiveja
Perhaps you've thought of my suggestion already and rejected it but if not.
If you have single year file with many variables, it seems to me that adding a year variable is kind of trivial, just a compute statement. The big problem is the rename operation. Somebody (and I see that Jon has done so) who knows Python will post code to strip out the number characters from the variable name and, if you use python that is the way to go--a documented, reusable segment of code. The hard work alternative is to use Display to list the variable names, use a text editor to find and replace the year, and then a rename variables command to do the actual rename operation.

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Fiveja
Sent: Monday, May 04, 2015 10:18 AM
To: [hidden email]
Subject: Extracting characters from variable name

Every year, Census releases new population estimates for the recent year. The dataset they provide hard codes the year in the variable name (e.g., POPESTIMATE2013).

How I can I extract the year element (e.g., 2013) from the variable name and put that value in a new variable, CensusYear? I can count on the year portion always being 4 characters (e.g., "2013) and always occurring at the 12th character spot (i.e., after "POPESTIMATE").

My understanding is that the char.substr function only extracts characters from data values, not variable names.



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Extracting-characters-from-variable-name-tp5729482.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Extracting characters from variable name

Fiveja
In reply to this post by Jon K Peck
Thank you, Jon. That worked. I had to add an execute statement for the transformation to run.

The year does govern the entire dataset. The reason for storing the year as a data value is because the cases for this year will later be appended to a file containing data for previous years (with variables for State, Population, CensusYear). It builds a historical data file of all years, which can be distinguished by CensusYear.