Rename variable if it has a certain substring

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Rename variable if it has a certain substring

Fiveja
I have several SPSS datasets. One has a numeric variable called "SalesFigures_Car," the other "SalesFigures_House," etc. The name of these variables all start with the same substring, "SalesFigures_"

I'd like to rename any variable that starts with "SalesFigures_" to simply "SalesFigures."

I suspect the code would take the form of something like:

do if 'variable starts with the string, 'SalesFigures_'
rename variables (SalesFigures_XXX=SalesFigures).
end if.

But maybe I'm way off. Any suggestions?
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Jignesh Sutar
Yes, way off! What you are asking is impossible to have two different variables with the same name? Variable names need to be unique, it a basic principle of database management.



Is it that  "SalesFigures_Car" and "SalesFigures_House" are mutually exclusive ie. for each row where there is data for Car there is not for House and vice versa. If so, then you may be thinking the need to combine them into a single variable?

For which you could do:

compute SalesFigure=sum(SalesFigures_Car,SalesFigures_House).

Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Fiveja
Jignesh Sutar wrote
Yes, way off! What you are asking is impossible to have two different variables with the same name? Variable names need to be unique, it a basic principle of database management.
To clarify, these variables are in separate datasets:

"I have several SPSS datasets. One has a numeric variable called "SalesFigures_Car," the other "SalesFigures_House," etc."

I will be opening these separately and performing the rename.


Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Fiveja
I found some python code that gets me close (From http://spssx-discussion.1045642.n5.nabble.com/renaming-variables-deleting-the-first-part-of-the-name-td1071057.html).

The code below finds any variable prefixed with "SalesFigures_" (it will only find one).
It then creates a copy of the SPSS variable dictionary including only variables whose name fits this pattern (There will only be one variable in each dataset that fits this pattern).
It then loops over the variable (loop not really needed) and creates a new name, deleting the pattern and submitting the rename commands.

It will turn SalesFigures_Cars into SalesFiguresCars. Obviously not what I want but I think I'm on the right track. Just need to have it replace whatever is *after* the "_" with a blank ("").

begin program.
import spss, spssaux, re

pat="^[SalesFigures]+_"
vard = spssaux.VariableDict(pattern=pat)
for v in vard:
        newname = re.sub(pat, "SalesFigures", str(v))
        spss.Submit("RENAME VARIABLES (%s=%s)" % (v, newname))
end program.
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Bruce Weaver
Administrator
In reply to this post by Fiveja

* Old school approach using a macro with SET ERRORS = NONE.

DEFINE !Rename (
 DataSetList = !CHAREND("/") /
 SuffixList = !CMDEND)
SET ERRORS=NONE.
!DO !D !in(!DataSetList)
 DATASET ACTIVATE !D.
 !DO !S !in(!SuffixList)
  RENAME VARIABLES (!CONCAT("SalesFigures_",!S) = SalesFigures).
 !DOEND
!DOEND
SET ERRORS=LISTING.
!ENDDEFINE.

* Generate 3 datasets with different SalesFigures variables.
NEW FILE.
DATASET CLOSE all.

DATA LIST free / SalesFigures_Car (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d1.

DATA LIST free / SalesFigures_House (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d2.

DATA LIST free / SalesFigures_Boat (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d3.

***************************************.
* Call the macro to rename all of the variables.
SET MPRINT ON.
!Rename
 DataSetList = d1 d2 d3 /
 SuffixList = Car House Boat.
SET MPRINT OFF.
***************************************.

* Check that it worked.
* Stack the 3 datasets and LIST.
ADD FILES
 FILE = d1 / IN = d1 /
 FILE = d2 / IN = d2 /  
 FILE = d3 / IN = d3 .
EXECUTE.
DATASET NAME AllData.
DATASET ACTIVATE AllData.
* Close the original 3 datasets.
DATASET CLOSE all.
LIST.
* Variable name was changed to SalesFigures in all datasets.
* No error messages were displayed.

OUTPUT from LIST:

SalesFigures d1 d2 d3
 
      1       1  0  0
      2       1  0  0
      3       1  0  0
      4       1  0  0
      5       1  0  0
      1       0  1  0
      2       0  1  0
      3       0  1  0
      4       0  1  0
      5       0  1  0
      1       0  0  1
      2       0  0  1
      3       0  0  1
      4       0  0  1
      5       0  0  1

Number of cases read:  15    Number of cases listed:  15


Fiveja wrote
Jignesh Sutar wrote
Yes, way off! What you are asking is impossible to have two different variables with the same name? Variable names need to be unique, it a basic principle of database management.
To clarify, these variables are in separate datasets:

"I have several SPSS datasets. One has a numeric variable called "SalesFigures_Car," the other "SalesFigures_House," etc."

I will be opening these separately and performing the rename.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

David Marso
Administrator
Me thinks this be a task best left to a python program (or some flavor of the 'horrible hack').
I'm not writing any code for you but search this group for SPSSAux and dig around for the dirty bits in the docs.

Bruce Weaver wrote
* Old school approach using a macro with SET ERRORS = NONE.

DEFINE !Rename (
 DataSetList = !CHAREND("/") /
 SuffixList = !CMDEND)
SET ERRORS=NONE.
!DO !D !in(!DataSetList)
 DATASET ACTIVATE !D.
 !DO !S !in(!SuffixList)
  RENAME VARIABLES (!CONCAT("SalesFigures_",!S) = SalesFigures).
 !DOEND
!DOEND
SET ERRORS=LISTING.
!ENDDEFINE.

* Generate 3 datasets with different SalesFigures variables.
NEW FILE.
DATASET CLOSE all.

DATA LIST free / SalesFigures_Car (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d1.

DATA LIST free / SalesFigures_House (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d2.

DATA LIST free / SalesFigures_Boat (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d3.

***************************************.
* Call the macro to rename all of the variables.
SET MPRINT ON.
!Rename
 DataSetList = d1 d2 d3 /
 SuffixList = Car House Boat.
SET MPRINT OFF.
***************************************.

* Check that it worked.
* Stack the 3 datasets and LIST.
ADD FILES
 FILE = d1 / IN = d1 /
 FILE = d2 / IN = d2 /  
 FILE = d3 / IN = d3 .
EXECUTE.
DATASET NAME AllData.
DATASET ACTIVATE AllData.
* Close the original 3 datasets.
DATASET CLOSE all.
LIST.
* Variable name was changed to SalesFigures in all datasets.
* No error messages were displayed.

OUTPUT from LIST:

SalesFigures d1 d2 d3
 
      1       1  0  0
      2       1  0  0
      3       1  0  0
      4       1  0  0
      5       1  0  0
      1       0  1  0
      2       0  1  0
      3       0  1  0
      4       0  1  0
      5       0  1  0
      1       0  0  1
      2       0  0  1
      3       0  0  1
      4       0  0  1
      5       0  0  1

Number of cases read:  15    Number of cases listed:  15


Fiveja wrote
Jignesh Sutar wrote
Yes, way off! What you are asking is impossible to have two different variables with the same name? Variable names need to be unique, it a basic principle of database management.
To clarify, these variables are in separate datasets:

"I have several SPSS datasets. One has a numeric variable called "SalesFigures_Car," the other "SalesFigures_House," etc."

I will be opening these separately and performing the rename.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Bruce Weaver
Administrator
Yes, the 'horrible hack' did cross my mind while I was cobbling together that macro.  ;-)

Anyway, it seemed to me if the number of variables to be renamed is not enormous, if all the variable names are known, and if this is not something that needs to be generalized and scaled for future use, the approach used in my macro would be fairly reasonable (or at least not horribly unreasonable).  But as Art K says, YMMV.  :-)


David Marso wrote
Me thinks this be a task best left to a python program (or some flavor of the 'horrible hack').
I'm not writing any code for you but search this group for SPSSAux and dig around for the dirty bits in the docs.

Bruce Weaver wrote
* Old school approach using a macro with SET ERRORS = NONE.

DEFINE !Rename (
 DataSetList = !CHAREND("/") /
 SuffixList = !CMDEND)
SET ERRORS=NONE.
!DO !D !in(!DataSetList)
 DATASET ACTIVATE !D.
 !DO !S !in(!SuffixList)
  RENAME VARIABLES (!CONCAT("SalesFigures_",!S) = SalesFigures).
 !DOEND
!DOEND
SET ERRORS=LISTING.
!ENDDEFINE.

* Generate 3 datasets with different SalesFigures variables.
NEW FILE.
DATASET CLOSE all.

DATA LIST free / SalesFigures_Car (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d1.

DATA LIST free / SalesFigures_House (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d2.

DATA LIST free / SalesFigures_Boat (F1).
BEGIN DATA
1 2 3 4 5
END DATA.
DATASET NAME d3.

***************************************.
* Call the macro to rename all of the variables.
SET MPRINT ON.
!Rename
 DataSetList = d1 d2 d3 /
 SuffixList = Car House Boat.
SET MPRINT OFF.
***************************************.

* Check that it worked.
* Stack the 3 datasets and LIST.
ADD FILES
 FILE = d1 / IN = d1 /
 FILE = d2 / IN = d2 /  
 FILE = d3 / IN = d3 .
EXECUTE.
DATASET NAME AllData.
DATASET ACTIVATE AllData.
* Close the original 3 datasets.
DATASET CLOSE all.
LIST.
* Variable name was changed to SalesFigures in all datasets.
* No error messages were displayed.

OUTPUT from LIST:

SalesFigures d1 d2 d3
 
      1       1  0  0
      2       1  0  0
      3       1  0  0
      4       1  0  0
      5       1  0  0
      1       0  1  0
      2       0  1  0
      3       0  1  0
      4       0  1  0
      5       0  1  0
      1       0  0  1
      2       0  0  1
      3       0  0  1
      4       0  0  1
      5       0  0  1

Number of cases read:  15    Number of cases listed:  15


Fiveja wrote
Jignesh Sutar wrote
Yes, way off! What you are asking is impossible to have two different variables with the same name? Variable names need to be unique, it a basic principle of database management.
To clarify, these variables are in separate datasets:

"I have several SPSS datasets. One has a numeric variable called "SalesFigures_Car," the other "SalesFigures_House," etc."

I will be opening these separately and performing the rename.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Fiveja
I agree Python is best. The python code I suggested uses the SPSSaux.

If I can just get the correct regular expression to find the pattern (i.e., any variable that starts with "SalesFigures" OR any variable with a "_" followed by anything), the Python code should work. I've studied the options
https://docs.python.org/2/library/re.html but can't get it.

I've tried:

* strategy - detect the "_" and anything after it, with the goal of replacing it with ""

pat="_*"
pat="_*?"
pat="_+([a-zA-Z])"

* strategy - detect any variable starting with "SalesFigures" and simply replace the whole variable with "SalesFigures"

pat="SalesFigures+([a-zA-Z])"
pat="^[SalesFigures]+([a-zA-Z])"
pat="^[SalesFigures]+_*"

None of these rename the variable to simply "SalesFigures"

Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Albert-Jan Roskam-3

>>> import re
>>> s = "SalesFigures_foo"
>>> re.sub(r"(\w+)_.+", r"\1", s)
'SalesFigures'

Or easier:

s.split("_")[0]

> Date: Sat, 24 Oct 2015 10:08:52 -0700
> From: heislerkurt@gmail.com
> Subject: Re: [SPSSX-L] Rename variable if it has a certain substring
> To: SPSSX-L@LISTSERV.UGA.EDU
>
> I agree Python is best. The python code I suggested uses the SPSSaux.
>
> If I can just get the correct regular expression to find the pattern (i.e.,
> any variable that starts with "SalesFigures" OR any variable with a "_"
> followed by anything), the Python code should work. I've studied the options
> https://docs.python.org/2/library/re.html but can't get it.
>
> I've tried:
>
> * strategy - detect the "_" and anything after it, with the goal of
> replacing it with ""
>
> pat="_*"
> pat="_*?"
> pat="_+([a-zA-Z])"
>
> * strategy - detect any variable starting with "SalesFigures" and simply
> replace the whole variable with "SalesFigures"
>
> pat="SalesFigures+([a-zA-Z])"
> pat="^[SalesFigures]+([a-zA-Z])"
> pat="^[SalesFigures]+_*"
>
> None of these rename the variable to simply "SalesFigures"
>
>
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Rename-variable-if-it-has-a-certain-substring-tp5730832p5730856.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Jon K Peck
In reply to this post by Fiveja
Try this:

begin program.
import spss, spssaux, re

varname = spssaux.VariableDict(pattern="^SalesFigures.*").variables[0]
if varname:
    spss.Submit("""rename variables (%s=SalesFigures)""" % varname)
end program.

It renames a variable whose names starts with SalesFigures (case sensitive) to be just SalesFigures.
Of course, this will not work if there is more than one.

HTH


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Fiveja <[hidden email]>
To:        [hidden email]
Date:        10/24/2015 11:09 AM
Subject:        Re: [SPSSX-L] Rename variable if it has a certain substring
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I agree Python is best. The python code I suggested uses the SPSSaux.

If I can just get the correct regular expression to find the pattern (i.e.,
any variable that starts with "SalesFigures" OR any variable with a "_"
followed by anything), the Python code should work. I've studied the options
https://docs.python.org/2/library/re.htmlbut can't get it.

I've tried:

* strategy - detect the "_" and anything after it, with the goal of
replacing it with ""

pat="_*"
pat="_*?"
pat="_+([a-zA-Z])"

* strategy - detect any variable starting with "SalesFigures" and simply
replace the whole variable with "SalesFigures"

pat="SalesFigures+([a-zA-Z])"
pat="^[SalesFigures]+([a-zA-Z])"
pat="^[SalesFigures]+_*"

None of these rename the variable to simply "SalesFigures"





--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Rename-variable-if-it-has-a-certain-substring-tp5730832p5730856.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Fiveja
In reply to this post by Albert-Jan Roskam-3
Albert-Jan Roskam-3 wrote
>>> import re
>>> s = "SalesFigures_foo"
>>> re.sub(r"(\w+)_.+", r"\1", s)
'SalesFigures'

Or easier:

s.split("_")[0]
Great! Being new to python, I wasn't sure how to put this in context. For example, where would I insert s.split("_")[0] into my code, or is that suggesting a completely different code? If I used the "s" strategy, would I replace vard = spssaux.VariableDict(pattern=pat) with s = spssaux.VariableDict(pattern=pat)?

With some tinkering I discovered this worked:

begin program.
import spss, spssaux, re
pat=r"(\w+)_.+"
vard = spssaux.VariableDict(pattern=pat)
for v in vard:
        newname = re.sub(pat, "SalesFigures", str(v))
        spss.Submit("RENAME VARIABLES (%s=%s)" % (v, newname))
end program.

I find Python's regular expressions very difficult to master, but want to learn. This is what I gather after some googling:

\w matches any alphanumeric character and the underscore
\1 performs some type of match

What does the "r" do?

Much thanks for this!
Reply | Threaded
Open this post in threaded view
|

Re: Rename variable if it has a certain substring

Fiveja
In reply to this post by Jon K Peck
Jon K Peck wrote
begin program.
import spss, spssaux, re

varname = spssaux.VariableDict(pattern="^SalesFigures.*").variables[0]
if varname:
    spss.Submit("""rename variables (%s=SalesFigures)""" % varname)
end program.
This works, too, and I like that it's intuitive.

Thanks to you both.