SPSSX Discussion

SPSS21, python, regex, spssaux.VariableDict ?

Classic

List

Threaded

2 messages Options

Lance Hoffmeyer

SPSS21, python, regex, spssaux.VariableDict ?

Hello all,

I am trying to use python to create a subset of variables in a dataset. I am trying to do this by using regular expression grouping but am having difficulty and am hoping someone can help out. I understand I can do this using ADD FILES
which is what I am currently doing. I am using python more as an exercise to understand how SPSS and python work
together and to also streamline my syntax so that the syntax could work with multiple datasets without cutting and pasting
variable names.

Using regex I am trying to find variables that match this pattern
@1_Original_Count_1
@2_Original_Count_2
@3_Original_Count_3
@4_Original_Count_4
@5_Original_Count_5

and add them to a variable list that I can then submit to SPSS using the "ADD FILES" command
Below is the syntax I have been playing around with.

TIA,

Lance

begin program.
import spss, spssaux, re
#help(spssaux.VariableDict)
varlist = spssaux.VariableDict(pattern=r'@\d+.*Count')
print varlist.Variables

#Find out how many groups there are in the data. The groups are named @1*, @2* ... so find largest numeric value after @
Maxval = 1
for i in varlist.Variables:
m = re.match(r"@(\d+)_ Original_Count_*",i)
if m.group(1) > Maxval
Maxval = m.group(1);

# Create a subset of vars containing only variables where group number (@\d+) and index (Count_\d+) are the same
for i in range(len(varlist.Variables)):
m = re.match(r"@(\d+)_ Original_Count_(\d+)",i)
if m.group(1) = m.group(2)
new.varlist = varlist.Variables[i]
print new.varlist

end program.

Varlist
I want to keep
@1_Original_Count_1
@2_Original_Count_2
@3_Original_Count_3
@4_Original_Count_4
@5_Original_Count_5

Original List
@1_Original_Count_5
@1_Original_Count_4
@1_Original_Count_3
@1_Original_Count_2
@1_Original_Count_1
@4_Original_Count_4
@4_Original_Count_5
@3_Original_Count_1
@3_Original_Count_3
@3_Original_Count_2
@3_Original_Count_5
@3_Original_Count_4
@2_Original_Count_2
@2_Original_Count_3
@2_Original_Count_1
@4_Original_Count_1
@4_Original_Count_2
@4_Original_Count_3
@2_Original_Count_4
@2_Original_Count_5
@5_Original_Count_3
@5_Original_Count_2
@5_Original_Count_1
@5_Original_Count_5
@5_Original_Count_4

Albert-Jan Roskam

Re: SPSS21, python, regex, spssaux.VariableDict ?

Hi Lance,

I think you could use a backreference for this purpose:

>>> variables = """@1_Original_Count_5
... @1_Original_Count_4
... @1_Original_Count_3
... @1_Original_Count_2
... @1_Original_Count_1
... @4_Original_Count_4
... @4_Original_Count_5
... @3_Original_Count_1
... @3_Original_Count_3
... @3_Original_Count_2
... @3_Original_Count_5
... @3_Original_Count_4
... @2_Original_Count_2
... @2_Original_Count_3
... @2_Original_Count_1
... @4_Original_Count_1
... @4_Original_Count_2
... @4_Original_Count_3
... @2_Original_Count_4
... @2_Original_Count_5
... @5_Original_Count_3
... @5_Original_Count_2
... @5_Original_Count_1
... @5_Original_Count_5
... @5_Original_Count_4
... """
>>> import re
>>> filter(lambda s: re.match(r"@(\d+).+_\1", s), variables.split())
['@1_Original_Count_1', '@4_Original_Count_4', '@3_Original_Count_3', '@2_Original_Count_2', '@5_Original_Count_5']

Instead of "filter" you could also use

>>> [s for s in variables.split() if re.match(r"@(\d+).+_\1", s)]

Instead of this:

> m = re.match(r"@(\d+)_ Original_Count_*",i)
> if m.group(1) > Maxval
> Maxval = m.group(1);

I'd do:

m = re.match(r"@(\d+)_ Original_Count_*",i)
if m: # or else you might get: AttributeError: 'NoneType' object has no attribute 'group'
val = int(m.group(i) # convert it into int
if val > maxVal:
print "burps"

Regards,

Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

----- Original Message -----

> From: Lance Hoffmeyer <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Friday, January 24, 2014 4:27 PM
> Subject: [SPSSX-L] SPSS21, python, regex, spssaux.VariableDict ?
>
> Hello all,
>
> I am trying to use python to create a subset of variables in a dataset. I
> am trying to do this by using regular expression grouping but am having
> difficulty and am hoping someone can help out. I understand I can do this
> using ADD FILES
> which is what I am currently doing. I am using python more as an exercise
> to understand how SPSS and python work
> together and to also streamline my syntax so that the syntax could work with
> multiple datasets without cutting and pasting
> variable names.
>
> Using regex I am trying to find variables that match this pattern
> @1_Original_Count_1
> @2_Original_Count_2
> @3_Original_Count_3
> @4_Original_Count_4
> @5_Original_Count_5
>
> and add them to a variable list that I can then submit to SPSS using the
> "ADD FILES" command
> Below is the syntax I have been playing around with.
>
> TIA,
>
> Lance
>
>
>
> begin program.
> import spss, spssaux, re
> #help(spssaux.VariableDict)
> varlist = spssaux.VariableDict(pattern=r'@\d+.*Count')
> print varlist.Variables
>
> #Find out how many groups there are in the data. The groups are named @1*,
> @2* ... so find largest numeric value after @
> Maxval = 1
> for i in varlist.Variables:
> m = re.match(r"@(\d+)_ Original_Count_*",i)
> if m.group(1) > Maxval
> Maxval = m.group(1);
>
>
>
> # Create a subset of vars containing only variables where group number
> (@\d+) and index (Count_\d+) are the same
> for i in range(len(varlist.Variables)):
> m = re.match(r"@(\d+)_ Original_Count_(\d+)",i)
> if m.group(1) = m.group(2)
> new.varlist = varlist.Variables[i]
> print new.varlist
>
>
> end program.
>
>
>
> Varlist
> I want to keep
> @1_Original_Count_1
> @2_Original_Count_2
> @3_Original_Count_3
> @4_Original_Count_4
> @5_Original_Count_5
>
> Original List
> @1_Original_Count_5
> @1_Original_Count_4
> @1_Original_Count_3
> @1_Original_Count_2
> @1_Original_Count_1
> @4_Original_Count_4
> @4_Original_Count_5
> @3_Original_Count_1
> @3_Original_Count_3
> @3_Original_Count_2
> @3_Original_Count_5
> @3_Original_Count_4
> @2_Original_Count_2
> @2_Original_Count_3
> @2_Original_Count_1
> @4_Original_Count_1
> @4_Original_Count_2
> @4_Original_Count_3
> @2_Original_Count_4
> @2_Original_Count_5
> @5_Original_Count_3
> @5_Original_Count_2
> @5_Original_Count_1
> @5_Original_Count_5
> @5_Original_Count_4
>
>
>
>
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/SPSS21-python-regex-spssaux-VariableDict-tp5724129.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD