SPSS21, python, regex, spssaux.VariableDict ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SPSS21, python, regex, spssaux.VariableDict ?

Lance Hoffmeyer
Hello all,

I am trying to use python to create a subset of variables in a dataset.  I am trying to do this by using regular expression grouping but am having difficulty and am hoping someone can help out.  I understand I can do this using ADD FILES
which is what I am currently doing.   I am using python more as an exercise to understand how SPSS and python work
together and to also streamline my syntax so that the syntax could work with multiple datasets without cutting and pasting
variable names.

Using regex I am trying to find variables that match this pattern
@1_Original_Count_1
@2_Original_Count_2
@3_Original_Count_3
@4_Original_Count_4
@5_Original_Count_5

and add them to a variable list that I can then submit to SPSS using the "ADD FILES" command
Below is the syntax I have been playing around with.

TIA,

Lance



begin program.
import spss, spssaux, re
#help(spssaux.VariableDict)
varlist = spssaux.VariableDict(pattern=r'@\d+.*Count')
print varlist.Variables

#Find out how many groups there are in the data. The groups are named @1*, @2* ... so find largest numeric value after @
Maxval = 1
for i in varlist.Variables:
   m = re.match(r"@(\d+)_ Original_Count_*",i)
    if m.group(1) > Maxval
        Maxval = m.group(1);

 

# Create a subset of vars containing only variables where group number (@\d+) and index (Count_\d+) are the same
for i in range(len(varlist.Variables)):
   m = re.match(r"@(\d+)_ Original_Count_(\d+)",i)
   if m.group(1) = m.group(2)
      new.varlist = varlist.Variables[i]
print new.varlist


end program.

 

Varlist
I want to keep
@1_Original_Count_1
@2_Original_Count_2
@3_Original_Count_3
@4_Original_Count_4
@5_Original_Count_5
 
Original List
@1_Original_Count_5
@1_Original_Count_4
@1_Original_Count_3
@1_Original_Count_2
@1_Original_Count_1
@4_Original_Count_4
@4_Original_Count_5
@3_Original_Count_1
@3_Original_Count_3
@3_Original_Count_2
@3_Original_Count_5
@3_Original_Count_4
@2_Original_Count_2
@2_Original_Count_3
@2_Original_Count_1
@4_Original_Count_1
@4_Original_Count_2
@4_Original_Count_3
@2_Original_Count_4
@2_Original_Count_5
@5_Original_Count_3
@5_Original_Count_2
@5_Original_Count_1
@5_Original_Count_5
@5_Original_Count_4

 

Reply | Threaded
Open this post in threaded view
|

Re: SPSS21, python, regex, spssaux.VariableDict ?

Albert-Jan Roskam
Hi Lance,

I think you could use a backreference for this purpose:

>>> variables = """@1_Original_Count_5
... @1_Original_Count_4
... @1_Original_Count_3
... @1_Original_Count_2
... @1_Original_Count_1
... @4_Original_Count_4
... @4_Original_Count_5
... @3_Original_Count_1
... @3_Original_Count_3
... @3_Original_Count_2
... @3_Original_Count_5
... @3_Original_Count_4
... @2_Original_Count_2
... @2_Original_Count_3
... @2_Original_Count_1
... @4_Original_Count_1
... @4_Original_Count_2
... @4_Original_Count_3
... @2_Original_Count_4
... @2_Original_Count_5
... @5_Original_Count_3
... @5_Original_Count_2
... @5_Original_Count_1
... @5_Original_Count_5
... @5_Original_Count_4
... """
>>> import re
>>> filter(lambda s: re.match(r"@(\d+).+_\1",  s), variables.split())
['@1_Original_Count_1', '@4_Original_Count_4', '@3_Original_Count_3', '@2_Original_Count_2', '@5_Original_Count_5']

Instead of "filter" you could also use

>>> [s for s in variables.split() if re.match(r"@(\d+).+_\1",  s)]


Instead of this:

>    m = re.match(r"@(\d+)_ Original_Count_*",i)
>     if m.group(1) > Maxval
>         Maxval = m.group(1);

I'd do:

m = re.match(r"@(\d+)_ Original_Count_*",i)
if m:  # or else you might get: AttributeError: 'NoneType' object has no attribute 'group'
    val = int(m.group(i)  # convert it into int
    if val > maxVal:
       print "burps"


Regards,

Albert-Jan




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




----- Original Message -----

> From: Lance Hoffmeyer <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Friday, January 24, 2014 4:27 PM
> Subject: [SPSSX-L] SPSS21, python, regex, spssaux.VariableDict ?
>
> Hello all,
>
> I am trying to use python to create a subset of variables in a dataset.  I
> am trying to do this by using regular expression grouping but am having
> difficulty and am hoping someone can help out.  I understand I can do this
> using ADD FILES
> which is what I am currently doing.   I am using python more as an exercise
> to understand how SPSS and python work
> together and to also streamline my syntax so that the syntax could work with
> multiple datasets without cutting and pasting
> variable names.
>
> Using regex I am trying to find variables that match this pattern
> @1_Original_Count_1
> @2_Original_Count_2
> @3_Original_Count_3
> @4_Original_Count_4
> @5_Original_Count_5
>
> and add them to a variable list that I can then submit to SPSS using the
> "ADD FILES" command
> Below is the syntax I have been playing around with.
>
> TIA,
>
> Lance
>
>
>
> begin program.
> import spss, spssaux, re
> #help(spssaux.VariableDict)
> varlist = spssaux.VariableDict(pattern=r'@\d+.*Count')
> print varlist.Variables
>
> #Find out how many groups there are in the data. The groups are named @1*,
> @2* ... so find largest numeric value after @
> Maxval = 1
> for i in varlist.Variables:
>    m = re.match(r"@(\d+)_ Original_Count_*",i)
>     if m.group(1) > Maxval
>         Maxval = m.group(1);
>
>
>
> # Create a subset of vars containing only variables where group number
> (@\d+) and index (Count_\d+) are the same
> for i in range(len(varlist.Variables)):
>    m = re.match(r"@(\d+)_ Original_Count_(\d+)",i)
>    if m.group(1) = m.group(2)
>       new.varlist = varlist.Variables[i]
> print new.varlist
>
>
> end program.
>
>
>
> Varlist
> I want to keep
> @1_Original_Count_1
> @2_Original_Count_2
> @3_Original_Count_3
> @4_Original_Count_4
> @5_Original_Count_5
>
> Original List
> @1_Original_Count_5
> @1_Original_Count_4
> @1_Original_Count_3
> @1_Original_Count_2
> @1_Original_Count_1
> @4_Original_Count_4
> @4_Original_Count_5
> @3_Original_Count_1
> @3_Original_Count_3
> @3_Original_Count_2
> @3_Original_Count_5
> @3_Original_Count_4
> @2_Original_Count_2
> @2_Original_Count_3
> @2_Original_Count_1
> @4_Original_Count_1
> @4_Original_Count_2
> @4_Original_Count_3
> @2_Original_Count_4
> @2_Original_Count_5
> @5_Original_Count_3
> @5_Original_Count_2
> @5_Original_Count_1
> @5_Original_Count_5
> @5_Original_Count_4
>
>
>
>
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/SPSS21-python-regex-spssaux-VariableDict-tp5724129.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD