Hello all,
I am trying to use python to create a subset of variables in a dataset. I am trying to do this by using regular expression grouping but am having difficulty and am hoping someone can help out. I understand I can do this using ADD FILES which is what I am currently doing. I am using python more as an exercise to understand how SPSS and python work together and to also streamline my syntax so that the syntax could work with multiple datasets without cutting and pasting variable names. Using regex I am trying to find variables that match this pattern @1_Original_Count_1 @2_Original_Count_2 @3_Original_Count_3 @4_Original_Count_4 @5_Original_Count_5 and add them to a variable list that I can then submit to SPSS using the "ADD FILES" command Below is the syntax I have been playing around with. TIA, Lance begin program. import spss, spssaux, re #help(spssaux.VariableDict) varlist = spssaux.VariableDict(pattern=r'@\d+.*Count') print varlist.Variables #Find out how many groups there are in the data. The groups are named @1*, @2* ... so find largest numeric value after @ Maxval = 1 for i in varlist.Variables: m = re.match(r"@(\d+)_ Original_Count_*",i) if m.group(1) > Maxval Maxval = m.group(1); # Create a subset of vars containing only variables where group number (@\d+) and index (Count_\d+) are the same for i in range(len(varlist.Variables)): m = re.match(r"@(\d+)_ Original_Count_(\d+)",i) if m.group(1) = m.group(2) new.varlist = varlist.Variables[i] print new.varlist end program. Varlist I want to keep @1_Original_Count_1 @2_Original_Count_2 @3_Original_Count_3 @4_Original_Count_4 @5_Original_Count_5 Original List @1_Original_Count_5 @1_Original_Count_4 @1_Original_Count_3 @1_Original_Count_2 @1_Original_Count_1 @4_Original_Count_4 @4_Original_Count_5 @3_Original_Count_1 @3_Original_Count_3 @3_Original_Count_2 @3_Original_Count_5 @3_Original_Count_4 @2_Original_Count_2 @2_Original_Count_3 @2_Original_Count_1 @4_Original_Count_1 @4_Original_Count_2 @4_Original_Count_3 @2_Original_Count_4 @2_Original_Count_5 @5_Original_Count_3 @5_Original_Count_2 @5_Original_Count_1 @5_Original_Count_5 @5_Original_Count_4 |
Hi Lance,
I think you could use a backreference for this purpose: >>> variables = """@1_Original_Count_5 ... @1_Original_Count_4 ... @1_Original_Count_3 ... @1_Original_Count_2 ... @1_Original_Count_1 ... @4_Original_Count_4 ... @4_Original_Count_5 ... @3_Original_Count_1 ... @3_Original_Count_3 ... @3_Original_Count_2 ... @3_Original_Count_5 ... @3_Original_Count_4 ... @2_Original_Count_2 ... @2_Original_Count_3 ... @2_Original_Count_1 ... @4_Original_Count_1 ... @4_Original_Count_2 ... @4_Original_Count_3 ... @2_Original_Count_4 ... @2_Original_Count_5 ... @5_Original_Count_3 ... @5_Original_Count_2 ... @5_Original_Count_1 ... @5_Original_Count_5 ... @5_Original_Count_4 ... """ >>> import re >>> filter(lambda s: re.match(r"@(\d+).+_\1", s), variables.split()) ['@1_Original_Count_1', '@4_Original_Count_4', '@3_Original_Count_3', '@2_Original_Count_2', '@5_Original_Count_5'] Instead of "filter" you could also use >>> [s for s in variables.split() if re.match(r"@(\d+).+_\1", s)] Instead of this: > m = re.match(r"@(\d+)_ Original_Count_*",i) > if m.group(1) > Maxval > Maxval = m.group(1); I'd do: m = re.match(r"@(\d+)_ Original_Count_*",i) if m: # or else you might get: AttributeError: 'NoneType' object has no attribute 'group' val = int(m.group(i) # convert it into int if val > maxVal: print "burps" Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ----- Original Message ----- > From: Lance Hoffmeyer <[hidden email]> > To: [hidden email] > Cc: > Sent: Friday, January 24, 2014 4:27 PM > Subject: [SPSSX-L] SPSS21, python, regex, spssaux.VariableDict ? > > Hello all, > > I am trying to use python to create a subset of variables in a dataset. I > am trying to do this by using regular expression grouping but am having > difficulty and am hoping someone can help out. I understand I can do this > using ADD FILES > which is what I am currently doing. I am using python more as an exercise > to understand how SPSS and python work > together and to also streamline my syntax so that the syntax could work with > multiple datasets without cutting and pasting > variable names. > > Using regex I am trying to find variables that match this pattern > @1_Original_Count_1 > @2_Original_Count_2 > @3_Original_Count_3 > @4_Original_Count_4 > @5_Original_Count_5 > > and add them to a variable list that I can then submit to SPSS using the > "ADD FILES" command > Below is the syntax I have been playing around with. > > TIA, > > Lance > > > > begin program. > import spss, spssaux, re > #help(spssaux.VariableDict) > varlist = spssaux.VariableDict(pattern=r'@\d+.*Count') > print varlist.Variables > > #Find out how many groups there are in the data. The groups are named @1*, > @2* ... so find largest numeric value after @ > Maxval = 1 > for i in varlist.Variables: > m = re.match(r"@(\d+)_ Original_Count_*",i) > if m.group(1) > Maxval > Maxval = m.group(1); > > > > # Create a subset of vars containing only variables where group number > (@\d+) and index (Count_\d+) are the same > for i in range(len(varlist.Variables)): > m = re.match(r"@(\d+)_ Original_Count_(\d+)",i) > if m.group(1) = m.group(2) > new.varlist = varlist.Variables[i] > print new.varlist > > > end program. > > > > Varlist > I want to keep > @1_Original_Count_1 > @2_Original_Count_2 > @3_Original_Count_3 > @4_Original_Count_4 > @5_Original_Count_5 > > Original List > @1_Original_Count_5 > @1_Original_Count_4 > @1_Original_Count_3 > @1_Original_Count_2 > @1_Original_Count_1 > @4_Original_Count_4 > @4_Original_Count_5 > @3_Original_Count_1 > @3_Original_Count_3 > @3_Original_Count_2 > @3_Original_Count_5 > @3_Original_Count_4 > @2_Original_Count_2 > @2_Original_Count_3 > @2_Original_Count_1 > @4_Original_Count_1 > @4_Original_Count_2 > @4_Original_Count_3 > @2_Original_Count_4 > @2_Original_Count_5 > @5_Original_Count_3 > @5_Original_Count_2 > @5_Original_Count_1 > @5_Original_Count_5 > @5_Original_Count_4 > > > > > > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/SPSS21-python-regex-spssaux-VariableDict-tp5724129.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |