SPSSX Discussion

Jon Peck's Python code

Classic

List

Threaded

6 messages Options

John F Hall

Jon Peck's Python code

I just ran Jon’s previous Python code, written to move question numbers to the beginning of var labels in the 2011 British Social Attitudes survey, on a subset from the 2004 survey. This was on a teaching file with only 49 variables as used in Marsh and Elliott “Exploring Data” (2^nd edition, Polity, 2008).

title 'Python code to modify BSA variable labels (Jon Peck, IBM/SPSS, 2013)'.

begin program.

import spss,re

from spssaux import _smartquote

for v in range(spss.GetVariableCount()):

vname = spss.GetVariableName(v)

vlabel = spss.GetVariableLabel(v)

vl = []

# Find the question number and move to front

mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)

if not mo is None:

vl.append("Q." + mo.group(3) + ": ")

vl.append(mo.group(1))

hasq = True

else: # no Q-style question number. Check for multiple questions

hasq = False

mo = re.match(r"(.*)(a2\..*)", vlabel, flags=re.I)

if not mo is None: # multiple q's

vl.append(mo.group(2) + ": ")

vl.append(mo.group(1))

mo = re.match(r"(.*)(b2\..*)", vlabel, flags=re.I)

if not mo is None: # multiple q's

vl.append(mo.group(2) + ": ")

vl.append(mo.group(1))

if len(vl) == 0:

vl.append("")

vl.append(vlabel)

# capitalize first letter of label excluding the Q number

vl[-1] = vl[-1][0].upper() + vl[-1][1:]

# find freestanding "dv"

mo = re.search(r"(.*)(\bdv\b)(.*)", vl[1], flags=re.I)

if not mo is None:

if hasq:

vlabel = vl[0] + "(dv) " + mo.group(1)

else:

if vl[0] != "":

vl[0] = "(dv) " + vl[0]

vlabel = vl[0] + mo.group(1) + mo.group(3)

else:

vlabel = "(dv) " + mo.group(1) + mo.group(3)

else:

vlabel = vl[0] + vl[1]

spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel)))

end program.

The initial file has:

Country: England, Scotland or Wales? Q28

Sex of Respondent Q39

Respondent's age in years Q40

People can be trusted/can't be too careful?A2.13

NS-SEC - long version Q519

Respondent's main economic activity last week? Q539

Terminal education age<categorised> Q766

The Python worked on some:

Q.28: Country: England, Scotland or Wales?

Q.39: Sex of Respondent

Q.40: Respondent's age in years

A2.13: People can be trusted/can't be too careful?

but not others, eg:

NS-SEC - long version Q519

Respondent's main economic activity last week? Q539

Terminal education age<categorised> Q766

Respondent give money to charity how often? B619

Respondent gives how much to charity per year B620

Party political identification (compressed) dv Q211

I can modify the subset it by hand, but the main file has over 800 variables. I’ve tried some clumsy modifications to the Python, but none of them seem to work.

Help!

John F Hall (Mr)

[Retired academic survey researcher]

Email: [hidden email]

Website: www.surveyresearch.weebly.com

SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop

David Marso

Re: Jon Peck's Python code

Administrator

John,
Maybe time for you to do a crash course on regular expressions so you at least understand the code before doing clumsy modifications?
D

John F Hall wrote

I just ran Jon's previous Python code, written to move question numbers to
the beginning of var labels in the 2011 British Social Attitudes survey, on
a subset from the 2004 survey. This was on a teaching file with only 49
variables as used in Marsh and Elliott "Exploring Data" (2nd edition,
Polity, 2008).

title 'Python code to modify BSA variable labels (Jon Peck, IBM/SPSS,
2013)'.
begin program.
import spss,re
from spssaux import _smartquote

for v in range(spss.GetVariableCount()):
vname = spss.GetVariableName(v)
vlabel = spss.GetVariableLabel(v)
vl = []
# Find the question number and move to front
mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)
if not mo is None:
vl.append("Q." + mo.group(3) + ": ")
vl.append(mo.group(1))
hasq = True
else: # no Q-style question number. Check for multiple questions
hasq = False
mo = re.match(r"(.*)(a2\..*)", vlabel, flags=re.I)
if not mo is None: # multiple q's
vl.append(mo.group(2) + ": ")
vl.append(mo.group(1))
mo = re.match(r"(.*)(b2\..*)", vlabel, flags=re.I)
if not mo is None: # multiple q's
vl.append(mo.group(2) + ": ")
vl.append(mo.group(1))
if len(vl) == 0:
vl.append("")
vl.append(vlabel)
# capitalize first letter of label excluding the Q number
vl[-1] = vl[-1][0].upper() + vl[-1][1:]
# find freestanding "dv"
mo = re.search(r"(.*)(\bdv\b)(.*)", vl[1], flags=re.I)
if not mo is None:
if hasq:
vlabel = vl[0] + "(dv) " + mo.group(1)
else:
if vl[0] != "":
vl[0] = "(dv) " + vl[0]
vlabel = vl[0] + mo.group(1) + mo.group(3)
else:
vlabel = "(dv) " + mo.group(1) + mo.group(3)
else:
vlabel = vl[0] + vl[1]
spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel)))
end program.

The initial file has:

Country: England, Scotland or Wales? Q28
Sex of Respondent Q39
Respondent's age in years Q40
People can be trusted/can't be too careful?A2.13
NS-SEC - long version Q519
Respondent's main economic activity last week? Q539
Terminal education age<categorised> Q766

The Python worked on some:

Q.28: Country: England, Scotland or Wales?
Q.39: Sex of Respondent
Q.40: Respondent's age in years
A2.13: People can be trusted/can't be too careful?

but not others, eg:

NS-SEC - long version Q519
Respondent's main economic activity last week? Q539
Terminal education age<categorised> Q766

Respondent give money to charity how often? B619
Respondent gives how much to charity per year B620

Party political identification (compressed) dv Q211

I can modify the subset it by hand, but the main file has over 800
variables. I've tried some clumsy modifications to the Python, but none of
them seem to work.

Help!

John F Hall (Mr)
[Retired academic survey researcher]

Email: <mailto:[hidden email]> [hidden email]
Website: <http://www.surveyresearch.weebly.com/>
www.surveyresearch.weebly.com
SPSS start page:
<http://surveyresearch.weebly.com/1-survey-analysis-workshop.html>
www.surveyresearch.weebly.com/1-survey-analysis-workshop

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Jon K Peck

Re: Jon Peck's Python code

In reply to this post by John F Hall

John,
The specifications for that code said that the question number was preceded by a colon. So the matching expression includes
(:Q)
These labels don't have the colon, so remove it from the search expression here

mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: John F Hall <[hidden email]>
To: [hidden email],
Date: 02/11/2014 09:52 AM
Subject: [SPSSX-L] Jon Peck's Python code
Sent by: "SPSSX(r) Discussion" <[hidden email]>

title 'Python code to modify BSA variable labels (Jon Peck, IBM/SPSS, 2013)'.

begin program.

import spss,re

from spssaux import _smartquote

for v in range(spss.GetVariableCount()):

vname = spss.GetVariableName(v)

vlabel = spss.GetVariableLabel(v)

vl = []

# Find the question number and move to front

mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)

if not mo is None:

vl.append("Q." + mo.group(3) + ": ")

vl.append(mo.group(1))

hasq = True

else: # no Q-style question number. Check for multiple questions

hasq = False

mo = re.match(r"(.*)(a2\..*)", vlabel, flags=re.I)

if not mo is None: # multiple q's

vl.append(mo.group(2) + ": ")

vl.append(mo.group(1))

mo = re.match(r"(.*)(b2\..*)", vlabel, flags=re.I)

if not mo is None: # multiple q's

vl.append(mo.group(2) + ": ")

vl.append(mo.group(1))

if len(vl) == 0:

vl.append("")

vl.append(vlabel)

# capitalize first letter of label excluding the Q number

vl[-1] = vl[-1][0].upper() + vl[-1][1:]

# find freestanding "dv"

mo = re.search(r"(.*)(\bdv\b)(.*)", vl[1], flags=re.I)

if not mo is None:

if hasq:

vlabel = vl[0] + "(dv) " + mo.group(1)

else:

if vl[0] != "":

vl[0] = "(dv) " + vl[0]

vlabel = vl[0] + mo.group(1) + mo.group(3)

else:

vlabel = "(dv) " + mo.group(1) + mo.group(3)

else:

vlabel = vl[0] + vl[1]

spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel)))

end program.

The initial file has:

Country: England, Scotland or Wales? Q28

Sex of Respondent Q39

Respondent's age in years Q40

People can be trusted/can't be too careful?A2.13

NS-SEC - long version Q519

Respondent's main economic activity last week? Q539

Terminal education age<categorised> Q766

The Python worked on some:

Q.28: Country: England, Scotland or Wales?

Q.39: Sex of Respondent

Q.40: Respondent's age in years

A2.13: People can be trusted/can't be too careful?

but not others, eg:

NS-SEC - long version Q519

Respondent's main economic activity last week? Q539

Terminal education age<categorised> Q766

Respondent give money to charity how often? B619

Respondent gives how much to charity per year B620

Party political identification (compressed) dv Q211

I can modify the subset it by hand, but the main file has over 800 variables. I’ve tried some clumsy modifications to the Python, but none of them seem to work.

Help!

John F Hall (Mr)

[Retired academic survey researcher]

Email: johnfhall@...

Website: www.surveyresearch.weebly.com

SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop

John F Hall

Re: Jon Peck's Python code

I did that, but it still didn’t work. Leave it for tonight and I’ll try again in the morning.

From: Jon K Peck [mailto:[hidden email]]
Sent: 11 February 2014 19:13
To: John F Hall
Cc: [hidden email]
Subject: Re: [SPSSX-L] Jon Peck's Python code

title 'Python code to modify BSA variable labels (Jon Peck, IBM/SPSS, 2013)'.

begin program.

import spss,re

from spssaux import _smartquote

for v in range(spss.GetVariableCount()):

vname = spss.GetVariableName(v)

vlabel = spss.GetVariableLabel(v)

vl = []

# Find the question number and move to front

mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)

if not mo is None:

vl.append("Q." + mo.group(3) + ": ")

vl.append(mo.group(1))

hasq = True

else: # no Q-style question number. Check for multiple questions

hasq = False

mo = re.match(r"(.*)(a2\..*)", vlabel, flags=re.I)

if not mo is None: # multiple q's

vl.append(mo.group(2) + ": ")

vl.append(mo.group(1))

mo = re.match(r"(.*)(b2\..*)", vlabel, flags=re.I)

if not mo is None: # multiple q's

vl.append(mo.group(2) + ": ")

vl.append(mo.group(1))

if len(vl) == 0:

vl.append("")

vl.append(vlabel)

# capitalize first letter of label excluding the Q number

vl[-1] = vl[-1][0].upper() + vl[-1][1:]

# find freestanding "dv"

mo = re.search(r"(.*)(\bdv\b)(.*)", vl[1], flags=re.I)

if not mo is None:

if hasq:

vlabel = vl[0] + "(dv) " + mo.group(1)

else:

if vl[0] != "":

vl[0] = "(dv) " + vl[0]

vlabel = vl[0] + mo.group(1) + mo.group(3)

else:

vlabel = "(dv) " + mo.group(1) + mo.group(3)

else:

vlabel = vl[0] + vl[1]

spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel)))

end program.

The initial file has:

Country: England, Scotland or Wales? Q28

Sex of Respondent Q39

Respondent's age in years Q40

People can be trusted/can't be too careful?A2.13

NS-SEC - long version Q519

Respondent's main economic activity last week? Q539

Terminal education age<categorised> Q766

The Python worked on some:

Q.28: Country: England, Scotland or Wales?

Q.39: Sex of Respondent

Q.40: Respondent's age in years

A2.13: People can be trusted/can't be too careful?

but not others, eg:

NS-SEC - long version Q519

Respondent's main economic activity last week? Q539

Terminal education age<categorised> Q766

Respondent give money to charity how often? B619

Respondent gives how much to charity per year B620

Party political identification (compressed) dv Q211

I can modify the subset it by hand, but the main file has over 800 variables. I’ve tried some clumsy modifications to the Python, but none of them seem to work.

Help!

John F Hall (Mr)

[Retired academic survey researcher]

Email: [hidden email]

Website: www.surveyresearch.weebly.com

SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop

Albert-Jan Roskam

Re: Jon Peck's Python code

In reply to this post by Jon K Peck

John,

You might want to consider replacing the last line:

spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel)))

with this:

outfile = os.path.join(tempfile.gettempdir(), " syntax_" + time.strftime("%Y-%m-%d_%Hh%Mm%Ss") + " .sps")
with open(outfile, "wb") as f:
f.write("variable label %s %s%s." % (vname, _smartquote(vlabel), os.linesep))
print " ---> Done! Syntax read: '%s'" % f.name

but put this at the beginning of the BEGIN PROGRAM block first:

import os, tempfile, time

That way, the generated syntax is written to the computer's temporary directory, e.g. as 'syntax_2014-02-15_14h18m20s.sps' . You can open it and, if necessary, finetune it manually.
You can/should of course keep a copy of the syntax for future reference. That way, the Python program will do most of the work, but all the rare exceptions that would make the program overly compiicated are done by you.

Regards,

Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Tuesday, February 11, 2014 7:12 PM
Subject: Re: [SPSSX-L] Jon Peck's Python code

John,
The specifications for that code said that the question number was preceded by a colon. So the matching expression includes
(:Q)
These labels don't have the colon, so remove it from the search expression here
mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: John F Hall <[hidden email]>
To: [hidden email],
Date: 02/11/2014 09:52 AM
Subject: [SPSSX-L] Jon Peck's Python code
Sent by: "SPSSX(r) Discussion" <[hidden email]>

I just ran Jon’s previous Python code, written to move question numbers to the beginning of var labels in the 2011 British Social Attitudes survey, on a subset from the 2004 survey. This was on a teaching file with only 49 variables as used in Marsh and Elliott “Exploring Data” (2^nd edition, Polity, 2008).

title 'Python code to modify BSA variable labels (Jon Peck, IBM/SPSS, 2013)'.
begin program.
import spss,re
from spssaux import _smartquote

for v in range(spss.GetVariableCount()):
vname = spss.GetVariableName(v)
vlabel = spss.GetVariableLabel(v)
vl = []
# Find the question number and move to front
mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)
if not mo is None:
vl.append("Q." + mo.group(3) + ": ")
vl.append(mo.group(1))
hasq = True
else: # no Q-style question number. Check for multiple questions
hasq = False
mo = re.match(r"(.*)(a2\..*)", vlabel, flags=re.I)
if not mo is None: # multiple q's
vl.append(mo.group(2) + ": ")
vl.append(mo.group(1))
mo = re.match(r"(.*)(b2\..*)", vlabel, flags=re.I)
if not mo is None: # multiple q's
vl.append(mo.group(2) + ": ")
vl.append(mo.group(1))
if len(vl) == 0:
vl.append("")
vl.append(vlabel)
# capitalize first letter of label excluding the Q number
vl[-1] = vl[-1][0].upper() + vl[-1][1:]
# find freestanding "dv"
mo = re.search(r"(.*)(\bdv\b)(.*)", vl[1], flags=re.I)
if not mo is None:
if hasq:
vlabel = vl[0] + "(dv) " + mo.group(1)
else:
if vl[0] != "":
vl[0] = "(dv) " + vl[0]
vlabel = vl[0] + mo.group(1) + mo.group(3)
else:
vlabel = "(dv) " + mo.group(1) + mo.group(3)
else:
vlabel = vl[0] + vl[1]
spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel)))
end program.

The initial file has:

Country: England, Scotland or Wales? Q28
Sex of Respondent Q39
Respondent's age in years Q40
People can be trusted/can't be too careful?A2.13
NS-SEC - long version Q519
Respondent's main economic activity last week? Q539
Terminal education age<categorised> Q766

The Python worked on some:

Q.28: Country: England, Scotland or Wales?
Q.39: Sex of Respondent
Q.40: Respondent's age in years
A2.13: People can be trusted/can't be too careful?

but not others, eg:

NS-SEC - long version Q519
Respondent's main economic activity last week? Q539
Terminal education age<categorised> Q766

Respondent give money to charity how often? B619
Respondent gives how much to charity per year B620

Party political identification (compressed) dv Q211

I can modify the subset it by hand, but the main file has over 800 variables. I’ve tried some clumsy modifications to the Python, but none of them seem to work.

Help!

John F Hall (Mr)
[Retired academic survey researcher]

Email: [hidden email]
Website: www.surveyresearch.weebly.com
SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop

Albert-Jan Roskam

Re: Jon Peck's Python code

In reply to this post by David Marso

I concurr, Esp. in regular expressions one character can even make a big difference.

These two reseources are very good, even though the first one is about Python 3. It is a sample chapter from a book by Mark Summerfield:
http://ptgmedia.pearsoncmg.com/images/9780321680563/samplepages/0321680561_Sample.pdf
http://docs.python.org/2/howto/regex.html

Regards,

Albert-Jan

From: David Marso <[hidden email]>
To: [hidden email]
Sent: Tuesday, February 11, 2014 6:40 PM
Subject: Re: [SPSSX-L] Jon Peck's Python code

John,
Maybe time for you to do a crash course on regular expressions so you at
least understand the code before doing clumsy modifications?
D

John F Hall wrote

> I just ran Jon's previous Python code, written to move question numbers to
> the beginning of var labels in the 2011 British Social Attitudes survey,
> on
> a subset from the 2004 survey. This was on a teaching file with only 49
> variables as used in Marsh and Elliott "Exploring Data" (2nd edition,
> Polity, 2008).
>
>
> title 'Python code to modify BSA variable labels (Jon Peck, IBM/SPSS,
> 2013)'.
> begin program.
> import spss,re
> from spssaux import _smartquote
>
> for v in range(spss.GetVariableCount()):
> vname = spss.GetVariableName(v)
> vlabel = spss.GetVariableLabel(v)
> vl = []
> # Find the question number and move to front
> mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)
> if not mo is None:
> vl.append("Q." + mo.group(3) + ": ")
> vl.append(mo.group(1))
> hasq = True
> else: # no Q-style question number. Check for multiple questions
> hasq = False
> mo = re.match(r"(.*)(a2\..*)", vlabel, flags=re.I)
> if not mo is None: # multiple q's
> vl.append(mo.group(2) + ": ")
> vl.append(mo.group(1))
> mo = re.match(r"(.*)(b2\..*)", vlabel, flags=re.I)
> if not mo is None: # multiple q's
> vl.append(mo.group(2) + ": ")
> vl.append(mo.group(1))
> if len(vl) == 0:
> vl.append("")
> vl.append(vlabel)
> # capitalize first letter of label excluding the Q number
> vl[-1] = vl[-1][0].upper() + vl[-1][1:]
> # find freestanding "dv"
> mo = re.search(r"(.*)(\bdv\b)(.*)", vl[1], flags=re.I)
> if not mo is None:
> if hasq:
> vlabel = vl[0] + "(dv) " + mo.group(1)
> else:
> if vl[0] != "":
> vl[0] = "(dv) " + vl[0]
> vlabel = vl[0] + mo.group(1) + mo.group(3)
> else:
> vlabel = "(dv) " + mo.group(1) + mo.group(3)
> else:
> vlabel = vl[0] + vl[1]
> spss.Submit("""variable label %s %s.""" % (vname,
> _smartquote(vlabel)))
> end program.
>
> The initial file has:
>
> Country: England, Scotland or Wales? Q28
> Sex of Respondent Q39
> Respondent's age in years Q40
> People can be trusted/can't be too careful?A2.13
> NS-SEC - long version Q519
> Respondent's main economic activity last week? Q539
> Terminal education age
> <categorised>
> Q766
>
> The Python worked on some:
>
> Q.28: Country: England, Scotland or Wales?
> Q.39: Sex of Respondent
> Q.40: Respondent's age in years
> A2.13: People can be trusted/can't be too careful?
>
> but not others, eg:
>
> NS-SEC - long version Q519
> Respondent's main economic activity last week? Q539
> Terminal education age
> <categorised>
> Q766
>
> Respondent give money to charity how often? B619
> Respondent gives how much to charity per year B620
>
> Party political identification (compressed) dv Q211
>
> I can modify the subset it by hand, but the main file has over 800
> variables. I've tried some clumsy modifications to the Python, but none
> of
> them seem to work.
>
> Help!
>
> John F Hall (Mr)
> [Retired academic survey researcher]
>
> Email: <mailto:

> johnfhall@

> >

> johnfhall@

> Website: <http://www.surveyresearch.weebly.com/>
> www.surveyresearch.weebly.com
> SPSS start page:
> <http://surveyresearch.weebly.com/1-survey-analysis-workshop.html>

> www.surveyresearch.weebly.com/1-survey-analysis-workshop

-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Jon-Peck-s-Python-code-tp5724446p5724448.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD