Re: Modifying British Social Attitudes variable labels with syntax or Python

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Modifying British Social Attitudes variable labels with syntax or Python

John F Hall

There has been a lot of activity off-list since I first posted this problem.  Jon Peck is an absolute star, patiently bearing with my stream of label modification requests.

 

All the variables in the SPSS file for BSA 2011 distributed by UKDA have the question number at the end.  Many of them are inordinately long and users cannot see the question number without vastly widening the Label column. 

 

BEFORE:

 

Serial Number :Q1

Sample point :Q9

Stratification ID

Person 2 relationship to Respondent <8 categories> :Q57

Person 2 relationship to Respondent <7 categories> DV :Q58

Consider your life in general these days how happy or unhappy you are A2.1.

How much confidence in the Educational system in Britain A2.2a.

How much confidence in the Health care system in Britain A2.2b.

height in centimeters A2.27a.

weight in kilograms. dv A2.27b.

How comfortable having close relative in a relationship with someone who grew up in a Muslim country C2.8.

Censorship of films and magazines is necessary to uphold moral standards A2.49fB2.26fC2.25f.

 

It’s much easier to navigate SPSS files from questionnaire surveys when the question number is at the beginning of a label so that it can clearly be seen in the default Variable View.

 

Jon’s eventual version works a treat and is worth sharing:

 

begin program.

import spss,re

from spssaux import _smartquote

 

for v in range(spss.GetVariableCount()):  

    vname = spss.GetVariableName(v)

    vlabel = spss.GetVariableLabel(v)

    vl = []

    # Find the question number and move to front

    mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)

    if not mo is None:

        vl.append("Q." + mo.group(3) + ":  ")

        vl.append(mo.group(1))

        hasq = True

    else:  # no Q-style question number.  Check for multiple questions

        hasq = False

        mo = re.match(r"(.*)(a2\..*)", vlabel, flags=re.I)

        if not mo is None:   # multiple q's

            vl.append(mo.group(2) + ":  ")

            vl.append(mo.group(1))

        mo = re.match(r"(.*)(b2\..*)", vlabel, flags=re.I)

        if not mo is None:   # multiple q's

            vl.append(mo.group(2) + ":  ")

            vl.append(mo.group(1))

        mo = re.match(r"(.*)(c2\..*)", vlabel, flags=re.I)

        if not mo is None:   # multiple q's

            vl.append(mo.group(2) + ":  ")

            vl.append(mo.group(1))

        if len(vl) == 0:

            vl.append("")

            vl.append(vlabel)

    # capitalize first letter of label excluding the Q number

    vl[-1] = vl[-1][0].upper() + vl[-1][1:]

    # find freestanding "dv"

    mo = re.search(r"(.*)(\bdv\b)(.*)", vl[1], flags=re.I)

    if not mo is None:

        if hasq:

            vlabel = vl[0] + "(dv) " + mo.group(1)

        else:

            if vl[0] != "":

                vl[0] = "(dv) " + vl[0]

                vlabel = vl[0] + mo.group(1) + mo.group(3)

            else:

                vlabel = "(dv) " + mo.group(1) + mo.group(3)

    else:

        vlabel = vl[0] + vl[1]

    spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel)))

end program.

 

AFTER:

 

Q.1:  Serial Number

Q.9:  Sample point

Stratification ID

Q.14:  (dv) Population Density Quartiles

Q.57: Person 2 relationship to Respondent <8 categories>

(dv) Q.58: () Person 2 relationship to Respondent <7 categories>

A2.1.:  Consider your life in general these days how happy or unhappy you are

A2.2a.:  How much confidence in the Educational system in Britain

A2.2b.:  How much confidence in the Health care system in Britain

A2.27a.:  Height in centimeters

(dv) A2.27b.:  Weight in kilograms.

A2.49fB2.26fC2.25f.:  Censorship of films and magazines is necessary to uphold moral standards

C2.8.:  How comfortable having close relative in a relationship with someone who grew up in a Muslim country

 

All question numbers (where they exist) have been moved to the beginning of the labels, a stop inserted after Q, a colon and space after the number, all original upper case letters retained, all lower case letters at the beginning of the label (after the question number) converted to upper case, and any free standing “dv” or “DV” deleted, enclosed in brackets and moved to just after Q format question numbers and at the beginning of labels in other formats.  Truly a silk purse out of a pig’s ear!

 

There are 30 annual surveys in the series and they all have the same file structure.  The Python code can now hoefully be applied to all of them as well.

 

Thanks a million, Jon

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/spss-without-tears.html

 

PS  I’ve corrected correcting the measurment levels and now modifying the missing values in good ol’ syntax with a bit of help from Data > Define Variable Properties for a quick check on value labels (so far I’ve found 58 unique combinations of values, many with 4 or more per variable).  At least I can do something by myself.