Modifying British Social Attitudes variable labels with syntax or Python

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Modifying British Social Attitudes variable labels with syntax or Python

John F Hall

Natcen have just released an impressive site to mark the 30th year (http://www.bsa-30.natcen.ac.uk/ )

I am using data from these surveys for tutorial purposes, but I need to make some modifications to make the data easier to use in conjunction with the questionnare(s).  Variable names are of the mnemonic type and are incredibly tortuous as they follow the old SPSS restrictions of 8 characters.  This convention is followed so that the same names can be used in all waves of the survey.

Variable labels for all variables in all surveys in the series can be quite long as they tend to repeat all or most of the question text (so that tables can be used in publications).  The problem is that all labels have the question number at the end.  This means you can’t see the question numbers in the labels on the default Data Editor in Variable View, sometimes even if you drag the column edge far to the right.   

For example in the SPSS file for the 2011 survey

Name               Label

RSex                 Person 1 SEX :Q49

RAge                Person 1 age last birthday :Q50
~ ~ ~
WWWHrsWk    How many hours a week on average do you spend using the Internet or World Wide Web :Q240

For a few variables I can manually edit the labels to move the question number to the beginning, get rid of the colon and insert a period:

Q.49 Person 1 SEX
Q.50 Person 1 age last birthday
Q.240 How many hours a week on average do you spend using the Internet or World Wide Web

However there are 854 variables in the full set!  All variables end in :Qddd.  Is there a way of stripping them off, deleting the colon, moving the question number to the beginning, inserting a period after Q and a couple of spaces after the number?

Also there are a large number of variable labels which start with a lower case letter:

what should happen to disabled peoples benefits if they do not take active measures to find appropriate work :Q282
children are in poverty in Britain because Social benefits for families with children are not high enough dv :Q316

These need the first letter changing to upper case.

DBwork                        Q.282 What should happen to disabled peoples benefits if they do not take active measures to find appropriate work
CPWSocBn      Q.316 children are in poverty in Britain because Social benefits for families with children are not high enough dv

The dv at the end indicates a derived variable. This is invariably just before the question number, but would be better moved and enclosed in square brackets immediatey following the question number:

Q.316 [dv] Children are in poverty in Britain because: Social benefits for families with children are not high enough

This would make the file far easier to navigate and use (the upper case S is because it is one of a list of responses on a showcard: it’s easier to insert a colon by hand).

I’ve just been looking at the Python for SPSS site and it looks as if this would be a fairly straightforward job in Python.  I’ve never used Python and the documentation looks awesome.

 

I leave aside some of the labeling for questions asked of three different overlapping sub-samples!

 

BnMove            The current benefit system effectively encourages recipients to move off benefits A2.30eB2.1eC2.1e.

 

Any volunteers for a few lines (wholesomely acknowledged)  in Python or in SPSS syntax?

 

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/spss-without-tears.html

  

  

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Modifying British Social Attitudes variable labels with syntax or Python

Jon K Peck
Try the lightly tested code below. In order to avoid confusion in dialog box displays, I used (dv) rather than [dv], since when showing variable labels in dialogs, the variable name appears at the end in square brackets.

begin program.
import spss,re
from spssaux import _smartquote

for v in range(spss.GetVariableCount()):  
    vname = spss.GetVariableName(v)
    vlabel = spss.GetVariableLabel(v)
    vl = []
    mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)
    if not mo is None:
        vl.append("Q." + mo.group(3) + " ")
        vl.append(mo.group(1))
    else:
        vl.append("")
        vl.append(vlabel)
    mo = re.search(r"(.*)(\[dv\] *)", vl[1])
    if not mo is None:
        vlabel = vl[0] + "(dv) " + mo.group(1)
    else:
        vlabel = vl[0] + vl[1]
    spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel.capitalize())))
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        John F Hall <[hidden email]>
To:        [hidden email],
Date:        09/11/2013 11:58 AM
Subject:        [SPSSX-L] Modifying British Social Attitudes variable labels with              syntax or              Python
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Natcen have just released an impressive site to mark the 30th year (http://www.bsa-30.natcen.ac.uk/ )

I am using data from these surveys for tutorial purposes, but I need to make some modifications to make the data easier to use in conjunction with the questionnare(s).  Variable names are of the mnemonic type and are incredibly tortuous as they follow the old SPSS restrictions of 8 characters.  This convention is followed so that the same names can be used in all waves of the survey.

Variable labels for all variables in all surveys in the series can be quite long as they tend to repeat all or most of the question text (so that tables can be used in publications).  The problem is that all labels have the question number at the end.  This means you can’t see the question numbers in the labels on the default Data Editor in Variable View, sometimes even if you drag the column edge far to the right.  

For example in the SPSS file for the 2011 survey

Name               Label

RSex                 Person 1 SEX :Q49

RAge                Person 1 age last birthday :Q50
~ ~ ~
WWWHrsWk    How many hours a week on average do you spend using the Internet or World Wide Web :Q240

For a few variables I can manually edit the labels to move the question number to the beginning, get rid of the colon and insert a period:

Q.49 Person 1 SEX
Q.50 Person 1 age last birthday
Q.240 How many hours a week on average do you spend using the Internet or World Wide Web

However there are 854 variables in the full set!  All variables end in :Qddd.  Is there a way of stripping them off, deleting the colon, moving the question number to the beginning, inserting a period after Q and a couple of spaces after the number?

Also there are a large number of variable labels which start with a lower case letter:

what should happen to disabled peoples benefits if they do not take active measures to find appropriate work :Q282
children are in poverty in Britain because Social benefits for families with children are not high enough dv :Q316

These need the first letter changing to upper case.

DBwork                        Q.282 What should happen to disabled peoples benefits if they do not take active measures to find appropriate work
CPWSocBn      Q.316 children are in poverty in Britain because Social benefits for families with children are not high enough dv

The dv at the end indicates a derived variable. This is invariably just before the question number, but would be better moved and enclosed in square brackets immediatey following the question number:

Q.316 [dv] Children are in poverty in Britain because: Social benefits for families with children are not high enough

This would make the file far easier to navigate and use (the upper case S is because it is one of a list of responses on a showcard: it’s easier to insert a colon by hand).

I’ve just been looking at the Python for SPSS site and it looks as if this would be a fairly straightforward job in Python.  I’ve never used Python and the documentation looks awesome.

 

I leave aside some of the labeling for questions asked of three different overlapping sub-samples!

 

BnMove            The current benefit system effectively encourages recipients to move off benefits A2.30eB2.1eC2.1e.

 

Any volunteers for a few lines (wholesomely acknowledged)  in Python or in SPSS syntax?

 

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   johnfhall@...  

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/spss-without-tears.html

 

 

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Modifying British Social Attitudes variable labels with syntax or Python

John F Hall

Jon

 

I really appreciate this.  Having a lifelong fear of computing fubars I proceeded with extreme caution by saving a subset of variables with the simplest labels in a separate file, even keeping the original names with their bizarre mix of lower and upper case letters.

 

Step 1:  Create subset with simple lables

 

SAVE OUTFILE='E:\Weebly downloads\bsa2011\labeltest.sav'

  /keep ABCVer to RAge

  /COMPRESSED.

 

Original names:

 

ABCVer

Country

XYVer

OddEven

Househld

RSex

RAge

 

Original labels:

 

questionnaire version :Q30

England, Scotland or Wales? :Q31

Version X/Y for filtering :Q32

1=Odd 2=Even :Q33

Number living in household, including respondent :Q47

Person 1 SEX :Q49

Person 1 age last birthday :Q50

 

Step 2: Run Jon’s syntax

 

With bated breath, fingers crossed etc., I tested your syntax, and it worked first time, but all upper case was converted to lower case:

 

New labels:

 

Q.30 questionnaire version

Q.31 england, scotland or wales?

Q.32 version x/y for filtering

Q.33 1=odd 2=even

Q.47 number living in household, including respondent

Q.49 person 1 sex

Q.50 person 1 age last birthday

 

Presumably a small tweak can rectify this?

 

Step 3:  Test on more complex labels

 

Original names:

 

P2Rel2

P2Rel

RPrivEd

CPR2Gov

CPR2LGov

WTkg

BnComp

 

Original labels:

 

Person 2 relationship to Respondent <7 categories> DV :Q58

Person 2 relationship to Respondent? <5 categories> DV :Q59

R attends/ed fee-paying, private primary or secondary school in UK : dv :Q985

Central government should be responsible for reducing child poverty in Britain dv :Q291

Local government  should be responsible for reducing child poverty in Britain dv :Q292

weight in kilograms. dv A2.27b.

The current benefit system is far too complicated A2.30cB2.1cC2.1c.

 

New labels:

 

Q.58 person 2 relationship to respondent <7 categories> dv

Q.59 person 2 relationship to respondent? <5 categories> dv

Q.985 r attends/ed fee-paying, private primary or secondary school in uk : dv

Q.291 central government should be responsible for reducing child poverty in britain dv

Q.292 local government  should be responsible for reducing child poverty in britain dv

Weight in kilograms. dv a2.27b.

The current benefit system is far too complicated a2.30cb2.1cc2.1c.

 

It hasn’t picked up the dv, but that is not a serious problem, unless it’s fairly straightforward to insert an asterisk or hash before the question number for any label with dv, and then delete dv?

 

#Q.58 Person 2 relationship to respondent <7 categories>

 

The complex question numbers for the three different questionnaire versions would look really ugly at the beginning of the labels, so they can stay where they are.

 

So, if you could manage a tweak to retain all upper case letters in the original labels and to convert any lower case first characters to upper case, eg:.

 

Q.30 Questionnaire version

Q.31 England, Scotland or Wales?

Q.985 R attends/ed fee-paying, private primary or secondary school in UK : dv

Q.291 Central government should be responsible for reducing child poverty in Britain dv

 

Eternal thanks in advance and handsome acknowledgment well deserved.

 

John

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/spss-without-tears.html

  

  

 

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: 11 September 2013 21:46
To: [hidden email]
Subject: Re: Modifying British Social Attitudes variable labels with syntax or Python

 

Try the lightly tested code below. In order to avoid confusion in dialog box displays, I used (dv) rather than [dv], since when showing variable labels in dialogs, the variable name appears at the end in square brackets.

begin program.
import spss,re
from spssaux import _smartquote

for v in range(spss.GetVariableCount()):  
    vname = spss.GetVariableName(v)
    vlabel = spss.GetVariableLabel(v)
    vl = []
    mo = re.match(r"(.*)(:Q)(\d+).*", vlabel)
    if not mo is None:
        vl.append("Q." + mo.group(3) + " ")
        vl.append(mo.group(1))
    else:
        vl.append("")
        vl.append(vlabel)
    mo = re.search(r"(.*)(\[dv\] *)", vl[1])
    if not mo is None:
        vlabel = vl[0] + "(dv) " + mo.group(1)
    else:
        vlabel = vl[0] + vl[1]
    spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel.capitalize())))
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        John F Hall <[hidden email]>
To:        [hidden email],
Date:        09/11/2013 11:58 AM
Subject:        [SPSSX-L] Modifying British Social Attitudes variable labels with              syntax or              Python
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





Natcen have just released an impressive site to mark the 30th year (http://www.bsa-30.natcen.ac.uk/ )

I am using data from these surveys for tutorial purposes, but I need to make some modifications to make the data easier to use in conjunction with the questionnare(s).  Variable names are of the mnemonic type and are incredibly tortuous as they follow the old SPSS restrictions of 8 characters.  This convention is followed so that the same names can be used in all waves of the survey.

Variable labels for all variables in all surveys in the series can be quite long as they tend to repeat all or most of the question text (so that tables can be used in publications).  The problem is that all labels have the question number at the end.  This means you can’t see the question numbers in the labels on the default Data Editor in Variable View, sometimes even if you drag the column edge far to the right.  

For example in the SPSS file for the 2011 survey

Name               Label

RSex                 Person 1 SEX :Q49

RAge                Person 1 age last birthday :Q50
~ ~ ~
WWWHrsWk    How many hours a week on average do you spend using the Internet or World Wide Web :Q240

For a few variables I can manually edit the labels to move the question number to the beginning, get rid of the colon and insert a period:

Q.49 Person 1 SEX
Q.50 Person 1 age last birthday
Q.240 How many hours a week on average do you spend using the Internet or World Wide Web

However there are 854 variables in the full set!  All variables end in :Qddd.  Is there a way of stripping them off, deleting the colon, moving the question number to the beginning, inserting a period after Q and a couple of spaces after the number?

Also there are a large number of variable labels which start with a lower case letter:

what should happen to disabled peoples benefits if they do not take active measures to find appropriate work :Q282
children are in poverty in Britain because Social benefits for families with children are not high enough dv :Q316

These need the first letter changing to upper case.

DBwork                        Q.282 What should happen to disabled peoples benefits if they do not take active measures to find appropriate work
CPWSocBn      Q.316 children are in poverty in Britain because Social benefits for families with children are not high enough dv

The dv at the end indicates a derived variable. This is invariably just before the question number, but would be better moved and enclosed in square brackets immediatey following the question number:

Q.316 [dv] Children are in poverty in Britain because: Social benefits for families with children are not high enough

This would make the file far easier to navigate and use (the upper case S is because it is one of a list of responses on a showcard: it’s easier to insert a colon by hand).

I’ve just been looking at the Python for SPSS site and it looks as if this would be a fairly straightforward job in Python.  I’ve never used Python and the documentation looks awesome.

 

I leave aside some of the labeling for questions asked of three different overlapping sub-samples!

 

BnMove            The current benefit system effectively encourages recipients to move off benefits A2.30eB2.1eC2.1e.

 

Any volunteers for a few lines (wholesomely acknowledged)  in Python or in SPSS syntax?

 

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email]  

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/spss-without-tears.html