Natcen have just released an impressive site to mark the 30th year (http://www.bsa-30.natcen.ac.uk/ ) I am using data from these surveys for tutorial purposes, but I need to make some modifications to make the data easier to use in conjunction with the questionnare(s). Variable names are of the mnemonic type and are incredibly tortuous as they follow the old SPSS restrictions of 8 characters. This convention is followed so that the same names can be used in all waves of the survey. Variable labels for all variables in all surveys in the series can be quite long as they tend to repeat all or most of the question text (so that tables can be used in publications). The problem is that all labels have the question number at the end. This means you can’t see the question numbers in the labels on the default Data Editor in Variable View, sometimes even if you drag the column edge far to the right. For example in the SPSS file for the 2011 survey Name Label RSex Person 1 SEX :Q49 RAge Person 1 age last birthday :Q50 For a few variables I can manually edit the labels to move the question number to the beginning, get rid of the colon and insert a period: Q.49 Person 1 SEX However there are 854 variables in the full set! All variables end in :Qddd. Is there a way of stripping them off, deleting the colon, moving the question number to the beginning, inserting a period after Q and a couple of spaces after the number? Also there are a large number of variable labels which start with a lower case letter: what should happen to disabled peoples benefits if they do not take active measures to find appropriate work :Q282 These need the first letter changing to upper case. DBwork Q.282 What should happen to disabled peoples benefits if they do not take active measures to find appropriate work The dv at the end indicates a derived variable. This is invariably just before the question number, but would be better moved and enclosed in square brackets immediatey following the question number: Q.316 [dv] Children are in poverty in Britain because: Social benefits for families with children are not high enough This would make the file far easier to navigate and use (the upper case S is because it is one of a list of responses on a showcard: it’s easier to insert a colon by hand). I’ve just been looking at the Python for SPSS site and it looks as if this would be a fairly straightforward job in Python. I’ve never used Python and the documentation looks awesome. I leave aside some of the labeling for questions asked of three different overlapping sub-samples! BnMove The current benefit system effectively encourages recipients to move off benefits A2.30eB2.1eC2.1e. Any volunteers for a few lines (wholesomely acknowledged) in Python or in SPSS syntax? John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/spss-without-tears.html |
Try the lightly tested code below. In order
to avoid confusion in dialog box displays, I used (dv) rather than [dv],
since when showing variable labels in dialogs, the variable name appears
at the end in square brackets.
begin program. import spss,re from spssaux import _smartquote for v in range(spss.GetVariableCount()): vname = spss.GetVariableName(v) vlabel = spss.GetVariableLabel(v) vl = [] mo = re.match(r"(.*)(:Q)(\d+).*", vlabel) if not mo is None: vl.append("Q." + mo.group(3) + " ") vl.append(mo.group(1)) else: vl.append("") vl.append(vlabel) mo = re.search(r"(.*)(\[dv\] *)", vl[1]) if not mo is None: vlabel = vl[0] + "(dv) " + mo.group(1) else: vlabel = vl[0] + vl[1] spss.Submit("""variable label %s %s.""" % (vname, _smartquote(vlabel.capitalize()))) end program. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: John F Hall <[hidden email]> To: [hidden email], Date: 09/11/2013 11:58 AM Subject: [SPSSX-L] Modifying British Social Attitudes variable labels with syntax or Python Sent by: "SPSSX(r) Discussion" <[hidden email]> Natcen have just released an impressive site to mark the 30th year (http://www.bsa-30.natcen.ac.uk/ ) I am using data from these surveys for tutorial purposes, but I need to make some modifications to make the data easier to use in conjunction with the questionnare(s). Variable names are of the mnemonic type and are incredibly tortuous as they follow the old SPSS restrictions of 8 characters. This convention is followed so that the same names can be used in all waves of the survey. Variable labels for all variables in all surveys in the series can be quite long as they tend to repeat all or most of the question text (so that tables can be used in publications). The problem is that all labels have the question number at the end. This means you can’t see the question numbers in the labels on the default Data Editor in Variable View, sometimes even if you drag the column edge far to the right. For example in the SPSS file for the 2011 survey Name Label RSex Person 1 SEX :Q49 RAge
Person 1 age last birthday :Q50 For a few variables I can manually edit the labels to move the question number to the beginning, get rid of the colon and insert a period: Q.49 Person 1 SEX However there are 854 variables in the full set! All variables end in :Qddd. Is there a way of stripping them off, deleting the colon, moving the question number to the beginning, inserting a period after Q and a couple of spaces after the number? Also there are a large number of variable labels which start with a lower case letter: what should happen to disabled
peoples benefits if they do not take active measures to find appropriate
work :Q282 These need the first letter changing to upper case. DBwork
Q.282 What
should happen to disabled peoples benefits if they do not take active measures
to find appropriate work The dv at the end indicates a derived variable. This is invariably just before the question number, but would be better moved and enclosed in square brackets immediatey following the question number: Q.316 [dv] Children are in poverty in Britain because: Social benefits for families with children are not high enough This would make the file far easier to navigate and use (the upper case S is because it is one of a list of responses on a showcard: it’s easier to insert a colon by hand). I’ve just been looking at the Python for SPSS site and it looks as if this would be a fairly straightforward job in Python. I’ve never used Python and the documentation looks awesome.
I leave aside some of the labeling for questions asked of three different overlapping sub-samples!
BnMove The current benefit system effectively encourages recipients to move off benefits A2.30eB2.1eC2.1e.
Any volunteers for a few lines (wholesomely acknowledged) in Python or in SPSS syntax?
John F Hall (Mr) [Retired academic survey researcher]
Email: johnfhall@... Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/spss-without-tears.html
|
Jon I really appreciate this. Having a lifelong fear of computing fubars I proceeded with extreme caution by saving a subset of variables with the simplest labels in a separate file, even keeping the original names with their bizarre mix of lower and upper case letters. Step 1: Create subset with simple lables SAVE OUTFILE='E:\Weebly downloads\bsa2011\labeltest.sav' /keep ABCVer to RAge /COMPRESSED. Original names: ABCVer Country XYVer OddEven Househld RSex RAge Original labels: questionnaire version :Q30 England, Scotland or Wales? :Q31 Version X/Y for filtering :Q32 1=Odd 2=Even :Q33 Number living in household, including respondent :Q47 Person 1 SEX :Q49 Person 1 age last birthday :Q50 Step 2: Run Jon’s syntax With bated breath, fingers crossed etc., I tested your syntax, and it worked first time, but all upper case was converted to lower case: New labels: Q.30 questionnaire version Q.31 england, scotland or wales? Q.32 version x/y for filtering Q.33 1=odd 2=even Q.47 number living in household, including respondent Q.49 person 1 sex Q.50 person 1 age last birthday Presumably a small tweak can rectify this? Step 3: Test on more complex labels Original names: P2Rel2 P2Rel RPrivEd CPR2Gov CPR2LGov WTkg BnComp Original labels: Person 2 relationship to Respondent <7 categories> DV :Q58 Person 2 relationship to Respondent? <5 categories> DV :Q59 R attends/ed fee-paying, private primary or secondary school in UK : dv :Q985 Central government should be responsible for reducing child poverty in Britain dv :Q291 Local government should be responsible for reducing child poverty in Britain dv :Q292 weight in kilograms. dv A2.27b. The current benefit system is far too complicated A2.30cB2.1cC2.1c. New labels: Q.58 person 2 relationship to respondent <7 categories> dv Q.59 person 2 relationship to respondent? <5 categories> dv Q.985 r attends/ed fee-paying, private primary or secondary school in uk : dv Q.291 central government should be responsible for reducing child poverty in britain dv Q.292 local government should be responsible for reducing child poverty in britain dv Weight in kilograms. dv a2.27b. The current benefit system is far too complicated a2.30cb2.1cc2.1c. It hasn’t picked up the dv, but that is not a serious problem, unless it’s fairly straightforward to insert an asterisk or hash before the question number for any label with dv, and then delete dv? #Q.58 Person 2 relationship to respondent <7 categories> The complex question numbers for the three different questionnaire versions would look really ugly at the beginning of the labels, so they can stay where they are. So, if you could manage a tweak to retain all upper case letters in the original labels and to convert any lower case first characters to upper case, eg:. Q.30 Questionnaire version Q.31 England, Scotland or Wales? Q.985 R attends/ed fee-paying, private primary or secondary school in UK : dv Q.291 Central government should be responsible for reducing child poverty in Britain dv Eternal thanks in advance and handsome acknowledgment well deserved. John John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/spss-without-tears.html From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck Try the lightly tested code below. In order to avoid confusion in dialog box displays, I used (dv) rather than [dv], since when showing variable labels in dialogs, the variable name appears at the end in square brackets.
I am using data from these surveys for tutorial purposes, but I need to make some modifications to make the data easier to use in conjunction with the questionnare(s). Variable names are of the mnemonic type and are incredibly tortuous as they follow the old SPSS restrictions of 8 characters. This convention is followed so that the same names can be used in all waves of the survey. Variable labels for all variables in all surveys in the series can be quite long as they tend to repeat all or most of the question text (so that tables can be used in publications). The problem is that all labels have the question number at the end. This means you can’t see the question numbers in the labels on the default Data Editor in Variable View, sometimes even if you drag the column edge far to the right. For example in the SPSS file for the 2011 survey Name Label RSex Person 1 SEX :Q49 RAge Person 1 age last birthday :Q50 For a few variables I can manually edit the labels to move the question number to the beginning, get rid of the colon and insert a period: Q.49 Person 1 SEX However there are 854 variables in the full set! All variables end in :Qddd. Is there a way of stripping them off, deleting the colon, moving the question number to the beginning, inserting a period after Q and a couple of spaces after the number? Also there are a large number of variable labels which start with a lower case letter: what should happen to disabled peoples benefits if they do not take active measures to find appropriate work :Q282 These need the first letter changing to upper case. DBwork Q.282 What should happen to disabled peoples benefits if they do not take active measures to find appropriate work The dv at the end indicates a derived variable. This is invariably just before the question number, but would be better moved and enclosed in square brackets immediatey following the question number: Q.316 [dv] Children are in poverty in Britain because: Social benefits for families with children are not high enough This would make the file far easier to navigate and use (the upper case S is because it is one of a list of responses on a showcard: it’s easier to insert a colon by hand). I’ve just been looking at the Python for SPSS site and it looks as if this would be a fairly straightforward job in Python. I’ve never used Python and the documentation looks awesome. I leave aside some of the labeling for questions asked of three different overlapping sub-samples! BnMove The current benefit system effectively encourages recipients to move off benefits A2.30eB2.1eC2.1e. Any volunteers for a few lines (wholesomely acknowledged) in Python or in SPSS syntax? John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/spss-without-tears.html |
Free forum by Nabble | Edit this page |