Re: Modifying syntax or Variable Labels with Python (getting close)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python (getting close)

Jeff6610

 

I’m getting closer to something that is starting to work for my application. (taking a few things from different suggestions and modifying)

 

The first program does the trick for the first part.

 

The second is close to do some printing – I can print the original variable label, but can find no documentation to tell me how to print a custom attribute from a variable dictionary.

 

I’ve marked the problematic line – can anyone tell me how to modify?

 

Thanks

 

Jeff

 

 

 

**** This works fine *****.

********** Truncate the Label where the #showMoreInfo HTLM code starts  and then place this into Custom Attribute "Question" **********.

Begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

  spssaux.CreateAttribute(v.VariableName, 'Question',v.VariableLabel) # watch the indentation here

End program.

 

 

************** This works to print the Variable Name, Question, and Answer Choices, but does so from the Label and not Custom Attribute ***************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.VariableLabel     # <<<<<< How can I alter to print the custom attribute “Question” ?

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: Friday, 1 June 2018 6:15 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python (getting close)

Jon Peck
Custom attributes are available for a variable in a VariableDict object as a dictionary named Attributes.  For example, you might write
vardict = spssaux.VariableDict()
print vardict['jobcat'].Attributes['question']

This assumes that the question attribute exists.  You could print the whole set of attributes for a variable like this.
print vardict['jobcat'].Attributes
That would just show {} if the variable has no attributes.

Note that in regular Statistics syntax, the CODEBOOK procedure can print all the metadata, including custom attributes.

On Fri, Jun 1, 2018 at 5:09 AM, Jeff <[hidden email]> wrote:

 

I’m getting closer to something that is starting to work for my application. (taking a few things from different suggestions and modifying)

 

The first program does the trick for the first part.

 

The second is close to do some printing – I can print the original variable label, but can find no documentation to tell me how to print a custom attribute from a variable dictionary.

 

I’ve marked the problematic line – can anyone tell me how to modify?

 

Thanks

 

Jeff

 

 

 

**** This works fine *****.

********** Truncate the Label where the #showMoreInfo HTLM code starts  and then place this into Custom Attribute "Question" **********.

Begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

  spssaux.CreateAttribute(v.VariableName, 'Question',v.VariableLabel) # watch the indentation here

End program.

 

 

************** This works to print the Variable Name, Question, and Answer Choices, but does so from the Label and not Custom Attribute ***************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.VariableLabel     # <<<<<< How can I alter to print the custom attribute “Question” ?

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: Friday, 1 June 2018 6:15 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python (getting close)

Jeff6610

 

Thanks Jon,

 

I just about have this. The code below is working for me to print Questions (in a custom attribute) and corresponding value labels from a dataset.

 

The only small issue now is that the sorted function in the code below produces value labels sorted in the fashion below. Is there an easy way to correct this so that 10 comes after 9 etc.

 

1

10

11

2

3

4

5

Etc.

 

Best,

 

Jeff

 

 

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]                     #<<<<<<<<<<<<<<  sorted here is slightly problematic

end program.

 

 

 

From: Jon Peck <[hidden email]>
Sent: Friday, 1 June 2018 11:47 PM
To: Jeff <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Modifying syntax or Variable Labels with Python (getting close)

 

Custom attributes are available for a variable in a VariableDict object as a dictionary named Attributes.  For example, you might write

vardict = spssaux.VariableDict()

print vardict['jobcat'].Attributes['question']

 

This assumes that the question attribute exists.  You could print the whole set of attributes for a variable like this.

print vardict['jobcat'].Attributes

That would just show {} if the variable has no attributes.

 

Note that in regular Statistics syntax, the CODEBOOK procedure can print all the metadata, including custom attributes.

 

On Fri, Jun 1, 2018 at 5:09 AM, Jeff <[hidden email]> wrote:

 

I’m getting closer to something that is starting to work for my application. (taking a few things from different suggestions and modifying)

 

The first program does the trick for the first part.

 

The second is close to do some printing – I can print the original variable label, but can find no documentation to tell me how to print a custom attribute from a variable dictionary.

 

I’ve marked the problematic line – can anyone tell me how to modify?

 

Thanks

 

Jeff

 

 

 

**** This works fine *****.

********** Truncate the Label where the #showMoreInfo HTLM code starts  and then place this into Custom Attribute "Question" **********.

Begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

  spssaux.CreateAttribute(v.VariableName, 'Question',v.VariableLabel) # watch the indentation here

End program.

 

 

************** This works to print the Variable Name, Question, and Answer Choices, but does so from the Label and not Custom Attribute ***************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.VariableLabel     # <<<<<< How can I alter to print the custom attribute “Question” ?

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: Friday, 1 June 2018 6:15 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python (getting close)

Jon Peck
You get these results because the value labels are sorted as strings.  The Python sorted function, however, allows you to supply a custom comparison function that can deal with this.

def cmp(x, y):
    """sort collation function that compares strings numerically
    when possible
    """
    try:
        x, y = float(x), float(y)
    except:
        pass
    return x < y and -1 or 1
    
    

data = ['1', '10', '5', '0', 'zabc', 'xyz']
print sorted(data, cmp=cmp)

This function first tries to convert both arguments for the comparison to numbers.  If that works, then the return expression is based on a numeric comparison.  If one or both arguments cannot be converted to numbers, the result is based on a string comparison.

The cmp function is supposed to return -1, 0, or 1 according to whether x<y, x=y, or x>y.  I ignored the equality case, since the values in a list of value labels should all  be different.

On Fri, Jun 1, 2018 at 9:50 PM, Jeff <[hidden email]> wrote:

 

Thanks Jon,

 

I just about have this. The code below is working for me to print Questions (in a custom attribute) and corresponding value labels from a dataset.

 

The only small issue now is that the sorted function in the code below produces value labels sorted in the fashion below. Is there an easy way to correct this so that 10 comes after 9 etc.

 

1

10

11

2

3

4

5

Etc.

 

Best,

 

Jeff

 

 

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]                     #<<<<<<<<<<<<<<  sorted here is slightly problematic

end program.

 

 

 

From: Jon Peck <[hidden email]>
Sent: Friday, 1 June 2018 11:47 PM
To: Jeff <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Modifying syntax or Variable Labels with Python (getting close)

 

Custom attributes are available for a variable in a VariableDict object as a dictionary named Attributes.  For example, you might write

vardict = spssaux.VariableDict()

print vardict['jobcat'].Attributes['question']

 

This assumes that the question attribute exists.  You could print the whole set of attributes for a variable like this.

print vardict['jobcat'].Attributes

That would just show {} if the variable has no attributes.

 

Note that in regular Statistics syntax, the CODEBOOK procedure can print all the metadata, including custom attributes.

 

On Fri, Jun 1, 2018 at 5:09 AM, Jeff <[hidden email]> wrote:

 

I’m getting closer to something that is starting to work for my application. (taking a few things from different suggestions and modifying)

 

The first program does the trick for the first part.

 

The second is close to do some printing – I can print the original variable label, but can find no documentation to tell me how to print a custom attribute from a variable dictionary.

 

I’ve marked the problematic line – can anyone tell me how to modify?

 

Thanks

 

Jeff

 

 

 

**** This works fine *****.

********** Truncate the Label where the #showMoreInfo HTLM code starts  and then place this into Custom Attribute "Question" **********.

Begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

  spssaux.CreateAttribute(v.VariableName, 'Question',v.VariableLabel) # watch the indentation here

End program.

 

 

************** This works to print the Variable Name, Question, and Answer Choices, but does so from the Label and not Custom Attribute ***************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.VariableLabel     # <<<<<< How can I alter to print the custom attribute “Question” ?

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: Friday, 1 June 2018 6:15 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python (getting close)

Jon Peck
I should have added that if these labels are retrieved from a VariableDict object, there are two properties available.  ValueLabels returns the labels indexed by the value as a string.  ValueLabelsTyped returns the indexes according to the variable type.  See the difference between these two outputs.

begin program.
import spssaux

vardict=spssaux.VariableDict("jobcat")
print vardict['jobcat'].ValueLabels
print vardict['jobcat'].ValueLabelsTyped
end program.

On Sat, Jun 2, 2018 at 12:04 PM, Jon Peck <[hidden email]> wrote:
You get these results because the value labels are sorted as strings.  The Python sorted function, however, allows you to supply a custom comparison function that can deal with this.

def cmp(x, y):
    """sort collation function that compares strings numerically
    when possible
    """
    try:
        x, y = float(x), float(y)
    except:
        pass
    return x < y and -1 or 1
    
    

data = ['1', '10', '5', '0', 'zabc', 'xyz']
print sorted(data, cmp=cmp)

This function first tries to convert both arguments for the comparison to numbers.  If that works, then the return expression is based on a numeric comparison.  If one or both arguments cannot be converted to numbers, the result is based on a string comparison.

The cmp function is supposed to return -1, 0, or 1 according to whether x<y, x=y, or x>y.  I ignored the equality case, since the values in a list of value labels should all  be different.

On Fri, Jun 1, 2018 at 9:50 PM, Jeff <[hidden email]> wrote:

 

Thanks Jon,

 

I just about have this. The code below is working for me to print Questions (in a custom attribute) and corresponding value labels from a dataset.

 

The only small issue now is that the sorted function in the code below produces value labels sorted in the fashion below. Is there an easy way to correct this so that 10 comes after 9 etc.

 

1

10

11

2

3

4

5

Etc.

 

Best,

 

Jeff

 

 

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]                     #<<<<<<<<<<<<<<  sorted here is slightly problematic

end program.

 

 

 

From: Jon Peck <[hidden email]>
Sent: Friday, 1 June 2018 11:47 PM
To: Jeff <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Modifying syntax or Variable Labels with Python (getting close)

 

Custom attributes are available for a variable in a VariableDict object as a dictionary named Attributes.  For example, you might write

vardict = spssaux.VariableDict()

print vardict['jobcat'].Attributes['question']

 

This assumes that the question attribute exists.  You could print the whole set of attributes for a variable like this.

print vardict['jobcat'].Attributes

That would just show {} if the variable has no attributes.

 

Note that in regular Statistics syntax, the CODEBOOK procedure can print all the metadata, including custom attributes.

 

On Fri, Jun 1, 2018 at 5:09 AM, Jeff <[hidden email]> wrote:

 

I’m getting closer to something that is starting to work for my application. (taking a few things from different suggestions and modifying)

 

The first program does the trick for the first part.

 

The second is close to do some printing – I can print the original variable label, but can find no documentation to tell me how to print a custom attribute from a variable dictionary.

 

I’ve marked the problematic line – can anyone tell me how to modify?

 

Thanks

 

Jeff

 

 

 

**** This works fine *****.

********** Truncate the Label where the #showMoreInfo HTLM code starts  and then place this into Custom Attribute "Question" **********.

Begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

  spssaux.CreateAttribute(v.VariableName, 'Question',v.VariableLabel) # watch the indentation here

End program.

 

 

************** This works to print the Variable Name, Question, and Answer Choices, but does so from the Label and not Custom Attribute ***************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.VariableLabel     # <<<<<< How can I alter to print the custom attribute “Question” ?

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: Friday, 1 June 2018 6:15 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]




--
Jon K Peck
[hidden email]




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python (getting close)

Jeff6610

 

It’s very close. I should have mentioned this complexity.

 

The values and their labels actually look like this:

 

1 Answer One

2 Answer Two

3 Answer Three

10 Answer Ten

11 Answer Eleven

 

This means that the first suggestion of yours with the cmp function won’t work, but…

 

The code below using your example with ValueLabelsTyped sorts properly, however, it adds an unwanted decimal like below. Is there an easy way to remove this ?  I tried the second code block with a substring function, but that returned an error.

 

 

1.0 Answer One

2.0 Answer Two

3.0 Answer Three

10.0 Answer Ten

11.0 Answer Eleven

 

 

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabelsTyped                                          # <<<<<<<<<<<<<<< Using “Typed” here worked, but adds a decimal .

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

********************* This returns an error ***********************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabelsTyped

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels):

         DotLocation = valueLabels[lbl].find(".0")                     # <<<<<<<<<<<<<<<<<<< This or the line below doesn’t work

         print " ",lbl[DotLocation:]," ", valueLabels[lbl]

end program.

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Sunday, 3 June 2018 5:07 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python (getting close)

 

I should have added that if these labels are retrieved from a VariableDict object, there are two properties available.  ValueLabels returns the labels indexed by the value as a string.  ValueLabelsTyped returns the indexes according to the variable type.  See the difference between these two outputs.

 

begin program.

import spssaux

 

vardict=spssaux.VariableDict("jobcat")

print vardict['jobcat'].ValueLabels

print vardict['jobcat'].ValueLabelsTyped

end program.

 

On Sat, Jun 2, 2018 at 12:04 PM, Jon Peck <[hidden email]> wrote:

You get these results because the value labels are sorted as strings.  The Python sorted function, however, allows you to supply a custom comparison function that can deal with this.

 

def cmp(x, y):

    """sort collation function that compares strings numerically

    when possible

    """

    try:

        x, y = float(x), float(y)

    except:

        pass

    return x < y and -1 or 1

    

    

 

data = ['1', '10', '5', '0', 'zabc', 'xyz']

print sorted(data, cmp=cmp)

 

This function first tries to convert both arguments for the comparison to numbers.  If that works, then the return expression is based on a numeric comparison.  If one or both arguments cannot be converted to numbers, the result is based on a string comparison.

 

The cmp function is supposed to return -1, 0, or 1 according to whether x<y, x=y, or x>y.  I ignored the equality case, since the values in a list of value labels should all  be different.

 

On Fri, Jun 1, 2018 at 9:50 PM, Jeff <[hidden email]> wrote:

 

Thanks Jon,

 

I just about have this. The code below is working for me to print Questions (in a custom attribute) and corresponding value labels from a dataset.

 

The only small issue now is that the sorted function in the code below produces value labels sorted in the fashion below. Is there an easy way to correct this so that 10 comes after 9 etc.

 

1

10

11

2

3

4

5

Etc.

 

Best,

 

Jeff

 

 

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]                     #<<<<<<<<<<<<<<  sorted here is slightly problematic

end program.

 

 

 

From: Jon Peck <[hidden email]>
Sent: Friday, 1 June 2018 11:47 PM
To: Jeff <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Modifying syntax or Variable Labels with Python (getting close)

 

Custom attributes are available for a variable in a VariableDict object as a dictionary named Attributes.  For example, you might write

vardict = spssaux.VariableDict()

print vardict['jobcat'].Attributes['question']

 

This assumes that the question attribute exists.  You could print the whole set of attributes for a variable like this.

print vardict['jobcat'].Attributes

That would just show {} if the variable has no attributes.

 

Note that in regular Statistics syntax, the CODEBOOK procedure can print all the metadata, including custom attributes.

 

On Fri, Jun 1, 2018 at 5:09 AM, Jeff <[hidden email]> wrote:

 

I’m getting closer to something that is starting to work for my application. (taking a few things from different suggestions and modifying)

 

The first program does the trick for the first part.

 

The second is close to do some printing – I can print the original variable label, but can find no documentation to tell me how to print a custom attribute from a variable dictionary.

 

I’ve marked the problematic line – can anyone tell me how to modify?

 

Thanks

 

Jeff

 

 

 

**** This works fine *****.

********** Truncate the Label where the #showMoreInfo HTLM code starts  and then place this into Custom Attribute "Question" **********.

Begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

  spssaux.CreateAttribute(v.VariableName, 'Question',v.VariableLabel) # watch the indentation here

End program.

 

 

************** This works to print the Variable Name, Question, and Answer Choices, but does so from the Label and not Custom Attribute ***************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.VariableLabel     # <<<<<< How can I alter to print the custom attribute “Question” ?

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: Friday, 1 June 2018 6:15 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]



 

--

Jon K Peck
[hidden email]



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python (getting close)

Jon Peck
The valueLabels or valueLabelsType properties are dictionaries of the labels.  The cmp function I posted expects just the keys.  If you are passing the whole dictionary, the function will receive (key, value) pairs.  Here is a version of cmp that works with pairs but sorts by the keys.

def cmp(x, y):
    """sort collation function that compares strings numerically
    when possible
    """
    try:
        xx, yy = float(x[0]), float(y[0])
    except:
        xx, yy = x[0], y[0]
    return xx < yy and -1 or 1

If you use ValueLabelsTyped and the variable is numeric, the keys are returned as float values (with decimal), because the values do not have to be integers.  If you are processing a numeric, integer variable, you can convert the keys to integers and sort with code like this.  It converts the keys to integers are the values are read, so the sort will produce the value labels sorted numerically by the key (value being labelled).

begin program.
import spssaux

vardict = spssaux.VariableDict()
vl =  sorted([(int(v[0]), v[1]) for v in vardict['jobcat'].ValueLabelsTyped.items()])
print vl
end program.



On Sun, Jun 3, 2018 at 12:43 AM, Jeff <[hidden email]> wrote:

 

It’s very close. I should have mentioned this complexity.

 

The values and their labels actually look like this:

 

1 Answer One

2 Answer Two

3 Answer Three

10 Answer Ten

11 Answer Eleven

 

This means that the first suggestion of yours with the cmp function won’t work, but…

 

The code below using your example with ValueLabelsTyped sorts properly, however, it adds an unwanted decimal like below. Is there an easy way to remove this ?  I tried the second code block with a substring function, but that returned an error.

 

 

1.0 Answer One

2.0 Answer Two

3.0 Answer Three

10.0 Answer Ten

11.0 Answer Eleven

 

 

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabelsTyped                                          # <<<<<<<<<<<<<<< Using “Typed” here worked, but adds a decimal .

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

********************* This returns an error ***********************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabelsTyped

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels):

         DotLocation = valueLabels[lbl].find(".0")                     # <<<<<<<<<<<<<<<<<<< This or the line below doesn’t work

         print " ",lbl[DotLocation:]," ", valueLabels[lbl]

end program.

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Sunday, 3 June 2018 5:07 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python (getting close)

 

I should have added that if these labels are retrieved from a VariableDict object, there are two properties available.  ValueLabels returns the labels indexed by the value as a string.  ValueLabelsTyped returns the indexes according to the variable type.  See the difference between these two outputs.

 

begin program.

import spssaux

 

vardict=spssaux.VariableDict("jobcat")

print vardict['jobcat'].ValueLabels

print vardict['jobcat'].ValueLabelsTyped

end program.

 

On Sat, Jun 2, 2018 at 12:04 PM, Jon Peck <[hidden email]> wrote:

You get these results because the value labels are sorted as strings.  The Python sorted function, however, allows you to supply a custom comparison function that can deal with this.

 

def cmp(x, y):

    """sort collation function that compares strings numerically

    when possible

    """

    try:

        x, y = float(x), float(y)

    except:

        pass

    return x < y and -1 or 1

    

    

 

data = ['1', '10', '5', '0', 'zabc', 'xyz']

print sorted(data, cmp=cmp)

 

This function first tries to convert both arguments for the comparison to numbers.  If that works, then the return expression is based on a numeric comparison.  If one or both arguments cannot be converted to numbers, the result is based on a string comparison.

 

The cmp function is supposed to return -1, 0, or 1 according to whether x<y, x=y, or x>y.  I ignored the equality case, since the values in a list of value labels should all  be different.

 

On Fri, Jun 1, 2018 at 9:50 PM, Jeff <[hidden email]> wrote:

 

Thanks Jon,

 

I just about have this. The code below is working for me to print Questions (in a custom attribute) and corresponding value labels from a dataset.

 

The only small issue now is that the sorted function in the code below produces value labels sorted in the fashion below. Is there an easy way to correct this so that 10 comes after 9 etc.

 

1

10

11

2

3

4

5

Etc.

 

Best,

 

Jeff

 

 

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.Attributes['Question']

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]                     #<<<<<<<<<<<<<<  sorted here is slightly problematic

end program.

 

 

 

From: Jon Peck <[hidden email]>
Sent: Friday, 1 June 2018 11:47 PM
To: Jeff <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Modifying syntax or Variable Labels with Python (getting close)

 

Custom attributes are available for a variable in a VariableDict object as a dictionary named Attributes.  For example, you might write

vardict = spssaux.VariableDict()

print vardict['jobcat'].Attributes['question']

 

This assumes that the question attribute exists.  You could print the whole set of attributes for a variable like this.

print vardict['jobcat'].Attributes

That would just show {} if the variable has no attributes.

 

Note that in regular Statistics syntax, the CODEBOOK procedure can print all the metadata, including custom attributes.

 

On Fri, Jun 1, 2018 at 5:09 AM, Jeff <[hidden email]> wrote:

 

I’m getting closer to something that is starting to work for my application. (taking a few things from different suggestions and modifying)

 

The first program does the trick for the first part.

 

The second is close to do some printing – I can print the original variable label, but can find no documentation to tell me how to print a custom attribute from a variable dictionary.

 

I’ve marked the problematic line – can anyone tell me how to modify?

 

Thanks

 

Jeff

 

 

 

**** This works fine *****.

********** Truncate the Label where the #showMoreInfo HTLM code starts  and then place this into Custom Attribute "Question" **********.

Begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

  spssaux.CreateAttribute(v.VariableName, 'Question',v.VariableLabel) # watch the indentation here

End program.

 

 

************** This works to print the Variable Name, Question, and Answer Choices, but does so from the Label and not Custom Attribute ***************.

begin program.

import spss, spssaux

vardict = spssaux.VariableDict()

for v in vardict:

    print "\n", "Variable Name: ", v.VariableName,"\n","Question: ",v.VariableLabel     # <<<<<< How can I alter to print the custom attribute “Question” ?

    valueLabels = v.ValueLabels

    if valueLabels:

       print "Answer Choices:"

       for lbl in sorted(valueLabels): print " ", lbl," ", valueLabels[lbl]

end program.

 

 

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: Friday, 1 June 2018 6:15 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]



 

--

Jon K Peck
[hidden email]



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD