Modifying syntax or Variable Labels with Python

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Modifying syntax or Variable Labels with Python

Jeff6610

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

John F Hall

I'm sure JKP or one of the coding gurus will help with specifics.  If the syntax results in variable labels with up to 256 characters, I would tend to go with that.  If you click on Labels in the DE to highlight the whole column you can use CTRL+C and CTRL+V to copy the labels into a Word or notepad file where they are easier to edit.  CAPI/BLAISE may make life easier for questionnaire completion (by personal interview or on-line) but it makes for clutter for the (secondary) survey analyst.  I find it helps to get hold of a facsimile questionnaire, but even these are complex these days and ingenuity is often needed to generate meaningful (and reasonably short) labels.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

Website:     Journeys in Survey Research

Course:       Survey Analysis Workshop (SPSS)

Research:   Subjective Social Indicators (Quality of Life)

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 05:45
To: [hidden email]
Subject: Modifying syntax or Variable Labels with Python

 

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Jeff6610

 

Thanks John,

 

I had considered the option to edit in a word processor, but the survey is complex and will go through multiple stages of refinement. I’m hoping to spend time putting together some type of code so that I can simply re-generate these labels and ultimately a codebook for printing each time I change the survey questions and/or answer categories.

 

…and besides, it’s time I learn a bit about python. …have started to read and experiment, but I’m finding near zero on line about what I’m after.

 

Jeff

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Thursday, 31 May 2018 3:19 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I'm sure JKP or one of the coding gurus will help with specifics.  If the syntax results in variable labels with up to 256 characters, I would tend to go with that.  If you click on Labels in the DE to highlight the whole column you can use CTRL+C and CTRL+V to copy the labels into a Word or notepad file where they are easier to edit.  CAPI/BLAISE may make life easier for questionnaire completion (by personal interview or on-line) but it makes for clutter for the (secondary) survey analyst.  I find it helps to get hold of a facsimile questionnaire, but even these are complex these days and ingenuity is often needed to generate meaningful (and reasonably short) labels.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

Website:     Journeys in Survey Research

Course:       Survey Analysis Workshop (SPSS)

Research:   Subjective Social Indicators (Quality of Life)

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 05:45
To: [hidden email]
Subject: Modifying syntax or Variable Labels with Python

 

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Andy W
In reply to this post by Jeff6610
The syntax file (sps) is plain text, so importing that into python and trying
to clean up the variable label strings using regex is definitely an option.
It will be a bit of work though given your original example. It wouldn't be
too bad if junk like

#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;  
border: none;    color: white;    padding: 4px 8px;    text-align: center;  
text-decoration: none;    display: inline-block;    font-size: 10pxpx;  
margin: 4px 2px;    cursor: pointer;           }

was all on one line, but the splitting up of multiple lines makes it a bit
more painful to deal with.

The example blog post I linked to earlier what it did was take meta-data in
an excel file and apply that to SPSS variable labels. If you have external
data defining the meta-data in a spreadsheet or other standardized way that
may be easier to deal with then cleaning up the syntax file.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Art Kendall
In reply to this post by Jeff6610
Does the string "VARIABLE LABELS" occur only once or is it on every line?

You can think of the syntax file as data and read it in as FREE or FIXED,
etc.
If your data is in fact one line something like this untested syntax
should work.
...
string MyNewVar(a256).
compute FirstQuote = index(MyInVar,'"').
compute FirstPound = index(MYInVar, '#').
compute MyNewVar = char.substr(MyInVar,FirstQuote,FirstPound).
compute MyNewVar = concat(MyNewVar,'"').
...
alter type MyNewVar (a=amin).





-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Rich Ulrich
In reply to this post by Andy W

I agree that the multi-line format is a big complexity for editing.

However, for the example shown, all the lines after the first line (with

VARIABLE LABEL) can be discarded because it only has that HTML data.


If every line which is actually useful starts with VARIABLE LABEL, you can

adapt David's approach - re-write those lines using SPSS, while simply

ignoring every line that that does not start with those words.


Instead of editing my main syntax in a long-term project, I would make

VARIABLE LABELS into a syntax file that gets INCLUDEd, or whatever, so

that revisions of labels never touch the main processing syntax at all.


 --
Rich Ulrich



From: SPSSX(r) Discussion <[hidden email]> on behalf of Andy W <[hidden email]>
Sent: Thursday, May 31, 2018 7:01 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python
 
The syntax file (sps) is plain text, so importing that into python and trying
to clean up the variable label strings using regex is definitely an option.
It will be a bit of work though given your original example. It wouldn't be
too bad if junk like

#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;  
border: none;    color: white;    padding: 4px 8px;    text-align: center;  
text-decoration: none;    display: inline-block;    font-size: 10pxpx;  
margin: 4px 2px;    cursor: pointer;           }

was all on one line, but the splitting up of multiple lines makes it a bit
more painful to deal with.

The example blog post I linked to earlier what it did was take meta-data in
an excel file and apply that to SPSS variable labels. If you have external
data defining the meta-data in a spreadsheet or other standardized way that
may be easier to deal with then cleaning up the syntax file.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

PRogman
In reply to this post by Andy W
The syntax file could be loaded into any editor (Notepad++/MS Word) and then
do a Search and Replace to change every sequence of the characters
double-quote plus-sign new-line double-quote (in MS Word: "+^p") to nothing
at all, to create a single line for each VARIABLE LABEL. The resulting text
file could then be read into SPSS as long string (eg A2000) to be
manipulated with ALTER TYPE  ... AMIN, CHAR.SUBSTR, VARIABLE ATTRIBUTE, etc.

/PRogman


Andy W wrote

> The syntax file (sps) is plain text, so importing that into python and
> trying
> to clean up the variable label strings using regex is definitely an
> option.
> It will be a bit of work though given your original example. It wouldn't
> be
> too bad if junk like
>
> #showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;  
> border: none;    color: white;    padding: 4px 8px;    text-align: center;  
> text-decoration: none;    display: inline-block;    font-size: 10pxpx;  
> margin: 4px 2px;    cursor: pointer;           }
>
> was all on one line, but the splitting up of multiple lines makes it a bit
> more painful to deal with.
>
> The example blog post I linked to earlier what it did was take meta-data
> in
> an excel file and apply that to SPSS variable labels. If you have external
> data defining the meta-data in a spreadsheet or other standardized way
> that
> may be easier to deal with then cleaning up the syntax file.
>
>
>
> -----
> Andy W

> apwheele@

> http://andrewpwheeler.wordpress.com/
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Jon Peck
In reply to this post by Jeff6610
As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.



On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Jeff6610

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

  1. Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).
  2. Truncate the labels using a python program – Yes, please send me the one you have.
  3. Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

John F Hall

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

Website:     Journeys in Survey Research

Course:       Survey Analysis Workshop (SPSS)

Research:   Subjective Social Indicators (Quality of Life)

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

  1. Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).
  2. Truncate the labels using a python program – Yes, please send me the one you have.
  3. Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Jeff6610

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

Website:     Journeys in Survey Research

Course:       Survey Analysis Workshop (SPSS)

Research:   Subjective Social Indicators (Quality of Life)

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

spss.giesel@yahoo.de
I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.
 
---
Mario Giesel
Munich, Germany


Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:


 
I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.
 
I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.
 
Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.
 
I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.
 
Best,
 
Jeff
 
 
 
 
 
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python
 
Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.
 
What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.
 
John F Hall  MA (Cantab) Dip Ed (Dunelm)
[Retired academic survey researcher]
 
Email:          [hidden email]
 
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python
 
 
You have the general idea Jon,
 
I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.
 
What seems like a good strategy:
 
1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).
2)      Truncate the labels using a python program – Yes, please send me the one you have.
3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.
 
1 is no problem
2 – I’ve got started below, but it will only work for the first variable label for some reason.
3 – haven’t figured this one out yet, but am coming close
 
Thanks in advance.
 
Jeff
 
 
********** Trial Code – not working yet *********.
begin program.
import spss, spssaux
vardict=spssaux.VariableDict()
for v in vardict:
  EndLocation = v.VariableLabel.find("#showMoreInfo")
  if EndLocation > 0:
     v.VariableLabel = v.VariableLabel[:EndLocation]
     print EndLocation
end program.
 
 
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python
 
As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.
 
But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.
 
So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.
 
 
 
On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:
 
I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.
 
The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:
 
VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+
"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+
""".
 
I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.
 
 
I’m trying to do something like this:
 
For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :
  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)
 
 
Can someone point me in the right direction?
 
Thanks in advance,

Jeff
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


 
--
Jon K Peck
[hidden email]
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

John F Hall

Here's one of the questionnaires for the NORC GSS 2008 http://gss.norc.org/documents/quex/BALLOT1XSECEnglish.pdf as downloaded to work with:

 

Stephen A. Sweet and Karen Grace-Martin 
Data Analysis with SPSS: A First Course in Applied Statistics
(4th Edition, Pearson, 2010)

 

Bit easier to use than Jeff's, but software to extract a "facsimile" questionnaire would still make life easier.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

Website:     Journeys in Survey Research

Course:       Survey Analysis Workshop (SPSS)

Research:   Subjective Social Indicators (Quality of Life)

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel
Sent: 01 June 2018 10:15
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

I've tested this with your example string using Notepad++ as regex machine:

1. Remove string concatenation via regular expressions
Replace
" *\+ *\r\n"
 by
""

2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command

a) Replace
VARIABLE LABELS
by
VARIABLE ATTRIBUTE VARIABLES=

b) REPLACE
([^"]+)(.+)
by
\1 ATTRIBUTE = varinfo\(\2"\).

3. Read n characters from attribute varinfo into the variable label

BEGIN PROGRAM PYTHON.
import spss
syntax = ""
n = 100
for i in xrange(spss.GetVariableCount()):
    label = spss.GetVarAttributes(i,"varinfo")[0][:n]
    syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label)
print syntax
END PROGRAM.

 

---

Mario Giesel

Munich, Germany

 

Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018:

 

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Modifying syntax or Variable Labels with Python

Jon Peck
In reply to this post by Jeff6610
Although the main Python apis are documented in the help system, unfortunately the extra modules such as spssaux, spssaux2, extendedTransforms, and spssdata are not, although some are partially covered in the Programming and Data Management book.  It is expected that you actually look at the source code for these as every class and function begins with a docstring that explains the usage, and there may be additional comments and examples in those files.

It really helps to have a good Python IDE to work with Python code.  The Python distribution includes IDLE, but that is pretty weak.  I use Wing IDE, which is an inexpensive commerical product, but there are a number of free IDEs and even Wing has a free subset version.  Although I have not used it, I have heard from users that the free version of PyCharm is pretty good.

On Fri, Jun 1, 2018 at 12:31 AM, Jeff <[hidden email]> wrote:

 

I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent.

 

I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work.

 

Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC.

 

I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving.

 

Best,

 

Jeff

 

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall
Sent: Friday, 1 June 2018 3:42 PM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column.  That would make the Data Editor easier to navigate.

 

What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added).  Users could then work with both questionnaire and Data Editor open.

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

Website:     Journeys in Survey Research

Course:       Survey Analysis Workshop (SPSS)

Research:   Subjective Social Indicators (Quality of Life)

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff
Sent: 31 May 2018 22:14
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

 

You have the general idea Jon,

 

I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line.

 

What seems like a good strategy:

 

1)      Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels).

2)      Truncate the labels using a python program – Yes, please send me the one you have.

3)      Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset.

 

1 is no problem

2 – I’ve got started below, but it will only work for the first variable label for some reason.

3 – haven’t figured this one out yet, but am coming close

 

Thanks in advance.

 

Jeff

 

 

********** Trial Code – not working yet *********.

begin program.

import spss, spssaux

vardict=spssaux.VariableDict()

for v in vardict:

  EndLocation = v.VariableLabel.find("#showMoreInfo")

  if EndLocation > 0:

     v.VariableLabel = v.VariableLabel[:EndLocation]

     print EndLocation

end program.

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck
Sent: Friday, 1 June 2018 6:00 AM
To: [hidden email]
Subject: Re: Modifying syntax or Variable Labels with Python

 

As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed.

 

But I  have not seen anyone address the issue of using a variable label vs using a custom attribute.  Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor.  They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax.  They don't have the length limitation of variable labels either.

 

So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length.  So, there may be other metadata such as interviewer instructions that also should get this treatment.

 

 

 

On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:

 

I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work.

 

The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below:

 

VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest {    background-color: #4CAF50;    border: none;    color: whit"+

"e;    padding: 4px 8px;    text-align: center;    text-decoration: none;    display: inline-block;    font-size: 10pxpx;    margin: 4px 2px;    cursor: pointer;           }If you are enrolled part-time, you should answer, ""Yes."+

""".

 

I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset.

 

 

I’m trying to do something like this:

 

For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) :

  CustomAttribute “Question” = Substring(First-Instance-of-“  to   First-Instance-of-#)

 

 

Can someone point me in the right direction?

 

Thanks in advance,


Jeff

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD