I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work. The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below: VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest { background-color: #4CAF50; border: none; color: whit"+ "e; padding: 4px 8px; text-align: center; text-decoration: none; display: inline-block; font-size: 10pxpx; margin: 4px 2px; cursor: pointer; }If you are enrolled part-time, you should answer, ""Yes."+ """. I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset. I’m trying to do something like this: For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) : CustomAttribute “Question” = Substring(First-Instance-of-“ to First-Instance-of-#) Can someone point me in the right direction? Thanks in advance,
|
I'm sure JKP or one of the coding gurus will help with specifics. If the syntax results in variable labels with up to 256 characters, I would tend to go with that. If you click on Labels in the DE to highlight the whole column you can use CTRL+C and CTRL+V to copy the labels into a Word or notepad file where they are easier to edit. CAPI/BLAISE may make life easier for questionnaire completion (by personal interview or on-line) but it makes for clutter for the (secondary) survey analyst. I find it helps to get hold of a facsimile questionnaire, but even these are complex these days and ingenuity is often needed to generate meaningful (and reasonably short) labels. John F Hall MA (Cantab) Dip Ed (Dunelm) [Retired academic survey researcher] Email: [hidden email] Website: Journeys in Survey Research Course: Survey Analysis Workshop (SPSS) From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work. The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below: VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest { background-color: #4CAF50; border: none; color: whit"+ "e; padding: 4px 8px; text-align: center; text-decoration: none; display: inline-block; font-size: 10pxpx; margin: 4px 2px; cursor: pointer; }If you are enrolled part-time, you should answer, ""Yes."+ """. I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset. I’m trying to do something like this: For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) : CustomAttribute “Question” = Substring(First-Instance-of-“ to First-Instance-of-#) Can someone point me in the right direction? Thanks in advance,
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks John, I had considered the option to edit in a word processor, but the survey is complex and will go through multiple stages of refinement. I’m hoping to spend time putting together some type of code so that I can simply re-generate these labels and ultimately a codebook for printing each time I change the survey questions and/or answer categories. …and besides, it’s time I learn a bit about python. …have started to read and experiment, but I’m finding near zero on line about what I’m after. Jeff From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall I'm sure JKP or one of the coding gurus will help with specifics. If the syntax results in variable labels with up to 256 characters, I would tend to go with that. If you click on Labels in the DE to highlight the whole column you can use CTRL+C and CTRL+V to copy the labels into a Word or notepad file where they are easier to edit. CAPI/BLAISE may make life easier for questionnaire completion (by personal interview or on-line) but it makes for clutter for the (secondary) survey analyst. I find it helps to get hold of a facsimile questionnaire, but even these are complex these days and ingenuity is often needed to generate meaningful (and reasonably short) labels. John F Hall MA (Cantab) Dip Ed (Dunelm) [Retired academic survey researcher] Email: [hidden email] Website: Journeys in Survey Research Course: Survey Analysis Workshop (SPSS) From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff I have an SPSS syntax file that was generated automatically from 3rd party survey software. It’s placing a few hundred variable labels into an spss syntax file that when run will create an spss dataset, but has a few problems that I hope to fix with python code (but I’m new to python) or whatever will work. The faulty syntax file has labels that exceed a 256 character maximum because they contain unnecessary information that comes after important information (a survey question). I’m hoping to extract the good info and place it into a custom attribute called “Question”. The original syntax looks like what I’ve listed below: VARIABLE LABELS AttendSchool "Are you currently ENROLLED IN A UNIVERSITY OR ANY OTHER FORM OF HIGHER EDUCATION or were you enrolled within the past year?#showMoreInfo {display: none;}.buttonTest { background-color: #4CAF50; border: none; color: whit"+ "e; padding: 4px 8px; text-align: center; text-decoration: none; display: inline-block; font-size: 10pxpx; margin: 4px 2px; cursor: pointer; }If you are enrolled part-time, you should answer, ""Yes."+ """. I’m unsure whether it’s best (or even possible) to use Python to access and alter a syntax file, or just run the faulty Variable Labels commands, get a few hundred warnings that the labels will be truncated when the datafile is created, and then use python to alter the variable labels within the dataset. I’m trying to do something like this: For all instances of the phrase “VARIABLE LABELS” (or for all Variable labels in the dataset) : CustomAttribute “Question” = Substring(First-Instance-of-“ to First-Instance-of-#) Can someone point me in the right direction? Thanks in advance,
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jeff6610
The syntax file (sps) is plain text, so importing that into python and trying
to clean up the variable label strings using regex is definitely an option. It will be a bit of work though given your original example. It wouldn't be too bad if junk like #showMoreInfo {display: none;}.buttonTest { background-color: #4CAF50; border: none; color: white; padding: 4px 8px; text-align: center; text-decoration: none; display: inline-block; font-size: 10pxpx; margin: 4px 2px; cursor: pointer; } was all on one line, but the splitting up of multiple lines makes it a bit more painful to deal with. The example blog post I linked to earlier what it did was take meta-data in an excel file and apply that to SPSS variable labels. If you have external data defining the meta-data in a spreadsheet or other standardized way that may be easier to deal with then cleaning up the syntax file. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jeff6610
Does the string "VARIABLE LABELS" occur only once or is it on every line?
You can think of the syntax file as data and read it in as FREE or FIXED, etc. If your data is in fact one line something like this untested syntax should work. ... string MyNewVar(a256). compute FirstQuote = index(MyInVar,'"'). compute FirstPound = index(MYInVar, '#'). compute MyNewVar = char.substr(MyInVar,FirstQuote,FirstPound). compute MyNewVar = concat(MyNewVar,'"'). ... alter type MyNewVar (a=amin). ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Andy W
I agree that the multi-line format is a big complexity for editing.
However, for the example shown, all the lines after the first line (with
VARIABLE LABEL) can be discarded because it only has that HTML data.
If every line which is actually useful starts with VARIABLE LABEL, you can adapt David's approach - re-write those lines using SPSS, while simply
ignoring every line that that does not start with those words.
Instead of editing my main syntax in a long-term project, I would make
VARIABLE LABELS into a syntax file that gets INCLUDEd, or whatever, so that revisions of labels never touch the main processing syntax at all.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Andy W <[hidden email]>
Sent: Thursday, May 31, 2018 7:01 AM To: [hidden email] Subject: Re: Modifying syntax or Variable Labels with Python The syntax file (sps) is plain text, so importing that into python and trying
to clean up the variable label strings using regex is definitely an option. It will be a bit of work though given your original example. It wouldn't be too bad if junk like #showMoreInfo {display: none;}.buttonTest { background-color: #4CAF50; border: none; color: white; padding: 4px 8px; text-align: center; text-decoration: none; display: inline-block; font-size: 10pxpx; margin: 4px 2px; cursor: pointer; } was all on one line, but the splitting up of multiple lines makes it a bit more painful to deal with. The example blog post I linked to earlier what it did was take meta-data in an excel file and apply that to SPSS variable labels. If you have external data defining the meta-data in a spreadsheet or other standardized way that may be easier to deal with then cleaning up the syntax file. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Andy W
The syntax file could be loaded into any editor (Notepad++/MS Word) and then
do a Search and Replace to change every sequence of the characters double-quote plus-sign new-line double-quote (in MS Word: "+^p") to nothing at all, to create a single line for each VARIABLE LABEL. The resulting text file could then be read into SPSS as long string (eg A2000) to be manipulated with ALTER TYPE ... AMIN, CHAR.SUBSTR, VARIABLE ATTRIBUTE, etc. /PRogman Andy W wrote > The syntax file (sps) is plain text, so importing that into python and > trying > to clean up the variable label strings using regex is definitely an > option. > It will be a bit of work though given your original example. It wouldn't > be > too bad if junk like > > #showMoreInfo {display: none;}.buttonTest { background-color: #4CAF50; > border: none; color: white; padding: 4px 8px; text-align: center; > text-decoration: none; display: inline-block; font-size: 10pxpx; > margin: 4px 2px; cursor: pointer; } > > was all on one line, but the splitting up of multiple lines makes it a bit > more painful to deal with. > > The example blog post I linked to earlier what it did was take meta-data > in > an excel file and apply that to SPSS variable labels. If you have external > data defining the meta-data in a spreadsheet or other standardized way > that > may be easier to deal with then cleaning up the syntax file. > > > > ----- > Andy W > apwheele@ > http://andrewpwheeler.wordpress.com/ > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jeff6610
As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed. But I have not seen anyone address the issue of using a variable label vs using a custom attribute. Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor. They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax. They don't have the length limitation of variable labels either. So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length. So, there may be other metadata such as interviewer instructions that also should get this treatment. On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:
|
You have the general idea Jon, I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line. What seems like a good strategy:
1 is no problem 2 – I’ve got started below, but it will only work for the first variable label for some reason. 3 – haven’t figured this one out yet, but am coming close Thanks in advance. Jeff ********** Trial Code – not working yet *********. begin program. import spss, spssaux vardict=spssaux.VariableDict() for v in vardict: EndLocation = v.VariableLabel.find("#showMoreInfo") if EndLocation > 0: v.VariableLabel = v.VariableLabel[:EndLocation] print EndLocation end program. From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed. But I have not seen anyone address the issue of using a variable label vs using a custom attribute. Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor. They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax. They don't have the length limitation of variable labels either. So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length. So, there may be other metadata such as interviewer instructions that also should get this treatment. On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:
-- Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column. That would make the Data Editor easier to navigate. What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added). Users could then work with both questionnaire and Data Editor open. John F Hall MA (Cantab) Dip Ed (Dunelm) [Retired academic survey researcher] Email: [hidden email] Website: Journeys in Survey Research Course: Survey Analysis Workshop (SPSS) From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff You have the general idea Jon, I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line. What seems like a good strategy:
1 is no problem 2 – I’ve got started below, but it will only work for the first variable label for some reason. 3 – haven’t figured this one out yet, but am coming close Thanks in advance. Jeff ********** Trial Code – not working yet *********. begin program. import spss, spssaux vardict=spssaux.VariableDict() for v in vardict: EndLocation = v.VariableLabel.find("#showMoreInfo") if EndLocation > 0: v.VariableLabel = v.VariableLabel[:EndLocation] print EndLocation end program. From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed. But I have not seen anyone address the issue of using a variable label vs using a custom attribute. Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor. They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax. They don't have the length limitation of variable labels either. So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length. So, there may be other metadata such as interviewer instructions that also should get this treatment. On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:
-- Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent. I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work. Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC. I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving. Best, Jeff From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column. That would make the Data Editor easier to navigate. What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added). Users could then work with both questionnaire and Data Editor open. John F Hall MA (Cantab) Dip Ed (Dunelm) [Retired academic survey researcher] Email: [hidden email] Website: Journeys in Survey Research Course: Survey Analysis Workshop (SPSS) From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff You have the general idea Jon, I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line. What seems like a good strategy: 1) Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels). 2) Truncate the labels using a python program – Yes, please send me the one you have. 3) Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset. 1 is no problem 2 – I’ve got started below, but it will only work for the first variable label for some reason. 3 – haven’t figured this one out yet, but am coming close Thanks in advance. Jeff ********** Trial Code – not working yet *********. begin program. import spss, spssaux vardict=spssaux.VariableDict() for v in vardict: EndLocation = v.VariableLabel.find("#showMoreInfo") if EndLocation > 0: v.VariableLabel = v.VariableLabel[:EndLocation] print EndLocation end program. From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed. But I have not seen anyone address the issue of using a variable label vs using a custom attribute. Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor. They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax. They don't have the length limitation of variable labels either. So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length. So, there may be other metadata such as interviewer instructions that also should get this treatment. On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:
-- Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I've tested this with your example string using Notepad++ as regex machine: 1. Remove string concatenation via regular expressions Replace " *\+ *\r\n" by "" 2. Transform VARIABLE LABELS command to VARIABLE ATTRIBUTE command a) Replace VARIABLE LABELS by VARIABLE ATTRIBUTE VARIABLES= b) REPLACE ([^"]+)(.+) by \1 ATTRIBUTE = varinfo\(\2"\). 3. Read n characters from attribute varinfo into the variable label BEGIN PROGRAM PYTHON. import spss syntax = "" n = 100 for i in xrange(spss.GetVariableCount()): label = spss.GetVarAttributes(i,"varinfo")[0][:n] syntax += 'VARIABLE LABELS {} "{}".\n'.format(spss.GetVariableName(i), label) print syntax END PROGRAM. --- Mario Giesel Munich, Germany Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018: I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent. I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work. Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC. I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving. Best, Jeff From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall Sent: Friday, 1 June 2018 3:42 PM To: [hidden email] Subject: Re: Modifying syntax or Variable Labels with Python Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column. That would make the Data Editor easier to navigate. What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added). Users could then work with both questionnaire and Data Editor open. John F Hall MA (Cantab) Dip Ed (Dunelm) [Retired academic survey researcher] Email: [hidden email] Website: Journeys in Survey Research Course: Survey Analysis Workshop (SPSS) From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff Sent: 31 May 2018 22:14 To: [hidden email] Subject: Re: Modifying syntax or Variable Labels with Python You have the general idea Jon, I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line. What seems like a good strategy: 1) Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels). 2) Truncate the labels using a python program – Yes, please send me the one you have. 3) Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset. 1 is no problem 2 – I’ve got started below, but it will only work for the first variable label for some reason. 3 – haven’t figured this one out yet, but am coming close Thanks in advance. Jeff ********** Trial Code – not working yet *********. begin program. import spss, spssaux vardict=spssaux.VariableDict() for v in vardict: EndLocation = v.VariableLabel.find("#showMoreInfo") if EndLocation > 0: v.VariableLabel = v.VariableLabel[:EndLocation] print EndLocation end program. From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck Sent: Friday, 1 June 2018 6:00 AM To: [hidden email] Subject: Re: Modifying syntax or Variable Labels with Python As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed. But I have not seen anyone address the issue of using a variable label vs using a custom attribute. Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor. They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax. They don't have the length limitation of variable labels either. So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length. So, there may be other metadata such as interviewer instructions that also should get this treatment. On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:
-- Jon K Peck [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD |
Here's one of the questionnaires for the NORC GSS 2008 http://gss.norc.org/documents/quex/BALLOT1XSECEnglish.pdf as downloaded to work with: Stephen A. Sweet and Karen Grace-Martin Bit easier to use than Jeff's, but software to extract a "facsimile" questionnaire would still make life easier. John F Hall MA (Cantab) Dip Ed (Dunelm) [Retired academic survey researcher] Email: [hidden email] Website: Journeys in Survey Research Course: Survey Analysis Workshop (SPSS) From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Mario Giesel I've tested this with your example string using Notepad++ as regex machine: --- Mario Giesel Munich, Germany Jeff <[hidden email]> schrieb am 8:31 Freitag, 1.Juni 2018: I agree about the desirability of having the survey software itself be able to produce a codebook version that could contain only the relevant question info. Unfortunately, it does not and instead will only produce a very complex questionnaire replica that has skip patterns imbedded and produces something that’s essentially full size for the purposes of giving it to a respondent. I’ve actually written code to produce what I need a few years ago that I wrote in Visual BASIC, but when the survey software was updated, they changed the format of the database and it will no longer work. Doing what I want in Python seems the best option. The concepts are relatively easy for me, but I don’t know the equivalent python code to what I used before in Visual BASIC. I’m very slowly figuring it out, but it’s taking many hours since I can’t find the documentation I need to make the learning curve less steep. There are only a few things that I need to make it work. …hoping that Jon or someone else can get me moving. Best, Jeff From: SPSSX(r) Discussion <[hidden email]> On Behalf Of John F Hall Once you've sorted the var lab syntax out, my preference for the Data Editor would be to put the truncated labels in the Labels column and the question routeing info into the new Custom Attribute column, then hide the column. That would make the Data Editor easier to navigate. What would make life much easier for secondary analysis and/or teaching would be software that can take the complex CAPI/BLAISE and extract a version of the questionnaire containing only the question number (if present) and text of the actual question (possibly with the variable names added). Users could then work with both questionnaire and Data Editor open. John F Hall MA (Cantab) Dip Ed (Dunelm) [Retired academic survey researcher] Email: [hidden email] Website: Journeys in Survey Research Course: Survey Analysis Workshop (SPSS) From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff You have the general idea Jon, I’ve got a few things to work, but still struggling with learning a few things in python that I can’t find documentation for on-line. What seems like a good strategy: 1) Permit the 3rd-party-generated syntax to run that places overly-long labels into the dataset with about 200 variables (most of which have the long labels). 2) Truncate the labels using a python program – Yes, please send me the one you have. 3) Place the newly-truncated labels into a custom attribute called “Question” for each variable in the dataset. 1 is no problem 2 – I’ve got started below, but it will only work for the first variable label for some reason. 3 – haven’t figured this one out yet, but am coming close Thanks in advance. Jeff ********** Trial Code – not working yet *********. begin program. import spss, spssaux vardict=spssaux.VariableDict() for v in vardict: EndLocation = v.VariableLabel.find("#showMoreInfo") if EndLocation > 0: v.VariableLabel = v.VariableLabel[:EndLocation] print EndLocation end program. From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jon Peck As several have suggested, there are several ways to edit these labels down, including editing with Statistics, using a smart editor that recognizes regular expressions such as Notepad++, and a simple Python program, which I can provide if needed. But I have not seen anyone address the issue of using a variable label vs using a custom attribute. Custom attributes are great for recording such things as question text in a survey, and these can be displayed by the CODEBOOK procedure and in the Data Editor. They can also be accessed by Python code, and creation of variable and file custom attributes is supported in syntax. They don't have the length limitation of variable labels either. So I suppose the goal is to move the question text to such an attribute but perhaps to do something else (besides truncation) for variable labels as the question text may be unwieldy due to length. So, there may be other metadata such as interviewer instructions that also should get this treatment. On Wed, May 30, 2018 at 9:44 PM, Jeff <[hidden email]> wrote:
-- Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jeff6610
Although the main Python apis are documented in the help system, unfortunately the extra modules such as spssaux, spssaux2, extendedTransforms, and spssdata are not, although some are partially covered in the Programming and Data Management book. It is expected that you actually look at the source code for these as every class and function begins with a docstring that explains the usage, and there may be additional comments and examples in those files. It really helps to have a good Python IDE to work with Python code. The Python distribution includes IDLE, but that is pretty weak. I use Wing IDE, which is an inexpensive commerical product, but there are a number of free IDEs and even Wing has a free subset version. Although I have not used it, I have heard from users that the free version of PyCharm is pretty good. On Fri, Jun 1, 2018 at 12:31 AM, Jeff <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |