SPSSX Discussion

Python String Substitution Problem

Classic

List

Threaded

6 messages Options

boschw

Python String Substitution Problem

Hi all,

I’m attempting to learn how to use Python a bit and have hit a problem I can’t solve. I am working on this mostly to learn Python, not because doing it this way is integral and necessary to successfully completing the project. So, while I’m sure there are other/better ways to do it, I would appreciate a Python-based solution.

In the dataset I am working in, responses where the intent is not readily identifiable (messy writing, markings hit more than 1 check-box on the same question, etc) are given the value of -5 to indicate that their response intent is unclear. -5 is never, on any question, a valid response. It always indicates unclear response intent. I am trying to create a simple program that for each question will output a list of ID’s whose response was coded as -5. Code I am using is included below.

I am using Python to loop through all of the variables in the dataset. I need to use slightly different code for numeric and string variables (i.e. select if var eq -5 vs. select if var eq “-5”). I have figured out how to get it to run the different sets of code based on whether the variable is string or numeric. It runs correctly for numeric variables, but for some reason I always get 2 error messages on the section of the code relating to strings. Also, when I run the code outside of the python block and just insert a string variable in place of %s it runs perfectly.

The SPSS error reads “A relational operator may have two numeric operands or two character string operands…”

The Python error reads:

File "<string>", line 19, in <module>

File "C:\Program Files\Python2.7.1\lib\site-packages\spss210\spss\spss.py", line 1525, in Submit

raise SpssError,error

spss.errMsg.SpssError: [errLevel 3] Serious error.

What is happening?

TIA,

Walker

CODE USED:

begin program.

import spss, spssaux

vdict=spssaux.VariableDict()

varlist=vdict.range(start="LocationCode", end="Q130")

for i in range(len(varlist)):

spss.StartDataStep()

myvar=varlist[i]

datasetObj = spss.Dataset()

varObj = datasetObj.varlist[i]

if varObj.type == 0:

spss.EndDataStep()

spss.Submit("""

temporary.

select if %s eq -5.

summarize Seq# %s

/format=validlist nocasenum nototal

/title="-5 Cases for %s"

/cells=none.

""" %(myvar, myvar, myvar))

elif varObj.type >= 1:

spss.EndDataStep()

spss.Submit("""

temporary.

select if %s eq '-5'.

summarize Seq# %s

/format=validlist nocasenum nototal

/title='-5 Cases for %s'

/cells=none.

""" %(myvar, myvar, myvar))

end program.

Jon K Peck

Re: Python String Substitution Problem

Things are a bit mixed up here. The main problem is that a VariableDict object has already all the metadata properties, so you don't want to also use the dataset apis. Also, the error message is indicating a malformed SPSS command, so it's best to construct the command and then print it so you can see what Statistics is seeing. I would also recommend using the named substitution parameter method of constructing the command rather than the positional form you are using.

Here is an example of doing this task directly. I've left out some of the command modifiers for brevity. It assumes that the data file is already open. Feel free to ask more questions about this approach.

begin program.
import spss, spssaux

vdict = spssaux.VariableDict()
varlist = vdict.range(start="LocationCode", end="Q130")
cmdset = r"""temporary.
select if %(v)s eq %(criterion)s.
summarize id %(v)s."""

for v in varlist:
if vdict[v].VariableType == 0:
criterion = -5
else:
criterion = "'-5'"
spss.Submit(cmdset % locals())
end program.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: "Walker J. Bosch" <[hidden email]>
To: [hidden email],
Date: 11/29/2012 12:23 PM
Subject: [SPSSX-L] Python String Substitution Problem
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi all,

I’m attempting to learn how to use Python a bit and have hit a problem I can’t solve. I am working on this mostly to learn Python, not because doing it this way is integral and necessary to successfully completing the project. So, while I’m sure there are other/better ways to do it, I would appreciate a Python-based solution.

In the dataset I am working in, responses where the intent is not readily identifiable (messy writing, markings hit more than 1 check-box on the same question, etc) are given the value of -5 to indicate that their response intent is unclear. -5 is never, on any question, a valid response. It always indicates unclear response intent. I am trying to create a simple program that for each question will output a list of ID’s whose response was coded as -5. Code I am using is included below.

I am using Python to loop through all of the variables in the dataset. I need to use slightly different code for numeric and string variables (i.e. select if var eq -5 vs. select if var eq “-5”). I have figured out how to get it to run the different sets of code based on whether the variable is string or numeric. It runs correctly for numeric variables, but for some reason I always get 2 error messages on the section of the code relating to strings. Also, when I run the code outside of the python block and just insert a string variable in place of %s it runs perfectly.

The SPSS error reads “A relational operator may have two numeric operands or two character string operands…”
The Python error reads:
File "<string>", line 19, in <module>
File "C:\Program Files\Python2.7.1\lib\site-packages\spss210\spss\spss.py", line 1525, in Submit
raise SpssError,error
spss.errMsg.SpssError: [errLevel 3] Serious error.

What is happening?

TIA,
Walker

CODE USED:
begin program.
import spss, spssaux
vdict=spssaux.VariableDict()
varlist=vdict.range(start="LocationCode", end="Q130")
for i in range(len(varlist)):
spss.StartDataStep()
myvar=varlist[i]
datasetObj = spss.Dataset()
varObj = datasetObj.varlist[i]
if varObj.type == 0:
spss.EndDataStep()
spss.Submit("""
temporary.
select if %s eq -5.
summarize Seq# %s
/format=validlist nocasenum nototal
/title="-5 Cases for %s"
/cells=none.
""" %(myvar, myvar, myvar))
elif varObj.type >= 1:
spss.EndDataStep()
spss.Submit("""
temporary.
select if %s eq '-5'.
summarize Seq# %s
/format=validlist nocasenum nototal
/title='-5 Cases for %s'
/cells=none.
""" %(myvar, myvar, myvar))
end program.

boschw

Re: Python String Substitution Problem

A couple of follow-up question just for my own edification:

1) It looks like the “locals()” function allows you to substitute values of variables that have been defined within the program. Is that right?

2) What is the purpose of the “r” that appears on the “spss.Submit(r”””…” line?

3) This question is largely irrelevant given the improved code I now have, but I’m curious about the problem I was having. With the code I submitted, if the variable “Seq#” was the first variable in the dataset the program would crash. If it was the last variable in the dataset the program would run correctly all the way through. Ruben confirmed this as well. Jon, do you have any idea why this would be?

Thanks again for the help. Much appreciated.

Walker

Phone: (651) 280-2679

Fax: (651) 280-3679

IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. If you have received this message in error, please notify the sender immediately and arrange for the return or destruction of these documents

From: Jon K Peck [mailto:[hidden email]]
Sent: Thursday, November 29, 2012 9:20 PM
To: Walker J. Bosch
Cc: [hidden email]
Subject: Re: [SPSSX-L] Python String Substitution Problem

Jon K Peck

Re: Python String Substitution Problem

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: "Walker J. Bosch" <[hidden email]>
To: [hidden email],
Date: 11/30/2012 08:26 AM
Subject: Re: [SPSSX-L] Python String Substitution Problem
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Thank you very much to Jon and also to Ruben, who helped me out off-list. I am just getting started and have no programming background (aside from SPSS) as was likely evident given the code I submitted. Both provided simple, working solutions.

A couple of follow-up question just for my own edification:
1) It looks like the “locals()” function allows you to substitute values of variables that have been defined within the program. Is that right?
>>>Yes. locals() returns a dictionary of all the local Python variables, so these can be referred to in the string substitution expressions.
2) What is the purpose of the “r” that appears on the “spss.Submit(r”””…” line?
>>>By default, the backslash is used in Python strings to indicate an escape character. So, for example, \t would be converted to a tab character. Definitely not what you want when writing path specifications. The r prevents backslash from being interpreted that way. You can also write paths with forward slashes.
3) This question is largely irrelevant given the improved code I now have, but I’m curious about the problem I was having. With the code I submitted, if the variable “Seq#” was the first variable in the dataset the program would crash. If it was the last variable in the dataset the program would run correctly all the way through. Ruben confirmed this as well. Jon, do you have any idea why this would be?
>>>Order should be irrelevant. I would generate the actual command being submitted to see what was really being run.

Thanks again for the help. Much appreciated.

Walker

Phone: (651) 280-2679
Fax: (651) 280-3679

IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. If you have received this message in error, please notify the sender immediately and arrange for the return or destruction of these documents

From: Jon K Peck [mailto:peck@...]
Sent: Thursday, November 29, 2012 9:20 PM
To: Walker J. Bosch
Cc: [hidden email]
Subject: Re: [SPSSX-L] Python String Substitution Problem

Things are a bit mixed up here. The main problem is that a VariableDict object has already all the metadata properties, so you don't want to also use the dataset apis. Also, the error message is indicating a malformed SPSS command, so it's best to construct the command and then print it so you can see what Statistics is seeing. I would also recommend using the named substitution parameter method of constructing the command rather than the positional form you are using.

Here is an example of doing this task directly. I've left out some of the command modifiers for brevity. It assumes that the data file is already open. Feel free to ask more questions about this approach.

begin program.
import spss, spssaux

vdict = spssaux.VariableDict()
varlist = vdict.range(start="LocationCode", end="Q130")
cmdset = r"""temporary.
select if %(v)s eq %(criterion)s.
summarize id %(v)s."""

for v in varlist:
if vdict[v].VariableType == 0:
criterion = -5
else:
criterion = "'-5'"
spss.Submit(cmdset % locals())
end program.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@...
new phone: 720-342-5621

From: "Walker J. Bosch" <walker.bosch@...>
To: [hidden email],
Date: 11/29/2012 12:23 PM
Subject: [SPSSX-L] Python String Substitution Problem
Sent by: "SPSSX(r) Discussion" <[hidden email]>

boschw

Re: Python String Substitution Problem

In reply to this post by Jon K Peck

While looping through all of the variables in the range, when the routine encounters a variable that does not have any cases that meet the criteria (var = -5, or ‘-5’ for strings) it raises an SPSS Warning and a Python Error and stops looping through the remainder of the variables in the range. How can I make it continue to loop through the rest of the variables? Code used and errors raised shown below:

SPSS Warning: “No cases were input to this procedure…Execution of this command stops”.

Python Error:

Traceback (most recent call last):

File "<string>", line 16, in <module>

File "C:\Program Files\Python2.7.1\lib\site-packages\spss210\spss\spss.py", line 1525, in Submit

raise SpssError,error

spss.errMsg.SpssError: [errLevel 3] Serious error.

Code used:

begin program.

import spss, spssaux

vdict = spssaux.VariableDict()

varlist = vdict.range(start="LocationCode", end="Q130")

cmdset = r"""temporary.

select if %(v)s eq %(criterion)s.

summarize Seq# %(v)s

/format=validlist nocasenum nototal

/title="-5 Cases for %(v)s"

/cells=none."""

for v in varlist:

if vdict[v].VariableType == 0:

criterion = -5

else:

criterion = "'-5'"

spss.Submit(cmdset % locals())

end program.

Walker

Phone: (651) 280-2679

Fax: (651) 280-3679

From: Jon K Peck [mailto:[hidden email]]
Sent: Thursday, November 29, 2012 9:20 PM
To: Walker J. Bosch
Cc: [hidden email]
Subject: Re: [SPSSX-L] Python String Substitution Problem

Jon K Peck

Re: Python String Substitution Problem

The trouble is that the command has no data to process in that case, so you get an SPSS error. You can keep the script running by using a try/except construct. It would look like this.

try:
for v in varlist:
   if vdict[v].VariableType == 0:
     criterion = -5
   else:
     criterion = "'-5'"
   spss.Submit(cmdset % locals())
except:
pass

You will still get the error messages in the output, but they won't stop the program.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: "Walker J. Bosch" <[hidden email]>
To: Jon K Peck/Chicago/IBM@IBMUS,
Cc: "[hidden email]" <[hidden email]>
Date: 12/10/2012 02:12 PM
Subject: RE: [SPSSX-L] Python String Substitution Problem

I’ve been having more problems with this. It worked perfectly on the test data I was using, but I now have the real dataset and I am having trouble.   Just as a refresher, I am using Python to loop through several hundred variables and for each variable identify the cases with a value of -5. Values of -5 were used to identify places where the survey scanner could not establish the respondent’s intent so we will go back to the hard copy of the survey for these cases and look ourselves. Now on to the problem.

While looping through all of the variables in the range, when the routine encounters a variable that does not have any cases that meet the criteria (var = -5, or ‘-5’ for strings) it raises an SPSS Warning and a Python Error and stops looping through the remainder of the variables in the range. How can I make it continue to loop through the rest of the variables? Code used and errors raised shown below:

SPSS Warning: “No cases were input to this procedure…Execution of this command stops”.
Python Error:
  Traceback (most recent call last):
File "<string>", line 16, in <module>
File "C:\Program Files\Python2.7.1\lib\site-packages\spss210\spss\spss.py", line 1525, in Submit
    raise SpssError,error
spss.errMsg.SpssError: [errLevel 3] Serious error.

Code used:

begin program.
import spss, spssaux

vdict = spssaux.VariableDict()
varlist = vdict.range(start="LocationCode", end="Q130")
cmdset = r"""temporary.
select if %(v)s eq %(criterion)s.
summarize Seq# %(v)s
/format=validlist nocasenum nototal
/title="-5 Cases for %(v)s"
/cells=none."""

for v in varlist:
  if vdict[v].VariableType == 0:
    criterion = -5
  else:
    criterion = "'-5'"
  spss.Submit(cmdset % locals())
end program.

Walker

Phone: (651) 280-2679
Fax: (651) 280-3679

IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. If you have received this message in error, please notify the sender immediately and arrange for the return or destruction of these documents

From: Jon K Peck [mailto:peck@...]
Sent: Thursday, November 29, 2012 9:20 PM
To: Walker J. Bosch
Cc: [hidden email]
Subject: Re: [SPSSX-L] Python String Substitution Problem

Things are a bit mixed up here. The main problem is that a VariableDict object has already all the metadata properties, so you don't want to also use the dataset apis. Also, the error message is indicating a malformed SPSS command, so it's best to construct the command and then print it so you can see what Statistics is seeing. I would also recommend using the named substitution parameter method of constructing the command rather than the positional form you are using.

Here is an example of doing this task directly. I've left out some of the command modifiers for brevity. It assumes that the data file is already open. Feel free to ask more questions about this approach.

begin program.
import spss, spssaux

vdict = spssaux.VariableDict()
varlist = vdict.range(start="LocationCode", end="Q130")
cmdset = r"""temporary.
select if %(v)s eq %(criterion)s.
summarize id %(v)s."""

for v in varlist:
if vdict[v].VariableType == 0:
criterion = -5
else:
criterion = "'-5'"
spss.Submit(cmdset % locals())
end program.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@...
new phone: 720-342-5621

From: "Walker J. Bosch" <walker.bosch@...>
To: [hidden email],
Date: 11/29/2012 12:23 PM
Subject: [SPSSX-L] Python String Substitution Problem
Sent by: "SPSSX(r) Discussion" <[hidden email]>