SPSSX Discussion

(no subject)

Classic

List

Threaded

7 messages Options

Smith, Benton

(no subject)

Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program. This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels. The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, ).

Example Variable Name: q7xg_qs1_9_q7x

Example Variable Label: Product One(remote access) :

Example Value Label: 1 'Not aware of it'

I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels. Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process. Any suggestions would be greatly appreciated.

Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>

Smith, Benton

Editing Existing Value Labels Across Large Data Sets

I'm sorry; I left off the Subject line on my early request for help.

Benton

________________________________

Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program. This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels. The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, ).

Example Variable Name: q7xg_qs1_9_q7x

Example Variable Label: Product One(remote access) :

Example Value Label: 1 'Not aware of it'

I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels. Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process. Any suggestions would be greatly appreciated.

Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>

Peck, Jon

Re: Editing Existing Value Labels Across Large Data Sets

This would be easy to automate if you can use the Python programmability. Here is some untested code.

begin program.
import spss, spssaux, re

vardict = spssaux.VariableDict()
for v in vardict:
vlset = vardict[v].ValueLabels
for val in vlset:
vlset[val] = re.sub("<.*?>", "", vlset[val])
vardict[v].ValueLabels = vlset
end program.

It
- creates a Python variable dictionary
- loops over all the variables
- gets the value labels for each variable as a Python dictionary
- substitutes out html tags
- assigns the modified labels back to the variable

The regular expression that gets rid of the tags, "<.*?>", has one subtlety.
The usual behavior is to do "greedy" matching, so with abc, you would match the entire string, not the tag. By using the form .*?, Python uses the shortest matching string.

You could also do the variable labels in this code.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Smith, Benton
Sent: Thursday, May 10, 2007 8:48 AM
To: [hidden email]
Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets

I'm sorry; I left off the Subject line on my early request for help.

Benton

________________________________

Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program. This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels. The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, ).

Example Variable Name: q7xg_qs1_9_q7x

Example Variable Label: Product One(remote access) :

Example Value Label: 1 'Not aware of it'

I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels. Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process. Any suggestions would be greatly appreciated.

Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>

Smith, Benton

Re: Editing Existing Value Labels Across Large Data Sets

Thanks Jon,
I just downloaded Python today... so it may be a while before I get up
the learning curve enough to execute your suggestion. I'm still just
trying to get SPSS and Python to synch with one-another. I will let you
know if I'm able to run the code you created.

I have also received a suggestion to use an export of the data
dictionary in the same way that I clean my Variable Labels.

I appreciate your time and recommendations,
Benton

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Thursday, May 10, 2007 11:43 AM
To: Smith, Benton; [hidden email]
Subject: RE: [SPSSX-L] Editing Existing Value Labels Across Large Data
Sets

This would be easy to automate if you can use the Python
programmability. Here is some untested code.

begin program.
import spss, spssaux, re

vardict = spssaux.VariableDict()
for v in vardict:
vlset = vardict[v].ValueLabels
for val in vlset:
vlset[val] = re.sub("<.*?>", "", vlset[val])
vardict[v].ValueLabels = vlset
end program.

It
- creates a Python variable dictionary
- loops over all the variables
- gets the value labels for each variable as a Python dictionary
- substitutes out html tags
- assigns the modified labels back to the variable

The regular expression that gets rid of the tags, "<.*?>", has one
subtlety.
The usual behavior is to do "greedy" matching, so with abc, you
would match the entire string, not the tag. By using the form .*?,
Python uses the shortest matching string.

You could also do the variable labels in this code.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Smith, Benton
Sent: Thursday, May 10, 2007 8:48 AM
To: [hidden email]
Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets

I'm sorry; I left off the Subject line on my early request for help.

Benton

________________________________

Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program. This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels. The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, ).

Example Variable Name: q7xg_qs1_9_q7x

Example Variable Label: Product One(remote access) :

Example Value Label: 1 'Not aware of it'

I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels. Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process. Any suggestions would be greatly appreciated.

Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>

vlad simion

Re: Editing Existing Value Labels Across Large Data Sets

In reply to this post by Peck, Jon

Hi Jon,

I tried to run the code you provide, but it gives me the following error:

Traceback (most recent call last):
File "<string>", line 8, in ?
File "C:\Python24\lib\site-packages\spssaux.py", line 654, in _ValLabSet
spss.Submit("VALUE LABELS " + spss.GetVariableName(self.index) + " " +
vllist)
File "C:\Python24\lib\site-packages\spss\spss150\spss.py", line 772, in
GetVariableName
raise SpssError,error
spss.spss150.errMsg.SpssError: [errLevel 1000] Expects an integer argument.

Can you please take a look and see what's wrong.

Many thanks,

Vlad.

On 5/10/07, Peck, Jon <[hidden email]> wrote:

>
> This would be easy to automate if you can use the Python
> programmability. Here is some untested code.
>
> begin program.
> import spss, spssaux, re
>
> vardict = spssaux.VariableDict()
> for v in vardict:
> vlset = vardict[v].ValueLabels
> for val in vlset:
> vlset[val] = re.sub("<.*?>", "", vlset[val])
> vardict[v].ValueLabels = vlset
> end program.
>
> It
> - creates a Python variable dictionary
> - loops over all the variables
> - gets the value labels for each variable as a Python dictionary
> - substitutes out html tags
> - assigns the modified labels back to the variable
>
> The regular expression that gets rid of the tags, "<.*?>", has one
> subtlety.
> The usual behavior is to do "greedy" matching, so with abc, you
> would match the entire string, not the tag. By using the form .*?, Python
> uses the shortest matching string.
>
> You could also do the variable labels in this code.
>
> HTH,
> Jon Peck
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Smith, Benton
> Sent: Thursday, May 10, 2007 8:48 AM
> To: [hidden email]
> Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets
>
> I'm sorry; I left off the Subject line on my early request for help.
>
>
>
> Benton
>
>
>
> ________________________________
>
>
>
> Hello All,
>
> I often deal with SPSS data sets that come from an HTML-based Research
> Survey creation program. This program exports well to SPSS 14.0 and
> automatically provides Variable Names, Variable Labels, and Value
> Labels. The problem is that this Survey Program reads in the HTML text
> as the Variable Labels and Value Labels, with the special HTML
> characters included (such as, ).
>
>
>
> Example Variable Name: q7xg_qs1_9_q7x
>
> Example Variable Label: Product One(remote access) :
>
> Example Value Label: 1 'Not aware of it'
>
>
>
> I have been cleaning the HTML characters out of the Variable Labels by
> copying them to another software and using the wildcard "<*>" to
> Find/Replace the HTML characters with a space, but I do not know of a
> simple way to clean up the Value Labels. Some of the files are fairly
> large (17,000 or more variables), and this can be a very time-consuming
> clean-up process. Any suggestions would be greatly appreciated.
>
>
>
>
>
> Thanks,
>
> Benton Smith
>
> [hidden email] <mailto:[hidden email]>
>

--
Vlad Simion
Data Analyst
Tel: +40 720130611

Peck, Jon

Re: Editing Existing Value Labels Across Large Data Sets

Try this slightly modified version.

begin program.

import spss, spssaux, re

vardict = spssaux.VariableDict()

for v in vardict:

vlset = vardict[v].ValueLabels

for val in vlset:

vlset[val] = re.sub("<.*?>", "", vlset[val])

vardict[v.VariableName].ValueLabels = vlset

end program.

________________________________

From: vlad simion [mailto:[hidden email]]
Sent: Friday, May 11, 2007 3:29 AM
To: Peck, Jon
Cc: [hidden email]
Subject: Re: Editing Existing Value Labels Across Large Data Sets

Hi Jon,

I tried to run the code you provide, but it gives me the following error:

Traceback (most recent call last):
File "<string>", line 8, in ?
File "C:\Python24\lib\site-packages\spssaux.py", line 654, in _ValLabSet
spss.Submit("VALUE LABELS " + spss.GetVariableName(self.index) + " " + vllist)
File "C:\Python24\lib\site-packages\spss\spss150\spss.py", line 772, in GetVariableName
raise SpssError,error
spss.spss150.errMsg.SpssError: [errLevel 1000] Expects an integer argument.

Can you please take a look and see what's wrong.

Many thanks,

Vlad.

On 5/10/07, Peck, Jon <[hidden email]> wrote:

This would be easy to automate if you can use the Python programmability. Here is some untested code.

begin program.
import spss, spssaux, re

vardict = spssaux.VariableDict()
for v in vardict:
vlset = vardict[v].ValueLabels
for val in vlset:
vlset[val] = re.sub("<.*?>", "", vlset[val])
vardict[v].ValueLabels = vlset
end program.

It
- creates a Python variable dictionary
- loops over all the variables
- gets the value labels for each variable as a Python dictionary
- substitutes out html tags
- assigns the modified labels back to the variable

The regular expression that gets rid of the tags, "<.*?>", has one subtlety.
The usual behavior is to do "greedy" matching, so with abc, you would match the entire string, not the tag. By using the form .*?, Python uses the shortest matching string.

You could also do the variable labels in this code.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Smith, Benton
Sent: Thursday, May 10, 2007 8:48 AM
To: [hidden email]
Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets

I'm sorry; I left off the Subject line on my early request for help.

Benton

________________________________

Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program. This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels. The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, ).

Example Variable Name: q7xg_qs1_9_q7x

Example Variable Label: Product One(remote access) :

Example Value Label: 1 'Not aware of it'

I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels. Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process. Any suggestions would be greatly appreciated.

Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>

--
Vlad Simion
Data Analyst
Tel: +40 720130611

vlad simion

Re: Editing Existing Value Labels Across Large Data Sets

Thank you very much Jon, it's working :)

All the best,

Vlad.

On 5/11/07, Peck, Jon <[hidden email]> wrote:

>
> Try this slightly modified version.
>
>
>
> begin program.
>
> import spss, spssaux, re
>
>
>
> vardict = spssaux.VariableDict()
>
> for v in vardict:
>
> vlset = vardict[v].ValueLabels
>
> for val in vlset:
>
> vlset[val] = re.sub("<.*?>", "", vlset[val])
>
> vardict[v.VariableName].ValueLabels = vlset
>
> end program.
>
>
>
>
>
>
> ------------------------------
>
> *From:* vlad simion [mailto:[hidden email]]
> *Sent:* Friday, May 11, 2007 3:29 AM
> *To:* Peck, Jon
> *Cc:* [hidden email]
> *Subject:* Re: Editing Existing Value Labels Across Large Data Sets
>
>
>
> Hi Jon,
>
> I tried to run the code you provide, but it gives me the following error:
>
> Traceback (most recent call last):
> File "<string>", line 8, in ?
> File "C:\Python24\lib\site-packages\spssaux.py", line 654, in _ValLabSet
>
> spss.Submit("VALUE LABELS " + spss.GetVariableName(self.index) + " " +
> vllist)
> File "C:\Python24\lib\site-packages\spss\spss150\spss.py", line 772, in
> GetVariableName
> raise SpssError,error
> spss.spss150.errMsg.SpssError: [errLevel 1000] Expects an integer
> argument.
>
> Can you please take a look and see what's wrong.
>
> Many thanks,
>
> Vlad.
>
> On 5/10/07, *Peck, Jon* <[hidden email]> wrote:
>
> This would be easy to automate if you can use the Python
> programmability. Here is some untested code.
>
> begin program.
> import spss, spssaux, re
>
> vardict = spssaux.VariableDict()
> for v in vardict:
> vlset = vardict[v].ValueLabels
> for val in vlset:
> vlset[val] = re.sub("<.*?>", "", vlset[val])
> vardict[v].ValueLabels = vlset
> end program.
>
> It
> - creates a Python variable dictionary
> - loops over all the variables
> - gets the value labels for each variable as a Python dictionary
> - substitutes out html tags
> - assigns the modified labels back to the variable
>
> The regular expression that gets rid of the tags, "<.*?>", has one
> subtlety.
> The usual behavior is to do "greedy" matching, so with abc, you
> would match the entire string, not the tag. By using the form .*?, Python
> uses the shortest matching string.
>
> You could also do the variable labels in this code.
>
> HTH,
> Jon Peck
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Smith, Benton
> Sent: Thursday, May 10, 2007 8:48 AM
> To: [hidden email]
> Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets
>
> I'm sorry; I left off the Subject line on my early request for help.
>
>
>
> Benton
>
>
>
> ________________________________
>
>
>
> Hello All,
>
> I often deal with SPSS data sets that come from an HTML-based Research
> Survey creation program. This program exports well to SPSS 14.0 and
> automatically provides Variable Names, Variable Labels, and Value
> Labels. The problem is that this Survey Program reads in the HTML text
> as the Variable Labels and Value Labels, with the special HTML
> characters included (such as, ).
>
>
>
> Example Variable Name: q7xg_qs1_9_q7x
>
> Example Variable Label: Product One(remote access) :
>
> Example Value Label: 1 'Not aware of it'
>
>
>
> I have been cleaning the HTML characters out of the Variable Labels by
> copying them to another software and using the wildcard "<*>" to
> Find/Replace the HTML characters with a space, but I do not know of a
> simple way to clean up the Value Labels. Some of the files are fairly
> large (17,000 or more variables), and this can be a very time-consuming
> clean-up process. Any suggestions would be greatly appreciated.
>
>
>
>
>
> Thanks,
>
> Benton Smith
>
> [hidden email] <mailto:[hidden email]>
>
>
>
>
> --
> Vlad Simion
> Data Analyst
> Tel: +40 720130611
>

--
Vlad Simion
Data Analyst
Tel: +40 720130611