(no subject)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

(no subject)

Smith, Benton
Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program.  This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels.  The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, </b>).

 

Example Variable Name:            q7xg_qs1_9_q7x

Example Variable Label:             <b>Product One</b>(remote access) :

Example Value Label:                 1 '<b>Not aware of it</b>'

 

I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels.  Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process.  Any suggestions would be greatly appreciated.

 

 

Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>  

 

 
Reply | Threaded
Open this post in threaded view
|

Editing Existing Value Labels Across Large Data Sets

Smith, Benton
I'm sorry; I left off the Subject line on my early request for help.

 

Benton

 

________________________________

 

Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program.  This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels.  The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, </b>).

 

Example Variable Name:            q7xg_qs1_9_q7x

Example Variable Label:             <b>Product One</b>(remote access) :

Example Value Label:                 1 '<b>Not aware of it</b>'

 

I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels.  Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process.  Any suggestions would be greatly appreciated.

 

 

Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>  

 

 
Reply | Threaded
Open this post in threaded view
|

Re: Editing Existing Value Labels Across Large Data Sets

Peck, Jon
This would be easy to automate if you can use the Python programmability.  Here is some untested code.

begin program.
import spss, spssaux, re

vardict = spssaux.VariableDict()
for v in vardict:
  vlset = vardict[v].ValueLabels
  for val in vlset:
    vlset[val] = re.sub("<.*?>", "", vlset[val])
  vardict[v].ValueLabels = vlset
end program.

It
- creates a Python variable dictionary
- loops over all the variables
- gets the value labels for each variable as a Python dictionary
- substitutes out html tags
- assigns the modified labels back to the variable

The regular expression that gets rid of the tags, "<.*?>", has one subtlety.
The usual behavior is to do "greedy" matching, so with <b>abc</b>, you would match the entire string, not the tag.  By using the form .*?, Python uses the shortest matching string.

You could also do the variable labels in this code.

HTH,
Jon Peck


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Smith, Benton
Sent: Thursday, May 10, 2007 8:48 AM
To: [hidden email]
Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets

I'm sorry; I left off the Subject line on my early request for help.



Benton



________________________________



Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program.  This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels.  The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, </b>).



Example Variable Name:            q7xg_qs1_9_q7x

Example Variable Label:             <b>Product One</b>(remote access) :

Example Value Label:                 1 '<b>Not aware of it</b>'



I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels.  Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process.  Any suggestions would be greatly appreciated.





Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Editing Existing Value Labels Across Large Data Sets

Smith, Benton
Thanks Jon,
I just downloaded Python today... so it may be a while before I get up
the learning curve enough to execute your suggestion.  I'm still just
trying to get SPSS and Python to synch with one-another. I will let you
know if I'm able to run the code you created.

I have also received a suggestion to use an export of the data
dictionary in the same way that I clean my Variable Labels.

I appreciate your time and recommendations,
Benton

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Thursday, May 10, 2007 11:43 AM
To: Smith, Benton; [hidden email]
Subject: RE: [SPSSX-L] Editing Existing Value Labels Across Large Data
Sets

This would be easy to automate if you can use the Python
programmability.  Here is some untested code.

begin program.
import spss, spssaux, re

vardict = spssaux.VariableDict()
for v in vardict:
  vlset = vardict[v].ValueLabels
  for val in vlset:
    vlset[val] = re.sub("<.*?>", "", vlset[val])
  vardict[v].ValueLabels = vlset
end program.

It
- creates a Python variable dictionary
- loops over all the variables
- gets the value labels for each variable as a Python dictionary
- substitutes out html tags
- assigns the modified labels back to the variable

The regular expression that gets rid of the tags, "<.*?>", has one
subtlety.
The usual behavior is to do "greedy" matching, so with <b>abc</b>, you
would match the entire string, not the tag.  By using the form .*?,
Python uses the shortest matching string.

You could also do the variable labels in this code.

HTH,
Jon Peck


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Smith, Benton
Sent: Thursday, May 10, 2007 8:48 AM
To: [hidden email]
Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets

I'm sorry; I left off the Subject line on my early request for help.



Benton



________________________________



Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program.  This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels.  The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, </b>).



Example Variable Name:            q7xg_qs1_9_q7x

Example Variable Label:             <b>Product One</b>(remote access) :

Example Value Label:                 1 '<b>Not aware of it</b>'



I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels.  Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process.  Any suggestions would be greatly appreciated.





Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Editing Existing Value Labels Across Large Data Sets

vlad simion
In reply to this post by Peck, Jon
Hi Jon,

I tried to run the code you provide, but it gives me the following error:

Traceback (most recent call last):
  File "<string>", line 8, in ?
  File "C:\Python24\lib\site-packages\spssaux.py", line 654, in _ValLabSet
    spss.Submit("VALUE LABELS " + spss.GetVariableName(self.index) + " " +
vllist)
  File "C:\Python24\lib\site-packages\spss\spss150\spss.py", line 772, in
GetVariableName
    raise SpssError,error
spss.spss150.errMsg.SpssError: [errLevel 1000] Expects an integer argument.

Can you please take a look and see what's wrong.

Many thanks,

Vlad.

On 5/10/07, Peck, Jon <[hidden email]> wrote:

>
> This would be easy to automate if you can use the Python
> programmability.  Here is some untested code.
>
> begin program.
> import spss, spssaux, re
>
> vardict = spssaux.VariableDict()
> for v in vardict:
>   vlset = vardict[v].ValueLabels
>   for val in vlset:
>     vlset[val] = re.sub("<.*?>", "", vlset[val])
>   vardict[v].ValueLabels = vlset
> end program.
>
> It
> - creates a Python variable dictionary
> - loops over all the variables
> - gets the value labels for each variable as a Python dictionary
> - substitutes out html tags
> - assigns the modified labels back to the variable
>
> The regular expression that gets rid of the tags, "<.*?>", has one
> subtlety.
> The usual behavior is to do "greedy" matching, so with <b>abc</b>, you
> would match the entire string, not the tag.  By using the form .*?, Python
> uses the shortest matching string.
>
> You could also do the variable labels in this code.
>
> HTH,
> Jon Peck
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Smith, Benton
> Sent: Thursday, May 10, 2007 8:48 AM
> To: [hidden email]
> Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets
>
> I'm sorry; I left off the Subject line on my early request for help.
>
>
>
> Benton
>
>
>
> ________________________________
>
>
>
> Hello All,
>
> I often deal with SPSS data sets that come from an HTML-based Research
> Survey creation program.  This program exports well to SPSS 14.0 and
> automatically provides Variable Names, Variable Labels, and Value
> Labels.  The problem is that this Survey Program reads in the HTML text
> as the Variable Labels and Value Labels, with the special HTML
> characters included (such as, </b>).
>
>
>
> Example Variable Name:            q7xg_qs1_9_q7x
>
> Example Variable Label:             <b>Product One</b>(remote access) :
>
> Example Value Label:                 1 '<b>Not aware of it</b>'
>
>
>
> I have been cleaning the HTML characters out of the Variable Labels by
> copying them to another software and using the wildcard "<*>" to
> Find/Replace the HTML characters with a space, but I do not know of a
> simple way to clean up the Value Labels.  Some of the files are fairly
> large (17,000 or more variables), and this can be a very time-consuming
> clean-up process.  Any suggestions would be greatly appreciated.
>
>
>
>
>
> Thanks,
>
> Benton Smith
>
> [hidden email] <mailto:[hidden email]>
>



--
Vlad Simion
Data Analyst
Tel:      +40 720130611
Reply | Threaded
Open this post in threaded view
|

Re: Editing Existing Value Labels Across Large Data Sets

Peck, Jon
Try this slightly modified version.

 

begin program.

import spss, spssaux, re

 

vardict = spssaux.VariableDict()

for v in vardict:

  vlset = vardict[v].ValueLabels

  for val in vlset:

    vlset[val] = re.sub("<.*?>", "", vlset[val])

  vardict[v.VariableName].ValueLabels = vlset

end program.

 

 

 

________________________________

From: vlad simion [mailto:[hidden email]]
Sent: Friday, May 11, 2007 3:29 AM
To: Peck, Jon
Cc: [hidden email]
Subject: Re: Editing Existing Value Labels Across Large Data Sets

 

Hi Jon,

I tried to run the code you provide, but it gives me the following error:

Traceback (most recent call last):
  File "<string>", line 8, in ?
  File "C:\Python24\lib\site-packages\spssaux.py", line 654, in _ValLabSet
    spss.Submit("VALUE LABELS " + spss.GetVariableName(self.index) + " " + vllist)
  File "C:\Python24\lib\site-packages\spss\spss150\spss.py", line 772, in GetVariableName
    raise SpssError,error
spss.spss150.errMsg.SpssError: [errLevel 1000] Expects an integer argument.

Can you please take a look and see what's wrong.

Many thanks,

Vlad.

On 5/10/07, Peck, Jon <[hidden email]> wrote:

This would be easy to automate if you can use the Python programmability.  Here is some untested code.

begin program.
import spss, spssaux, re

vardict = spssaux.VariableDict()
for v in vardict:
  vlset = vardict[v].ValueLabels
  for val in vlset:
    vlset[val] = re.sub("<.*?>", "", vlset[val])
  vardict[v].ValueLabels = vlset
end program.

It
- creates a Python variable dictionary
- loops over all the variables
- gets the value labels for each variable as a Python dictionary
- substitutes out html tags
- assigns the modified labels back to the variable

The regular expression that gets rid of the tags, "<.*?>", has one subtlety.
The usual behavior is to do "greedy" matching, so with <b>abc</b>, you would match the entire string, not the tag.  By using the form .*?, Python uses the shortest matching string.

You could also do the variable labels in this code.

HTH,
Jon Peck


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Smith, Benton
Sent: Thursday, May 10, 2007 8:48 AM
To: [hidden email]
Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets

I'm sorry; I left off the Subject line on my early request for help.



Benton



________________________________



Hello All,

I often deal with SPSS data sets that come from an HTML-based Research
Survey creation program.  This program exports well to SPSS 14.0 and
automatically provides Variable Names, Variable Labels, and Value
Labels.  The problem is that this Survey Program reads in the HTML text
as the Variable Labels and Value Labels, with the special HTML
characters included (such as, </b>).



Example Variable Name:            q7xg_qs1_9_q7x

Example Variable Label:             <b>Product One</b>(remote access) :

Example Value Label:                 1 '<b>Not aware of it</b>'



I have been cleaning the HTML characters out of the Variable Labels by
copying them to another software and using the wildcard "<*>" to
Find/Replace the HTML characters with a space, but I do not know of a
simple way to clean up the Value Labels.  Some of the files are fairly
large (17,000 or more variables), and this can be a very time-consuming
clean-up process.  Any suggestions would be greatly appreciated.





Thanks,

Benton Smith

[hidden email] <mailto:[hidden email]>




--
Vlad Simion
Data Analyst
Tel:      +40 720130611

Reply | Threaded
Open this post in threaded view
|

Re: Editing Existing Value Labels Across Large Data Sets

vlad simion
Thank you very much Jon, it's working :)

All the best,

Vlad.

On 5/11/07, Peck, Jon <[hidden email]> wrote:

>
>  Try this slightly modified version.
>
>
>
> begin program.
>
> import spss, spssaux, re
>
>
>
> vardict = spssaux.VariableDict()
>
> for v in vardict:
>
>   vlset = vardict[v].ValueLabels
>
>   for val in vlset:
>
>     vlset[val] = re.sub("<.*?>", "", vlset[val])
>
>   vardict[v.VariableName].ValueLabels = vlset
>
> end program.
>
>
>
>
>
>
>  ------------------------------
>
> *From:* vlad simion [mailto:[hidden email]]
> *Sent:* Friday, May 11, 2007 3:29 AM
> *To:* Peck, Jon
> *Cc:* [hidden email]
> *Subject:* Re: Editing Existing Value Labels Across Large Data Sets
>
>
>
> Hi Jon,
>
> I tried to run the code you provide, but it gives me the following error:
>
> Traceback (most recent call last):
>   File "<string>", line 8, in ?
>   File "C:\Python24\lib\site-packages\spssaux.py", line 654, in _ValLabSet
>
>     spss.Submit("VALUE LABELS " + spss.GetVariableName(self.index) + " " +
> vllist)
>   File "C:\Python24\lib\site-packages\spss\spss150\spss.py", line 772, in
> GetVariableName
>     raise SpssError,error
> spss.spss150.errMsg.SpssError: [errLevel 1000] Expects an integer
> argument.
>
> Can you please take a look and see what's wrong.
>
> Many thanks,
>
> Vlad.
>
> On 5/10/07, *Peck, Jon* <[hidden email]> wrote:
>
> This would be easy to automate if you can use the Python
> programmability.  Here is some untested code.
>
> begin program.
> import spss, spssaux, re
>
> vardict = spssaux.VariableDict()
> for v in vardict:
>   vlset = vardict[v].ValueLabels
>   for val in vlset:
>     vlset[val] = re.sub("<.*?>", "", vlset[val])
>   vardict[v].ValueLabels = vlset
> end program.
>
> It
> - creates a Python variable dictionary
> - loops over all the variables
> - gets the value labels for each variable as a Python dictionary
> - substitutes out html tags
> - assigns the modified labels back to the variable
>
> The regular expression that gets rid of the tags, "<.*?>", has one
> subtlety.
> The usual behavior is to do "greedy" matching, so with <b>abc</b>, you
> would match the entire string, not the tag.  By using the form .*?, Python
> uses the shortest matching string.
>
> You could also do the variable labels in this code.
>
> HTH,
> Jon Peck
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Smith, Benton
> Sent: Thursday, May 10, 2007 8:48 AM
> To: [hidden email]
> Subject: [SPSSX-L] Editing Existing Value Labels Across Large Data Sets
>
> I'm sorry; I left off the Subject line on my early request for help.
>
>
>
> Benton
>
>
>
> ________________________________
>
>
>
> Hello All,
>
> I often deal with SPSS data sets that come from an HTML-based Research
> Survey creation program.  This program exports well to SPSS 14.0 and
> automatically provides Variable Names, Variable Labels, and Value
> Labels.  The problem is that this Survey Program reads in the HTML text
> as the Variable Labels and Value Labels, with the special HTML
> characters included (such as, </b>).
>
>
>
> Example Variable Name:            q7xg_qs1_9_q7x
>
> Example Variable Label:             <b>Product One</b>(remote access) :
>
> Example Value Label:                 1 '<b>Not aware of it</b>'
>
>
>
> I have been cleaning the HTML characters out of the Variable Labels by
> copying them to another software and using the wildcard "<*>" to
> Find/Replace the HTML characters with a space, but I do not know of a
> simple way to clean up the Value Labels.  Some of the files are fairly
> large (17,000 or more variables), and this can be a very time-consuming
> clean-up process.  Any suggestions would be greatly appreciated.
>
>
>
>
>
> Thanks,
>
> Benton Smith
>
> [hidden email] <mailto:[hidden email]>
>
>
>
>
> --
> Vlad Simion
> Data Analyst
> Tel:      +40 720130611
>



--
Vlad Simion
Data Analyst
Tel:      +40 720130611