strip html tags from variable/value labels

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

strip html tags from variable/value labels

oggesjolin
Hi there! Anyone know of functions within SPSS or a script to remove any HTML tags (keeping only text) from all variable and value labels in a sav file? Like BeautifulSoup get_text or something?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: strip html tags from variable/value labels

David Marso
Administrator
Provide an example of what you are referring to.
This does not parse on a Sat AM.
-----
oggesjolin wrote
Hi there! Anyone know of functions within SPSS or a script to remove any HTML tags (keeping only text) from all variable and value labels in a sav file? Like BeautifulSoup get_text or something?

Thanks
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: strip html tags from variable/value labels

Jon Peck
This can easily be done with a short Python program.  I'll post one later this morning.

On Sat, Dec 17, 2016 at 7:05 AM, David Marso <[hidden email]> wrote:
Provide an example of what you are referring to.
This does not parse on a Sat AM.
-----

oggesjolin wrote
> Hi there! Anyone know of functions within SPSS or a script to remove any
> HTML tags (keeping only text) from all variable and value labels in a sav
> file? Like BeautifulSoup get_text or something?
>
> Thanks





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/strip-html-tags-from-variable-value-labels-tp5733626p5733627.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: strip html tags from variable/value labels

Jon Peck
​Run this from the syntax window (or an INSERT file).  In case the listserv mangles the indentation, the body of each "for" loop needs to be indented.  The code will remove each html directive - assumed to be any string enclosed in <> - from all the variable and value labels in the active dataset.

begin program.
import spss, spssaux, re

spss.StartDataStep()
ds = spss.Dataset()
for v in ds.varlist:
    v.label = re.sub(r"<.*?>", "", v.label)
    for k,  lbl in v.valueLabels.data.items():
        v.valueLabels[k] = re.sub(r"<.*?>", "", lbl)
spss.EndDataStep()
end program.

On Sat, Dec 17, 2016 at 7:52 AM, Jon Peck <[hidden email]> wrote:
This can easily be done with a short Python program.  I'll post one later this morning.

On Sat, Dec 17, 2016 at 7:05 AM, David Marso <[hidden email]> wrote:
Provide an example of what you are referring to.
This does not parse on a Sat AM.
-----

oggesjolin wrote
> Hi there! Anyone know of functions within SPSS or a script to remove any
> HTML tags (keeping only text) from all variable and value labels in a sav
> file? Like BeautifulSoup get_text or something?
>
> Thanks





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/strip-html-tags-from-variable-value-labels-tp5733626p5733627.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: strip html tags from variable/value labels

oggesjolin
Brilliant, many thanks Jon!
Reply | Threaded
Open this post in threaded view
|

Re: strip html tags from variable/value labels

oggesjolin
In reply to this post by Jon Peck
Hi again Jon!

When running the script I get thrown a:

Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\re.py", line 151,
in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

Sorry for bothering you, but I would greatly appreciate help on this..

Thanks
/o



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/strip-html-tags-from-variable-value-labels-tp5733626p5733631.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD