Hi there! Anyone know of functions within SPSS or a script to remove any HTML tags (keeping only text) from all variable and value labels in a sav file? Like BeautifulSoup get_text or something?
Thanks |
Administrator
|
Provide an example of what you are referring to.
This does not parse on a Sat AM. -----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
This can easily be done with a short Python program. I'll post one later this morning. On Sat, Dec 17, 2016 at 7:05 AM, David Marso <[hidden email]> wrote: Provide an example of what you are referring to. |
Run this from the syntax window (or an INSERT file). In case the listserv mangles the indentation, the body of each "for" loop needs to be indented. The code will remove each html directive - assumed to be any string enclosed in <> - from all the variable and value labels in the active dataset. begin program. import spss, spssaux, re spss.StartDataStep() ds = spss.Dataset() for v in ds.varlist: v.label = re.sub(r"<.*?>", "", v.label) for k, lbl in v.valueLabels.data.items(): v.valueLabels[k] = re.sub(r"<.*?>", "", lbl) spss.EndDataStep() end program. On Sat, Dec 17, 2016 at 7:52 AM, Jon Peck <[hidden email]> wrote:
|
Brilliant, many thanks Jon!
|
In reply to this post by Jon Peck
Hi again Jon!
When running the script I get thrown a: Traceback (most recent call last): File "<string>", line 7, in <module> File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer Sorry for bothering you, but I would greatly appreciate help on this.. Thanks /o -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/strip-html-tags-from-variable-value-labels-tp5733626p5733631.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |