Reading HTML or extracting content from HTML

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading HTML or extracting content from HTML

Maguin, Eugene

Here’s my problem. I’m working with a BRFSS data file from CDC and they have standardized on SAS. They wrote a codebook to an HTML file. I need to extract the values+value descriptors/labels from the HTML file. I can copy and paste lines from each table but that is slow. I can simply copy a bunch of text but a lot of editing is required. I don’t think there is but are there any options to using spss to read the HTML file?

Or, can an HTML file be turned into a text file or a formatted file that is spss readable? I can copy/paste into word and convert the table to text but structure is lost and the contents are turned into a column format, which is no gain.

The problem is that SAS does not support value labels in the spss meaning of the command.

 

Thanks, Gene Maguin

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading HTML or extracting content from HTML

Jon Peck
It depends a lot on the structure of the html file and exactly what you want to extract.  Can you post (or send to [hidden email]) the file in question.  There are lots of tools for working with html, but SPSS doesn't read html files as data or data dictionaries directly.

On Fri, May 7, 2021 at 8:47 AM Maguin, Eugene <[hidden email]> wrote:

Here’s my problem. I’m working with a BRFSS data file from CDC and they have standardized on SAS. They wrote a codebook to an HTML file. I need to extract the values+value descriptors/labels from the HTML file. I can copy and paste lines from each table but that is slow. I can simply copy a bunch of text but a lot of editing is required. I don’t think there is but are there any options to using spss to read the HTML file?

Or, can an HTML file be turned into a text file or a formatted file that is spss readable? I can copy/paste into word and convert the table to text but structure is lost and the contents are turned into a column format, which is no gain.

The problem is that SAS does not support value labels in the spss meaning of the command.

 

Thanks, Gene Maguin

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading HTML or extracting content from HTML

Art Kendall
In reply to this post by Maguin, Eugene
It is some years since I used  SAS datasets, IIIRC, much of the data
definition is embedded and you can read SAS files into SPSS.

WordPerfect can read many HTML files.  It can write Word or plain text
files.  If you send me the HTML file (or a link to it) I'll see if I can
covert it.





-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Reading HTML or extracting content from HTML

David Marso-2
In reply to this post by Maguin, Eugene
Please post a snippet of the file on the list. In all likelihood  you could  create an INPUT PROGRAM to cull the metadata then WRITE a SPSS syntax file for INSERTion. Just a first stab at spitballing.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD