Re: Reading HTML or extracting content from HTML

Posted by Jon Peck on
URL: http://spssx-discussion.165.s1.nabble.com/Reading-HTML-or-extracting-content-from-HTML-tp5740509p5740510.html

It depends a lot on the structure of the html file and exactly what you want to extract.  Can you post (or send to [hidden email]) the file in question.  There are lots of tools for working with html, but SPSS doesn't read html files as data or data dictionaries directly.

On Fri, May 7, 2021 at 8:47 AM Maguin, Eugene <[hidden email]> wrote:

Here’s my problem. I’m working with a BRFSS data file from CDC and they have standardized on SAS. They wrote a codebook to an HTML file. I need to extract the values+value descriptors/labels from the HTML file. I can copy and paste lines from each table but that is slow. I can simply copy a bunch of text but a lot of editing is required. I don’t think there is but are there any options to using spss to read the HTML file?

Or, can an HTML file be turned into a text file or a formatted file that is spss readable? I can copy/paste into word and convert the table to text but structure is lost and the contents are turned into a column format, which is no gain.

The problem is that SAS does not support value labels in the spss meaning of the command.

 

Thanks, Gene Maguin

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD