Reading HTML or extracting content from HTML

Posted by Maguin, Eugene on
URL: http://spssx-discussion.165.s1.nabble.com/Reading-HTML-or-extracting-content-from-HTML-tp5740509.html

Here’s my problem. I’m working with a BRFSS data file from CDC and they have standardized on SAS. They wrote a codebook to an HTML file. I need to extract the values+value descriptors/labels from the HTML file. I can copy and paste lines from each table but that is slow. I can simply copy a bunch of text but a lot of editing is required. I don’t think there is but are there any options to using spss to read the HTML file?

Or, can an HTML file be turned into a text file or a formatted file that is spss readable? I can copy/paste into word and convert the table to text but structure is lost and the contents are turned into a column format, which is no gain.

The problem is that SAS does not support value labels in the spss meaning of the command.

 

Thanks, Gene Maguin

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD