I am trying to avoid re-inventing the wheel. I could cobble together a one-off Macro in WordPerfect, but thought that there could already be a generalized approach these days since a generalized approach was built into FORTRAN before 1972.
"Tagged" may not be the correct term. If members know of alternative vocabulary for this that would help me do searches, please let me know. These are plain text files with repeated fields something like this. a field delimiter a text string tag content delimiter a text string content a field delimiter a text string tag content delimiter a text string content . . . A case delimiter. In the resulting set of variables those cases where a tagged field was not encountered the variable was blank. In 1972 in FORTRAN had a format like this mostly used for .INI filetypes. These were a way to do things like SPSS SET commands. For example, The US Library of Congress makes its card catalog available in "tagged" format. https://en.wikipedia.org/wiki/MARC_standards https://www.loc.gov/marc/bibliographic/nlr/ DIALOG and Orbit were database systems with "tagged" abstracts of journal articles, etc. the downloaded abstracts had 'tags' for journal name, volume, issue, pages, date, author, abstract,.
Art Kendall
Social Research Consultants |
How is this different from a normal csv-format file?
If you need to do something fancier, it seems like an INPUT PROGRAM would be able to handle this. I wrote zillions of lines of Fortran code mostly in the 60's and 70's, but I don't remember anything built into the language for this situation. |
CSV data as text looks like this. And this could be what a result looks like so it can be used in SPSS Excel etc.
Name Gender School Major "John","male","MIT","" "Mary Quite Contrary","female","U of MD","English" "Joyce","","","Math" The same data as "tagged" might look like this {Name:"John";Gender:"male";School:"MIT"} {Name:"Mary Quite Contrary";Gender:"female";School:"U of MD";"English"} {Name:"Joyce";Major:"Math"} In a worst case scenario John's data might look like this. {School:"MIT";Name:"John";Gender:"male";} or {Gender:"male";School:"MIT";Name:"John"} An example of the first kind of tagged data is downloaded Library of Congress catalog "cards". Cards (cases) have a fixed order to tags, but empty fields are not mentioned. If there are not editions of a book there is simply no tagged field for edition.
Art Kendall
Social Research Consultants |
There is a Python library named pymarc that reads MARC21 files and has methods for working with that.
It says, https://pypi.org/project/pymarc/ pymarc is a python library for working with bibliographic data encoded in MARC21. It provides an API for reading, writing and modifying MARC records. It was mostly designed to be an emergency eject seat, for getting your data assets out of MARC and into some kind of saner representation. You can install it in Statistics like this STATS PACKAGE INSTALL PYTHON=pymarc. A simple usage example: from pymarc import MARCReader with open('test/marc.dat', 'rb') as fh: reader = MARCReader(fh) for record in reader: print(record.title()) Specifics would depend on the tag set. If you have a sample MARC file and can provide some information on what you want from it, I can write a little code that would turn it into a simple csv file that could then be read into Statistics (or used elsewhere). |
Thanks, I'll try to find out if the data is actually MARC or just something similar.
I still cannot recall what that kind of data layout was called. When used for .INI files there was only 1 record not a set of of them. A modern version of an INI file is: An INI file is a configuration file used by Windows programs to initialize program settings. It contains sections for settings and preferences (delimited by a string in square brackets) with each section containing one or more name and value parameters. . So the core parts are [name-of-variables]value
Art Kendall
Social Research Consultants |
MARC sound like a special instance of some XML coding.
The Wiki article might offer some leads but I did not see a simple solution there. For data in XML, I thought that there might be a simple, free program somewhere that translates from XML to spreadsheet, using var names. Okay, searching for XML language translators shows me one commercial package that allows a free test, and several links that might be free programs. -- Rich Ulrich |
The FORTRAN and MARC formats are older than XML.
libraries have ways to go back and forth between MARC and XML. https://guides.library.illinois.edu/c.php?g=463460&p=3168159 https://www.loc.gov/standards/marcxml/
Art Kendall
Social Research Consultants |
Art,
Okay. MARC was a predecessor, similar in using tags. Those references you cited seem to be more concerned with international character sets than differences in structure. It occurs to me that it should be SO easy to go from XML to a so-called 'normal form' for a database, that any old, pure database package might be able to read XML files. It may be 20 years since I touched one of those. But that could be another place to look for conversion for tagged data. |
As I sasid above, there is a Python module that reads MARC files, so it is easy to write those out as CSV format and then read them anywhere.
|
In reply to this post by Rich Ulrich
Thanks for bringing up XML.
I have not tried this, but it appears that Excel can read XML! https://trumpexcel.com/convert-xml-to-excel/ This list is a great resource, this lead on XML and Jon's idea about reading MARC records are great leads.
Art Kendall
Social Research Consultants |
Administrator
|
Art, that website doesn't have anything to do with DJT45, does it? :-O
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Art Kendall
Excel - and SPSS with the DAP - can read XML, but there is still the issue of how it gets mapped into a useful form. Just because a program can parse XML text doesn't mean that it will necessarily map it the way you want it.
XML is a bullet, but it isn't always a magic bullet. |
Free forum by Nabble | Edit this page |