SPSSX Discussion

Has anybody already created a way to read "tagged" data into SPSS?

Classic

List

Threaded

12 messages Options

Art Kendall

Has anybody already created a way to read "tagged" data into SPSS?

I am trying to avoid re-inventing the wheel. I could cobble together a one-off Macro in WordPerfect, but thought that there could already be a generalized approach these days since a generalized approach was built into FORTRAN before 1972.

"Tagged" may not be the correct term. If members know of alternative vocabulary for this that would help me do searches, please let me know.

These are plain text files with repeated fields something like this.
a field delimiter
a text string tag
content delimiter
a text string content

a field delimiter
a text string tag
content delimiter
a text string content
. . .
A case delimiter.

In the resulting set of variables those cases where a tagged field was not encountered the variable was blank.

In 1972 in FORTRAN had a format like this mostly used for .INI filetypes. These were a way to do things like SPSS SET commands.

For example,
The US Library of Congress makes its card catalog available in "tagged" format.

https://en.wikipedia.org/wiki/MARC_standards
https://www.loc.gov/marc/bibliographic/nlr/

DIALOG and Orbit were database systems with "tagged" abstracts of journal articles, etc. the downloaded abstracts had 'tags' for journal name, volume, issue, pages, date, author, abstract,.

Art Kendall
Social Research Consultants

jkpeck

Re: Has anybody already created a way to read "tagged" data into SPSS?

How is this different from a normal csv-format file?

If you need to do something fancier, it seems like an INPUT PROGRAM would be able to handle this.

I wrote zillions of lines of Fortran code mostly in the 60's and 70's, but I don't remember anything built into the language for this situation.

Art Kendall

Re: Has anybody already created a way to read "tagged" data into SPSS?

CSV data as text looks like this. And this could be what a result looks like so it can be used in SPSS Excel etc.
Name Gender School Major
"John","male","MIT",""
"Mary Quite Contrary","female","U of MD","English"
"Joyce","","","Math"

The same data as "tagged" might look like this
{Name:"John";Gender:"male";School:"MIT"}
{Name:"Mary Quite Contrary";Gender:"female";School:"U of MD";"English"}
{Name:"Joyce";Major:"Math"}

In a worst case scenario John's data might look like this.
{School:"MIT";Name:"John";Gender:"male";}
or
{Gender:"male";School:"MIT";Name:"John"}

An example of the first kind of tagged data is downloaded Library of Congress catalog "cards". Cards (cases) have a fixed order to tags, but empty fields are not mentioned. If there are not editions of a book there is simply no tagged field for edition.

Art Kendall
Social Research Consultants

jkpeck

Re: Has anybody already created a way to read "tagged" data into SPSS?

There is a Python library named pymarc that reads MARC21 files and has methods for working with that.

It says,
https://pypi.org/project/pymarc/

pymarc is a python library for working with bibliographic data encoded in MARC21. It provides an API for reading, writing and modifying MARC records. It was mostly designed to be an emergency eject seat, for getting your data assets out of MARC and into some kind of saner representation.

You can install it in Statistics like this
STATS PACKAGE INSTALL PYTHON=pymarc.

A simple usage example:
from pymarc import MARCReader
with open('test/marc.dat', 'rb') as fh:
reader = MARCReader(fh)
for record in reader:
print(record.title())

Specifics would depend on the tag set. If you have a sample MARC file and can provide some information on what you want from it, I can write a little code that would turn it into a simple csv file that could then be read into Statistics (or used elsewhere).

Art Kendall

Re: Has anybody already created a way to read "tagged" data into SPSS?

Thanks, I'll try to find out if the data is actually MARC or just something similar.

I still cannot recall what that kind of data layout was called. When used for .INI files there was only 1 record not a set of of them.

A modern version of an INI file is:

An INI file is a configuration file used by Windows programs to initialize program settings. It contains sections for settings and preferences (delimited by a string in square brackets) with each section containing one or more name and value parameters.
.
So the core parts are [name-of-variables]value

Art Kendall
Social Research Consultants

Rich Ulrich

Re: Has anybody already created a way to read "tagged" data into SPSS?

MARC sound like a special instance of some XML coding.

The Wiki article might offer some leads but I did not see a
simple solution there. For data in XML, I thought that there
might be a simple, free program somewhere that translates
from XML to spreadsheet, using var names.

Okay, searching for XML language translators shows me one
commercial package that allows a free test, and several links
that might be free programs.

--
Rich Ulrich

Art Kendall

Re: Has anybody already created a way to read "tagged" data into SPSS?

The FORTRAN and MARC formats are older than XML.
libraries have ways to go back and forth between MARC and XML.
https://guides.library.illinois.edu/c.php?g=463460&p=3168159

https://www.loc.gov/standards/marcxml/

Art Kendall
Social Research Consultants

Rich Ulrich

Re: Has anybody already created a way to read "tagged" data into SPSS?

Art,
Okay. MARC was a predecessor, similar in using tags.
Those references you cited seem to be more concerned
with international character sets than differences
in structure.

It occurs to me that it should be SO easy to go from
XML to a so-called 'normal form' for a database, that
any old, pure database package might be able to read
XML files. It may be 20 years since I touched one of those.
But that could be another place to look for conversion
for tagged data.

jkpeck

Re: Has anybody already created a way to read "tagged" data into SPSS?

As I sasid above, there is a Python module that reads MARC files, so it is easy to write those out as CSV format and then read them anywhere.

Art Kendall

Re: Has anybody already created a way to read "tagged" data into SPSS?

In reply to this post by Rich Ulrich

Thanks for bringing up XML.

I have not tried this, but it appears that Excel can read XML!

https://trumpexcel.com/convert-xml-to-excel/

This list is a great resource, this lead on XML and Jon's idea about reading MARC records are great leads.

Art Kendall
Social Research Consultants

Bruce Weaver

Re: Has anybody already created a way to read "tagged" data into SPSS?

Administrator

Art, that website doesn't have anything to do with DJT45, does it? :-O

Art Kendall wrote

Thanks for bringing up XML.

I have not tried this, but it appears that Excel can read XML!

https://trumpexcel.com/convert-xml-to-excel/

This list is a great resource, this lead on XML and Jon's idea about reading MARC records are great leads.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

jkpeck

Re: Has anybody already created a way to read "tagged" data into SPSS?

In reply to this post by Art Kendall

Excel - and SPSS with the DAP - can read XML, but there is still the issue of how it gets mapped into a useful form. Just because a program can parse XML text doesn't mean that it will necessarily map it the way you want it.

XML is a bullet, but it isn't always a magic bullet.