Building a dataset from data stored in .XML format

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Building a dataset from data stored in .XML format

Turner, John E. (VADOC)

Dear ListServ Members;

 

We have data that has been stored in .XML and would like to use SPSS (v19) to import this data into a common SPSS file (many .XML files imported into one dataset).  Each .XML file will have the same format (variable names, etc…) as all of the other .XML files we would like to import.

 

Is there a way to import .XML formatted information into SPSS, preferably in batches (not importing each file separately and then doing some sort of add cases type thing – there are tens of thousands of these files), using SPSS v19?

 

Best Regards,


John Turner

Virginia Department of Corrections

Reply | Threaded
Open this post in threaded view
|

Re: Building a dataset from data stored in .XML format

Garry Gelade

Dear John

 

One way to do this is to write a Python program to parse and import the data. Raynald Levesque has some examples on his site http://www.spsstools.net

 

Garry Gelade

Business Analytic Ltd

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Turner, John E.
Sent: 16 February 2011 19:32
To: [hidden email]
Subject: Building a dataset from data stored in .XML format

 

Dear ListServ Members;

 

We have data that has been stored in .XML and would like to use SPSS (v19) to import this data into a common SPSS file (many .XML files imported into one dataset).  Each .XML file will have the same format (variable names, etc…) as all of the other .XML files we would like to import.

 

Is there a way to import .XML formatted information into SPSS, preferably in batches (not importing each file separately and then doing some sort of add cases type thing – there are tens of thousands of these files), using SPSS v19?

 

Best Regards,


John Turner

Virginia Department of Corrections

Reply | Threaded
Open this post in threaded view
|

Re: Building a dataset from data stored in .XML format

Jon K Peck
In reply to this post by Turner, John E. (VADOC)
There are two parts to this problem.  First, can you read a single xml data file into SPSS Statistics?  Second, how do you read a large set of identically structured xml files into one dataset?

On the first part, XML files come in infinite variety.  So try reading one with Statistics.  To do this, you need to install the SPSS Data Access Pack, which contains the XML driver.  The DAP can be downloaded from http://www.spss.com/drivers/ if you don't already have it.

Then use File>Open Database>New Query and define the data source using the xml driver.  From all this you can generate the GET DATA syntax to read an xml file.

Once you are set up to read a single dataset, you are ready to set up a job to combine multiple datasets.  For this you need to use Python programmability directly or indirectly in order to avoid having to enumerate each dataset individually.

Using programmability, you can use the glob.glob method to find all the files matching your input set, say "c:/myxmldata/*.xml".  Then iterate over these files, opening and matching each of these files by submitting commands to Statistics until you have the entire aggregate.

Alternatively, if you are more comfortable just using SPSS Statistics, you can use the SPSSINC PROCESS FILES extension command from the SPSS Community (www.ibm.com/developerworks/spssdevcentral).  It will iterate through the specified files and run a block of Statistics syntax for each one,  So you could use the file handles or macros that it provides to your Statistics code to open and match each file.  See the help for the command or its dialog box help for more information.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        "Turner, John E." <[hidden email]>
To:        [hidden email]
Date:        02/16/2011 12:39 PM
Subject:        [SPSSX-L] Building a dataset from data stored in .XML format
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear ListServ Members;
 
We have data that has been stored in .XML and would like to use SPSS (v19) to import this data into a common SPSS file (many .XML files imported into one dataset).  Each .XML file will have the same format (variable names, etc…) as all of the other .XML files we would like to import.
 
Is there a way to import .XML formatted information into SPSS, preferably in batches (not importing each file separately and then doing some sort of add cases type thing – there are tens of thousands of these files), using SPSS v19?
 
Best Regards,

John Turner

Virginia Department of Corrections
Reply | Threaded
Open this post in threaded view
|

Re: Building a dataset from data stored in .XML format

Frank Furter
Jon, I have to import data form an XML data source, too, but I haven't been able to locate an appropriate driver in the SPSS Data Accessy Pack. I use SPSS version 22 and DAP version 7.1 - which of the drivers should I use?

Moreover, there are XML data stanards defined by the CDSIC organization that are becoming increasingly important in clinical research (prease see my previous post under http://spssx-discussion.1045642.n5.nabble.com/CDISC-standards-td5716047.html). Has SPSS ever considered providing a solution for this (SAS has ...)?

Best, Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Building a dataset from data stored in .XML format

Albert-Jan Roskam
----- Original Message -----

> From: Andreas Voelp <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Tuesday, April 1, 2014 10:54 AM
> Subject: Re: [SPSSX-L] Building a dataset from data stored in .XML format
>
> Jon, I have to import data form an XML data source, too, but I haven't been
> able to locate an appropriate driver in the SPSS Data Accessy Pack. I use
> SPSS version 22 and DAP version 7.1 - which of the drivers should I use?
>
> Moreover, there are XML data stanards defined by the CDSIC organization that
> are becoming increasingly important in clinical research (prease see my
> previous post under
> http://spssx-discussion.1045642.n5.nabble.com/CDISC-standards-td5716047.html
> <http://spssx-discussion.1045642.n5.nabble.com/CDISC-standards-td5716047.html>
> ). Has SPSS ever considered providing a solution for this (SAS has ...)?

Hi,

I do not know CDISC other than the name. There a lot of xml formats out there that are at least in part overlapping in terms of their goals: HL7, OpenEHR, SDMX. Would you say that CDISC is somehow more important/popular? HL7 is very widely used (maybe CDISC is a subset of HL7?) so why not have importers for this format too?

Just curious.

regards,
Albert-Jan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD