Format conversion outside stat software

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Format conversion outside stat software

Marshall Burns

Hello,

 

                This e-mail list came up when I was searching for information on converting data from statistical formats to MS Access.

 

                I am requesting data from a government archive and they are giving me four options for data format: SAS, SPSS, Stata, or tab-delimited file. I do not have the statistical software, so if I get one of the first three, I’ll need to convert it to Access. I don’t know if the data in those formats might have any better data structure that makes it better than the tab-delimited. So I’m looking for a way to convert from those formats to Access.

 

                I see that the two primary ways of doing this outside the statistical software are “Stat/Transfer” and “DBMS/Copy,” of which the second one is no longer sold. I’d like to find a way to do this without investing in commercial software. Can people here offer any suggestions?

 

                Thank you for your assistance.

 

Best regards,

Marshall Burns, PhD

www.MBurns.com

(805) 451-4507

 

Reply | Threaded
Open this post in threaded view
|

Re: Format conversion outside stat software

Maguin, Eugene

SAS, SPSS, and Stata all have program specific data file structures and while each of the programs, I assume, can read data file versions from the other programs, the most general format would always be preferable. Thus tab delimited. That said, I think you also need to evaluate the measurement level of the variables in the dataset. Two extremes: a) all variables are real numbers such as frequency, length, mass, etc. b) All variables are integer numbers and some or all of the variables are categorical-ordinal or nominal such as sex, ethnic group, highest grade/final degree, car/make model, etc where text descriptors are required to give meaning to the numeric values. If b), then supplementary file(s) are required to carry the so-called value labels information. Then a program specific file would be better because there probably would be less preliminary work to get the file ‘analysis-ready’.

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marshall Burns
Sent: Wednesday, February 05, 2014 10:02 AM
To: [hidden email]
Subject: Format conversion outside stat software

 

Hello,

 

                This e-mail list came up when I was searching for information on converting data from statistical formats to MS Access.

 

                I am requesting data from a government archive and they are giving me four options for data format: SAS, SPSS, Stata, or tab-delimited file. I do not have the statistical software, so if I get one of the first three, I’ll need to convert it to Access. I don’t know if the data in those formats might have any better data structure that makes it better than the tab-delimited. So I’m looking for a way to convert from those formats to Access.

 

                I see that the two primary ways of doing this outside the statistical software are “Stat/Transfer” and “DBMS/Copy,” of which the second one is no longer sold. I’d like to find a way to do this without investing in commercial software. Can people here offer any suggestions?

 

                Thank you for your assistance.

 

Best regards,

Marshall Burns, PhD

www.MBurns.com

(805) 451-4507

 

Reply | Threaded
Open this post in threaded view
|

Re: Format conversion outside stat software

Marshall Burns

Thank you Gene for your helpful information.

 

The rest of the discussion here is not really relevant for me, so I’m going to unsubscribe. If anyone else is going to respond to my question, please be sure to copy your response to me at [hidden email] so that I get it.

 

Thank you.

 

Marshall Burns

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Wednesday, February 5, 2014 08:37
To: [hidden email]
Subject: Re: Format conversion outside stat software

 

SAS, SPSS, and Stata all have program specific data file structures and while each of the programs, I assume, can read data file versions from the other programs, the most general format would always be preferable. Thus tab delimited. That said, I think you also need to evaluate the measurement level of the variables in the dataset. Two extremes: a) all variables are real numbers such as frequency, length, mass, etc. b) All variables are integer numbers and some or all of the variables are categorical-ordinal or nominal such as sex, ethnic group, highest grade/final degree, car/make model, etc where text descriptors are required to give meaning to the numeric values. If b), then supplementary file(s) are required to carry the so-called value labels information. Then a program specific file would be better because there probably would be less preliminary work to get the file ‘analysis-ready’.

Gene Maguin

 

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Marshall Burns
Sent: Wednesday, February 05, 2014 10:02 AM
To: [hidden email]
Subject: Format conversion outside stat software

 

Hello,

 

                This e-mail list came up when I was searching for information on converting data from statistical formats to MS Access.

 

                I am requesting data from a government archive and they are giving me four options for data format: SAS, SPSS, Stata, or tab-delimited file. I do not have the statistical software, so if I get one of the first three, I’ll need to convert it to Access. I don’t know if the data in those formats might have any better data structure that makes it better than the tab-delimited. So I’m looking for a way to convert from those formats to Access.

 

                I see that the two primary ways of doing this outside the statistical software are “Stat/Transfer” and “DBMS/Copy,” of which the second one is no longer sold. I’d like to find a way to do this without investing in commercial software. Can people here offer any suggestions?

 

                Thank you for your assistance.

 

Best regards,

Marshall Burns, PhD

www.MBurns.com

(805) 451-4507

 

Reply | Threaded
Open this post in threaded view
|

Re: Format conversion outside stat software

Albert-Jan Roskam
In reply to this post by Maguin, Eugene
Or this:

# -*- coding: utf-8 -*-

import sys
import os
import random

import pypyodbc
import savReaderWriter

debug = False

"""sav2mdb.py: convert SPSS system files (codepage) to Microsoft Access files"""

def get_table_name(filename):
    tbl = os.path.splitext(os.path.basename(filename))[0]
    return tbl.capitalize().replace(" ", "_")

def get_metadata(savFilename):
    with savReaderWriter.SavHeaderReader(savFilename) as header:
        varNames, varTypes = header.varNames, header.varTypes
        formats = header.formats
    return varNames, varTypes, formats

def sql_create_table_from_sav(savFilename):
    """Generate SQL 'CREATE TABLE' statement on the basis of <savFilename>
    SPSS-to-SQL datatype translation:
    numeric, except date/time --> FLOAT
    date or time --> CHAR(26) (iso dates where applicable)
    string < 256 bytes --> TEXT of that length
    string >= 256 bytes --> TEXT
    $sysmis --> NULL
    """
    varNames, varTypes, formats = get_metadata(savFilename)
    tbl = get_table_name(savFilename)
    # if "id" happens to be an existing varname, then suffix the primary key
    suffix = "_%04d" % random.randint(1000, 9999) if "id" in varNames else ""
    sql = ["CREATE TABLE %(tbl)s (id%(suffix)s COUNTER PRIMARY KEY,\n  " % locals()]
    for varName in varNames:
        varType = varTypes[varName]
        format_ = formats[varName].lower()
        dataType = "FLOAT" if varType == 0 else "CHAR(%d)" % varType if varType < 256 else "TEXT"
        dataType = "CHAR(26)" if "time" in format_ or "date" in format_ else dataType
        sql.append("%(varName)s %(dataType)s, \n  " % locals())
    return "".join(sql).rstrip(", \n  ") + "\n);"

def sql_insert_template(savFilename):
    """Generate SQL 'INSERT INTO' template, suitable for sql quote escaping"""
    varNames, varTypes, formats = get_metadata(savFilename)
    tbl = get_table_name(savFilename)
    varNames_ = ", ".join(varNames)
    insert = "INSERT INTO %(tbl)s (%(varNames_)s) VALUES " % locals()
    template = [eval('"?" if varTypes[v] > 0 or "TIME" in formats[v] or '
                     '"DATE" in formats[v] else "?"') for v in varNames]
    template = insert + "(" + ", ".join(template) + ");\n"
    return template

def write_ms_access_file(savFilename, mdbFilename=None, overwrite=True):
    """Write the actual MS Access file"""
    if not sys.platform.startswith("win"):
        raise EnvironmentError("Sorry, Windows only")
    if not mdbFilename:
        mdbFilename = os.path.splitext(savFilename)[0].replace(" ", "_") + ".mdb"
    if os.path.exists(mdbFilename) and overwrite:
        os.remove(mdbFilename)
    create_table_sql = sql_create_table_from_sav(savFilename)
    insert_table_sql = sql_insert_template(savFilename)
    if debug:
        print create_table_sql

    pypyodbc.win_create_mdb(mdbFilename)
    try:
        connection_string = 'Driver={Microsoft Access Driver (*.mdb)};DBQ=%s'
        connection = pypyodbc.connect(connection_string % mdbFilename)
        cursor = connection.cursor()
        cursor.execute(create_table_sql)
        with savReaderWriter.SavReader(savFilename) as reader:
            for record in reader:
                cursor.execute(insert_table_sql, tuple(record))
        cursor.commit()
    finally:
        connection.close()

if __name__ == "__main__":
    run_as_script = len(sys.argv) > 1
    if run_as_script:
        if len(sys.argv) == 2:
            write_ms_access_file(sys.argv[1])
        elif len(sys.argv) == 3:
            write_ms_access_file(sys.argv[1], sys.argv[2])
        elif len(sys.argv) == 4:
            write_ms_access_file(sys.argv[1], sys.argv[2], sys.argv[3])
    else:
        print ("Usage: sav2mdb savFilename[[, mdbFilename], overwrite]\n"
               "If overwrite (True/False) is specified, mdbFilename must\n"
               "also be specified")

 
Regards,

Albert-Jan



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: "Maguin, Eugene" <[hidden email]>
To: [hidden email]
Sent: Wednesday, February 5, 2014 5:36 PM
Subject: Re: [SPSSX-L] Format conversion outside stat software

SAS, SPSS, and Stata all have program specific data file structures and while each of the programs, I assume, can read data file versions from the other programs, the most general format would always be preferable. Thus tab delimited. That said, I think you also need to evaluate the measurement level of the variables in the dataset. Two extremes: a) all variables are real numbers such as frequency, length, mass, etc. b) All variables are integer numbers and some or all of the variables are categorical-ordinal or nominal such as sex, ethnic group, highest grade/final degree, car/make model, etc where text descriptors are required to give meaning to the numeric values. If b), then supplementary file(s) are required to carry the so-called value labels information. Then a program specific file would be better because there probably would be less preliminary work to get the file ‘analysis-ready’.
Gene Maguin
 
 
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marshall Burns
Sent: Wednesday, February 05, 2014 10:02 AM
To: [hidden email]
Subject: Format conversion outside stat software
 
Hello,
 
                This e-mail list came up when I was searching for information on converting data from statistical formats to MS Access.
 
                I am requesting data from a government archive and they are giving me four options for data format: SAS, SPSS, Stata, or tab-delimited file. I do not have the statistical software, so if I get one of the first three, I’ll need to convert it to Access. I don’t know if the data in those formats might have any better data structure that makes it better than the tab-delimited. So I’m looking for a way to convert from those formats to Access.
 
                I see that the two primary ways of doing this outside the statistical software are “Stat/Transfer” and “DBMS/Copy,” of which the second one is no longer sold. I’d like to find a way to do this without investing in commercial software. Can people here offer any suggestions?
 
                Thank you for your assistance.
 
Best regards,
Marshall Burns, PhD
(805) 451-4507