Hello, This e-mail list came up when I was searching for information on converting data from statistical formats to MS Access. I am requesting data from a government archive and they are giving me four options for data format: SAS, SPSS, Stata, or tab-delimited file. I do not have the statistical software, so if I get one of the first three, I’ll need to convert it to Access. I don’t know if the data in those formats might have any better data structure that makes it better than the tab-delimited. So I’m looking for a way to convert from those formats to Access. I see that the two primary ways of doing this outside the statistical software are “Stat/Transfer” and “DBMS/Copy,” of which the second one is no longer sold. I’d like to find a way to do this without investing in commercial software. Can people here offer any suggestions? Thank you for your assistance. Best regards, Marshall Burns, PhD (805) 451-4507 |
SAS, SPSS, and Stata all have program specific data file structures and while each of the programs, I assume, can read data file versions from the other programs, the most general format would always be preferable. Thus tab delimited. That
said, I think you also need to evaluate the measurement level of the variables in the dataset. Two extremes: a) all variables are real numbers such as frequency, length, mass, etc. b) All variables are integer numbers and some or all of the variables are categorical-ordinal
or nominal such as sex, ethnic group, highest grade/final degree, car/make model, etc where text descriptors are required to give meaning to the numeric values. If b), then supplementary file(s) are required to carry the so-called value labels information.
Then a program specific file would be better because there probably would be less preliminary work to get the file ‘analysis-ready’. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Marshall Burns Hello, This e-mail list came up when I was searching for information on converting data from statistical formats to MS Access.
I am requesting data from a government archive and they are giving me four options for data format: SAS, SPSS, Stata, or tab-delimited file. I do not have the statistical software, so if I get one of the first three, I’ll
need to convert it to Access. I don’t know if the data in those formats might have any better data structure that makes it better than the tab-delimited. So I’m looking for a way to convert from those formats to Access.
I see that the two primary ways of doing this outside the statistical software are “Stat/Transfer” and “DBMS/Copy,” of which the second one is no longer sold. I’d like to find a way to do this without investing in commercial
software. Can people here offer any suggestions? Thank you for your assistance. Best regards, Marshall Burns, PhD (805) 451-4507 |
Thank you Gene for your helpful information. The rest of the discussion here is not really relevant for me, so I’m going to unsubscribe. If anyone else is going to respond to my question, please be sure to copy your response to me at [hidden email] so that I get it. Thank you. Marshall Burns From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene SAS, SPSS, and Stata all have program specific data file structures and while each of the programs, I assume, can read data file versions from the other programs, the most general format would always be preferable. Thus tab delimited. That said, I think you also need to evaluate the measurement level of the variables in the dataset. Two extremes: a) all variables are real numbers such as frequency, length, mass, etc. b) All variables are integer numbers and some or all of the variables are categorical-ordinal or nominal such as sex, ethnic group, highest grade/final degree, car/make model, etc where text descriptors are required to give meaning to the numeric values. If b), then supplementary file(s) are required to carry the so-called value labels information. Then a program specific file would be better because there probably would be less preliminary work to get the file ‘analysis-ready’. Gene Maguin From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Marshall Burns Hello, This e-mail list came up when I was searching for information on converting data from statistical formats to MS Access. I am requesting data from a government archive and they are giving me four options for data format: SAS, SPSS, Stata, or tab-delimited file. I do not have the statistical software, so if I get one of the first three, I’ll need to convert it to Access. I don’t know if the data in those formats might have any better data structure that makes it better than the tab-delimited. So I’m looking for a way to convert from those formats to Access. I see that the two primary ways of doing this outside the statistical software are “Stat/Transfer” and “DBMS/Copy,” of which the second one is no longer sold. I’d like to find a way to do this without investing in commercial software. Can people here offer any suggestions? Thank you for your assistance. Best regards, Marshall Burns, PhD (805) 451-4507 |
In reply to this post by Maguin, Eugene
Or this:
# -*- coding: utf-8 -*- import sys import os import random import pypyodbc import savReaderWriter debug = False """sav2mdb.py: convert SPSS system files (codepage) to Microsoft Access files""" def get_table_name(filename): tbl = os.path.splitext(os.path.basename(filename))[0] return tbl.capitalize().replace(" ", "_") def get_metadata(savFilename): with savReaderWriter.SavHeaderReader(savFilename) as header: varNames, varTypes = header.varNames, header.varTypes formats = header.formats return varNames, varTypes, formats def sql_create_table_from_sav(savFilename): """Generate SQL 'CREATE TABLE' statement on the basis of <savFilename> SPSS-to-SQL datatype translation: numeric, except date/time --> FLOAT date or time --> CHAR(26) (iso dates where applicable) string < 256 bytes --> TEXT of that length string >= 256 bytes --> TEXT $sysmis --> NULL """ varNames, varTypes, formats = get_metadata(savFilename) tbl = get_table_name(savFilename) # if "id" happens to be an existing varname, then suffix the primary key suffix = "_%04d" % random.randint(1000, 9999) if "id" in varNames else "" sql = ["CREATE TABLE %(tbl)s (id%(suffix)s COUNTER PRIMARY KEY,\n " % locals()] for varName in varNames: varType = varTypes[varName] format_ = formats[varName].lower() dataType = "FLOAT" if varType == 0 else "CHAR(%d)" % varType if varType < 256 else "TEXT" dataType = "CHAR(26)" if "time" in format_ or "date" in format_ else dataType sql.append("%(varName)s %(dataType)s, \n " % locals()) return "".join(sql).rstrip(", \n ") + "\n);" def sql_insert_template(savFilename): """Generate SQL 'INSERT INTO' template, suitable for sql quote escaping""" varNames, varTypes, formats = get_metadata(savFilename) tbl = get_table_name(savFilename) varNames_ = ", ".join(varNames) insert = "INSERT INTO %(tbl)s (%(varNames_)s) VALUES " % locals() template = [eval('"?" if varTypes[v] > 0 or "TIME" in formats[v] or ' '"DATE" in formats[v] else "?"') for v in varNames] template = insert + "(" + ", ".join(template) + ");\n" return template def write_ms_access_file(savFilename, mdbFilename=None, overwrite=True): """Write the actual MS Access file""" if not sys.platform.startswith("win"): raise EnvironmentError("Sorry, Windows only") if not mdbFilename: mdbFilename = os.path.splitext(savFilename)[0].replace(" ", "_") + ".mdb" if os.path.exists(mdbFilename) and overwrite: os.remove(mdbFilename) create_table_sql = sql_create_table_from_sav(savFilename) insert_table_sql = sql_insert_template(savFilename) if debug: print create_table_sql pypyodbc.win_create_mdb(mdbFilename) try: connection_string = 'Driver={Microsoft Access Driver (*.mdb)};DBQ=%s' connection = pypyodbc.connect(connection_string % mdbFilename) cursor = connection.cursor() cursor.execute(create_table_sql) with savReaderWriter.SavReader(savFilename) as reader: for record in reader: cursor.execute(insert_table_sql, tuple(record)) cursor.commit() finally: connection.close() if __name__ == "__main__": run_as_script = len(sys.argv) > 1 if run_as_script: if len(sys.argv) == 2: write_ms_access_file(sys.argv[1]) elif len(sys.argv) == 3: write_ms_access_file(sys.argv[1], sys.argv[2]) elif len(sys.argv) == 4: write_ms_access_file(sys.argv[1], sys.argv[2], sys.argv[3]) else: print ("Usage: sav2mdb savFilename[[, mdbFilename], overwrite]\n" "If overwrite (True/False) is specified, mdbFilename must\n" "also be specified") Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
Free forum by Nabble | Edit this page |