I've inherited several files containing thousands of lines of SPSS syntax for processing a dataset of responses to a complex survey. Most of the code derives new output variables on the basis of case-by-case data (mainly using DO IF and COMPUTE) and it doesn't use any complex SPSS procedures - mainly just MATCH, FREQUENCIES and CROSSTABS. The syntax outputs various files along the way.
I've made some improvements by using SPSS macros (e.g. I wrote a macro to annualise amounts depending on the corresponding period code, rather than copying and pasting a chunk of syntax throughout the code). But I now have SPSS Python Essentials (v21) and I am wondering if I would be better off starting to reimplement the code in Python. In particular, one issue is that for each year of the survey, the input variables change slightly as the questionnaire is updated. I'd like to be able to use Python to separate out functions to derive each output variable from the raw inputs, so that the code can easily be updated when the inputs change. At the moment, the process of updating the code is rather laborious. Does anyone have advice on how I might approach this task, if SPSS Python programming is indeed a sensible way to go about it? Is the best way to process case-by-case data to use the spssdata class? I realise this is a rather general query, but thanks for any help. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Python certainly has the potential to simplify
and generalize jobs. Here are some resources that may help.
A blog post on the SPSS Community site Using SPSSINC PROGRAM and generalizing your code vs writing macros in SPSS Statistics https://www.ibm.com/developerworks/community/blogs/ab16c38e-2f7b-4912-a47e-85682d124d32/entry/using_spssinc_program_and_generalizing_your_code_vs_writing_macros_in_spss_statistics?lang=en The Programming and Data Management book https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/We70df3195ec8_4f95_9773_42e448fa9029/page/Books%20and%20Articles There are some extension commands that are implemented in Python but function like regular syntax that can be a big help in generalization. In particular the SPSSINC SELECT VARIABLES command generates macro definitions from variable metadata such as patterns in names, variable type, measurement level, and custom attributes. I will send you offline a PowerPoint I wrote a few years ago entitled Increasing productivity with SPSS Statistics: Generalization and Automation Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Martin Griffiths <[hidden email]> To: [hidden email] Date: 02/12/2015 07:45 AM Subject: [SPSSX-L] Using Python to process SPSS data Sent by: "SPSSX(r) Discussion" <[hidden email]> I've inherited several files containing thousands of lines of SPSS syntax for processing a dataset of responses to a complex survey. Most of the code derives new output variables on the basis of case-by-case data (mainly using DO IF and COMPUTE) and it doesn't use any complex SPSS procedures - mainly just MATCH, FREQUENCIES and CROSSTABS. The syntax outputs various files along the way. I've made some improvements by using SPSS macros (e.g. I wrote a macro to annualise amounts depending on the corresponding period code, rather than copying and pasting a chunk of syntax throughout the code). But I now have SPSS Python Essentials (v21) and I am wondering if I would be better off starting to reimplement the code in Python. In particular, one issue is that for each year of the survey, the input variables change slightly as the questionnaire is updated. I'd like to be able to use Python to separate out functions to derive each output variable from the raw inputs, so that the code can easily be updated when the inputs change. At the moment, the process of updating the code is rather laborious. Does anyone have advice on how I might approach this task, if SPSS Python programming is indeed a sensible way to go about it? Is the best way to process case-by-case data to use the spssdata class? I realise this is a rather general query, but thanks for any help. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by mgriffiths
At 09:37 AM 2/12/2015, Martin Griffiths wrote:
>I've inherited several files containing thousands of lines of SPSS >syntax for processing a dataset of responses to a complex survey. >Most of the code derives new output variables on the basis of >case-by-case data. ... One issue is that for each year of the >survey, the input variables change slightly as the questionnaire is >updated. I'd like to be able [to use Python] to separate out >functions to derive each output variable from the raw inputs, so >that the code can easily be updated when the inputs change. > >Does anyone have advice on how I might approach this task, if SPSS >Python programming is indeed a sensible way to go about it? Is the >best way to process case-by-case data to use the spssdata class? There are at least two, quite different, ways to use Python. One is what (I think) you're talking about: using Python code *instead of* SPSS code, in the spssdata class. The other is using Python as a super-macro tool *to generate* SPSS code, doing the actual processing in SPSS. From what you describe, your problem fits the latter approach better: You already have SPSS code to do what you need to do; the problem is, it needs to be changed for each new survey. If you use Python (or, possibly, macros) to generate the changes, then, . You'll be able to use most of the code you already have, rather than re-implementing its functionality in Python . Your code will be mostly native SPSS code. It'll be readable by other SPSS users who haven't gone through the additional step of becoming fluent in Python. Now, you write: >For each year of the survey, the input variables change slightly as >the questionnaire is updated. I'd like to be able ... to derive each >output variable from the raw inputs, so that the code can easily be >updated when the inputs change. The question is, how extensive are the changes? Is it as simple as the questions being the same each year, with variable names changing -- like Q1.2014 and Q2.2014 on last year's survey, becoming Q1.2015 and Q2.2015 for this year's survey? If so, the changed code could certainly be generated in Python, but since it can be made without accessing the data dictionary, it's well within the capability of a macro. Could you post what kind of changes need to be made for each year? That would help us give more informed answers to your questions. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |