Hi all,
I have a general question about organization of syntax files.
Since many of you are programmers, I suspect there is some gold standard out there on how to organize your files so that should you win the lottery, someone can come in and replicate what have done and where you left off.
I have this really bad habit of starting an analysis and then six months later ending up with an unwieldy syntax file with hundreds of lines. So, my new years resolution was to better organize these messes. I'm decent about documenting
the contents within a file and i started breaking up the files for an analysis by data importing and labeling, compute new variables; analyses
How do you break these files up or do you? Is there a good reference for this?
Thanks
Carol
|
I use a lot of syntax files. My rule of thumb is: one syntax for one job. F.e.: you have a project which needs data from 5 different sources: - 1 master syntax with references to - syntax which gets the data - syntax which edits the data into the file - syntax which which gets you the results from the data Try to keep your syntax files about 200 - 300 lines. If they are longer: cut them into smaller pieces. Use the FILE HANDLES and the INSERT FILE commands. And, most import of all: make notes. The first lines of your syntax should state what it does, line by line. * This syntax file edits the data: 1) it gets the data from 4 files: file A, file B, file C and file D 2) it matches files A and B of variable AGE 3) et cetera. Next, make notes in your syntax: * Match files A and B to get result Z. MATCH FILES /file = a /table = b /by var1. So: - use a master syntax - one syntax for one job - limit your syntax size to 300 lines - make notes in the syntax
|
In reply to this post by parisec
Hi Carol, You have touched on a cornerstone concept in all research. We all prefer running analyses to tweaking the finer points of a question, for example, but I have found that nothing is more important than creating a comprehensive code book, and developing the discipline of keeping it current. Plus, doing so helps the next wave of researchers. Syntax, output files etc are all part of the code book, including and RECODE commands etc. I have found it helpful to keep everything in a single excel file, with tabs representing different phases or processes of the project. In the “notes” section I will write out (type) a particular decision, or issue, and then attach any syntax files that correspond to executing the decision. Actually, I copy to excel, and keep original in SPSS. I hope this helps. Regards, Joseph A. Youngblood Director of Research SSI From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Parise, Carol A. Hi all, I have a general question about organization of syntax files. Since many of you are programmers, I suspect there is some gold standard out there on how to organize your files so that should you win the lottery, someone can come in and replicate what have done and where you left off. I have this really bad habit of starting an analysis and then six months later ending up with an unwieldy syntax file with hundreds of lines. So, my new years resolution was to better organize these messes. I'm decent about documenting the contents within a file and i started breaking up the files for an analysis by data importing and labeling, compute new variables; analyses How do you break these files up or do you? Is there a good reference for this? Thanks Carol |
Administrator
|
In reply to this post by kwame woei
"Try to keep your syntax files about 200 - 300 lines. If they are longer: cut them into smaller pieces. Use the FILE HANDLES and the INSERT FILE commands."
In general this is a great practice however having a lot of files around can be tiresome in some cases. For example: I am currently writing a large matrix macro of approximately 1300 lines. When finished it will likely be closer to 2000 (about 200-300 of these are detailed comments). It is logically segmented into approximately 25 smaller macros, called from a central controlling macro. I keep my sanity and my hair intact by commenting ABOVE the DEFINE-!ENDDEFINE Blocks so the comments appear in the outline window to the left. So far, easy peasy to keep track. The reason why I don't segment this into several files is it will be distributed to an unknown number of users downstream and I don't want to deal with knowing anything about their file system or force the end end user to define FILE HANDLES etc. Simply INCLUDE my single set of macros and call a single macro name with appropriate parameters. In general when building production code for myself or a small group I will typically segment into RawDataAccess --> IntermediateData DataTransformations --> PristineCleanData Analytics PostProcessing/Scripting FinalReportGeneration --- --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since!
As Joseph stated, I now need to have the discipline to put these these practices in place. Carol ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
INSERT rather than INCLUDE seems to be the command these days.
It has a few more options which might be useful. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by parisec
INCLUDE is still available, but it is obsolete.
Use the INSERT command instead. It supports interactive format
syntax as well as batch format. INCLUDE only supports batch format.
INSERT also has some options that you might find useful.
I try to minimize hard coded location references everywhere. Besides file handles, if you embed your syntax files in a Python wrapper, you can use the Python search path to avoid location references. This is particularly useful if you have a library of functions that you might use in multiple projects. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: "Parise, Carol A." <[hidden email]> To: [hidden email], Date: 03/11/2013 06:52 PM Subject: Re: [SPSSX-L] Syntax file organization Sent by: "SPSSX(r) Discussion" <[hidden email]> Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since! As Joseph stated, I now need to have the discipline to put these these practices in place. Carol ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Dear Jon Would you mind giving an example of how to use the Python search path in a wrapper? Thanks Garry From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck INCLUDE is still available, but it is obsolete. Use the INSERT command instead. It supports interactive format syntax as well as batch format. INCLUDE only supports batch format. INSERT also has some options that you might find useful.
|
>Dear Jon
> >Would you mind giving an example of how to use the Python search path in a wrapper? > I thought Jon was referring to the PYTHONPATH environment variable: http://docs.python.org/2/using/cmdline.html#envvar-PYTHONPATH ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Garry Gelade
The Python import mechanism, which is used
to load Python modules, does not use explicit file paths. Rather,
it uses a search strategy something like the DOS path to find the modules
to import. By default, it will look in the site-packages subdirectory
of your Python installation, in the extensions subdirectory of your Statistics
installation and some other places. You can add additional locations
automatically by creating a sitecustomize.py file in the site-packages
directory. This is what mine looks like.
import sys sys.path.append("c:/extcommon") sys.path.append("c:/python2764/lib/site-packages/spssaux") sys.path.append("c:/python2764/lib/site-packages/misc") sys.path.append("c:/extcommon18") sys.path.append("c:/python2764/lib/site-packages/temp") sys.path.append("c:/python2764/lib/site-packages/examples") Here is a trivial file named runthis.py saved anywhere on this search path. import spss spss.Submit("show all") Then in Statistics I run begin program. import runthis end program. That executes the contents of this file. This mechanism is particularly useful when you have a library of macro definitions or utility code that you want to share across projects, but it will work for any set of Statistics code. The file could in turn have INSERT commands in it, which do require a location, but the users of that syntax would not need to specify where to find it. You could go one step further and create a startup script that will be invoked automatically whenever you launch Statistics, and it could load your standard utility library using this mechanism without any explicit action. Details on startup scripts are in the main help under scripting. HTH, Jon Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Garry Gelade <[hidden email]> To: [hidden email], Date: 03/14/2013 03:13 AM Subject: Re: [SPSSX-L] Syntax file organization Sent by: "SPSSX(r) Discussion" <[hidden email]> Dear Jon Would you mind giving an example of how to use the Python search path in a wrapper? Thanks Garry From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck Sent: 12 March 2013 02:14 To: [hidden email] Subject: Re: Syntax file organization INCLUDE is still available, but it is obsolete. Use the INSERT command instead. It supports interactive format syntax as well as batch format. INCLUDE only supports batch format. INSERT also has some options that you might find useful. I try to minimize hard coded location references everywhere. Besides file handles, if you embed your syntax files in a Python wrapper, you can use the Python search path to avoid location references. This is particularly useful if you have a library of functions that you might use in multiple projects. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@... phone: 720-342-5621 From: "Parise, Carol A." <PariseC@...> To: [hidden email], Date: 03/11/2013 06:52 PM Subject: Re: [SPSSX-L] Syntax file organization Sent by: "SPSSX(r) Discussion" <[hidden email]> Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since! As Joseph stated, I now need to have the discipline to put these these practices in place. Carol ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Jon,
What are the advantages of using sitecustomize.py as compared with %PYTHONPATH%? I don't have write access to the site-packages dir, but I can define user env vars.
Regards,
Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
Free forum by Nabble | Edit this page |