Syntax file organization

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Syntax file organization

parisec
 
Hi all,
 
I have a general question about organization of  syntax files.
 
Since many of you are programmers, I suspect there is some gold standard out there on how to organize your files so that should you win the lottery, someone can come in and replicate what have done and where you left off.
 
I have this really bad habit of starting an analysis and then six months later ending up with an unwieldy syntax file with hundreds of lines. So, my new years resolution was to better organize these messes.  I'm decent about documenting the contents within a file and i started breaking up the files for an analysis by data importing and labeling, compute new variables; analyses
 
How do you break these files up or do you? Is there a good reference for this?
 
Thanks
Carol
 
 
 
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

kwame woei
I use a lot of syntax files. My rule of thumb is: one syntax for one job. 

F.e.: you have a project which needs data from 5 different sources: 
- 1 master syntax with references to
   - syntax which gets the data
   - syntax which edits the data into the file
   - syntax which which gets you the results from the data

Try to keep your syntax files about 200 - 300 lines. If they are longer: cut them into smaller pieces. Use the FILE HANDLES and the INSERT FILE commands. And, most import of all: make notes. The first lines of your syntax should state what it does, line by line. 
* This syntax file edits the data:
1) it gets the data from 4 files: file A, file B, file C and file D
2) it matches files A and B of variable AGE
3) et cetera.

Next, make notes in your syntax: 
* Match files A and B to get result Z. 
MATCH FILES /file = a /table = b /by var1.

So:
- use a master syntax
- one syntax for one job
- limit your syntax size to 300 lines
- make notes in the syntax






Op 8 mrt. 2013 om 22:54 heeft "Parise, Carol A." <[hidden email]> het volgende geschreven:

 
Hi all,
 
I have a general question about organization of  syntax files.
 
Since many of you are programmers, I suspect there is some gold standard out there on how to organize your files so that should you win the lottery, someone can come in and replicate what have done and where you left off.
 
I have this really bad habit of starting an analysis and then six months later ending up with an unwieldy syntax file with hundreds of lines. So, my new years resolution was to better organize these messes.  I'm decent about documenting the contents within a file and i started breaking up the files for an analysis by data importing and labeling, compute new variables; analyses
 
How do you break these files up or do you? Is there a good reference for this?
 
Thanks
Carol
 
 
 
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

Joseph A Youngblood
In reply to this post by parisec

Hi Carol,

 

You have touched on a cornerstone concept in all research.  We all prefer running analyses to tweaking the finer points of a question, for example, but I have found that nothing is more important than creating a comprehensive code book, and developing the discipline of keeping it current.  Plus, doing so helps the next wave of researchers.  Syntax, output files etc are all part of the code book, including and RECODE commands etc.  I have found it helpful to keep everything in a single excel file, with tabs representing different phases or processes of the project.  In the “notes” section I will write out (type) a particular decision, or issue, and then attach any syntax files that correspond to executing the decision.  Actually, I copy to excel, and keep original in SPSS.  I hope this helps.

 

Regards,

 

Joseph A. Youngblood

Director of Research

SSI

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Parise, Carol A.
Sent: Friday, March 08, 2013 1:54 PM
To: [hidden email]
Subject: Syntax file organization

 

 

Hi all,

 

I have a general question about organization of  syntax files.

 

Since many of you are programmers, I suspect there is some gold standard out there on how to organize your files so that should you win the lottery, someone can come in and replicate what have done and where you left off.

 

I have this really bad habit of starting an analysis and then six months later ending up with an unwieldy syntax file with hundreds of lines. So, my new years resolution was to better organize these messes.  I'm decent about documenting the contents within a file and i started breaking up the files for an analysis by data importing and labeling, compute new variables; analyses

 

How do you break these files up or do you? Is there a good reference for this?

 

Thanks

Carol

 

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

David Marso
Administrator
In reply to this post by kwame woei
"Try to keep your syntax files about 200 - 300 lines. If they are longer: cut them into smaller pieces. Use the FILE HANDLES and the INSERT FILE commands."

In general this is a great practice however having a lot of files around can be tiresome in some cases.
For example:  I am currently writing a large matrix macro of approximately 1300 lines.  When finished it will likely be closer to 2000 (about 200-300 of these are detailed comments).  
It is logically segmented into approximately 25 smaller macros, called from a central controlling macro.
I keep my sanity and my hair intact by commenting ABOVE the DEFINE-!ENDDEFINE Blocks so the comments appear in the outline window to the left.  

So far, easy peasy to keep track.  The reason why I don't segment this into several files is it will be distributed to an unknown number of users downstream and I don't want to deal with knowing anything about their file system or force the end end user to define FILE HANDLES etc.

Simply INCLUDE my single set of macros and call a single macro name with appropriate parameters.

In general when building production code for myself or a small group I will typically segment into
RawDataAccess --> IntermediateData
DataTransformations --> PristineCleanData
Analytics
PostProcessing/Scripting
FinalReportGeneration
---



--
kwame woei wrote
I use a lot of syntax files. My rule of thumb is: one syntax for one job.

F.e.: you have a project which needs data from 5 different sources:
- 1 master syntax with references to
   - syntax which gets the data
   - syntax which edits the data into the file
   - syntax which which gets you the results from the data

Try to keep your syntax files about 200 - 300 lines. If they are longer: cut them into smaller pieces. Use the FILE HANDLES and the INSERT FILE commands. And, most import of all: make notes. The first lines of your syntax should state what it does, line by line.
* This syntax file edits the data:
1) it gets the data from 4 files: file A, file B, file C and file D
2) it matches files A and B of variable AGE
3) et cetera.

Next, make notes in your syntax:
* Match files A and B to get result Z.
MATCH FILES /file = a /table = b /by var1.

So:
- use a master syntax
- one syntax for one job
- limit your syntax size to 300 lines
- make notes in the syntax






Op 8 mrt. 2013 om 22:54 heeft "Parise, Carol A." <[hidden email]> het volgende geschreven:

>  
> Hi all,
>  
> I have a general question about organization of  syntax files.
>  
> Since many of you are programmers, I suspect there is some gold standard out there on how to organize your files so that should you win the lottery, someone can come in and replicate what have done and where you left off.
>  
> I have this really bad habit of starting an analysis and then six months later ending up with an unwieldy syntax file with hundreds of lines. So, my new years resolution was to better organize these messes.  I'm decent about documenting the contents within a file and i started breaking up the files for an analysis by data importing and labeling, compute new variables; analyses
>  
> How do you break these files up or do you? Is there a good reference for this?
>  
> Thanks
> Carol
>  
>  
>  
>  
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

parisec
Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since!


As Joseph stated, I now need to have the discipline to put these these practices in place.

Carol

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

David Marso
Administrator
INSERT rather than INCLUDE seems to be the command these days.  
It has a few more options which might be useful.
--
parisec wrote
Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since!


As Joseph stated, I now need to have the discipline to put these these practices in place.

Carol

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

Jon K Peck
In reply to this post by parisec
INCLUDE is still available, but it is obsolete.  Use the INSERT command instead.  It supports interactive format syntax as well as batch format.  INCLUDE only supports batch format.  INSERT also has some options that you might find useful.

I try to minimize hard coded location references everywhere.  Besides file handles, if you embed your syntax files in a Python wrapper, you can use the Python search path to avoid location references.  This is particularly useful if you have a library of functions that you might use in multiple projects.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        "Parise, Carol A." <[hidden email]>
To:        [hidden email],
Date:        03/11/2013 06:52 PM
Subject:        Re: [SPSSX-L] Syntax file organization
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since!


As Joseph stated, I now need to have the discipline to put these these practices in place.

Carol

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

Garry Gelade

Dear Jon

 

Would you mind giving an example of how to use the Python search path in a wrapper?

 

Thanks

 

Garry

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: 12 March 2013 02:14
To: [hidden email]
Subject: Re: Syntax file organization

 

INCLUDE is still available, but it is obsolete.  Use the INSERT command instead.  It supports interactive format syntax as well as batch format.  INCLUDE only supports batch format.  INSERT also has some options that you might find useful.

I try to minimize hard coded location references everywhere.  Besides file handles, if you embed your syntax files in a Python wrapper, you can use the Python search path to avoid location references.  This is particularly useful if you have a library of functions that you might use in multiple projects.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        "Parise, Carol A." <[hidden email]>
To:        [hidden email],
Date:        03/11/2013 06:52 PM
Subject:        Re: [SPSSX-L] Syntax file organization
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since!


As Joseph stated, I now need to have the discipline to put these these practices in place.

Carol

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

Albert-Jan Roskam
>Dear Jon
>
>Would you mind giving an example of how to use the Python search path in a wrapper?
>

I thought Jon was referring to the PYTHONPATH environment variable:
http://docs.python.org/2/using/cmdline.html#envvar-PYTHONPATH

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

Jon K Peck
In reply to this post by Garry Gelade
The Python import mechanism, which is used to load Python modules, does not use explicit file paths.  Rather, it uses a search strategy something like the DOS path to find the modules to import.  By default, it will look in the site-packages subdirectory of your Python installation, in the extensions subdirectory of your Statistics installation and some other places.  You can add additional locations automatically by creating a sitecustomize.py file in the site-packages directory.  This is what mine looks like.

import sys
sys.path.append("c:/extcommon")
sys.path.append("c:/python2764/lib/site-packages/spssaux")
sys.path.append("c:/python2764/lib/site-packages/misc")
sys.path.append("c:/extcommon18")
sys.path.append("c:/python2764/lib/site-packages/temp")
sys.path.append("c:/python2764/lib/site-packages/examples")

Here is a trivial file named runthis.py saved anywhere on this search path.

import spss
spss.Submit("show all")

Then in Statistics I run
begin program.
import runthis
end program.

That executes the contents of this file.

This mechanism is particularly useful when you have a library of macro definitions or utility code that you want to share across projects, but it will work for any set of Statistics code.  The file could in turn have INSERT commands in it, which do require a location, but the users of that syntax would not need to specify where to find it.

You could go one step further and create a startup script that will be invoked automatically whenever you launch Statistics, and it could load your standard utility library using this mechanism without any explicit action.  Details on startup scripts are in the main help under scripting.

HTH,
Jon


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Garry Gelade <[hidden email]>
To:        [hidden email],
Date:        03/14/2013 03:13 AM
Subject:        Re: [SPSSX-L] Syntax file organization
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear Jon
 
Would you mind giving an example of how to use the Python search path in a wrapper?
 
Thanks
 
Garry
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck
Sent:
12 March 2013 02:14
To:
[hidden email]
Subject:
Re: Syntax file organization

 
INCLUDE is still available, but it is obsolete.  Use the INSERT command instead.  It supports interactive format syntax as well as batch format.  INCLUDE only supports batch format.  INSERT also has some options that you might find useful.

I try to minimize hard coded location references everywhere.  Besides file handles, if you embed your syntax files in a Python wrapper, you can use the Python search path to avoid location references.  This is particularly useful if you have a library of functions that you might use in multiple projects.



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM

peck@...
phone: 720-342-5621





From:        
"Parise, Carol A." <PariseC@...>
To:        
[hidden email],
Date:        
03/11/2013 06:52 PM
Subject:        
Re: [SPSSX-L] Syntax file organization
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>





Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since!


As Joseph stated, I now need to have the discipline to put these these practices in place.

Carol

=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Syntax file organization

Albert-Jan Roskam
Hi Jon,
 
What are the advantages of using sitecustomize.py as compared with %PYTHONPATH%? I don't have write access to the site-packages dir, but I can define user env vars.
 
Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Thursday, March 14, 2013 1:55 PM
Subject: Re: [SPSSX-L] Syntax file organization

The Python import mechanism, which is used to load Python modules, does not use explicit file paths.  Rather, it uses a search strategy something like the DOS path to find the modules to import.  By default, it will look in the site-packages subdirectory of your Python installation, in the extensions subdirectory of your Statistics installation and some other places.  You can add additional locations automatically by creating a sitecustomize.py file in the site-packages directory.  This is what mine looks like.

import sys
sys.path.append("c:/extcommon")
sys.path.append("c:/python2764/lib/site-packages/spssaux")
sys.path.append("c:/python2764/lib/site-packages/misc")
sys.path.append("c:/extcommon18")
sys.path.append("c:/python2764/lib/site-packages/temp")
sys.path.append("c:/python2764/lib/site-packages/examples")

Here is a trivial file named runthis.py saved anywhere on this search path.

import spss
spss.Submit("show all")

Then in Statistics I run
begin program.
import runthis
end program.

That executes the contents of this file.

This mechanism is particularly useful when you have a library of macro definitions or utility code that you want to share across projects, but it will work for any set of Statistics code.  The file could in turn have INSERT commands in it, which do require a location, but the users of that syntax would not need to specify where to find it.

You could go one step further and create a startup script that will be invoked automatically whenever you launch Statistics, and it could load your standard utility library using this mechanism without any explicit action.  Details on startup scripts are in the main help under scripting.

HTH,
Jon


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Garry Gelade <[hidden email]>
To:        [hidden email],
Date:        03/14/2013 03:13 AM
Subject:        Re: [SPSSX-L] Syntax file organization
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Dear Jon
 
Would you mind giving an example of how to use the Python search path in a wrapper?
 
Thanks
 
Garry
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck
Sent:
12 March 2013 02:14
To:
[hidden email]
Subject:
Re: Syntax file organization

 
INCLUDE is still available, but it is obsolete.  Use the INSERT command instead.  It supports interactive format syntax as well as batch format.  INCLUDE only supports batch format.  INSERT also has some options that you might find useful.

I try to minimize hard coded location references everywhere.  Besides file handles, if you embed your syntax files in a Python wrapper, you can use the Python search path to avoid location references.  This is particularly useful if you have a library of functions that you might use in multiple projects.



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM

[hidden email]
phone: 720-342-5621





From:        
"Parise, Carol A." <[hidden email]>
To:        
[hidden email],
Date:        
03/11/2013 06:52 PM
Subject:        
Re: [SPSSX-L] Syntax file organization
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




Thank you all for sharing your suggestions. In addition to giving me some ideas of ways that you handle this issues, you reminded me of the INCLUDE command. I used that prior to the PC days when it wasn't as easy to just open up an file and push RUN but i don't think i've used it since!


As Joseph stated, I now need to have the discipline to put these these practices in place.

Carol

=====================
To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD