For people who write code to achieve their tasks, I was wondering if any of you document your work, and if so how you do it. Have you found anything that works for you?
You could argue that SPSS syntax is relatively clear in its syntax, and comments can be included to help out. That’s all true, but I’ll bet I’m not the only one who gets into trouble for the lack of documentation. It’s not so much that I don’t document stuff. I do, and lots of it. But my problem tends to be that when attempting to refresh my memory of how something works after 12 months or more I find my comments are rather more cryptic than I’d like. The comments were probably crystal clear when I wrote them, but after a while quite often they lack certain essential details. English can be so ambiguous too. When I was a student some lecturers advocated pseudocode, but I never liked it. Another academic pushed the notion of Nassi–Shneiderman diagrams, which I loved and I think they were quite well suited to Pascal which I was studying at the time. I think they may be a bit over the top for SPSS. Some of the work I do involves running a sequence of command files that access different raw data sets. Each of the command files do some processing along the way but it all gets to become quite a complex arrangement. To make sense out of it I created a one page flowchart that showed input data, processes (ie. the different command files I use), system files created, and links to show how the output from one process relates to the input of another. I kept things to a minimum so that the flowchart is mostly a visual map of the process. The thing is, while this kind of works, I’m not super happy with it, and wonder that there might be something better that other people are using. How do you document your work? |
Ron
Do you want to do this just for yourself, or for other people trying to make sense of your syntax? I try to keep syntax files in separate folders for each data set, and sometimes, with very old files, also have problems remembering what I was trying to do. As well as comments I also tend to use TITLE and SUBTITLE: for final saved files I have occasionally added a DOCUMENT, but lack the discipline to do this regularly. I don't run to flow charts, but all my tutorials and commentaries use syntax (with some examples and exercise repeated using the GUI): most have explanations of the syntax used, usually in the context of a specific data management or analysis task, albeit at a fairly elementary level (See: http://surveyresearch.weebly.com/summary-guide-to-spss-tutorials.html) http://surveyresearch.weebly.com/teaching-with-survey-data.html has links to various examples including detailed work-throughs listed on http://surveyresearch.weebly.com/ons-national-well-being.html. More examples are listed on http://surveyresearch.weebly.com/british-social-attitudes-1983-to-2014.html. http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/sn_68_commentary_relative_deprivation_and_social_justice.pdf is a blow-by-blow account of my restoration in 2014, from the raw data of a 1962 survey, of someone else's SPSS saved file, last used for teaching in 1975. Remember 80-column (multi-punched cards and Fortran format statements? These may not be what you are looking for. Would it help if documentation were produced in two columns with blocks of syntax in column 1 accompanied by a detailed commentary in column 2? John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Ron0z
I document the hell out of everything.
Everything is run from a central small file which typically has an enumerated outline of basic process at the very top. INSERT FILE <file containing ALL macros for project>. Macro Calls: with description of Input and Output and basic operations of said macro. Within the macro file all macros are documented with a header describing required inputs and resulting outputs as well as side effects to the active file, required input files/datasets and resulting output files/datasets. Within each macro, the basic steps are documented in finer detail if the syntax is insufficient to convey the specifics. Most of my work is confidential (client to client) so I will refrain from posting anything specific from those projects. However I did do extensive work with the late Professor Will Shadish in recent years and the work is publically available here. Note there is an additional user manual in the zip file to supplement the documentation in the actual code. Enjoy: http://faculty.ucmerced.edu/sites/default/files/wshadish/files/dhps_030715.zip I hope this is helpful and will encourage others to be more conscientious about commenting/documenting their work. I have in the past (and likely in the future) assisted in performing mop-up projects where there were several thousand lines of dysfunctional code with nary a line of comment. The client was not initially happy that I took a full day to review the existing code and comment where possible before completely butchering the beast and rewriting the whole damned thing. OTOH, the client was absolutely delighted with the final result and referred others to me after it was sliced/diced and refactored with documentation. Need to break eggs to craft a proper omelet . Better than breaking heads (luckily the hapless incompetent perpetrators are always gone before I get the call).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Ron0z
Hi Ron. I think this is a great question because I'm just starting to plan how to document some of my more intricate (and high-stakes) tasks. One thing I have done (to document what I am doing while I'm planning) is to use COMMENT in my syntax very, very liberally. It helps break up longer (>500 lines) syntax files so that I can find what I'm looking for as well as document, in plain language, the equations I'm programming/calculating.
Kirsten L. Rewey, Ph.D. Quantitative Analyst Minnesota Department of Education 1500 Highway 36 West Roseville, MN 55113 651.582.8638 | [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ron0z Sent: Tuesday, January 10, 2017 6:46 PM To: [hidden email] Subject: How do you document your work? For people who write code to achieve their tasks, I was wondering if any of you document your work, and if so how you do it. Have you found anything that works for you? You could argue that SPSS syntax is relatively clear in its syntax, and comments can be included to help out. That’s all true, but I’ll bet I’m not the only one who gets into trouble for the lack of documentation. It’s not so much that I don’t document stuff. I do, and lots of it. But my problem tends to be that when attempting to refresh my memory of how something works after 12 months or more I find my comments are rather more cryptic than I’d like. The comments were probably crystal clear when I wrote them, but after a while quite often they lack certain essential details. English can be so ambiguous too. When I was a student some lecturers advocated pseudocode, but I never liked it. Another academic pushed the notion of Nassi–Shneiderman diagrams, which I loved and I think they were quite well suited to Pascal which I was studying at the time. I think they may be a bit over the top for SPSS. Some of the work I do involves running a sequence of command files that access different raw data sets. Each of the command files do some processing along the way but it all gets to become quite a complex arrangement. To make sense out of it I created a one page flowchart that showed input data, processes (ie. the different command files I use), system files created, and links to show how the output from one process relates to the input of another. I kept things to a minimum so that the flowchart is mostly a visual map of the process. The thing is, while this kind of works, I’m not super happy with it, and wonder that there might be something better that other people are using. How do you document your work? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-do-you-document-your-work-tp5733664.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Ron0z
As others have said, extensive commenting is the key to the documentation. I would point out a number of other tools that can help with documentation. Also, think not just about documentation but about reproducible results. For data files, document the source and other properties using Utilities > Data File Comments (ADD DOCUMENT). For variables, use custom variable attributes to enrich the metadata (VARIABLE ATTRIBUTES). These can be displayed in the Data Editor Variable View. For COMPUTE commands, note the dialog box option to use the formula as the variable label (which of course you can also do with syntax). Avoid circular overwriting of variables, which makes reconstructing a variable impossible. For output, use the TEXT extension command to describe things of note with the output. Start each session with a SHOW command that indicates any settings that might vary and might affect the results, e.g., Unicode vs code page mode, locale, and other SET choices. Organize data and syntax into project folders, possibly with a master project holding your general tools. Use the STATS OPEN PROJECT extension command to manage and use these projects in a standard way. Of course, break large syntax files into modular chunks. Number or date your syntax versions and archive each main version permanently so that you can go back to earlier versions when needed. Maybe do this with output, too. And, of course, back all this up. Make note of exactly what versions of Statistics and any extension commands were used. You might want to include the journal file in the archive. HTH, Jon On Tue, Jan 10, 2017 at 5:45 PM, Ron0z <[hidden email]> wrote: For people who write code to achieve their tasks, I was wondering if any of |
In reply to this post by Ron0z
Describing what you've done is one thing. Sometimes the problem is What You Have Done.
I'm thinking of examples from computer programming: "spaghetti code" is probably hard to follow and
make sense of, no matter how many Comments. Writing with modules and subroutines gives code that is
easier to document and read; also, it more likely to be correct when you only have to proof-read in one place
instead of 10 times in parallel. (And for "interpreted" languages like SPSS, code with subroutines runs faster.)
Work systematically. I typically had multiple forms for multiple periods. Editing was one for each form in its own file, organized "long" (not "wide") so that an edit or special scoring was done in just one place. When there were a few forms that "worked together", they could be Match-Joined in long-form, for further scoring. The "wide" version thus had minimum content of distinct items that should later be ignored. This is a version of "working with modules" in order to avoid replicating code in separate places. Successive editions of the syntax files and data files were numbered in their names -- When someone wanted a new variable, I might need to back up and re-run several different syntax files, but it was no problem to know which ones. - A datafile for further analysis might /not/ have a new name if it simply added a variable, since it could still be used to replicate previous analyses. It /would/ have a new name if a previous variable had a change that was not just a correction. Document data corrections, too, when they are relevant to output that anyone might have seen. You do not want analyses with unexplained discrepancies. - An implication of this documentation, for big projects, is that corrections to the files actually being analyzed -- which might not include all data that has been collected -- should be added in sets rather than being ad-hoc. Then you can say, for instance, "These are the data collected as of September 30 with the corrections as of October 14." Hope this helps. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Ron0z <[hidden email]>
Sent: Tuesday, January 10, 2017 7:45 PM To: [hidden email] Subject: How do you document your work? For people who write code to achieve their tasks, I was wondering if any of
you document your work, and if so how you do it. Have you found anything that works for you? ... |
Administrator
|
"Successive editions of the syntax files and data files were numbered in their names -- "
YES! In similar vein, I always append the current date to all non final syntax and data files.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Rich Ulrich
more things that help with readable syntax.
(1) borrowing from what in accounting is called referencing. As you go along, have a team member who reads what you have written in syntax and evaluate whether what you have written is clear about what you say you intend to do, and whether the syntax does what you say you intend (2) be sure the variables view in complete and other team members (and or clients) find your names and variables readable. (3) Reserve system missing as a debugging tool for the many instances where the system is unable to do what you say. Clean up system files to convert system missing values to user missing after you figure out why your commands need to be re-drafted. (4) use as many user missing values as the situation calls for. LABEL them. (5) Use auto-indent (6) Use APPLY DICTIONARY when possible. (7) develop habits such as ALL CAPS for procedures, lower case for options, and Initial Caps for variables, and CamelCase for variables if needed (8) I have been after SPSS for some time to add features that enhance readability and documenting. For example, a process that warns when you try to process a variable that does not have labels, a process that sees whether you have varied the casing for a variable, asks which version you want, and cleans up all instances of that variable in the syntax. Tools that were available on DEC FORTRAN in the early 70's. See the archives of this list for "pretty" and "cref" or "Cross Reference" (9) use the options for output listings that include variable names, labesl, and value labels. (10) remember writing of all kinds is improved by drafting and re-drafting.
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |