Occasional failure in execution of production jobs run from batch (and workarounds)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Occasional failure in execution of production jobs run from batch (and workarounds)

Simon Phillip Freidin

Over the last 7 years I’ve found that in a sequence of 300+ production jobs run from bat files (Windows), occasionally and randomly a production job will run normally, but the SPSS program that should have been called has not actually run. The 300+ programs are the data management system for a 14+ year longitudinal population survey, and any program not being run in the sequence leads to faulty data.

 

IBM has not been able to replicate the symptoms of my "production job starts normally, ends normally, but *.sps hadn’t actually run" with the test suite I supplied, which leads them to believe this behaviour is peculiar to my computing environment. I suspect this problem only affects very complicated SPSS batch systems and will only be detected if the system is run repeatedly (eg in iterative production calls). I’ll document the symptoms and workarounds here in case they can help someone else.

 

What I’ve observed is that SPSS copies my source *.sps  program to the temp directory (%tmp%\spssDDDD\prognameDDDDDDDDDDDDDD.sps, D is a digit) and when SPSS attempts to read this temp copy, very occasionally (1 in 1000 production jobs) I see the one of the following in the output:

1. The output contains an explicit SPSS error (# 1216) which says the temporary *.sps file could not be opened; or

2. The output is (almost) empty indicating the temporary copy of the *.sps program was also empty.

 

1. Error # 1216:

>INSERT  FILE='C:\Users\username\AppData\Local\Temp\spss5760\03-6926761992762325371.sps' SYNTAX=BATCH  ERROR=CONTINUE.

>“Error # 1216.  Command name: INSERT. A file cannot be opened.  Probable causes are an attempt to open a read-only file for output,

>a directory which is too full, an invalid file specification, the specification of a non-existent disk, etc.  Use an asterisk (*) on the OUTFILE

>command to specify the active dataset or specify an external file name or file handle. Execution of this command stops. The error

>involves file C:\Users\username\AppData\Local\Temp\spss5760\03-6926761992762325371.sps”

 

My workaround for symptom 1 was to modify the batch programs to check the (utf-8, text) job output for error number 1216 (using find) and loop the batch program to a preceding point where it resubmits the production job if the find result is true. For example

 

:03-label

stats.exe  "C:\Users\username\My Documents\m\prog\03-.spj"   -production

find "Error # 1216" "c:\temp\m03.txt" && ( goto 03-label )

 

2. The production job generates an almost empty output file.

 

The output contains the insert call of the temporary copy, but not the output the program would generate.

 

>INSERT FILE='C:\Users\username\AppData\Local\Temp\spss5592\03-1963052610352749677.sps' SYNTAX=BATCH ERROR=CONTINUE.

>  26  0 

>  27  0  * End of INSERT and INCLUDE nesting level 01. /* last line of log here, but normally generates 500+ lines */

 

I’m guessing that the temporary file has been created but is empty. My workaround for symptom 2 was to change all the multi-line SPSS programs that the production jobs calls to one-line INSERT programs where the insert command points to my actual program. These one-line INSERT programs don’t show the symptom 2 behaviour.

 

Patching these 2 problems made the whole production job sequence return to running flawlessly.

 

Regards

Simon

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Occasional failure in execution of production jobs run from batch (and workarounds)

Art Kendall
Some  SWAGs.
Are you sure that you empty the temp directory before using it?

For debugging do you check disk space availability?

Is temp ram or disk?

do you close datasets that you are finished with?

Are other users using the temp device?
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Occasional failure in execution of production jobs run from batch (and workarounds)

Simon Phillip Freidin
In reply to this post by Simon Phillip Freidin

Hi Art

 

Interesting guesses.

 

I run the following bat script at the start of the run to clean the temp directory

 

del /f /q /s %temp%\*

rmdir /s /q "%temp%\.com_ibm_tools_attach"

for /d %%d in (%temp%\spss*.) do rmdir /s /q "%%d"

for /d %%d in (%temp%\jaws*.) do rmdir /s /q "%%d"

 

There is plenty (300 GB) of free space on the HDD.

 

Temp is disk, might try adding a ram disk sometime, any recommended software?.

 

There is a "dataset close all" at the end of each program.

 

No other users of temp.

 

Regards

Simon

 

 

 

From: Simon Freidin
Sent: Thursday, 28 August 2014 4:14 PM
To: Simon Freidin
Subject: Re: Occasional failure in execution of production jobs run from batch (and workarounds)

 

Some  SWAGs.

Are you sure that you empty the temp directory before using it?

 

For debugging do you check disk space availability?

 

Is temp ram or disk?

 

do you close datasets that you are finished with?

 

Are other users using the temp device?

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Occasional failure in execution of production jobs run from batch (and workarounds)

Art Kendall
You have a workaround but if you feel a need to find out "why" perhaps it would help in debugging your process to do a host command to do a DOS dir before calling the file that has the problem.

if it becomes critical you might hire someone like David Marso to help you debug your process.
Art Kendall
Social Research Consultants