trouble reading large syntax file

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

trouble reading large syntax file

Merlin Marshall
Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables, but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Jon Peck
You will need a lot of available memory for that.  In fact, if you have string variables with length more than 8 bytes, they count as extra variables and take extra dictionary slots.

If you are running out of memory, you might try creating two sav files each with half the variables and then doing MATCH FILES on the two.

Another trick that might help a lot would  be using external mode to run the syntax, because that eliminates all the memory and cpu overhead of the user interface and, in particular, the Data Editor, which needs a lot of updating when there are so many variables.  I have on occasion seen code run an order of magnitude faster or more in external mode for certain tasks.

To use external mode, you need to use a tiny amount of Python code.  You could do it like this.
Open a command prompt (DOS) window and cd to the Python directory, which is under the Statistics installation directory.
Type this code

import spss
spss.Submit(r"""INSERT FILE="filespec".""")

where filespec is the path to a syntax file to execute.  Note the r before the quote and the use of three " surrounding the command.
ctrl-z will exit the session.

If it produces a lot of output that you don't want, you can write
spss.SetOutput("off")
before the Submit line.
On the other hand, if you want to capture the output, you would wrap the syntax in OMS and OMSEND commands to get a Viewer file or just plain text (less overhead).


BTW, you might consider using zsav rather than sav with large datasets as zsav format compresses much more effectively than sav in most cases.

On Mon, Jul 17, 2017 at 10:20 AM, Merlin Marshall <[hidden email]> wrote:
Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables, but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Mike
In reply to this post by Merlin Marshall
Hi,

It is unclear what you mean by the phrase "smaller data set".
Do you mean:

(1) You use all observations/cases but a subset of the 30,000
variables,
or
(2) You use a subset of cases but all 30,000 variables.

Jon Peck or others may know if there is a more sophisticated
way of cresting the master file of 30k vars & 12K cases
but if situation (1) above is true, why don't you group the
cases into manageable numbers (say 5k cases), and then
do an add files after all of the cases have been saved to
a system data file (about 6 data files).

If (2) is the case, create three system data files with 10k
each (assuming that this can be done without a problem)
but with some unique identifiers in each file (e.g., an ID
number/name, other variables, or create a unique identifier
for each case that appears in each system data file).
Then do a match files on the identifier(s) to create the
master file.

It could be that you can do the whole analysis in one run as
a production job, perhaps with increasing ram (does SPSS
still allow this? I know the mainframe versions did), or making
sure that no other programs are running and unneeded
background programs turned off.  However, this may require
you to be the administrator on the Window PC.

Again, more tech savvy may be able to suggest a more
sophisticated (powerful) method(s) of doing this.
Otherwise, a piecewise approach may get you to your
goal.

To John and other SPSS personnel (past & present):
I assume that one can still run SPSS as a production job
in a "Dos" or system window which should reduce the
amount of RAM SPSS uses for running itself.  I believe
this was true in earlier (i.e., DOS versions) but maybe
I was wrong.  If I am clear in what I am saying, is there
a way of doing this now (i.e., do a command line call
of SPSS with specifications for input file(s), output file(s),
and other "/" specifications?

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Merlin Marshall" <[hidden email]>
To: <[hidden email]>
Sent: Monday, July 17, 2017 12:20 PM
Subject: trouble reading large syntax file


Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am
using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav
file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the
variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it
appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables,
but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Jon Peck
It is possible to run a production job by executing in a DOS box a production job via
stats -production ...  (Search Help > Topics for "command line" for details.)

In production mode, you would need to create an spj file with control information.  If you run from the command line but not in production mode, you get the UI and would not save the UI overhead.

On Mon, Jul 17, 2017 at 11:00 AM, Mike Palij <[hidden email]> wrote:
Hi,

It is unclear what you mean by the phrase "smaller data set".
Do you mean:

(1) You use all observations/cases but a subset of the 30,000
variables,
or
(2) You use a subset of cases but all 30,000 variables.

Jon Peck or others may know if there is a more sophisticated
way of cresting the master file of 30k vars & 12K cases
but if situation (1) above is true, why don't you group the
cases into manageable numbers (say 5k cases), and then
do an add files after all of the cases have been saved to
a system data file (about 6 data files).

If (2) is the case, create three system data files with 10k
each (assuming that this can be done without a problem)
but with some unique identifiers in each file (e.g., an ID
number/name, other variables, or create a unique identifier
for each case that appears in each system data file).
Then do a match files on the identifier(s) to create the
master file.

It could be that you can do the whole analysis in one run as
a production job, perhaps with increasing ram (does SPSS
still allow this? I know the mainframe versions did), or making
sure that no other programs are running and unneeded
background programs turned off.  However, this may require
you to be the administrator on the Window PC.

Again, more tech savvy may be able to suggest a more
sophisticated (powerful) method(s) of doing this.
Otherwise, a piecewise approach may get you to your
goal.

To John and other SPSS personnel (past & present):
I assume that one can still run SPSS as a production job
in a "Dos" or system window which should reduce the
amount of RAM SPSS uses for running itself.  I believe
this was true in earlier (i.e., DOS versions) but maybe
I was wrong.  If I am clear in what I am saying, is there
a way of doing this now (i.e., do a command line call
of SPSS with specifications for input file(s), output file(s),
and other "/" specifications?

-Mike Palij
New York University
[hidden email]


----- Original Message ----- From: "Merlin Marshall" <[hidden email]>
To: <[hidden email]>
Sent: Monday, July 17, 2017 12:20 PM
Subject: trouble reading large syntax file



Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables, but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Merlin Marshall
In reply to this post by Merlin Marshall
Hi Mike,

Sorry I wasn't clear.  By smaller I mean fewer variables.

The syntax code does not contain errors, the problem comes when one tries to get all the variables in one job.

Merlin Marshall

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Merlin Marshall
In reply to this post by Merlin Marshall
Thank you John Peck and everyone else who responded to my question.

In regards to an earlier comment, the file is too large (too many variables) to open in Excel, Stata or R.  It did open in SAS.

Because it won't open in almost all the stats packages we support, we are probably just going to tell people that we can not comply with their request for code to read the entire data set in one go.  Of course it can be read in pieces, and that is how it should be done, but that is not what these users want.

John, I don't know Python, but the support person who is working on this issue does.  I don't know if he tried it or not, he is out of the office, so I can't tell you how it went.

Thank you everyone again,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Mike
In reply to this post by Jon Peck

Jon,
 
As one of the members of the list who used to run SPSS jobs
on Vax and other mainframe systems (I started out with UNIVAC),
I am used to writing syntax in an editor, creating a raw text file
of data (if not included "inline"), and submitting a job at a
system command prompt.  What you say below is somewhat
at variance with my experience because of the use of Python
(not that there is anything wrong with that).  However, looking
at the PDF "IBM SPSS Statistics Batch Facility User's Guide"
shows that the old mainframe functionality can be used without
Python. 
 
Some fiddling with Windows' environment variables (i.e., identifying
where to put temporary files, etc.; see page 12 in the manual
for Ver 23), one can open a "DOS" window (Win 10 really doesn't
have DOS in it, right? Should this be called a "command line window"?).
and at the command prompt enter something like the following:
 
General format:
C:> statisticsb -f syntaxfile -type outputtype -out outputfile
 
Specific example:
C:> statisticsb -f "C:\syntaxjobs\bank.sps" -type text -out C:\output\bank.txt
 
where "statisticsb" invokes the SPSS production program (I
assume that it is a limited front end to the SPSS statistics
software), "-f" identifies the location and the name of the
SPSS syntax file (what happened to "in=" or am I confusing
that switch with BMDP or another program?), "-type"
specifies the output format (NOTE: SPSS's use of pivot
table are a PITA but I believe one can specify such tables
to be "deconstructed" into component parts if old fashioned
text format is used -- might also work with HTML/XML format),
and "-out" identifies where and the name of the output file
(note that one has to provide right file extension).
 
This would seem to me to put minimal demands on
system resources (no need to open SPSS; one can
just have Windows explorer open to access the
output file and make sure that the Data system file
was created).
 
So, Jon, this *should* work, right? And there is no
need for a *. spj file, right?
 
-Mike Palij
New York University
 
 
 
 
----- Original Message -----
Sent: Monday, July 17, 2017 12:58 PM
Subject: Re: trouble reading large syntax file

You will need a lot of available memory for that.  In fact, if you have string variables with length more than 8 bytes, they count as extra variables and take extra dictionary slots.

If you are running out of memory, you might try creating two sav files each with half the variables and then doing MATCH FILES on the two.

Another trick that might help a lot would  be using external mode to run the syntax, because that eliminates all the memory and cpu overhead of the user interface and, in particular, the Data Editor, which needs a lot of updating when there are so many variables.  I have on occasion seen code run an order of magnitude faster or more in external mode for certain tasks.

To use external mode, you need to use a tiny amount of Python code.  You could do it like this.
Open a command prompt (DOS) window and cd to the Python directory, which is under the Statistics installation directory.
Type this code

import spss
spss.Submit(r"""INSERT FILE="filespec".""")

where filespec is the path to a syntax file to execute.  Note the r before the quote and the use of three " surrounding the command.
ctrl-z will exit the session.

If it produces a lot of output that you don't want, you can write
spss.SetOutput("off")
before the Submit line.
On the other hand, if you want to capture the output, you would wrap the syntax in OMS and OMSEND commands to get a Viewer file or just plain text (less overhead).


BTW, you might consider using zsav rather than sav with large datasets as zsav format compresses much more effectively than sav in most cases.

On Mon, Jul 17, 2017 at 10:20 AM, Merlin Marshall <[hidden email]> wrote:
Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables, but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Jon Peck
Mike,

I think statisticsb is only available with Statistics Server, so most people do not have it.

As for Win 10 and the "DOS" window, Microsoft officially calls it the "Command Prompt", and Win 10 certainly does not have old DOS code in it, but it is still commonly referred to as the DOS window, since it works a lot like real DOS from days of yore.

But either with statisticsb or a production job or the Python external-mode approach, the point is the same: less stress on system resources and in some cases much less.

On Wed, Jul 19, 2017 at 11:56 AM, Mike Palij <[hidden email]> wrote:
Jon,
 
As one of the members of the list who used to run SPSS jobs
on Vax and other mainframe systems (I started out with UNIVAC),
I am used to writing syntax in an editor, creating a raw text file
of data (if not included "inline"), and submitting a job at a
system command prompt.  What you say below is somewhat
at variance with my experience because of the use of Python
(not that there is anything wrong with that).  However, looking
at the PDF "IBM SPSS Statistics Batch Facility User's Guide"
shows that the old mainframe functionality can be used without
Python. 
 
Some fiddling with Windows' environment variables (i.e., identifying
where to put temporary files, etc.; see page 12 in the manual
for Ver 23), one can open a "DOS" window (Win 10 really doesn't
have DOS in it, right? Should this be called a "command line window"?).
and at the command prompt enter something like the following:
 
General format:
C:> statisticsb -f syntaxfile -type outputtype -out outputfile
 
Specific example:
C:> statisticsb -f "C:\syntaxjobs\bank.sps" -type text -out C:\output\bank.txt
 
where "statisticsb" invokes the SPSS production program (I
assume that it is a limited front end to the SPSS statistics
software), "-f" identifies the location and the name of the
SPSS syntax file (what happened to "in=" or am I confusing
that switch with BMDP or another program?), "-type"
specifies the output format (NOTE: SPSS's use of pivot
table are a PITA but I believe one can specify such tables
to be "deconstructed" into component parts if old fashioned
text format is used -- might also work with HTML/XML format),
and "-out" identifies where and the name of the output file
(note that one has to provide right file extension).
 
This would seem to me to put minimal demands on
system resources (no need to open SPSS; one can
just have Windows explorer open to access the
output file and make sure that the Data system file
was created).
 
So, Jon, this *should* work, right? And there is no
need for a *. spj file, right?
 
-Mike Palij
New York University
 
 
 
 
----- Original Message -----
Sent: Monday, July 17, 2017 12:58 PM
Subject: Re: trouble reading large syntax file

You will need a lot of available memory for that.  In fact, if you have string variables with length more than 8 bytes, they count as extra variables and take extra dictionary slots.

If you are running out of memory, you might try creating two sav files each with half the variables and then doing MATCH FILES on the two.

Another trick that might help a lot would  be using external mode to run the syntax, because that eliminates all the memory and cpu overhead of the user interface and, in particular, the Data Editor, which needs a lot of updating when there are so many variables.  I have on occasion seen code run an order of magnitude faster or more in external mode for certain tasks.

To use external mode, you need to use a tiny amount of Python code.  You could do it like this.
Open a command prompt (DOS) window and cd to the Python directory, which is under the Statistics installation directory.
Type this code

import spss
spss.Submit(r"""INSERT FILE="filespec".""")

where filespec is the path to a syntax file to execute.  Note the r before the quote and the use of three " surrounding the command.
ctrl-z will exit the session.

If it produces a lot of output that you don't want, you can write
spss.SetOutput("off")
before the Submit line.
On the other hand, if you want to capture the output, you would wrap the syntax in OMS and OMSEND commands to get a Viewer file or just plain text (less overhead).


BTW, you might consider using zsav rather than sav with large datasets as zsav format compresses much more effectively than sav in most cases.

On Mon, Jul 17, 2017 at 10:20 AM, Merlin Marshall <[hidden email]> wrote:
Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables, but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Mike

Jon,
 
Thanks for the feedback but I need to get one point clarified.
By "Statistics Servers" do you mean that SPSS programs are
located on a separate PC server or cloud platform?  That is,
Statistics server is not part of single user versions of SPSS
which would mean that statisticsb is  not available?  If so,
when was this change implemented?  I vaguely remember
being able to do this in older Windows versions of SPSS
(then again, I might be confusing this with BMDP which
used command line submission and didn't really have a
Windows interface until after SPSS bought and sold the
company; I also think such command line submission of
SPSS was possible on OS/2 though the "Windows"
interface was rather nice there).
 
-Mike Palij
New York University
 
----- Original Message -----
Sent: Wednesday, July 19, 2017 2:05 PM
Subject: Re: [SPSSX-L] trouble reading large syntax file

Mike,

I think statisticsb is only available with Statistics Server, so most people do not have it.

As for Win 10 and the "DOS" window, Microsoft officially calls it the "Command Prompt", and Win 10 certainly does not have old DOS code in it, but it is still commonly referred to as the DOS window, since it works a lot like real DOS from days of yore.

But either with statisticsb or a production job or the Python external-mode approach, the point is the same: less stress on system resources and in some cases much less.

On Wed, Jul 19, 2017 at 11:56 AM, Mike Palij <[hidden email]> wrote:
Jon,
 
As one of the members of the list who used to run SPSS jobs
on Vax and other mainframe systems (I started out with UNIVAC),
I am used to writing syntax in an editor, creating a raw text file
of data (if not included "inline"), and submitting a job at a
system command prompt.  What you say below is somewhat
at variance with my experience because of the use of Python
(not that there is anything wrong with that).  However, looking
at the PDF "IBM SPSS Statistics Batch Facility User's Guide"
shows that the old mainframe functionality can be used without
Python. 
 
Some fiddling with Windows' environment variables (i.e., identifying
where to put temporary files, etc.; see page 12 in the manual
for Ver 23), one can open a "DOS" window (Win 10 really doesn't
have DOS in it, right? Should this be called a "command line window"?).
and at the command prompt enter something like the following:
 
General format:
C:> statisticsb -f syntaxfile -type outputtype -out outputfile
 
Specific example:
C:> statisticsb -f "C:\syntaxjobs\bank.sps" -type text -out C:\output\bank.txt
 
where "statisticsb" invokes the SPSS production program (I
assume that it is a limited front end to the SPSS statistics
software), "-f" identifies the location and the name of the
SPSS syntax file (what happened to "in=" or am I confusing
that switch with BMDP or another program?), "-type"
specifies the output format (NOTE: SPSS's use of pivot
table are a PITA but I believe one can specify such tables
to be "deconstructed" into component parts if old fashioned
text format is used -- might also work with HTML/XML format),
and "-out" identifies where and the name of the output file
(note that one has to provide right file extension).
 
This would seem to me to put minimal demands on
system resources (no need to open SPSS; one can
just have Windows explorer open to access the
output file and make sure that the Data system file
was created).
 
So, Jon, this *should* work, right? And there is no
need for a *. spj file, right?
 
-Mike Palij
New York University
 
 
 
 
----- Original Message -----
Sent: Monday, July 17, 2017 12:58 PM
Subject: Re: trouble reading large syntax file

You will need a lot of available memory for that.  In fact, if you have string variables with length more than 8 bytes, they count as extra variables and take extra dictionary slots.

If you are running out of memory, you might try creating two sav files each with half the variables and then doing MATCH FILES on the two.

Another trick that might help a lot would  be using external mode to run the syntax, because that eliminates all the memory and cpu overhead of the user interface and, in particular, the Data Editor, which needs a lot of updating when there are so many variables.  I have on occasion seen code run an order of magnitude faster or more in external mode for certain tasks.

To use external mode, you need to use a tiny amount of Python code.  You could do it like this.
Open a command prompt (DOS) window and cd to the Python directory, which is under the Statistics installation directory.
Type this code

import spss
spss.Submit(r"""INSERT FILE="filespec".""")

where filespec is the path to a syntax file to execute.  Note the r before the quote and the use of three " surrounding the command.
ctrl-z will exit the session.

If it produces a lot of output that you don't want, you can write
spss.SetOutput("off")
before the Submit line.
On the other hand, if you want to capture the output, you would wrap the syntax in OMS and OMSEND commands to get a Viewer file or just plain text (less overhead).


BTW, you might consider using zsav rather than sav with large datasets as zsav format compresses much more effectively than sav in most cases.

On Mon, Jul 17, 2017 at 10:20 AM, Merlin Marshall <[hidden email]> wrote:
Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables, but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: trouble reading large syntax file

Jon Peck
SPSS Statistics Server is a different product from the SPSS Client that most people have.  The Server handles multiple users and consists of a central server program that clients connect to which spawns "slave" processes for each connection.  The server would generally be running on a remote system, so long jobs do not tie up the desktop.  It has some features not available in the Client system.  One distinction that used to exist - Client was limited in the number of processors it could use - was removed in V24.

Here is an extract from the Batch Facility Users Guide.  Statisticsb is part of this.

The IBM® SPSS® Statistics Batch Facility is a batch processing utility that is included with the IBM SPSS
Statistics Server product. This guide describes the Batch Facility and how to use it.
Introduction to the Batch Facility
IBM SPSS Statistics Server is client/server based. It distributes client requests for resource-intensive
operations to powerful server software. Typically, the client for IBM SPSS Statistics Server is a version of
IBM SPSS Statistics running on a desktop computer. The Batch Facility is an alternative way to use the
power of IBM SPSS Statistics Server, and it runs on the server computer.
----------
The Production Facility on the client machine can run jobs on Statistics Server and monitor them and fetch the output.

This has been the case for quite a few releases - maybe back to SPSS version 10, but I don't recall exactly.

On Wed, Jul 19, 2017 at 12:23 PM, Mike Palij <[hidden email]> wrote:
Jon,
 
Thanks for the feedback but I need to get one point clarified.
By "Statistics Servers" do you mean that SPSS programs are
located on a separate PC server or cloud platform?  That is,
Statistics server is not part of single user versions of SPSS
which would mean that statisticsb is  not available?  If so,
when was this change implemented?  I vaguely remember
being able to do this in older Windows versions of SPSS
(then again, I might be confusing this with BMDP which
used command line submission and didn't really have a
Windows interface until after SPSS bought and sold the
company; I also think such command line submission of
SPSS was possible on OS/2 though the "Windows"
interface was rather nice there).
 
-Mike Palij
New York University
 
----- Original Message -----
Sent: Wednesday, July 19, 2017 2:05 PM
Subject: Re: [SPSSX-L] trouble reading large syntax file

Mike,

I think statisticsb is only available with Statistics Server, so most people do not have it.

As for Win 10 and the "DOS" window, Microsoft officially calls it the "Command Prompt", and Win 10 certainly does not have old DOS code in it, but it is still commonly referred to as the DOS window, since it works a lot like real DOS from days of yore.

But either with statisticsb or a production job or the Python external-mode approach, the point is the same: less stress on system resources and in some cases much less.

On Wed, Jul 19, 2017 at 11:56 AM, Mike Palij <[hidden email]> wrote:
Jon,
 
As one of the members of the list who used to run SPSS jobs
on Vax and other mainframe systems (I started out with UNIVAC),
I am used to writing syntax in an editor, creating a raw text file
of data (if not included "inline"), and submitting a job at a
system command prompt.  What you say below is somewhat
at variance with my experience because of the use of Python
(not that there is anything wrong with that).  However, looking
at the PDF "IBM SPSS Statistics Batch Facility User's Guide"
shows that the old mainframe functionality can be used without
Python. 
 
Some fiddling with Windows' environment variables (i.e., identifying
where to put temporary files, etc.; see page 12 in the manual
for Ver 23), one can open a "DOS" window (Win 10 really doesn't
have DOS in it, right? Should this be called a "command line window"?).
and at the command prompt enter something like the following:
 
General format:
C:> statisticsb -f syntaxfile -type outputtype -out outputfile
 
Specific example:
C:> statisticsb -f "C:\syntaxjobs\bank.sps" -type text -out C:\output\bank.txt
 
where "statisticsb" invokes the SPSS production program (I
assume that it is a limited front end to the SPSS statistics
software), "-f" identifies the location and the name of the
SPSS syntax file (what happened to "in=" or am I confusing
that switch with BMDP or another program?), "-type"
specifies the output format (NOTE: SPSS's use of pivot
table are a PITA but I believe one can specify such tables
to be "deconstructed" into component parts if old fashioned
text format is used -- might also work with HTML/XML format),
and "-out" identifies where and the name of the output file
(note that one has to provide right file extension).
 
This would seem to me to put minimal demands on
system resources (no need to open SPSS; one can
just have Windows explorer open to access the
output file and make sure that the Data system file
was created).
 
So, Jon, this *should* work, right? And there is no
need for a *. spj file, right?
 
-Mike Palij
New York University
 
 
 
 
----- Original Message -----
Sent: Monday, July 17, 2017 12:58 PM
Subject: Re: trouble reading large syntax file

You will need a lot of available memory for that.  In fact, if you have string variables with length more than 8 bytes, they count as extra variables and take extra dictionary slots.

If you are running out of memory, you might try creating two sav files each with half the variables and then doing MATCH FILES on the two.

Another trick that might help a lot would  be using external mode to run the syntax, because that eliminates all the memory and cpu overhead of the user interface and, in particular, the Data Editor, which needs a lot of updating when there are so many variables.  I have on occasion seen code run an order of magnitude faster or more in external mode for certain tasks.

To use external mode, you need to use a tiny amount of Python code.  You could do it like this.
Open a command prompt (DOS) window and cd to the Python directory, which is under the Statistics installation directory.
Type this code

import spss
spss.Submit(r"""INSERT FILE="filespec".""")

where filespec is the path to a syntax file to execute.  Note the r before the quote and the use of three " surrounding the command.
ctrl-z will exit the session.

If it produces a lot of output that you don't want, you can write
spss.SetOutput("off")
before the Submit line.
On the other hand, if you want to capture the output, you would wrap the syntax in OMS and OMSEND commands to get a Viewer file or just plain text (less overhead).


BTW, you might consider using zsav rather than sav with large datasets as zsav format compresses much more effectively than sav in most cases.

On Mon, Jul 17, 2017 at 10:20 AM, Merlin Marshall <[hidden email]> wrote:
Hi all,

I have a data file with 30,000 variables and 12,000 observations.  I am using SPSS24 and Windows7.
Variables are a mix of character, numeric, date, time.

I have a very large .dat file and the syntax to turn this into a .sav file.  The syntax runs on a smaller data set.

When I try to run a syntax file to create a .sav file of all the variables, SPSS becomes very slow and or stops working.

Is there a way to improve performance with this file?  Currently it appears to take more than 6gb of ram for the syntax to try to run.

Yes, I know it is stupid to want a data file with that many variables, but that is what some of our clients want.

Thank you,

Merlin Marshall
Center for Human Resource Research
The Ohio State University
Columbus Ohio

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD