SPSSX Discussion

set cache reasonable values

Classic

List

Threaded

12 messages Options

Brian Moore-3

set cache reasonable values

Hi all-

I *think* I need to increase my cache, but wasn't able to find anything
in the documentation about the default value or what a good range might
be to increase to or toward.

My file is roughly 1 million cases & 180 variables, and the former will
be increasing over time.

Thanks,
Brian

PS- below syntax was run with Insert File (which generates the line
numbers to the left of the code in the output)

211 * Identify Duplicate Cases.
212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).
213 MATCH FILES /FILE = * /BY cust
214 /FIRST = PFirstOrderDate /LAST = PrimaryLast.
215 DO IF (PFirstOrderDate).
216 COMPUTE MatchSequence = 1 - PrimaryLast.
217 ELSE.
218 COMPUTE MatchSequence = MatchSequence + 1.
219 END IF.
220 LEAVE MatchSequence.
221 FORMAT MatchSequence (f7).
222 COMPUTE InDupGrp = MatchSequence > 0.
223 SORT CASES InDupGrp(D).

>Error. Command name: SORT CASES
>File write error: file name
C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left on
device (DATA1002)
>This command not executed.

>Error # 5817. Command name: SORT CASES
>The SPSS file sort has failed. The file remains unsorted. The
specific
>problem is printed below.

>Error # 5822. Command name: SORT CASES
>I/O error writing to the sort scratch file.

224 MATCH FILES /FILE = * /DROP = PrimaryLast InDupGrp MatchSequence.
225 VARIABLE LABELS PFirstOrderDate 'Indicator of each first matching
case as
Primary' .
226 VALUE LABELS PFirstOrderDate 0 'Duplicate Case' 1 'Primary Case'.
227 VARIABLE LEVEL PFirstOrderDate (ORDINAL).
228 FREQUENCIES VARIABLES = PFirstOrderDate .

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Barnett, Adrian (DECS)

Re: set cache reasonable values

Hi Brian
My guess is you are out of room on the drive SPSS uses for its scratch
files . On a standalone installation it creates a bunch of directories
in c:\my documents\local settings\temp for its scratch files. Ff your
sort falls over, odds are SPSS will not have cleaned up its old scratch
files. The directory above tends to fill up with all sorts of rubbish
that things like IE and the operating system itself leave around and
rarely clean up, so you may find you can get back lots more room by
fossicking around in there.

Clean up as much space as you can - you will need several times the size
of the file you are sorting.

In my experience with version up to 15 (haven't got 16 so I can't say)
SET CACHE makes little difference no matter how much physical memory you
have. Big sorts just thrash the disk and take forever.

Regards

Adrian Barnett

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Brian Moore
Sent: Tuesday, 27 November 2007 2:07 AM
To: [hidden email]
Subject: set cache reasonable values

Hi all-

I *think* I need to increase my cache, but wasn't able to find anything
in the documentation about the default value or what a good range might
be to increase to or toward.

My file is roughly 1 million cases & 180 variables, and the former will
be increasing over time.

Thanks,
Brian

PS- below syntax was run with Insert File (which generates the line
numbers to the left of the code in the output)

211 * Identify Duplicate Cases.
212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).
213 MATCH FILES /FILE = * /BY cust
214 /FIRST = PFirstOrderDate /LAST = PrimaryLast.
215 DO IF (PFirstOrderDate).
216 COMPUTE MatchSequence = 1 - PrimaryLast.
217 ELSE.
218 COMPUTE MatchSequence = MatchSequence + 1.
219 END IF.
220 LEAVE MatchSequence.
221 FORMAT MatchSequence (f7).
222 COMPUTE InDupGrp = MatchSequence > 0.
223 SORT CASES InDupGrp(D).

>Error. Command name: SORT CASES
>File write error: file name
C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left on
device (DATA1002)
>This command not executed.

>Error # 5817. Command name: SORT CASES
>The SPSS file sort has failed. The file remains unsorted. The
specific
>problem is printed below.

>Error # 5822. Command name: SORT CASES
>I/O error writing to the sort scratch file.

224 MATCH FILES /FILE = * /DROP = PrimaryLast InDupGrp MatchSequence.
225 VARIABLE LABELS PFirstOrderDate 'Indicator of each first matching
case as
Primary' .
226 VALUE LABELS PFirstOrderDate 0 'Duplicate Case' 1 'Primary Case'.
227 VARIABLE LEVEL PFirstOrderDate (ORDINAL).
228 FREQUENCIES VARIABLES = PFirstOrderDate .

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: set cache reasonable values

In reply to this post by Brian Moore-3

At 10:36 AM 11/26/2007, Brian Moore wrote:

>I *think* I need to increase my cache, but wasn't able to find
>anything in the documentation about the default value or what a good
>range might be to increase to or toward.
>
>My file is roughly 1 million cases & 180 variables, and the former
>will be increasing over time.

Here's the problem you cite. I'm quoting the error messages in full:

> 222 COMPUTE InDupGrp = MatchSequence > 0.
> 223 SORT CASES InDupGrp(D).
>
>>Error. Command name: SORT CASES
>>File write error: file name
>>C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left
>>on device (DATA1002)
>>This command not executed.
>
>>Error # 5817. Command name: SORT CASES
>>The SPSS file sort has failed. The file remains unsorted. The
>>specific problem is printed below.
>
>>Error # 5822. Command name: SORT CASES
>>I/O error writing to the sort scratch file.

As Adrian Barnett, said, on the face of it, this is simple: your
working disk filled, during the sort. That means you need more free
disk space, not a change in any SPSS setting. The usual estimate is
you should have free space 2-3 times the file size, to run a sort well.

HOWEVER, in your case there are probably ways around, besides adding
more disk space.

In the first place, the code you posted following this sort command,
doesn't rely on the data being sorted by 'InDupGrp':

> 224 MATCH FILES /FILE = *
> /DROP = PrimaryLast InDupGrp MatchSequence.
> 225 VARIABLE LABELS
> PFirstOrderDate
> 'Indicator of each first matching case as Primary' .
> 226 VALUE LABELS PFirstOrderDate
> 0 'Duplicate Case'
> 1 'Primary Case'.
> 227 VARIABLE LEVEL PFirstOrderDate (ORDINAL).
> 228 FREQUENCIES VARIABLES = PFirstOrderDate .

I suppose you do need the sort later for something, though.

Second, if you do want to sort by a binary variable, it's probably
faster to split the file and merge. It should take storage for ONE
copy of your file, and should be reasonably fast. Like this, assuming
that InDupYes and InDupNo are file names or file handles (NOT dataset
names) that can be used for the scratch files. Not tested:

DO IF InDupGrp EQ 1.
. XSAVE OUTFILE=InDupYes.
ELSE.
. XSAVE OUTFILE=InDupNo.
END IF.
EXECUTE.

ADD FILES
/FILE=InDupYes
/FILE=InDupNo.

By the way, I see that earlier in the job, command

> 212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).

ran successfully, on the same file. It may have been happenstance,
with the scratch files just making it the first time; but it's more
likely that, if the file was already sorted by at least 'cust', SPSS
used an adaptive-sort algorithm that took advantage of the file's
already being nearly in order.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Gregory Hildebrandt

Re: set cache reasonable values

Richard,

I have talked to the SPSS Help Desk about this issue. Also, I deal with
datafiles approximately the size you are describing.

Here are some of the suggestion, which I've passed on before.

1. Set Workspace (Total RAM memory) = ,say, 300M for Windows. As I have
2GB of RAM, and a 93GB hard drive on a laptop, with about 22GB free, I Set
Workpace = 300000 (the units are thousands). However, there havef been
times when I apparently Set Workspace too large. If one employs the command
Show Workspace., one can find out what I assume is the default value.
Another individual at the SPSS Help desk suggested that on 30M might be a
more appropriate.size for Set Workspace.

2. Under "File," select "Cache Data" before running the procedure.

3. Go to Edit >> Options, and determine the location of the temporary
directory. Delete the SPSS files in this directory. In fact, it has been
suggested that I create a new directory for the temporary files, that it is
dedicated to SPSS. This step can be quite important if one hasn't cleared
the temporary files recently..

4. If you are running a Sort command, first do Transform >> Automatic
Recode for the String variables, and run the procedure using the Recoded
variables. Numeric variables are easier to sort.

After doing all this, if I still have problems I reboot the computer.
Frequently, this seems to work best. There must be some way of emptying out
the entire cache on the computer (the temporary storage I only dimly
understand), but no one has ever been able to tell me how to do this.

Greg

On 11/26/07, Richard Ristow <[hidden email]> wrote:

>
> At 10:36 AM 11/26/2007, Brian Moore wrote:
>
> >I *think* I need to increase my cache, but wasn't able to find
> >anything in the documentation about the default value or what a good
> >range might be to increase to or toward.
> >
> >My file is roughly 1 million cases & 180 variables, and the former
> >will be increasing over time.
>
> Here's the problem you cite. I'm quoting the error messages in full:
>
> > 222 COMPUTE InDupGrp = MatchSequence > 0.
> > 223 SORT CASES InDupGrp(D).
> >
> >>Error. Command name: SORT CASES
> >>File write error: file name
> >>C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left
> >>on device (DATA1002)
> >>This command not executed.
> >
> >>Error # 5817. Command name: SORT CASES
> >>The SPSS file sort has failed. The file remains unsorted. The
> >>specific problem is printed below.
> >
> >>Error # 5822. Command name: SORT CASES
> >>I/O error writing to the sort scratch file.
>
> As Adrian Barnett, said, on the face of it, this is simple: your
> working disk filled, during the sort. That means you need more free
> disk space, not a change in any SPSS setting. The usual estimate is
> you should have free space 2-3 times the file size, to run a sort well.
>
> HOWEVER, in your case there are probably ways around, besides adding
> more disk space.
>
> In the first place, the code you posted following this sort command,
> doesn't rely on the data being sorted by 'InDupGrp':
>
> > 224 MATCH FILES /FILE = *
> > /DROP = PrimaryLast InDupGrp MatchSequence.
> > 225 VARIABLE LABELS
> > PFirstOrderDate
> > 'Indicator of each first matching case as Primary' .
> > 226 VALUE LABELS PFirstOrderDate
> > 0 'Duplicate Case'
> > 1 'Primary Case'.
> > 227 VARIABLE LEVEL PFirstOrderDate (ORDINAL).
> > 228 FREQUENCIES VARIABLES = PFirstOrderDate .
>
> I suppose you do need the sort later for something, though.
>
> Second, if you do want to sort by a binary variable, it's probably
> faster to split the file and merge. It should take storage for ONE
> copy of your file, and should be reasonably fast. Like this, assuming
> that InDupYes and InDupNo are file names or file handles (NOT dataset
> names) that can be used for the scratch files. Not tested:
>
> DO IF InDupGrp EQ 1.
> . XSAVE OUTFILE=InDupYes.
> ELSE.
> . XSAVE OUTFILE=InDupNo.
> END IF.
> EXECUTE.
>
> ADD FILES
> /FILE=InDupYes
> /FILE=InDupNo.
>
> By the way, I see that earlier in the job, command
>
> > 212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).
>
> ran successfully, on the same file. It may have been happenstance,
> with the scratch files just making it the first time; but it's more
> likely that, if the file was already sorted by at least 'cust', SPSS
> used an adaptive-sort algorithm that took advantage of the file's
> already being nearly in order.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Brian Moore-3

Re: set cache reasonable values

Thanks to everyone for insights on this issue.

I have succeeded in cleaning out the cache, but still getting warnings.

Apparently this is the maximum workspace (& from the warning <end of
message> appears to be a software determined maximum, not anything to do
with computing power) and I'm still getting the error.

Other specs that may matter:
-using version 15 (but are looking at upgrading to 16)
-File size is 420 MB
-& I have ~20 gigs total free.

Any other ideas?
Thanks,
Brian

extracts from the output file appear below.

>The parameter of the WORKSPACE subcommand is in terms of kilobytes
(KB).
>It must be at least 6148 and not greater than 2097151.

set workspace 2097151.
show workspace.

System Settings
Keyword Description Setting
WORKSPACE Special workspace memory limit in kilobytes 2097151

>Warning # 44
>The operating system could not allocate a memory segment of the size
>requested by SPSS. Therefore, SPSS can only use the largest block
>available in memory already allocated. Use SHOW WORKSPACE to determine
the
>size of the request and SET WORKSPACE to change it.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gregory Hildebrandt
Sent: Tuesday, November 27, 2007 10:42 AM
To: [hidden email]
Subject: Re: set cache reasonable values

Richard,

I have talked to the SPSS Help Desk about this issue. Also, I deal with
datafiles approximately the size you are describing.

Here are some of the suggestion, which I've passed on before.

1. Set Workspace (Total RAM memory) = ,say, 300M for Windows. As I
have 2GB of RAM, and a 93GB hard drive on a laptop, with about 22GB
free, I Set Workpace = 300000 (the units are thousands). However, there
havef been times when I apparently Set Workspace too large. If one
employs the command Show Workspace., one can find out what I assume is
the default value.
Another individual at the SPSS Help desk suggested that on 30M might be
a more appropriate.size for Set Workspace.

2. Under "File," select "Cache Data" before running the procedure.

3. Go to Edit >> Options, and determine the location of the temporary
directory. Delete the SPSS files in this directory. In fact, it has
been suggested that I create a new directory for the temporary files,
that it is dedicated to SPSS. This step can be quite important if one
hasn't cleared the temporary files recently..

4. If you are running a Sort command, first do Transform >> Automatic
Recode for the String variables, and run the procedure using the Recoded
variables. Numeric variables are easier to sort.

After doing all this, if I still have problems I reboot the computer.
Frequently, this seems to work best. There must be some way of emptying
out the entire cache on the computer (the temporary storage I only dimly
understand), but no one has ever been able to tell me how to do this.

Greg

On 11/26/07, Richard Ristow <[hidden email]> wrote:

> >>device (DATA1002) This command not executed.
> >
> >>Error # 5817. Command name: SORT CASES The SPSS file sort has
> >>failed. The file remains unsorted. The specific problem is printed

> >>below.
> >
> >>Error # 5822. Command name: SORT CASES I/O error writing to the
> >>sort scratch file.
>
> As Adrian Barnett, said, on the face of it, this is simple: your
> working disk filled, during the sort. That means you need more free
> disk space, not a change in any SPSS setting. The usual estimate is
> you should have free space 2-3 times the file size, to run a sort
well.

>
> HOWEVER, in your case there are probably ways around, besides adding
> more disk space.
>
> In the first place, the code you posted following this sort command,
> doesn't rely on the data being sorted by 'InDupGrp':
>
> > 224 MATCH FILES /FILE = *
> > /DROP = PrimaryLast InDupGrp MatchSequence.
> > 225 VARIABLE LABELS
> > PFirstOrderDate
> > 'Indicator of each first matching case as Primary' .
> > 226 VALUE LABELS PFirstOrderDate
> > 0 'Duplicate Case'
> > 1 'Primary Case'.
> > 227 VARIABLE LEVEL PFirstOrderDate (ORDINAL).
> > 228 FREQUENCIES VARIABLES = PFirstOrderDate .
>
> I suppose you do need the sort later for something, though.
>
> Second, if you do want to sort by a binary variable, it's probably
> faster to split the file and merge. It should take storage for ONE
> copy of your file, and should be reasonably fast. Like this, assuming
> that InDupYes and InDupNo are file names or file handles (NOT dataset
> names) that can be used for the scratch files. Not tested:
>
> DO IF InDupGrp EQ 1.
> . XSAVE OUTFILE=InDupYes.
> ELSE.
> . XSAVE OUTFILE=InDupNo.
> END IF.
> EXECUTE.
>
> ADD FILES
> /FILE=InDupYes
> /FILE=InDupNo.
>
> By the way, I see that earlier in the job, command
>
> > 212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).
>
> ran successfully, on the same file. It may have been happenstance,
> with the scratch files just making it the first time; but it's more
> likely that, if the file was already sorted by at least 'cust', SPSS
> used an adaptive-sort algorithm that took advantage of the file's
> already being nearly in order.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a

> list of commands to manage subscriptions, send the command INFO
> REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: set cache reasonable values

At 02:28 PM 11/28/2007, Brian Moore wrote:

>Thanks to everyone for insights on this issue.
>
>I have succeeded in cleaning out the cache, but still getting
>warnings. Apparently [2097151 bytes] is a software determined
>maximum [and cannot always be reached]:
>
>>Warning # 44
>>The operating system could not allocate a memory segment of the
>>size requested by SPSS.
>
>Other specs that may matter:
>-using version 15 (but are looking at upgrading to 16)
>-File size is 420 MB
>-& I have ~20 gigs total free.

Well, that appears to rule out an *intrinsic* problem with disk
space, although you previously got a message that SPSS couldn't get
the disk space it *thought* it needed:

>>Error. Command name: SORT CASES
>>File write error: file name
>>C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left
>>on device (DATA1002) This command not executed.

Goodness knows how SPSS got there, but I doubt we can diagnose that,
certainly not fix it.

>Any other ideas?

Well, the command sequence that blew up before was

>> 222 COMPUTE InDupGrp = MatchSequence > 0.
>> 223 SORT CASES InDupGrp(D).

Assuming that's still the case,

(a) Very wild chance: Do you need to do it at all? The code following
the sort, in your original posting, didn't appear to rely on the
data's being sorted by 'InDupGrp'. But I doubt this'll do it;
presumably, you do need the sort, otherwise.

(b) Still relying on your original posting: It struck me that,
earlier in your code, the command

>> 212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).

apparently worked, on the same file.

Sorting on the binary variable 'InDupGrp' gives a huge number of ties
on the sort key, and I wonder whether that gives the sorting
algorithm some trouble. (Yes, I can give a good argument why it
shouldn't.) I'd try appending the previous key sequence:

COMPUTE InDupGrp = MatchSequence > 0.
SORT CASES BY InDupGrp(D)
cust(A) Order_Date_Overall(A) ProductType(D).

(c) Have you tried the work-around I suggested? That is,

DO IF InDupGrp EQ 1.
. XSAVE OUTFILE=InDupYes.
ELSE.
. XSAVE OUTFILE=InDupNo.
END IF.
EXECUTE.

ADD FILES
/FILE=InDupYes
/FILE=InDupNo.

(Here, InDupYes and InDupNo are file names or file handles - NOT dataset
names - for scratch files; and the code's still not tested.)
....................
Apologies for any crucial points I've missed. But, well, 'any ideas' this is.

-Best of luck and best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: set cache reasonable values

" Apparently [2097151 bytes] is a software determined
>maximum [and cannot always be reached]:"

Well, actually no. That is 2GB (it's measure in K), and that is the maximum address space possible for any 32-bit Windows application. That is doubtless too big for reasonable use.

This doesn't explain the original problem, but trying to make the workspace that big is probably not the solution.

-Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Wednesday, November 28, 2007 1:40 PM
To: [hidden email]
Subject: Re: [SPSSX-L] set cache reasonable values

At 02:28 PM 11/28/2007, Brian Moore wrote:

Art Kendall-2

Re: set cache reasonable values

In reply to this post by Brian Moore-3

Are you sure you are distinguishing CACHE and SCRATCH?
In the SPSS context:
CACHE usually refers to memory (RAM) or storage(DISK) used for
buffering input and output especially useful when dealing with storage
(DISK) on another machine (e.g., a server). SPSS allows you to specify
how you want to use the Operating System function.

WORKSPACE usually refers to memory (RAM) used by a program for arrays
and instructions. SPSS allows you to adjust this

SCRATCH usually refers to storage (DISK) used temporarily to hold data.
You can specify what storage you want SPSS to use for SCRATCH.

If you share the storage you may have a quota (limit), or you may be
filling up the device or partition you are using for scratch.
Sometimes people use the term cache loosely in specifying file locations.

Art Kendall
Social Research Consultants

Brian Moore wrote:

> Thanks to everyone for insights on this issue.
>
> I have succeeded in cleaning out the cache, but still getting warnings.
>
> Apparently this is the maximum workspace (& from the warning <end of
> message> appears to be a software determined maximum, not anything to do
> with computing power) and I'm still getting the error.
>
> Other specs that may matter:
> -using version 15 (but are looking at upgrading to 16)
> -File size is 420 MB
> -& I have ~20 gigs total free.
>
> Any other ideas?
> Thanks,
> Brian
>
> extracts from the output file appear below.
>
>
>> The parameter of the WORKSPACE subcommand is in terms of kilobytes
>>
> (KB).
>
>> It must be at least 6148 and not greater than 2097151.
>>
>
> set workspace 2097151.
> show workspace.
>
> System Settings
> Keyword Description Setting
> WORKSPACE Special workspace memory limit in kilobytes 2097151
>
>
>> Warning # 44
>> The operating system could not allocate a memory segment of the size
>> requested by SPSS. Therefore, SPSS can only use the largest block
>> available in memory already allocated. Use SHOW WORKSPACE to determine
>>
> the
>
>> size of the request and SET WORKSPACE to change it.
>>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Gregory Hildebrandt
> Sent: Tuesday, November 27, 2007 10:42 AM
> To: [hidden email]
> Subject: Re: set cache reasonable values
>
> Richard,
>
> I have talked to the SPSS Help Desk about this issue. Also, I deal with
> datafiles approximately the size you are describing.
>
> Here are some of the suggestion, which I've passed on before.
>
> 1. Set Workspace (Total RAM memory) = ,say, 300M for Windows. As I
> have 2GB of RAM, and a 93GB hard drive on a laptop, with about 22GB
> free, I Set Workpace = 300000 (the units are thousands). However, there
> havef been times when I apparently Set Workspace too large. If one
> employs the command Show Workspace., one can find out what I assume is
> the default value.
> Another individual at the SPSS Help desk suggested that on 30M might be
> a more appropriate.size for Set Workspace.
>
> 2. Under "File," select "Cache Data" before running the procedure.
>
> 3. Go to Edit >> Options, and determine the location of the temporary
> directory. Delete the SPSS files in this directory. In fact, it has
> been suggested that I create a new directory for the temporary files,
> that it is dedicated to SPSS. This step can be quite important if one
> hasn't cleared the temporary files recently..
>
> 4. If you are running a Sort command, first do Transform >> Automatic
> Recode for the String variables, and run the procedure using the Recoded
> variables. Numeric variables are easier to sort.
>
> After doing all this, if I still have problems I reboot the computer.
> Frequently, this seems to work best. There must be some way of emptying
> out the entire cache on the computer (the temporary storage I only dimly
> understand), but no one has ever been able to tell me how to do this.
>
> Greg
>
>
>
> On 11/26/07, Richard Ristow <[hidden email]> wrote:
>
>> At 10:36 AM 11/26/2007, Brian Moore wrote:
>>
>>
>>> I *think* I need to increase my cache, but wasn't able to find
>>> anything in the documentation about the default value or what a good
>>> range might be to increase to or toward.
>>>
>>> My file is roughly 1 million cases & 180 variables, and the former
>>> will be increasing over time.
>>>
>> Here's the problem you cite. I'm quoting the error messages in full:
>>
>>
>>> 222 COMPUTE InDupGrp = MatchSequence > 0.
>>> 223 SORT CASES InDupGrp(D).
>>>
>>>
>>>> Error. Command name: SORT CASES
>>>> File write error: file name
>>>> C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left on
>>>>
>
>
>>>> device (DATA1002) This command not executed.
>>>>
>>>> Error # 5817. Command name: SORT CASES The SPSS file sort has
>>>> failed. The file remains unsorted. The specific problem is printed
>>>>
>
>
>>>> below.
>>>>
>>>> Error # 5822. Command name: SORT CASES I/O error writing to the
>>>> sort scratch file.
>>>>
>> As Adrian Barnett, said, on the face of it, this is simple: your
>> working disk filled, during the sort. That means you need more free
>> disk space, not a change in any SPSS setting. The usual estimate is
>> you should have free space 2-3 times the file size, to run a sort
>>
> well.
>
>> HOWEVER, in your case there are probably ways around, besides adding
>> more disk space.
>>
>> In the first place, the code you posted following this sort command,
>> doesn't rely on the data being sorted by 'InDupGrp':
>>
>>
>>> 224 MATCH FILES /FILE = *
>>> /DROP = PrimaryLast InDupGrp MatchSequence.
>>> 225 VARIABLE LABELS
>>> PFirstOrderDate
>>> 'Indicator of each first matching case as Primary' .
>>> 226 VALUE LABELS PFirstOrderDate
>>> 0 'Duplicate Case'
>>> 1 'Primary Case'.
>>> 227 VARIABLE LEVEL PFirstOrderDate (ORDINAL).
>>> 228 FREQUENCIES VARIABLES = PFirstOrderDate .
>>>
>> I suppose you do need the sort later for something, though.
>>
>> Second, if you do want to sort by a binary variable, it's probably
>> faster to split the file and merge. It should take storage for ONE
>> copy of your file, and should be reasonably fast. Like this, assuming
>> that InDupYes and InDupNo are file names or file handles (NOT dataset
>> names) that can be used for the scratch files. Not tested:
>>
>> DO IF InDupGrp EQ 1.
>> . XSAVE OUTFILE=InDupYes.
>> ELSE.
>> . XSAVE OUTFILE=InDupNo.
>> END IF.
>> EXECUTE.
>>
>> ADD FILES
>> /FILE=InDupYes
>> /FILE=InDupNo.
>>
>> By the way, I see that earlier in the job, command
>>
>>
>>> 212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).
>>>
>> ran successfully, on the same file. It may have been happenstance,
>> with the scratch files just making it the first time; but it's more
>> likely that, if the file was already sorted by at least 'cust', SPSS
>> used an adaptive-sort algorithm that took advantage of the file's
>> already being nearly in order.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except
>> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
>>
>
>
>> list of commands to manage subscriptions, send the command INFO
>> REFCARD
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list
> of commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

zstatman

Re: set cache reasonable values

This came out regarding tables & Java in V16. Might also help here

====================================

Problem Description:
I am using SPSS 16.0. I produce output which can contain somewhat large
tables.

In previous versions of SPSS, the job runs to completion quickly (~15
seconds or less). On the same system with SPSS 16.0.0, this job takes a long
time or never completes. The application may go gray and unresponsive.

If the output generates, I have similar trouble editing, copying/pasting,
exporting or printing the larger tables. For instance, it may take several
seconds to copy the table or I am unable to print.

Is there any way I can improve this behavior?

Resolution Summary:
The following workaround may help in this situation.

Resolution Description:
If you are running out of memory performing any operations with large tables
or if your jobs are taking a long time to run, we recommend temporarily
adjusting the Maximum Java Heap Size for the client system.

On Linux and Windows, this heap can be adjusted upwards by creating a system
environment variable, called 'SPSSClientMaxHeapLevel', on each SPSS client.
This variable can be set to values 1, 2, or 3 where 1 = 512MB 2 = 768MB and
3 = 1024. 1 is the default value. E.g.,

Variable: SPSSClientMaxHeapLevel
Value: 2

For Macintosh, the info.plist file needs to be edited. Info.plist, found in
/Applications/SPSSInc/SPSS16/SPSS16.0.app/Contents/Info.plist is an XML file
which contains the setting. If you wish to change the max heap value, you
need to do the following:

1. In Terminal, type:
$ cd /Applications/SPSSInc/SPSS16/SPSS16.0.app/Contents
2. Create a copy of the Info.plist file for safekeeping:
$ cp Info.plist Info.plist.keep
3. Edit the Info.plist file. For instance, If you want to increase the max
heap size to 768MB, you'd change: -Xmx512M to -Xmx768M $ vi Info.plist
(search for "-Xmx512M", make the change in vi, then save
Info.plist)

Please note: The drawback of setting Maximum Java Heap Size too high is that
it will take available memory away from the backend in single-seat mode,
potentially limiting other procedures that can be run (if they require a lot
of RAM) and possibly introducing video display problems.

================

W

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Will
Statistical Services

============
info.statman@earthlink.net
http://home.earthlink.net/~z_statman/
============

Richard Ristow

Re: set cache reasonable values

In reply to this post by Peck, Jon

I'd written,

>>"Apparently [2097151 bytes] is a software determined maximum [and
>>cannot always be reached]:"

At 03:59 PM 11/28/2007, Peck, Jon wrote:

>Well, actually no. That is 2GB (it's measure in K), and that is the
>maximum address space possible for any 32-bit Windows
>application. That is doubtless too big for reasonable use.

THANK you, Jon.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Brian Moore-3

Re: set cache reasonable values

In reply to this post by Richard Ristow

Thanks for the suggestions. As I continue to try to break up this
problem (such as with the file separation idea) it gets more and more
curious.

In fact now I'm getting this warning even with much smaller files.
&have had others run my larger syntax file on their computers without
error

I'm leaning toward this being some kind of hang-up that may have been
initially caused by the overflowing cache; but now SPSS is not even
checking for free space anymore before warning me.

I've shut down and restarted; but haven't tried anything as drastic as
reinstalling (yet).
Anything in between I could try?

Thanks ,
Brian

PS- one last oddity is that I can't find any obvious problems with the
RESULTS I'm getting when warned. (process is one I run every few weeks
on transactional database & levels are roughly as expected)

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Wednesday, November 28, 2007 1:40 PM
To: Brian Moore; [hidden email]
Cc: Gregory Hildebrandt
Subject: Re: set cache reasonable values

At 02:28 PM 11/28/2007, Brian Moore wrote:

>Thanks to everyone for insights on this issue.
>
>I have succeeded in cleaning out the cache, but still getting warnings.

>Apparently [2097151 bytes] is a software determined maximum [and cannot

>always be reached]:
>
>>Warning # 44
>>The operating system could not allocate a memory segment of the size
>>requested by SPSS.
>
>Other specs that may matter:
>-using version 15 (but are looking at upgrading to 16) -File size is
>420 MB -& I have ~20 gigs total free.

Well, that appears to rule out an *intrinsic* problem with disk space,
although you previously got a message that SPSS couldn't get the disk
space it *thought* it needed:

>>Error. Command name: SORT CASES
>>File write error: file name
>>C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left on
>>device (DATA1002) This command not executed.

Goodness knows how SPSS got there, but I doubt we can diagnose that,
certainly not fix it.

>Any other ideas?

Well, the command sequence that blew up before was

>> 222 COMPUTE InDupGrp = MatchSequence > 0.
>> 223 SORT CASES InDupGrp(D).

Assuming that's still the case,

(a) Very wild chance: Do you need to do it at all? The code following
the sort, in your original posting, didn't appear to rely on the data's
being sorted by 'InDupGrp'. But I doubt this'll do it; presumably, you
do need the sort, otherwise.

(b) Still relying on your original posting: It struck me that, earlier
in your code, the command

>> 212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).

apparently worked, on the same file.

Sorting on the binary variable 'InDupGrp' gives a huge number of ties on
the sort key, and I wonder whether that gives the sorting algorithm some
trouble. (Yes, I can give a good argument why it
shouldn't.) I'd try appending the previous key sequence:

COMPUTE InDupGrp = MatchSequence > 0.
SORT CASES BY InDupGrp(D)
cust(A) Order_Date_Overall(A) ProductType(D).

(c) Have you tried the work-around I suggested? That is,

DO IF InDupGrp EQ 1.
. XSAVE OUTFILE=InDupYes.
ELSE.
. XSAVE OUTFILE=InDupNo.
END IF.
EXECUTE.

ADD FILES
/FILE=InDupYes
/FILE=InDupNo.

(Here, InDupYes and InDupNo are file names or file handles - NOT dataset
names - for scratch files; and the code's still not tested.)
....................
Apologies for any crucial points I've missed. But, well, 'any ideas'
this is.

-Best of luck and best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Gregory Hildebrandt

Re: set cache reasonable values

Brian,

I think with one million data points and several hundred variables, SPSS
starts to have problems. For example in a similar sized dataset, I create a
chart in the SPSS Viewer, but can't copy into Excel or PowerPoint, which is
my preference for tables rather than editing the SPSS table. The data seems
to be behind the chart. Similar things happen with a large table, that has
less than 65,000 rows. However, I did move a moderate sized table into
"SPSS Pivot Table Object."

The temporary directory in Edit >> Options may fill up vary fast, so I have
replaced the default with C:\SPSS14.0\temp. About every other time I
open SPSS, I first delete all the files in the temporary directory

For sorting, increasing the memory using Set Workspace = 600000 or more has
helped, contrary to what I was told. However, there have been times in the
past when I have had to reduce the memory to permit a procedure like sorting
to work. You may want to start with the default (check "Show Workspace"),
and gradually increase the size.

I wonder if more RAM would help, or if you have used up too high a
proportion of your hard drive.

It may be time to reinstall. Make certain everything is off the hard
drive. It only takes a few minutes. However, once when I went into regedit,
with a member of the SPSS Help desk on the phone, I found remnants of an old
version of SPSS still in this directory, which I manually removed.

With a large file, the SPSS viewer also seems to increase in size very
quickly so one can easily end up with a 20mb viewer file. This might affect
your ability to use the Sort procedure. Contrary to the prevailing wisdon,
I have also found situtations in which the Syntax file is too large, and hae
had to begin a new one. This was with SPSS 11.5, so, perhaps, the problem
has been corrected.

I also wonder if you can copy the file into Access and sort in Access. Then
re-import into SPSS.

Hope this helps.

Greg

On 11/28/07, Brian Moore <[hidden email]> wrote:

>
> Thanks for the suggestions. As I continue to try to break up thiss
> problem (such as with the file separation idea) it gets more and more
> curious.
>
> In fact now I'm getting this warning even with much smaller files.
> &have had others run my larger syntax file on their computers without
> error
>
> I'm leaning toward this being some kind of hang-up that may have been
> initially caused by the overflowing cache; but now SPSS is not even
> checking for free space anymore before warning me.
>
> I've shut down and restarted; but haven't tried anything as drastic as
> reinstalling (yet).
> Anything in between I could try?
>
> Thanks ,
> Brian
>
> PS- one last oddity is that I can't find any obvious problems with the
> RESULTS I'm getting when warned. (process is one I run every few weeks
> on transactional database & levels are roughly as expected)
>
> -----Original Message-----
> From: Richard Ristow [mailto:[hidden email]]
> Sent: Wednesday, November 28, 2007 1:40 PM
> To: Brian Moore; [hidden email]
> Cc: Gregory Hildebrandt
> Subject: Re: set cache reasonable values
>
> At 02:28 PM 11/28/2007, Brian Moore wrote:
>
> >Thanks to everyone for insights on this issue.
> >
> >I have succeeded in cleaning out the cache, but still getting warnings.
>
> >Apparently [2097151 bytes] is a software determined maximum [and cannot
>
> >always be reached]:
> >
> >>Warning # 44
> >>The operating system could not allocate a memory segment of the size
> >>requested by SPSS.
> >
> >Other specs that may matter:
> >-using version 15 (but are looking at upgrading to 16) -File size is
> >420 MB -& I have ~20 gigs total free.
>
> Well, that appears to rule out an *intrinsic* problem with disk space,
> although you previously got a message that SPSS couldn't get the disk
> space it *thought* it needed:
>
> >>Error. Command name: SORT CASES
> >>File write error: file name
> >>C:\DOCUME~1\BMoore\LOCALS~1\Temp\spss2064\cache.33: No space left on
> >>device (DATA1002) This command not executed.
>
> Goodness knows how SPSS got there, but I doubt we can diagnose that,
> certainly not fix it.
>
> >Any other ideas?
>
> Well, the command sequence that blew up before was
>
> >> 222 COMPUTE InDupGrp = MatchSequence > 0.
> >> 223 SORT CASES InDupGrp(D).
>
> Assuming that's still the case,
>
> (a) Very wild chance: Do you need to do it at all? The code following
> the sort, in your original posting, didn't appear to rely on the data's
> being sorted by 'InDupGrp'. But I doubt this'll do it; presumably, you
> do need the sort, otherwise.
>
> (b) Still relying on your original posting: It struck me that, earlier
> in your code, the command
>
> >> 212 SORT CASES BY cust(A) Order_Date_Overall(A) ProductType(D).
>
> apparently worked, on the same file.
>
> Sorting on the binary variable 'InDupGrp' gives a huge number of ties on
> the sort key, and I wonder whether that gives the sorting algorithm some
> trouble. (Yes, I can give a good argument why it
> shouldn't.) I'd try appending the previous key sequence:
>
> COMPUTE InDupGrp = MatchSequence > 0.
> SORT CASES BY InDupGrp(D)
> cust(A) Order_Date_Overall(A) ProductType(D).
>
> (c) Have you tried the work-around I suggested? That is,
>
> DO IF InDupGrp EQ 1.
> . XSAVE OUTFILE=InDupYes.
> ELSE.
> . XSAVE OUTFILE=InDupNo.
> END IF.
> EXECUTE.
>
> ADD FILES
> /FILE=InDupYes
> /FILE=InDupNo.
>
> (Here, InDupYes and InDupNo are file names or file handles - NOT dataset
> names - for scratch files; and the code's still not tested.)
> ....................
> Apologies for any crucial points I've missed. But, well, 'any ideas'
> this is.
>
> -Best of luck and best wishes,
> Richard
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>