SPSSX Discussion

help with mixed models code

Classic

List

Threaded

9 messages Options

Bettina Haidich

help with mixed models code

Dear list,

I would like your help in an SPSS/PASW code for mixed models. I have a sample of 35 subjects measured during walking in four different scenarios: a) wearing shoes only, b) shoes plus metatarsal pads and c) shoes plus metatarsal bars, placed either c1) perpendicular to the foot axis or c2) oblique to the foot axis. Six steps were recorded for each subject and for each side (right/left). Therefore each subject has 48 measurements 4 groups x6 steps x2 sides. I have constructed a long dataset of 1680 observations (48x 35=1680). The dependent variable is IMPULSE. My problem is that I do not know how to model the group and the side that are not independent to account for the within-subject correlation . I used the repeated term for group and side but I am not sure if it is right. Then the pairwise comparisons I am getting for group and side with Bonferroni correction, I think are independent and not paired t-tests. Is there a way to get paired t-tests
within the mixed models procedure? For the repeated covariance type should I use AR(1) instead of diagonal?

Here is the code I run in SPSS
MIXED
IMPULSE BY GROUP SIDE GENDER
/CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1)
SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE)
PCONVERGE(0.000001, ABSOLUTE)
/FIXED = GROUP SIDE GENDER GROUP*SIDE GENDER*GROUP GENDER*SIDE GENDER*GROUP
*SIDE | SSTYPE(3)
/METHOD = REML
/PRINT = DESCRIPTIVES
/REPEATED = STEP*GROUP*SIDE | SUBJECT(ID) COVTYPE(DIAG)
/EMMEANS = TABLES(GROUP) COMPARE ADJ(BONFERRONI)
/EMMEANS = TABLES(SIDE) COMPARE ADJ(BONFERRONI)
/EMMEANS = TABLES(GENDER) COMPARE ADJ(BONFERRONI)
/EMMEANS = TABLES(GROUP*SIDE)
/EMMEANS = TABLES(GENDER*GROUP)
/EMMEANS = TABLES(GENDER*SIDE)
/EMMEANS = TABLES(GENDER*GROUP*SIDE) .
Any assistance will be greatly appreciated,

Bettina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Dean Tindall

CPU Specifications used for SPSS

Hi List,

Sorry for the slightly OT post, but I am currently attempting to get my
department to agree to an upgrade to our computers for the purpose of
decreasing calculation times and CTDs. At the moment we are lumbered
with a 2 gig processor and only 2 gigs of RAM.

We are regularly using files with 300,000+ cases and 1000+ variables and
calculation times can really take up the most part of the analyses.

So in order to aid the business case for upgrading the systems I was
wondering if you guys could please let me know what kind of systems you
are using,

Thanks in Advance,

Dean
This email was sent from:- Nunwood Consulting Ltd. (registered in England and Wales no. 3135953) whose head office is based at:- 7, Airport West, Lancaster Way, Yeadon, Leeds, LS19 7ZA.
Tel +44 (0) 845 372 0101
Fax +44 (0) 845 372 0102
Web http://www.nunwood.com
Email [hidden email]

This e-mail is confidential and intended solely for the use of the individual(s) to whom it is addressed. If you are not the intended
recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing, copying of, or
any action taken in reliance upon it, is strictly prohibited and may be illegal. To review our privacy policy please visit: www.nunwood.com/privacypolicy.html

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: CPU Specifications used for SPSS

At 09:08 AM 5/5/2010, Dean Tindall wrote:

>Sorry for the slightly OT post ...

This isn't remotely off-topic. Anything about SPSS is on-topic for SPSSX-L.

>I am attempting to get my department to agree to an upgrade to our
>computers for the purpose of decreasing calculation times and
>CTDs. At the moment we are lumbered with a 2 gig processor and only
>2 gigs of RAM.
>
>We are regularly using files with 300,000+ cases and 1000+ variables
>and calculation times can really take up the most part of the analyses.

I've cited, below(1), an earlier but still relevant thread on this
topic; and still earlier, Jon Peck wrote on SPSS's capacity, which I
append(3) in its entirety, with some additions of my own. I would say, now,

. The speed of most SPSS jobs is limited by the disk transfer rate,
and a faster CPU rarely helps much. (But Michael Kruger reports
differently.(2))

. For that reason, consider looking for faster disks. It can also
help to have two disk drives, and arrange so that SPSS reads from one
and writes to the other.

. More RAM will help, possibly quite a bit, especially with your wide files.

But also, consider the efficiency of your SPSS code, where often one
can realize dramatic improvements:

. SPSS should now handle 300,000 cases easily; I'd call that mid-big
by current standards. But 1,000 variables is a lot, and I'd look for
ways to reduce it. Those may include,
If your records include multiple instances of some repeating
occurrence like office visits, 'unrolling' to a structure with fewer
variables and more cases.
If your records include data from multiple sources bearing on the
same entities, keeping data from those sources in separate SPSS files
and using MATCH FILES to combine only those needed for a particular analysis.

. Avoid reading and writing data unnecessarily.
Remove EXECUTE statements unless you know a reason they're needed.
Keeping data from different sources separate also avoids a run's
reading data that isn't needed
And if you process many subsets of the records in your files, be
sure to structure the run so you don't, say, read the entire file for
every subset.

-With best wishes to you,
Richard

==========================================
APPENDIX: Other sources, citations and text
==========================================
(1) Date: Tue, 30 Jan 2007 09:47:07 -0500
From: Christopher Cronin <[hidden email]>
Subject:Computer Buying Help
To: [hidden email]
and follow-up postings

(2) Date: Wed, 6 Jan 2010 11:57:23 -0500
From: Michael Kruger <[hidden email]>
Subject:Re: Quad Core Processors
To: [hidden email]

I have been running PASW)v. 17.0) on a quad core processer with
64-bit Windows as the OS at work and at home and on a notebook with
the 32 bit OS. I cna say form my experience with datasets of several
million cases, the 64-bit OS and chip make a big difference in speed.
For the extra money, it's well worth it.

(3) Date: Thu, 5 Jun 2003 09:25:37 -0500
From: "Peck, Jon" <[hidden email]>
Subject:Re: Is there a limit of number of variables
for recent versions of SPSS
To: [hidden email]

There are several points to making regarding very wide files and huge datasets.

First, the theoretical SPSS limits are

Number of variables: (2**31) -1
Number of cases: (2**31) - 1

In calculating these limits, count one for each 8 bytes or part
thereof of a string variable. An a10 variable counts as two
variables, for example.

Approaching the theoretical limit on the number of variables,
however, is a very bad idea in practice for several reasons.

1. These are the theoretical limits in that you absolutely cannot go
beyond them. But there are other environmentally imposed limits that
you will surely hit first. For example, Windows applications are
absolutely limited to 2GB of addressable memory, and 1GB is a more
practical limit. Each dictionary entry requires about 100 bytes of
memory, because in addition to the variable name, other variable
properties also have to be stored. (On non-Windows platforms, SPSS
Server could, of course, face different environmental
limits.) Numerical variable values take 8 bytes as they are held as
double precision floating point values.

2. The overhead of reading and writing extremely wide cases when you
are doubtless not using more than a small fraction of them will limit
performance. And you don't want to be paging the variable
dictionary. If you have lots of RAM, you can probably reach between
32,000 and 100,000 variables before memory paging degrades
performance seriously.

3. Dialog boxes cannot display very large variable lists. You can
use variable sets to restrict the lists to the variables you are
really using, but lists with thousands of variables will always be awkward.

4. Memory usage is not just about the dictionary. The operating
system will almost always be paging code and data between memory and
disk. (You can look at paging rates via the Windows Task
Manager). The more you page, the slower things get, but the variable
dictionary is only one among many objects that the operating system
is juggling. However, there is another effect. On NT and later,
Windows automatically caches files (code or data) in memory so that
it can retrieve it quickly. This cache occupies memory that is
otherwise surplus, so if any application needs it, portions of the
cache are discarded to make room. You can see this effect quite
clearly if you start SPSS or any other large application; then shut
it down and start it again. It will load much more quickly the
second time, because it is retrieving the code modules needed at
startup from memory rather than disk. The Windows cache,
unfortunately, will not help data access very much unless most of the
dataset stays
in memory, because the cache will generally hold the most recently
accessed data. If you are reading cases sequentially, the one you
just finished with is the LAST one you will want again.

5. These points apply mainly to the number of variables. The number
of cases is not subject to the same problems, because the cases are
not generally all mapped into memory by SPSS (although Windows may
cache them). However, there are some procedures that because of
their computational requirements do have to hold the entire dataset
in memory, so those would not scale well up to immense numbers of cases.

The point of having an essentially unlimited number of variables is
not that you really need to go to that limit. Rather it is to avoid
hitting a limit incrementally. It's like infinity. You never want
to go there, but any value smaller is an arbitrary limit, which SPSS
tries to avoid. It is better not to have a hard stopping rule.

Modern database practice would be to break up your variables into
cohesive subsets and combine these with join (MATCH FILES in SPSS)
operations when you need variables from more than one subset. SPSS
is not a relational database, but working this way will be much more
efficient and practical with very large numbers of variables.

Regards,
Jon Peck
SPSS R & D
----------
Follow-on remarks:

. For most operations, increasing the number of cases will increase
the running time about in proportion. Usually, SPSS can handle a
great many cases gracefully. Jon, below, notes some operations for
which many cases may slow SPSS badly, but often work-around can be
found even for these.

. Increasing the number of variables will generally increase the
running time about in proportion, even if you're not using them all,
because the running time is dominated by the time to read the file
from disk, i.e. the total file size

. After some point hard to estimate (though larger if the machine has
more RAM), increasing the number of variables will increase the
running time out of all proportion, because putting the whole
dictionary and data for one case in RAM may require paging.

. I emphasize Jon's point that "modern database practice would be to
break up your variables into cohesive subsets", i.e. to restructure
with more cases and fewer variables. A typical example is changing
from one record per entity with data for many years, to one record
per entity per year. I've posted a number of solutions in which data
is given such a 'long' representation with many cases, instead of a
'wide' representation with many variables.
X-ELNK-Info: spv=0;
X-ELNK-AV: 0
X-ELNK-Info: sbv=0; sbrc=.0; sbf=0b; sbw=000;

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Garry Gelade

Re: CPU Specifications used for SPSS

In reply to this post by Dean Tindall

Dean

Just to add to what Mike Kruger has already said - 64 bit does make a
difference. It means you can directly address more memory. With 32 bit you
are limited to directly accessing 4GB RAM, and adding more won't help you
speed up. If you have more than 4GB RAM, 64 bit will definitely improve slow
performance.

Garry Gelade

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Dean Tindall
Sent: 05 May 2010 14:09
To: [hidden email]
Subject: CPU Specifications used for SPSS

Hi List,

Sorry for the slightly OT post, but I am currently attempting to get my
department to agree to an upgrade to our computers for the purpose of
decreasing calculation times and CTDs. At the moment we are lumbered
with a 2 gig processor and only 2 gigs of RAM.

We are regularly using files with 300,000+ cases and 1000+ variables and
calculation times can really take up the most part of the analyses.

So in order to aid the business case for upgrading the systems I was
wondering if you guys could please let me know what kind of systems you
are using,

Thanks in Advance,

Dean
This email was sent from:- Nunwood Consulting Ltd. (registered in England
and Wales no. 3135953) whose head office is based at:- 7, Airport West,
Lancaster Way, Yeadon, Leeds, LS19 7ZA.
Tel +44 (0) 845 372 0101
Fax +44 (0) 845 372 0102
Web http://www.nunwood.com
Email [hidden email]

This e-mail is confidential and intended solely for the use of the
individual(s) to whom it is addressed. If you are not the intended
recipient, be advised that you have received this e-mail in error and that
any use, dissemination, forwarding, printing, copying of, or
any action taken in reliance upon it, is strictly prohibited and may be
illegal. To review our privacy policy please visit:
www.nunwood.com/privacypolicy.html

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Barnett, Adrian (DECD)

Re: CPU Specifications used for SPSS

In reply to this post by Dean Tindall

Hi Dean
At work I am running an Intel dual core E6600 CPU, with 4GB RAM and 32 bit Windows XP. I have to process large files (900,000 cases and maybe 50 variables) and run into performance issues all the time.

At home I run an Intel Core i7 860 with 8GB RAM under 64 bit Windows 7. Running the same programs and data files there is 2-4 times faster.

As Richard explains, Windows will spend its time on very slow disk paging operations when dealing with large files. By running 64 bit versions of your OS and SPSS, much more memory becomes available than under a 32 bit OS and your CPU will spend more of its time working on the data and less waiting for the disk to finish paging to get it its next lot of usable data.

SPSS 18 is better at using memory, especially in its 64 bit version, than earlier versions. However, it could still do better. Observing it while sorting on my home system, it reported that the file it was sorting occupied 800MB when uncompressed. The sort routine said it was allocating 131 megabytes of RAM to the sort and proceeded to page data to and from disk. At the time, the operating system reported that 4GB of RAM was in use by everything that was running, but there was another 4GB doing nothing. I don't know why SPSS was not using all this memory - the whole file could have been read into memory several times over. Had it used all the available RAM, the sort would have run in a fraction of the time. CPU utilisation was not spread evenly across the available CPU cores, mostly concentrating on a single core. This is puzzling because there are sorting algorithms which scale well across multiple CPUs.

Anyway, as Richard has pointed out, disk speed is a key bottleneck. One slightly expensive option you may wish to explore is getting a solid state disk (SSD) and installing the operating system and your biggest data files there. SSDs are quite fast and may well improve your performance significantly.

There is a downside to running 64 bit OS and applications. ODBC only works between applications of the same bit width. 32 bit to 32 bit and 64 bit to 64 bit. ODBC between 32 bit and 64 bit does not happen. So if you need to export SPSS files to Access, it will only work via ODBC if you have a 64 bit version of Access. Office10, reportedly to be released in October, comes in both 32 and 64 bit versions, and I can confirm that ODBC from 64 bit SPSS to 64 bit Access works (there is a free public beta of Office 10 available from the Microsoft website). If you have corporate data systems which are 32 bit you won't be able to talk via ODBC to 64 bit SPSS. So some things may be a little awkward until everything becomes 64 bit, but none are impossible, as there are workarounds for the ODBC thing.

Regards

Adrian Barnett
Project Officer
Educational Measurement and Analysis
Data and Educational Measurement
DECS
ph 82261080

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Dean Tindall
Sent: Wednesday, 5 May 2010 10:39 PM
To: [hidden email]
Subject: CPU Specifications used for SPSS

Hi List,

Sorry for the slightly OT post, but I am currently attempting to get my
department to agree to an upgrade to our computers for the purpose of
decreasing calculation times and CTDs. At the moment we are lumbered
with a 2 gig processor and only 2 gigs of RAM.

We are regularly using files with 300,000+ cases and 1000+ variables and
calculation times can really take up the most part of the analyses.

So in order to aid the business case for upgrading the systems I was
wondering if you guys could please let me know what kind of systems you
are using,

Thanks in Advance,

Dean
This email was sent from:- Nunwood Consulting Ltd. (registered in England and Wales no. 3135953) whose head office is based at:- 7, Airport West, Lancaster Way, Yeadon, Leeds, LS19 7ZA.
Tel +44 (0) 845 372 0101
Fax +44 (0) 845 372 0102
Web http://www.nunwood.com
Email [hidden email]

This e-mail is confidential and intended solely for the use of the individual(s) to whom it is addressed. If you are not the intended
recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing, copying of, or
any action taken in reliance upon it, is strictly prohibited and may be illegal. To review our privacy policy please visit: www.nunwood.com/privacypolicy.html

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

mpirritano

Re: CPU Specifications used for SPSS

Does anyone have an opinion on the utility of using a program like
RamDisk? RamDisk allows you to use solid state ram for your paging file.
I've got this running at home. I think it helps. Is it just in my head?

Thanks
matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Barnett, Adrian (DECS)
Sent: Thursday, May 06, 2010 4:05 PM
To: [hidden email]
Subject: Re: CPU Specifications used for SPSS

Hi Dean
At work I am running an Intel dual core E6600 CPU, with 4GB RAM and 32
bit Windows XP. I have to process large files (900,000 cases and maybe
50 variables) and run into performance issues all the time.

At home I run an Intel Core i7 860 with 8GB RAM under 64 bit Windows 7.
Running the same programs and data files there is 2-4 times faster.

As Richard explains, Windows will spend its time on very slow disk
paging operations when dealing with large files. By running 64 bit
versions of your OS and SPSS, much more memory becomes available than
under a 32 bit OS and your CPU will spend more of its time working on
the data and less waiting for the disk to finish paging to get it its
next lot of usable data.

SPSS 18 is better at using memory, especially in its 64 bit version,
than earlier versions. However, it could still do better. Observing it
while sorting on my home system, it reported that the file it was
sorting occupied 800MB when uncompressed. The sort routine said it was
allocating 131 megabytes of RAM to the sort and proceeded to page data
to and from disk. At the time, the operating system reported that 4GB of
RAM was in use by everything that was running, but there was another 4GB
doing nothing. I don't know why SPSS was not using all this memory - the
whole file could have been read into memory several times over. Had it
used all the available RAM, the sort would have run in a fraction of the
time. CPU utilisation was not spread evenly across the available CPU
cores, mostly concentrating on a single core. This is puzzling because
there are sorting algorithms which scale well across multiple CPUs.

Anyway, as Richard has pointed out, disk speed is a key bottleneck. One
slightly expensive option you may wish to explore is getting a solid
state disk (SSD) and installing the operating system and your biggest
data files there. SSDs are quite fast and may well improve your
performance significantly.

There is a downside to running 64 bit OS and applications. ODBC only
works between applications of the same bit width. 32 bit to 32 bit and
64 bit to 64 bit. ODBC between 32 bit and 64 bit does not happen. So if
you need to export SPSS files to Access, it will only work via ODBC if
you have a 64 bit version of Access. Office10, reportedly to be released
in October, comes in both 32 and 64 bit versions, and I can confirm that
ODBC from 64 bit SPSS to 64 bit Access works (there is a free public
beta of Office 10 available from the Microsoft website). If you have
corporate data systems which are 32 bit you won't be able to talk via
ODBC to 64 bit SPSS. So some things may be a little awkward until
everything becomes 64 bit, but none are impossible, as there are
workarounds for the ODBC thing.

Regards

Adrian Barnett
Project Officer
Educational Measurement and Analysis
Data and Educational Measurement
DECS
ph 82261080

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Dean Tindall
Sent: Wednesday, 5 May 2010 10:39 PM
To: [hidden email]
Subject: CPU Specifications used for SPSS

Hi List,

Sorry for the slightly OT post, but I am currently attempting to get my
department to agree to an upgrade to our computers for the purpose of
decreasing calculation times and CTDs. At the moment we are lumbered
with a 2 gig processor and only 2 gigs of RAM.

We are regularly using files with 300,000+ cases and 1000+ variables and
calculation times can really take up the most part of the analyses.

So in order to aid the business case for upgrading the systems I was
wondering if you guys could please let me know what kind of systems you
are using,

Thanks in Advance,

Dean
This email was sent from:- Nunwood Consulting Ltd. (registered in
England and Wales no. 3135953) whose head office is based at:- 7,
Airport West, Lancaster Way, Yeadon, Leeds, LS19 7ZA.
Tel +44 (0) 845 372 0101
Fax +44 (0) 845 372 0102
Web http://www.nunwood.com
Email [hidden email]

This e-mail is confidential and intended solely for the use of the
individual(s) to whom it is addressed. If you are not the intended
recipient, be advised that you have received this e-mail in error and
that any use, dissemination, forwarding, printing, copying of, or
any action taken in reliance upon it, is strictly prohibited and may be
illegal. To review our privacy policy please visit:
www.nunwood.com/privacypolicy.html

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Barnett, Adrian (DECD)

Re: CPU Specifications used for SPSS

In reply to this post by Barnett, Adrian (DECD)

Hi Jonathon

Thanks for the tip about WORKSPACE and THREADS.

I didn’t realise it was possible to use it to improve memory usage because the manual says “don’t do it unless SPSS complains”.

Below is all that is said about the use of WORKSPACE.

WORKSPACE allocates more memory for some procedures when you receive a message indicating

that the available memory has been used up or indicating that only a given number of variables

can be processed. MXCELLS increases the maximum number of cells you can create for a new

pivot table when you receive a warning that a pivot table cannot be created because it exceeds the

maximum number of cells that are allowed.

􀂄 WORKSPACE allocates workspace memory in kilobytes for some procedures that allocate only

one block of memory. The default is 6148.

􀂄 Do not increase the workspace memory allocation unless the program issues a message that

there is not enough memory to complete a procedure

The section on THREADS discourages you from altering the setting.

I will experiment with both of these and see if anything improves.

I must say I didn’t observe an even allocation of work across cores (4 real and 4 virtual) when my sort was running. The overwhelming majority of work was being done by a single core. The others were doing stuff, but not much. There is a big sort running on my work computer as I write and one CPU is maxed out while the other is sitting at about 10-15%.

Can CPU utilization be made unbalanced if there is insufficient memory for second (and subsequent) cores to do anything useful?

Adrian Barnett

Project Officer

Educational Measurement and Analysis

Data and Educational Measurement

DECS

ph 82261080

From: Jon Fry [mailto:[hidden email]]
Sent: Tuesday, 11 May 2010 10:52 AM
To: Barnett, Adrian (DECS)
Cc: [hidden email]
Subject: Re: CPU Specifications used for SPSS

Regarding SORT CASES:

SORT CASES can use more memory than it was using here. 32-bit versions use at least 128MB; 64-bit versions use at least 512MB. If WORKSPACE is set higher, it will use the WORKSPACE setting. If the available memory (the result of the preceding calculation) is enough to store the entire dataset, it will sort the data in memory.

If the file might be bigger than the available memory, SORT CASES divides the work among a set of threads so it can make use of multiple cores. It first divides the memory. On Adrian's four core processor, it divided the 512MB available into four 128MB areas (about 131,000 KB) and gave one area to each thread. The number of threads it uses is controlled by SET THREADS.

Jonathan Fry

"SPSSX(r) Discussion" <[hidden email]> wrote on 05/06/2010 06:04:53 PM:

> From:
>
> "Barnett, Adrian (DECS)" <[hidden email]>
>
> To:
>
> [hidden email]
>
> Date:
>
> 05/07/2010 09:52 AM
>
> Subject:
>
> Re: CPU Specifications used for SPSS
>
> Sent by:
>
> "SPSSX(r) Discussion" <[hidden email]>
>
> Hi Dean
> At work I am running an Intel dual core E6600 CPU, with 4GB RAM and
> 32 bit Windows XP. I have to process large files (900,000 cases and
> maybe 50 variables) and run into performance issues all the time.
>
> At home I run an Intel Core i7 860 with 8GB RAM under 64 bit Windows
> 7. Running the same programs and data files there is 2-4 times faster.
>
> As Richard explains, Windows will spend its time on very slow disk
> paging operations when dealing with large files. By running 64 bit
> versions of your OS and SPSS, much more memory becomes available
> than under a 32 bit OS and your CPU will spend more of its time
> working on the data and less waiting for the disk to finish paging
> to get it its next lot of usable data.
>
> SPSS 18 is better at using memory, especially in its 64 bit version,
> than earlier versions. However, it could still do better. Observing
> it while sorting on my home system, it reported that the file it was
> sorting occupied 800MB when uncompressed. The sort routine said it
> was allocating 131 megabytes of RAM to the sort and proceeded to
> page data to and from disk. At the time, the operating system
> reported that 4GB of RAM was in use by everything that was running,
> but there was another 4GB doing nothing. I don't know why SPSS was
> not using all this memory - the whole file could have been read into
> memory several times over. Had it used all the available RAM, the
> sort would have run in a fraction of the time. CPU utilisation was
> not spread evenly across the available CPU cores, mostly
> concentrating on a single core. This is puzzling because there are
> sorting algorithms which scale well across multiple CPUs.
>
> Anyway, as Richard has pointed out, disk speed is a key bottleneck.
> One slightly expensive option you may wish to explore is getting a
> solid state disk (SSD) and installing the operating system and your
> biggest data files there. SSDs are quite fast and may well improve
> your performance significantly.
>
> There is a downside to running 64 bit OS and applications. ODBC only
> works between applications of the same bit width. 32 bit to 32 bit
> and 64 bit to 64 bit. ODBC between 32 bit and 64 bit does not
> happen. So if you need to export SPSS files to Access, it will only
> work via ODBC if you have a 64 bit version of Access. Office10,
> reportedly to be released in October, comes in both 32 and 64 bit
> versions, and I can confirm that ODBC from 64 bit SPSS to 64 bit
> Access works (there is a free public beta of Office 10 available
> from the Microsoft website). If you have corporate data systems
> which are 32 bit you won't be able to talk via ODBC to 64 bit SPSS.
> So some things may be a little awkward until everything becomes 64
> bit, but none are impossible, as there are workarounds for the ODBC thing.
>
> Regards
>
> Adrian Barnett
> Project Officer
> Educational Measurement and Analysis
> Data and Educational Measurement
> DECS
> ph 82261080
>
> -----Original Message-----
> From: SPSSX(r) Discussion [[hidden email]] On
> Behalf Of Dean Tindall
> Sent: Wednesday, 5 May 2010 10:39 PM
> To: [hidden email]
> Subject: CPU Specifications used for SPSS
>
> Hi List,
>
> Sorry for the slightly OT post, but I am currently attempting to get my
> department to agree to an upgrade to our computers for the purpose of
> decreasing calculation times and CTDs. At the moment we are lumbered
> with a 2 gig processor and only 2 gigs of RAM.
>
> We are regularly using files with 300,000+ cases and 1000+ variables and
> calculation times can really take up the most part of the analyses.
>
> So in order to aid the business case for upgrading the systems I was
> wondering if you guys could please let me know what kind of systems you
> are using,
>
> Thanks in Advance,
>
> Dean
> This email was sent from:- Nunwood Consulting Ltd. (registered in
> England and Wales no. 3135953) whose head office is based at:- 7,
> Airport West, Lancaster Way, Yeadon, Leeds, LS19 7ZA.
> Tel +44 (0) 845 372 0101
> Fax +44 (0) 845 372 0102
> Web http://www.nunwood.com
> Email [hidden email]
>
> This e-mail is confidential and intended solely for the use of the
> individual(s) to whom it is addressed. If you are not the intended
> recipient, be advised that you have received this e-mail in error
> and that any use, dissemination, forwarding, printing, copying of, or
> any action taken in reliance upon it, is strictly prohibited and may
> be illegal. To review our privacy policy please visit:
> www.nunwood.com/privacypolicy.html
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Barnett, Adrian (DECD)

Re: CPU Specifications used for SPSS

Hi Jonathon

My interest in the utilization of the separate cores is that, since the sorts are frequently the most time-consuming parts of what are lengthy programs, I’d like to see them all doing as much useful work as possible. So I’d like to see them all basically flat out all working on the sort, since that should get it done quicker. I’d also like to have to do as little disk swapping as possible, so I’d like it to use all the free memory it can.

At the moment it looks like the available memory and CPU capacity are not being used to the maximum.

Based on what the Task Manager was telling me during a run with WORKSPACE at its default, I could see that there was a gigabyte of memory free, so I re-ran with WORKSPACE at 1000000, which should have made 1 GB available. The resource messages suggested that it was giving 512 MB to each thread. However the results were a bit puzzling, since the same sort went from 8 sorts to 2 (which was encouraging) but the reported Total CPU time went from 56.8 sec to 58.34, and Elapsed from 49.76 to 46.33, which was one step forward and another back. There was also a fair amount of disk I/O.

I understand your point that if the CPU load is light, the separate CPUs will be doing different amounts (since one can look after it all), but sorting 900,000 records on several string variables of between 46 characters and 5 in length should be pretty demanding. So I am puzzled by what I am observing.

Your earlier description of how threads are allocated – dividing a task between cores when the file might be bigger than physical memory, sounds like what it should be doing routinely. The task should always complete quicker if split amongst more processors, especially if the file is smaller than memory. Superficially it looks to me as if splitting amongst processors when there is LESS memory than would fit the file would be the time to not bother with allocating to extra CPUs, since there’s nothing for them to do while waiting for the disk, and might even generate more disk I/O due to the overhead.

Thanks for the tip about re-setting WORKSPACE.

Adrian Barnett

Project Officer

Educational Measurement and Analysis

Data and Educational Measurement

DECS

ph 82261080

From: Jon Fry [mailto:[hidden email]]
Sent: Tuesday, 11 May 2010 11:43 PM
To: Barnett, Adrian (DECS)
Cc: [hidden email]
Subject: RE: CPU Specifications used for SPSS

Adrian,

It would probably help to set WORKSPACE back to some modest value (20000?) after your sorts. It can hurt to keep it too high.

There is no need to be concerned about CPU usage imbalance. The OS dispatches threads on any available processor; it does not try to balance usage. It may even pick the lowest-numbered available one. So only a heavy CPU load will look balanced.

Most sort problems are not CPU-intensive. If you are seeing the memory budgets, you can also see CPU times and elapsed times. I am sure the elapsed times for all sort phases exceed the CPU times even with multiple threads. My suggestions for setting THREADS: if you think your dataset will fit in memory after compression, use one thread. Otherwise, usually use two threads. Only for CPU-intensive sorting problems will more than two threads pay off.

Jonathan

From:	"Barnett, Adrian (DECS)" <[hidden email]>
To:	Jon Fry/Chicago/IBM@IBMUS
Cc:	"[hidden email]" <[hidden email]>
Date:	05/10/2010 08:53 PM
Subject:	RE: CPU Specifications used for SPSS

Hi Jonathon
Thanks for the tip about WORKSPACE and THREADS.

I didn’t realise it was possible to use it to improve memory usage because the manual says “don’t do it unless SPSS complains”.

Below is all that is said about the use of WORKSPACE.

WORKSPACE allocates more memory for some procedures when you receive a message indicating
that the available memory has been used up or indicating that only a given number of variables
can be processed. MXCELLS increases the maximum number of cells you can create for a new
pivot table when you receive a warning that a pivot table cannot be created because it exceeds the
maximum number of cells that are allowed.
? WORKSPACE allocates workspace memory in kilobytes for some procedures that allocate only
one block of memory. The default is 6148.
? Do not increase the workspace memory allocation unless the program issues a message that
there is not enough memory to complete a procedure

The section on THREADS discourages you from altering the setting.

I will experiment with both of these and see if anything improves.

I must say I didn’t observe an even allocation of work across cores (4 real and 4 virtual) when my sort was running. The overwhelming majority of work was being done by a single core. The others were doing stuff, but not much. There is a big sort running on my work computer as I write and one CPU is maxed out while the other is sitting at about 10-15%.

Can CPU utilization be made unbalanced if there is insufficient memory for second (and subsequent) cores to do anything useful?

Adrian Barnett
Project Officer
Educational Measurement and Analysis
Data and Educational Measurement
DECS
ph 82261080

From: Jon Fry [[hidden email]]
Sent: Tuesday, 11 May 2010 10:52 AM
To: Barnett, Adrian (DECS)
Cc: [hidden email]
Subject: Re: CPU Specifications used for SPSS

Regarding SORT CASES:

SORT CASES can use more memory than it was using here. 32-bit versions use at least 128MB; 64-bit versions use at least 512MB. If WORKSPACE is set higher, it will use the WORKSPACE setting. If the available memory (the result of the preceding calculation) is enough to store the entire dataset, it will sort the data in memory.

If the file might be bigger than the available memory, SORT CASES divides the work among a set of threads so it can make use of multiple cores. It first divides the memory. On Adrian's four core processor, it divided the 512MB available into four 128MB areas (about 131,000 KB) and gave one area to each thread. The number of threads it uses is controlled by SET THREADS.

Jonathan Fry

Barnett, Adrian (DECD)

Re: CPU Specifications used for SPSS

Hi Jonathon

Here is a summary of the results, with the value of WORKSPAVE set at 6148, 100000, 500000, 1000000.

WORKSPACE	Elapsed time, sort	CPU time, sort	Whole task, CPU time	Whole task, elapsed time	No. sorts	Merge, elapsed time	Merge, CPU time
6148	24.60	35.80	55.28	48.66	8.00	24.00	19.50
100000	24.10	37.20	56.88	50.13	8.00	25.90	19.60
500000	34.90	60.90	71.22	50.76	2.00	15.70	10.30
1000000	28.70	46.30	56.41	43.56	2.00	14.80	10.00

Elapsed overall time seems to get worse with increasing memory (counterintuitive) whilst merge times go down as would be expected from having to merge fewer subsets. It tends to support the advice in the manual that you should just let SPSS work out the best way of doing it! ;-)

It is very puzzling (well, to me anyway) that it should make things worse to have more memory available.

Below are the detailed reports from each run

Here’s the report from the listing with WORKSPACE set to 1000000:

sort cases locality_name, street_name,st_type,street_suffix

number_first,number_last,flat_number,LOT_NUMBER.

Row size: 856 bytes

Row count: 932355

Uncompressed file size: 761.1MB

Elapsed time for sort phase: 28.7 seconds

Processor time for sort phase: 46.3 seconds

Total size of files written by sort phase: 252.3MB

Sort #1

Memory budget: 500000KB

Row limit: 1340314

Rows sorted: 339763

Input time: 2.8 seconds

Sort time : 11.9 seconds

Output time: 3.7 seconds

Sort #2

Memory budget: 500000KB

Row limit: 592592

Rows sorted: 592592

Input time: 4.7 seconds

Sort time : 19.0 seconds

Output time: 4.9 seconds

Merge # 1

Sequences merged: 2

Elapsed time: 14.8 seconds

Processor time: 10.0 seconds

Preceding task required 56.41 seconds CPU time; 43.56 seconds elapsed.

Below is the same section after re-setting WORKSPACE to default value of 6148

Row size: 856 bytes

Row count: 932355

Uncompressed file size: 761.1MB

Elapsed time for sort phase: 24.6 seconds

Processor time for sort phase: 35.8 seconds

Total size of files written by sort phase: 252.3MB

Sort #1

Memory budget: 51200KB

Row limit: 60681

Rows sorted: 60681

Input time: 0.7 seconds

Sort time : 1.6 seconds

Output time: 1.1 seconds

Sort #2

Memory budget: 51200KB

Row limit: 137248

Rows sorted: 129980

Input time: 1.0 seconds

Sort time : 3.6 seconds

Output time: 2.0 seconds

Sort #3

Memory budget: 51200KB

Row limit: 134089

Rows sorted: 134089

Input time: 1.1 seconds

Sort time : 3.6 seconds

Output time: 1.3 seconds

Sort #4

Memory budget: 51200KB

Row limit: 136533

Rows sorted: 136533

Input time: 1.1 seconds

Sort time : 3.8 seconds

Output time: 2.2 seconds

Sort #5

Memory budget: 51200KB

Row limit: 137248

Rows sorted: 137248

Input time: 1.1 seconds

Sort time : 3.8 seconds

Output time: 2.1 seconds

Sort #6

Memory budget: 51200KB

Row limit: 137248

Rows sorted: 137248

Input time: 1.0 seconds

Sort time : 4.0 seconds

Output time: 1.3 seconds

Sort #7

Memory budget: 51200KB

Row limit: 137608

Rows sorted: 130862

Input time: 1.0 seconds

Sort time : 3.8 seconds

Output time: 3.0 seconds

Sort #8

Memory budget: 51200KB

Row limit: 136889

Rows sorted: 65714

Input time: 0.5 seconds

Sort time : 1.9 seconds

Output time: 1.3 seconds

Merge # 1

Sequences merged: 8

Elapsed time: 24.0 seconds

Processor time: 19.5 seconds

Preceding task required 55.28 seconds CPU time; 48.66 seconds elapsed.

Here it is set to 500,000

Row size: 856 bytes

Row count: 932355

Uncompressed file size: 761.1MB

Elapsed time for sort phase: 34.9 seconds

Processor time for sort phase: 60.9 seconds

Total size of files written by sort phase: 252.3MB

Sort #1

Memory budget: 250000KB

Row limit: 296296

Rows sorted: 296296

Input time: 2.6 seconds

Sort time : 9.0 seconds

Output time: 2.5 seconds

Sort #2

Memory budget: 250000KB

Row limit: 664935

Rows sorted: 636059

Input time: 5.0 seconds

Sort time : 21.8 seconds

Output time: 5.5 seconds

Merge # 1

Sequences merged: 2

Elapsed time: 15.7 seconds

Processor time: 10.3 seconds

Preceding task required 71.22 seconds CPU time; 50.76 seconds elapsed.

And here it is set to 100000

Row size: 856 bytes

Row count: 932355

Uncompressed file size: 761.1MB

Elapsed time for sort phase: 24.1 seconds

Processor time for sort phase: 37.2 seconds

Total size of files written by sort phase: 252.3MB

Sort #1

Memory budget: 51200KB

Row limit: 60681

Rows sorted: 60681

Input time: 0.7 seconds

Sort time : 1.8 seconds

Output time: 1.9 seconds

Sort #2

Memory budget: 51200KB

Row limit: 137248

Rows sorted: 129980

Input time: 1.1 seconds

Sort time : 3.5 seconds

Output time: 2.7 seconds

Sort #3

Memory budget: 51200KB

Row limit: 134089

Rows sorted: 134089

Input time: 1.1 seconds

Sort time : 3.7 seconds

Output time: 1.2 seconds

Sort #4

Memory budget: 51200KB

Row limit: 136533

Rows sorted: 136533

Input time: 1.1 seconds

Sort time : 4.2 seconds

Output time: 1.2 seconds

Sort #5

Memory budget: 51200KB

Row limit: 137248

Rows sorted: 137248

Input time: 1.1 seconds

Sort time : 4.0 seconds

Output time: 1.4 seconds

Sort #6

Memory budget: 51200KB

Row limit: 137248

Rows sorted: 137248

Input time: 1.1 seconds

Sort time : 4.2 seconds

Output time: 1.1 seconds

Sort #7

Memory budget: 51200KB

Row limit: 137608

Rows sorted: 130862

Input time: 1.1 seconds

Sort time : 4.1 seconds

Output time: 1.3 seconds

Sort #8

Memory budget: 51200KB

Row limit: 136889

Rows sorted: 65714

Input time: 0.6 seconds

Sort time : 2.0 seconds

Output time: 0.8 seconds

Merge # 1

Sequences merged: 8

Elapsed time: 25.9 seconds

Processor time: 19.6 seconds

Preceding task required 56.88 seconds CPU time; 50.13 seconds elapsed.

Adrian Barnett

Project Officer

Educational Measurement and Analysis

Data and Educational Measurement

DECS

ph 82261080

From: Jon Fry [mailto:[hidden email]]
Sent: Thursday, 13 May 2010 1:18 AM
To: Barnett, Adrian (DECS)
Cc: '[hidden email]'
Subject: RE: CPU Specifications used for SPSS

Adrian,

For the two-sort version, what are the reported read, sort, and write times?

Jonathan