SPSSX Discussion

How many cases & variables can be handled in SPSS

Classic

List

Threaded

4 messages Options

How many cases & variables can be handled in SPSS

Hello!

I'm curious about the question, how many variables and cases can be handled
in one SPSS data editor window?

Thank you

Peter

Chetan Oberoi

Re: How many cases & variables can be handled in SPSS

I have handled till 256700 cases and 970 variables in SPSS.

Hope this helps.

Thanks and best regards,
Chetan Oberoi

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
<Peter M?>
Sent: Tuesday, June 05, 2007 12:42 PM
To: [hidden email]
Subject: How many cases & variables can be handled in SPSS

Hello!

I'm curious about the question, how many variables and cases can be
handled
in one SPSS data editor window?

Thank you

Peter

The information in this e-mail is the property of Evalueserve and is confidential and privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken in reliance on it is prohibited and will be unlawful. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

Lemon, John S.

Re: How many cases & variables can be handled in SPSS

In reply to this post by <Peter M?>

I don't have the exact figures to hand but I believe the maximum number
of cases is the largest number the computer can store-1 so it is
something like (2**32)-1 and although in the early days of SPSS there
was a limit of 500 variables I believe that is now something V-E-R-Y
large.

I expect that the people from SPSS will provide the exact values but I
have handled files, albeit slowly, of 1.5 million cases and 100
variables, while conversely I have used files with 180,000 cases and
500+ variables.

Best Wishes

John S. Lemon
DIT - University of Aberdeen
Edward Wright Building: Room G51
Tel: +44 1224 273350
Fax: +44 1224 273372

> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Chetan Oberoi
> Sent: Tuesday, June 05, 2007 8:23 AM
> To: [hidden email]
> Subject: Re: How many cases & variables can be handled in SPSS
>
> I have handled till 256700 cases and 970 variables in SPSS.
>
> Hope this helps.
>
> Thanks and best regards,
> Chetan Oberoi
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> <Peter M?>
> Sent: Tuesday, June 05, 2007 12:42 PM
> To: [hidden email]
> Subject: How many cases & variables can be handled in SPSS
>
> Hello!
>
> I'm curious about the question, how many variables and cases can be
> handled
> in one SPSS data editor window?
>
> Thank you
>
> Peter
>
>
> The information in this e-mail is the property of Evalueserve
> and is confidential and privileged. It is intended solely for
> the addressee. Access to this email by anyone else is
> unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken in
> reliance on it is prohibited and will be unlawful. If you
> receive this message in error, please notify the sender
> immediately and delete all copies of this message.
>

Richard Ristow

Re: How many cases & variables can be handled in SPSS

In reply to this post by <Peter M?>

At 03:11 AM 6/5/2007, <Peter M?> wrote:

>I'm curious about the question, how many variables and cases can be
>handled in one SPSS data editor window?

This is something of a FAQ, last posted Fri, 5 Jan 2007 <13:17:52
-0500>.

LIMITS ON VARIABLES AND CASES

Below is a discussion by Jon Peck of SPSS, Inc., and applies to all
recent versions of SPSS.

I add to what Jon wrote,

. For most operations, increasing the number of cases will increase the
running time about in proportion

. Increasing the number of variables will generally increase the
running time about in proportion, even if you're not using them all,
because the running time is dominated by the time to read the file from
disk, i.e. the total file size

. After some point hard to estimate (though larger if the machine has
more RAM), increasing the number of variables will increase the running
time out of all proportion, because putting the whole dictionary and
plus data for one case in RAM may require paging.

. I emphasize Jon's point that "modern database practice would be to
break up your variables into cohesive subsets", i.e. to restructure
with more cases and fewer variables. A typical example is changing from
one record per entity with data for many years, to one record per
entity per year. I've posted a number of solutions in which data is
given such a 'long' representation, instead of a 'wide' representation
with many variables. But you know your problem, and can judge what's
best done in your instance.

At 10:25 AM 6/5/2003, Peck, Jon [of SPSS, Inc.] wrote:

>There are several points to making regarding very wide files and huge
>datasets.
>
>First, the theoretical SPSS limits are
>
>Number of variables: (2**31) -1
>Number of cases: (2**31) - 1
>
>In calculating these limits, count one for each 8 bytes or part
>thereof of a string variable. An a10 variable counts as two
>variables, for example.
>
>Approaching the theoretical limit on the number of variables, however,
>is a very bad idea in practice for several reasons.
>
>1. These are the theoretical limits in that you absolutely cannot go
>beyond them. But there are other environmentally imposed limits that
>you will surely hit first. For example, Windows applications are
>absolutely limited to 2GB of addressable memory, and 1GB is a more
>practical limit. Each dictionary entry requires about 100 bytes of
>memory, because in addition to the variable name, other variable
>properties also have to be stored. (On non-Windows platforms, SPSS
>Server could, of course, face different environmental
>limits.) Numerical variable values take 8 bytes as they are held as
>double precision floating point values.
>
>2. The overhead of reading and writing extremely wide cases when you
>are doubtless not using more than a small fraction of them will limit
>performance. And you don't want to be paging the variable
>dictionary. If you have lots of RAM, you can probably reach between
>32,000 and 100,000 variables before memory paging degrades performance
>seriously.
>
>3. Dialog boxes cannot display very large variable lists. You can use
>variable sets to restrict the lists to the variables you are really
>using, but lists with thousands of variables will always be awkward.
>
>4. Memory usage is not just about the dictionary. The operating
>system will almost always be paging code and data between memory and
>disk. (You can look at paging rates via the Windows Task
>Manager). The more you page, the slower things get, but the variable
>dictionary is only one among many objects that the operating system is
>juggling. However, there is another effect. On NT and later, Windows
>automatically caches files (code or data) in memory so that it can
>retrieve it quickly. This cache occupies memory that is otherwise
>surplus, so if any application needs it, portions of the cache are
>discarded to make room. You can see this effect quite clearly if you
>start SPSS or any other large application; then shut it down and start
>it again. It will load much more quickly the second time, because it
>is retrieving the code modules needed at startup from memory rather
>than disk. The Windows cache, unfortunately, will not help data
>access very much unless most of the dataset stays in memory, because
>the cache will generally hold the most recently accessed data. If you
>are reading cases sequentially, the one you just finished with is the
>LAST one you will want again.
>
>5. These points apply mainly to the number of variables. The number
>of cases is not subject to the same problems, because the cases are
>not generally all mapped into memory by SPSS (although Windows may
>cache them). However, there are some procedures that because of their
>computational requirements do have to hold the entire dataset in
>memory, so those would not scale well up to immense numbers of cases.
>
>The point of having an essentially unlimited number of variables is
>not that you really need to go to that limit. Rather it is to avoid
>hitting a limit incrementally. It's like infinity. You never want to
>go there, but any value smaller is an arbitrary limit, which SPSS
>tries to avoid. It is better not to have a hard stopping rule.
>
>Modern database practice would be to break up your variables into
>cohesive subsets and combine these with join (MATCH FILES in SPSS)
>operations when you need variables from more than one subset. SPSS is
>not a relational database, but working this way will be much more
>efficient and practical with very large numbers of variables.
>
>
>Regards,
>Jon Peck
>SPSS R & D