SPSSX Discussion

SPSS Database Capacity

Classic

List

Threaded

14 messages Options

btafoya

SPSS Database Capacity

I have a population of 1.5 Million records that I want to use in SPSS. Will it work? What platform will I need to handle something like this? I will be taking random samples but will need to get descriptive statistics from the entire population.

David Marso

Re: SPSS Database Capacity

Administrator

1.5M records is *NOT* large WRT capacity.
IIRC: Something like 2 Billion cases/2 Billion Variables.
Pretty much only limited by disk space.
---
The following simulates .5M records wit 120 vars in a few minutes.
Generates Descriptives in about 1 minute on my MacBookPro (circa 2007).

* Simulate raw data *.
INPUT PROGRAM.
LOOP CASEID=1 TO 1500000.
DO REPEAT v=v001 to v120.
compute v=trunc(uniform(100)).
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
exe.
DESC ALL.

----

btafoya wrote

I have a population of 1.5 Million records that I want to use in SPSS. Will it work? What platform will I need to handle something like this? I will be taking random samples but will need to get descriptive statistics from the entire population.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

MacGillivary Heather L

Automatic reply: SPSS Database Capacity

I am out of the office the afternoon of July 9th and will reply to emails at that time.

If you need immediate assistance, please contact Kay Gates 303-982-6565 or [hidden email]

Heather

MaxJasper

Re: SPSS Database Capacity

In reply to this post by btafoya

Take a look at SPSS Complex Samples too.

Max.

John Fiedler

Re: SPSS Database Capacity

In reply to this post by David Marso

About a minute to create and run DESC on David's example using a five year
old 64 bit ThinkPad.
No problems running 3.5 Bln records with two dozen variables on same PC.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Tuesday, June 26, 2012 7:52 PM
To: [hidden email]
Subject: Re: SPSS Database Capacity

1.5M records is *NOT* large WRT capacity.
IIRC: Something like 2 Billion cases/2 Billion Variables.
Pretty much only limited by disk space.
---
The following simulates .5M records wit 120 vars in a few minutes.
Generates Descriptives in about 1 minute on my MacBookPro (circa 2007).

* Simulate raw data *.
INPUT PROGRAM.
LOOP CASEID=1 TO 1500000.
DO REPEAT v=v001 to v120.
compute v=trunc(uniform(100)).
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
exe.
DESC ALL.

----

btafoya wrote
>
> I have a population of 1.5 Million records that I want to use in SPSS.
> Will it work? What platform will I need to handle something like this?
> I will be taking random samples but will need to get descriptive
> statistics from the entire population.
>

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp57138
09p5713810.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: SPSS Database Capacity

Administrator

In reply to this post by btafoya

And of course you can always DL a trial and beat the hell out of it and see it suffices for you!

btafoya wrote

I have a population of 1.5 Million records that I want to use in SPSS. Will it work? What platform will I need to handle something like this? I will be taking random samples but will need to get descriptive statistics from the entire population.

Jon K Peck

Re: SPSS Database Capacity

In reply to this post by btafoya

1.5M records is not a large file by SPSS standards.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: btafoya <[hidden email]>
To: [hidden email]
Date: 06/26/2012 08:39 PM
Subject: [SPSSX-L] SPSS Database Capacity
Sent by: "SPSSX(r) Discussion" <[hidden email]>

I have a population of 1.5 Million records that I want to use in SPSS. Will it work? What platform will I need to handle something like this? I will be taking random samples but will need to get descriptive statistics from the entire population. -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

David Marso

Re: SPSS Database Capacity

Administrator

Jon K Peck wrote

1.5M records is not a large file by SPSS standards.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: btafoya <[hidden email]>
To: [hidden email]
Date: 06/26/2012 08:39 PM
Subject: [SPSSX-L] SPSS Database Capacity
Sent by: "SPSSX(r) Discussion" <[hidden email]>

I have a population of 1.5 Million records that I want to use in SPSS.
Will
it work? What platform will I need to handle something like this? I will
be
taking random samples but will need to get descriptive statistics from the
entire population.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Albert-Jan Roskam

Re: SPSS Database Capacity

Hi,

See the line "How many variables and cases can SPSS for windows handle?" on http://www.spsstools.net/FAQ.htm Not sure what the differences are when on a 64 bit architecture.

Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: David Marso <[hidden email]>
To: [hidden email]
Sent: Friday, June 29, 2012 5:56 AM
Subject: Re: [SPSSX-L] SPSS Database Capacity

Hi Jon,
Please (when you have a moment) restate the actual *ABSOLUTE* max cases X
max vars for both 32 bit and 64 bits maybe even 132 bit systems just for the
hell of it (assuming infinite disk space and RAM). Would love a core dump
of the theory.

Jon K Peck wrote

>
> 1.5M records is not a large file by SPSS standards.
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> peck@.ibm
> new phone: 720-342-5621
>
>
>
>
> From: btafoya <btafoya@>
> To: SPSSX-L@.uga
> Date: 06/26/2012 08:39 PM
> Subject: [SPSSX-L] SPSS Database Capacity
> Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga>
>
>
>
> I have a population of 1.5 Million records that I want to use in SPSS.
> Will
> it work? What platform will I need to handle something like this? I will
> be
> taking random samples but will need to get descriptive statistics from the
> entire population.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809p5713886.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rick Oliver-3

Re: SPSS Database Capacity

I don't think the information on that page is entirely accurate. I'm not aware of any theoretical limits on either cases or variables. Your computing environment will determine practical limits.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]
Phone: 312.893.4922 | T/L: 206-4922

From: Albert-Jan Roskam <[hidden email]>
To: [hidden email]
Date: 06/29/2012 05:10 AM
Subject: Re: SPSS Database Capacity
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi,

See the line "How many variables and cases can SPSS for windows handle?" on http://www.spsstools.net/FAQ.htm Not sure what the differences are when on a 64 bit architecture.

Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: David Marso <[hidden email]>
To: [hidden email]
Sent: Friday, June 29, 2012 5:56 AM
Subject: Re: [SPSSX-L] SPSS Database Capacity

Hi Jon,
Please (when you have a moment) restate the actual *ABSOLUTE* max cases X
max vars for both 32 bit and 64 bits maybe even 132 bit systems just for the
hell of it (assuming infinite disk space and RAM). Would love a core dump
of the theory.

Jon K Peck wrote
>
> 1.5M records is not a large file by SPSS standards.
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> peck@.ibm
> new phone: 720-342-5621
>
>
>
>
> From: btafoya <btafoya@>
> To: SPSSX-L@.uga
> Date: 06/26/2012 08:39 PM
> Subject: [SPSSX-L] SPSS Database Capacity
> Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga>
>
>
>
> I have a population of 1.5 Million records that I want to use in SPSS.
> Will
> it work? What platform will I need to handle something like this? I will
> be
> taking random samples but will need to get descriptive statistics from the
> entire population.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809p5713886.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck

Re: SPSS Database Capacity

In reply to this post by David Marso

Information on absolute limits on cases and variables in Statistics isn't readily available, because those limits are far beyond practical constraints imposed by the hardware and OS. IOW, you can assume that no such limits exist on the part of Statistics. However, there are a few procedures - mainly time series oriented, that require the data to be held in memory and are thus likely to run out of resources sooner than the general case. Since Statistics mostly does not hold all the cases in memory, you will run out of patience before you run out of memory.

Our QA testbed goes up to 66,000 variables and more than 4 billion cases. That is an absurd number of variables, but there you are. Certainly, keeping the number of variables down to a reasonable number will speed performance. And I often wonder why people are often so averse to taking random samples of vast datasets except in situations where they are searching for needle-in-haystack very rare events.

BTW, I learned an interesting fact recently about the Watson system that won the Jeopardy contest. It had 155 terabytes of physical memory. Don't try that at home.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: David Marso <[hidden email]>
To: [hidden email]
Date: 06/28/2012 09:59 PM
Subject: Re: [SPSSX-L] SPSS Database Capacity
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi Jon, Please (when you have a moment) restate the actual *ABSOLUTE* max cases X max vars for both 32 bit and 64 bits maybe even 132 bit systems just for the hell of it (assuming infinite disk space and RAM). Would love a core dump of the theory. Jon K Peck wrote > > 1.5M records is not a large file by SPSS standards. > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > new phone: 720-342-5621 > > > > > From: btafoya <btafoya@> > To: SPSSX-L@.uga > Date: 06/26/2012 08:39 PM > Subject: [SPSSX-L] SPSS Database Capacity > Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga> > > > > I have a population of 1.5 Million records that I want to use in SPSS. > Will > it work? What platform will I need to handle something like this? I will > be > taking random samples but will need to get descriptive statistics from the > entire population. > > -- > View this message in context: >http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809.html> > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809p5713886.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Maguin, Eugene

Re: SPSS Database Capacity

Jon,

Perhaps you would comment on this please.

Recently somebody asked me off-list about display problems opening a 37Gb SPSS file with 90.546M records (rows). This person said he had a pop-up message that SPSS can ‘only’ display 90,500,000 rows. Are there display limits and if so, are those limits determined by the specifications of the computer. What specifications go into determining the limits and are there settings that can be tweaked to increase the limits?

Thanks, Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Friday, June 29, 2012 11:03 AM
To: [hidden email]
Subject: Re: SPSS Database Capacity

Hi Jon,
Please (when you have a moment) restate the actual *ABSOLUTE* max cases X
max vars for both 32 bit and 64 bits maybe even 132 bit systems just for the
hell of it (assuming infinite disk space and RAM). Would love a core dump
of the theory.

Jon K Peck wrote
>
> 1.5M records is not a large file by SPSS standards.
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> [hidden email]
> new phone: 720-342-5621
>
>
>
>
> From: btafoya <btafoya@>
> To: [hidden email]
> Date: 06/26/2012 08:39 PM
> Subject: [SPSSX-L] SPSS Database Capacity
> Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga>
>
>
>
> I have a population of 1.5 Million records that I want to use in SPSS.
> Will
> it work? What platform will I need to handle something like this? I will
> be
> taking random samples but will need to get descriptive statistics from the
> entire population.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809p5713886.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Poes, Matthew Joseph

Re: SPSS Database Capacity

In reply to this post by Jon K Peck

I believe the aversion to random sampling (as I experience it) is that people often have a tendency to hoard. There is reluctance to give up the data, given the time and money spent on collecting it. Often there is also a belief that if your sample is actually a population, then then you should analyze the entire population, instead of a sample of it, because its somehow more valid. There doesn’t seem to be a clear understanding that you can infer with near total confidence everything you need from just a sample, as compared with the entire population. Added to that is the common misconceptions with regard to drawing multiple random samples to confirm results. People often don’t understand the statistics they want or use, and don’t understand why we can’t just give them a single number. I believe its related to the same mindset that has led to numerous requests for how to do a multiple imputation and save out a single data set.

To echo what Jon has mentioned on some practical limits, some of my analysis work has required that I allow the program to run overnight or for a few hours, and increased variables and cases are frequently the result. One example you could run into is ridge regression. Another one I use far more often are the mixed modeling (MLM/HLM, etc.). With regard to MLM, I have found that the most common cause of overnight run times is a misspecification of the model in which I attempted to allow too many variables to be random to the point of absurdity. Occasionally this will happen with a reasonably specified model, where my a-priori theory that some factor be allowed to vary random is reasonable, but where the number of varied levels and amount of cases is so large that it takes quite a long time to converge on a final model. Again the main cause is a large number of levels to said model, and many blocks within higher levels. This is not an SPSS problem (maybe the efficiency of the algorithm, but I don’t know that others are better), rather it’s a hardware resource problem.

Matthew J Poes

Research Data Specialist

Center for Prevention Research and Development

University of Illinois

510 Devonshire Dr.

Champaign, IL 61820

Phone: 217-265-4576

email: [hidden email]

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Friday, June 29, 2012 10:03 AM
To: [hidden email]
Subject: Re: SPSS Database Capacity

Jon K Peck

Re: SPSS Database Capacity

In reply to this post by Maguin, Eugene

That is a display issue with the Data Editor. In some older versions of SPSS/Statistics (maybe still present, but I think it is gone), the DE is limited in how many rows it will display as a memory management issue. It has nothing to do with processing capabilities in other parts of the system. Obviously no one is going to page through huge numbers of rows. I don't think the 90 million number is correct, though. I think the limit was more like 90,000, but it was configurable (virtual rows). With Server, it is possible to suppress the DE altogether, which is appropriate for very large datasets.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: "Maguin, Eugene" <[hidden email]>
To: [hidden email]
Date: 06/29/2012 09:21 AM
Subject: Re: [SPSSX-L] SPSS Database Capacity
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Jon,

Perhaps you would comment on this please.
Recently somebody asked me off-list about display problems opening a 37Gb SPSS file with 90.546M records (rows). This person said he had a pop-up message that SPSS can ‘only’ display 90,500,000 rows. Are there display limits and if so, are those limits determined by the specifications of the computer. What specifications go into determining the limits and are there settings that can be tweaked to increase the limits?

Thanks, Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck
Sent: Friday, June 29, 2012 11:03 AM
To: [hidden email]
Subject: Re: SPSS Database Capacity

Information on absolute limits on cases and variables in Statistics isn't readily available, because those limits are far beyond practical constraints imposed by the hardware and OS. IOW, you can assume that no such limits exist on the part of Statistics. However, there are a few procedures - mainly time series oriented, that require the data to be held in memory and are thus likely to run out of resources sooner than the general case. Since Statistics mostly does not hold all the cases in memory, you will run out of patience before you run out of memory.

Our QA testbed goes up to 66,000 variables and more than 4 billion cases. That is an absurd number of variables, but there you are. Certainly, keeping the number of variables down to a reasonable number will speed performance. And I often wonder why people are often so averse to taking random samples of vast datasets except in situations where they are searching for needle-in-haystack very rare events.

BTW, I learned an interesting fact recently about the Watson system that won the Jeopardy contest. It had 155 terabytes of physical memory. Don't try that at home.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@...
new phone: 720-342-5621

From: David Marso <david.marso@...>
To: [hidden email]
Date: 06/28/2012 09:59 PM
Subject: Re: [SPSSX-L] SPSS Database Capacity
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi Jon,
Please (when you have a moment) restate the actual *ABSOLUTE* max cases X
max vars for both 32 bit and 64 bits maybe even 132 bit systems just for the
hell of it (assuming infinite disk space and RAM). Would love a core dump
of the theory.

Jon K Peck wrote
>
> 1.5M records is not a large file by SPSS standards.
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> peck@.ibm
> new phone: 720-342-5621
>
>
>
>
> From: btafoya <btafoya@>
> To: [hidden email]
> Date: 06/26/2012 08:39 PM
> Subject: [SPSSX-L] SPSS Database Capacity
> Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga>
>
>
>
> I have a population of 1.5 Million records that I want to use in SPSS.
> Will
> it work? What platform will I need to handle something like this? I will
> be
> taking random samples but will need to get descriptive statistics from the
> entire population.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-Database-Capacity-tp5713809p5713886.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD