Working with Large Datasets?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Working with Large Datasets?

Carolyn Catenhauser
I am currently using the desktop version of SPSS Base and I'm working
with over 1 million cases and SPSS is extremely slow and often crashes.
Aside from changing my computer and/or purchasing the SPSS Server
option, does anyone know of a solution? Thanks for any help!

 

Carolyn

 

________________________________________________________________________
_________________________________

Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
| [hidden email]
<mailto:[hidden email]>  | 816.841.5611

 

 

 


#####################################################################################
This email and any attachments thereto may contain private, confidential,
and privileged material for the sole use of the intended recipient. Any review,
copying, or distribution of this email (or any attachments thereto) by others is
strictly prohibited. If you are not the intended recipient, please contact the sender
immediately and permanently delete the original and any copies of this email and any
attachments thereto.
#####################################################################################

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Working with Large Datasets?

James Wilson-24
Carolyn:

I recently encountered the same problem and I was told that it had to do
with a single-core vs. a dual-core processor. I could run millions of
cases on a computer with a single-core processor (I was using an
aggregated Current Population Survey dataset), but when I was given a
new computer with a dual-core processor, SPSS ran slower than molasses
(I timed it and the program that ran in 15 minutes on my old computer
took nearly 12 hours to run on my new computer). The tech folks at my
university, after multiple discussions with the SPSS gurus, told me that
the new version of SPSS wasn't compatible with dual-core processors when
running large numbers of cases. When I requested and got my old computer
back, I had no problems running the large datasets again. That's the
best information I can give you and who knows how accurate it is. All I
can tell you is that the single-core processor ran the data and the
dual-core processor (despite multiple tweaks and attempted fixes)
wouldn't.

Best,
Jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Carolyn Catenhauser
Sent: Tuesday, December 02, 2008 11:45 AM
To: [hidden email]
Subject: Working with Large Datasets?

I am currently using the desktop version of SPSS Base and I'm working
with over 1 million cases and SPSS is extremely slow and often crashes.
Aside from changing my computer and/or purchasing the SPSS Server
option, does anyone know of a solution? Thanks for any help!



Carolyn



________________________________________________________________________
_________________________________

Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
| [hidden email]
<mailto:[hidden email]>  | 816.841.5611








########################################################################
#############
This email and any attachments thereto may contain private,
confidential,
and privileged material for the sole use of the intended recipient. Any
review,
copying, or distribution of this email (or any attachments thereto) by
others is
strictly prohibited. If you are not the intended recipient, please
contact the sender
immediately and permanently delete the original and any copies of this
email and any
attachments thereto.
########################################################################
#############

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Working with Large Datasets?

Hector Maletta
In reply to this post by Carolyn Catenhauser
Carolyn,
Slow it may be, especially if you have many variables per case, but it
should not crash. I customarily handle several million cases without a
glitch.
One immediate advise is that you should to work with a slimmer file. Save
your complete file, then save a new file (under a different name) specifying
(with the KEEP or DROP keywords in the SAVE command) the variables you need
for your analysis. Do not forget to include also the ID variable/s, just in
case you create some new variables in the process and later need to match
the slim file with the original fat one.
Even with a slim file, a million cases are a million cases, and may take
some minutes to process.
Regarding crashes, they are more likely to occur for other reasons, such as
reaching memory limits. Some procedures need to store the entire dataset in
memory (for instance CATPCA, and I believe also CLUSTER), and you are likely
to lack room for a million cases in your workspace. Most procedures,
however, read the file case by case, and have no such limitations.
Hector
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Carolyn Catenhauser
Sent: 02 December 2008 14:45
To: [hidden email]
Subject: Working with Large Datasets?

I am currently using the desktop version of SPSS Base and I'm working
with over 1 million cases and SPSS is extremely slow and often crashes.
Aside from changing my computer and/or purchasing the SPSS Server
option, does anyone know of a solution? Thanks for any help!



Carolyn



________________________________________________________________________
_________________________________

Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
| [hidden email]
<mailto:[hidden email]>  | 816.841.5611








############################################################################
#########
This email and any attachments thereto may contain private, confidential,
and privileged material for the sole use of the intended recipient. Any
review,
copying, or distribution of this email (or any attachments thereto) by
others is
strictly prohibited. If you are not the intended recipient, please contact
the sender
immediately and permanently delete the original and any copies of this email
and any
attachments thereto.
############################################################################
#########


To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Working with Large Datasets?

Art Kendall
In reply to this post by Carolyn Catenhauser
Possibly doing some routine maintenance on your system will help.
Depending on your platform you may want to do what in Windows is called
"Disk cleanup"
click <my computer>
highlight  the icon or line that represents one of your disks or
partitions e.g.  C:  or D:
click <properties>
note the size of the disk  and how much is used.
click <disk cleanup>  check/uncheck categories of files you want to cleanup.
when done with that
click <tools>
click <defrag now>

Do this for any hard disk you are using.

Do you have all of the updates to your OS?


Do you have a lot of variables?  Are many of them string?
How large is the system file relative to the available disk space?

Are you getting any error messages with a crash?

Have you checked <help> <workspace>?

When you are not going to save transformations or case wise variables
from procedures do you use /keep or /drop on the GET command?

What OS do you have?  What kind of CPU?  What year was your system
built? How much memory (RAM)?  How much Disk space (storage)?  How large
is the system file?


If you are on a Windows platform, have you used task manager to monitor
your cpu and disk use?


Art Kendall
Social Research Consultants



Carolyn Catenhauser wrote:

> I am currently using the desktop version of SPSS Base and I'm working
> with over 1 million cases and SPSS is extremely slow and often crashes.
> Aside from changing my computer and/or purchasing the SPSS Server
> option, does anyone know of a solution? Thanks for any help!
>
>
>
> Carolyn
>
>
>
> ________________________________________________________________________
> _________________________________
>
> Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
> | [hidden email]
> <mailto:[hidden email]>  | 816.841.5611
>
>
>
>
>
>
>
>
> #####################################################################################
> This email and any attachments thereto may contain private, confidential,
> and privileged material for the sole use of the intended recipient. Any review,
> copying, or distribution of this email (or any attachments thereto) by others is
> strictly prohibited. If you are not the intended recipient, please contact the sender
> immediately and permanently delete the original and any copies of this email and any
> attachments thereto.
> #####################################################################################
>
> ===================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Working with Large Datasets?

Hector Maletta
In reply to this post by James Wilson-24
James: I usually process several million cases on my Intel Core Duo laptop
(Dell Inspiron) without any unnatural slowness. I am using version 15
(scared off version 16 by reports of multiple bugs, and do not have v.17
yet). I do not know which version is "the new version of SPSS" you mention
in your message. I am sure it is not v.15, and I hope it is not v.17.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
James Wilson
Sent: 02 December 2008 15:14
To: [hidden email]
Subject: Re: Working with Large Datasets?

Carolyn:

I recently encountered the same problem and I was told that it had to do
with a single-core vs. a dual-core processor. I could run millions of
cases on a computer with a single-core processor (I was using an
aggregated Current Population Survey dataset), but when I was given a
new computer with a dual-core processor, SPSS ran slower than molasses
(I timed it and the program that ran in 15 minutes on my old computer
took nearly 12 hours to run on my new computer). The tech folks at my
university, after multiple discussions with the SPSS gurus, told me that
the new version of SPSS wasn't compatible with dual-core processors when
running large numbers of cases. When I requested and got my old computer
back, I had no problems running the large datasets again. That's the
best information I can give you and who knows how accurate it is. All I
can tell you is that the single-core processor ran the data and the
dual-core processor (despite multiple tweaks and attempted fixes)
wouldn't.

Best,
Jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Carolyn Catenhauser
Sent: Tuesday, December 02, 2008 11:45 AM
To: [hidden email]
Subject: Working with Large Datasets?

I am currently using the desktop version of SPSS Base and I'm working
with over 1 million cases and SPSS is extremely slow and often crashes.
Aside from changing my computer and/or purchasing the SPSS Server
option, does anyone know of a solution? Thanks for any help!



Carolyn



________________________________________________________________________
_________________________________

Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
| [hidden email]
<mailto:[hidden email]>  | 816.841.5611








########################################################################
#############
This email and any attachments thereto may contain private,
confidential,
and privileged material for the sole use of the intended recipient. Any
review,
copying, or distribution of this email (or any attachments thereto) by
others is
strictly prohibited. If you are not the intended recipient, please
contact the sender
immediately and permanently delete the original and any copies of this
email and any
attachments thereto.
########################################################################
#############

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Working with Large Datasets?

Hector Maletta
In reply to this post by Hector Maletta
Carolyn,
The save command, as all other commands, can be issued by syntax, opening a
syntax window (File-Open-Syntax), on which you write the command:

SAVE OUTFILE 'C:\My files\newfile.sav'/keep ID VAR1 VAR2 VAR3 VAR4.
You'd better learn a bit about using syntax besides the menu interface.
Start by going to the menu, issuing any instructions, and instead of OK
press PASTE: the corresponding syntax will be pasted on a syntax window for
your inspection and eventual execution. Press help for more enlightenment.

From the menu interface, go to the data editor, press File, SAVE AS, write
the name of the new file, press the VARIABLES button and select the
variables to keep. You may then press OK to execute this order, or PASTE to
have it written as a syntax command, and execute it from there (to execute a
syntax command or a set of commands use the RUN menu item in the syntax
window).

Hector


-----Original Message-----
From: karen lewis [mailto:[hidden email]]
Sent: 02 December 2008 15:43
To: 'Hector Maletta'
Subject: RE: Working with Large Datasets?

Where is the "save" command?

Karen

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Tuesday, December 02, 2008 12:15 PM
To: [hidden email]
Subject: Re: Working with Large Datasets?

Carolyn,
Slow it may be, especially if you have many variables per case, but it
should not crash. I customarily handle several million cases without a
glitch.
One immediate advise is that you should to work with a slimmer file. Save
your complete file, then save a new file (under a different name) specifying
(with the KEEP or DROP keywords in the SAVE command) the variables you need
for your analysis. Do not forget to include also the ID variable/s, just in
case you create some new variables in the process and later need to match
the slim file with the original fat one.
Even with a slim file, a million cases are a million cases, and may take
some minutes to process.
Regarding crashes, they are more likely to occur for other reasons, such as
reaching memory limits. Some procedures need to store the entire dataset in
memory (for instance CATPCA, and I believe also CLUSTER), and you are likely
to lack room for a million cases in your workspace. Most procedures,
however, read the file case by case, and have no such limitations.
Hector
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Carolyn Catenhauser
Sent: 02 December 2008 14:45
To: [hidden email]
Subject: Working with Large Datasets?

I am currently using the desktop version of SPSS Base and I'm working
with over 1 million cases and SPSS is extremely slow and often crashes.
Aside from changing my computer and/or purchasing the SPSS Server
option, does anyone know of a solution? Thanks for any help!



Carolyn



________________________________________________________________________
_________________________________

Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
| [hidden email]
<mailto:[hidden email]>  | 816.841.5611








############################################################################
#########
This email and any attachments thereto may contain private, confidential,
and privileged material for the sole use of the intended recipient. Any
review,
copying, or distribution of this email (or any attachments thereto) by
others is
strictly prohibited. If you are not the intended recipient, please contact
the sender
immediately and permanently delete the original and any copies of this email
and any
attachments thereto.
############################################################################
#########


To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Un-comment syntax in V17 Syntax Editor?

Arthur Burke
Is there a way to un-comment a block of syntax in the V17 Syntax Editor?
I turned the comment feature on and can't get it to turn off.

Art

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Working with Large Datasets?

Oliver, Richard
In reply to this post by Hector Maletta
You can also save subsets of variables from the dialog interface. From the menus in the Data Editor Window, choose File>Save As. Then click the Variables button in the Save dialog.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta
Sent: Tuesday, December 02, 2008 11:55 AM
To: [hidden email]
Subject: Re: Working with Large Datasets?

Carolyn,
The save command, as all other commands, can be issued by syntax, opening a
syntax window (File-Open-Syntax), on which you write the command:

SAVE OUTFILE 'C:\My files\newfile.sav'/keep ID VAR1 VAR2 VAR3 VAR4.
You'd better learn a bit about using syntax besides the menu interface.
Start by going to the menu, issuing any instructions, and instead of OK
press PASTE: the corresponding syntax will be pasted on a syntax window for
your inspection and eventual execution. Press help for more enlightenment.

From the menu interface, go to the data editor, press File, SAVE AS, write
the name of the new file, press the VARIABLES button and select the
variables to keep. You may then press OK to execute this order, or PASTE to
have it written as a syntax command, and execute it from there (to execute a
syntax command or a set of commands use the RUN menu item in the syntax
window).

Hector


-----Original Message-----
From: karen lewis [mailto:[hidden email]]
Sent: 02 December 2008 15:43
To: 'Hector Maletta'
Subject: RE: Working with Large Datasets?

Where is the "save" command?

Karen

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Tuesday, December 02, 2008 12:15 PM
To: [hidden email]
Subject: Re: Working with Large Datasets?

Carolyn,
Slow it may be, especially if you have many variables per case, but it
should not crash. I customarily handle several million cases without a
glitch.
One immediate advise is that you should to work with a slimmer file. Save
your complete file, then save a new file (under a different name) specifying
(with the KEEP or DROP keywords in the SAVE command) the variables you need
for your analysis. Do not forget to include also the ID variable/s, just in
case you create some new variables in the process and later need to match
the slim file with the original fat one.
Even with a slim file, a million cases are a million cases, and may take
some minutes to process.
Regarding crashes, they are more likely to occur for other reasons, such as
reaching memory limits. Some procedures need to store the entire dataset in
memory (for instance CATPCA, and I believe also CLUSTER), and you are likely
to lack room for a million cases in your workspace. Most procedures,
however, read the file case by case, and have no such limitations.
Hector
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Carolyn Catenhauser
Sent: 02 December 2008 14:45
To: [hidden email]
Subject: Working with Large Datasets?

I am currently using the desktop version of SPSS Base and I'm working
with over 1 million cases and SPSS is extremely slow and often crashes.
Aside from changing my computer and/or purchasing the SPSS Server
option, does anyone know of a solution? Thanks for any help!



Carolyn



________________________________________________________________________
_________________________________

Carolyn Catenhauser, M.A. | Service Management Group | Research Manager
| [hidden email]
<mailto:[hidden email]>  | 816.841.5611








############################################################################
#########
This email and any attachments thereto may contain private, confidential,
and privileged material for the sole use of the intended recipient. Any
review,
copying, or distribution of this email (or any attachments thereto) by
others is
strictly prohibited. If you are not the intended recipient, please contact
the sender
immediately and permanently delete the original and any copies of this email
and any
attachments thereto.
############################################################################
#########


To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Un-comment syntax in V17 Syntax Editor?

ViAnn Beadle
In reply to this post by Arthur Burke
Delete the *. This is a very primitive commenter!

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Arthur Burke
Sent: Tuesday, December 02, 2008 11:17 AM
To: [hidden email]
Subject: Un-comment syntax in V17 Syntax Editor?

Is there a way to un-comment a block of syntax in the V17 Syntax Editor?
I turned the comment feature on and can't get it to turn off.

Art

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD