Hardware recommendations?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Hardware recommendations?

Charley Trimble
It seems that the type of data files I analyze become bigger at the outset and then become even bigger as I create variables and run various types of analysis.  As a consequence and as my c: drive becomed increasingly stacked with my saved data or miscellaneous files, the processing time becomes slower and slower.  I keep my c: drive relatively clear and use an external hard drive for that purpose.  Also, I use the temp file option when processing off of the external hard drive but still, the processing time seems to grow slower.
Currently, I use the SPSS 15 & SPSS 16 desktop versions and rely upon an Intel dual core processor, 2.33 GHz, 667 Mhz and 4.0GB SDRAM, and a 160GB hard drive.
I'm contemplating going to a more powerful desktop but before I plunge into my pocketbook, I was wondering if an SPSS user could recommend a hardware set-up that will give me the faster processing times.  I've not used a server version but I'm open to whatever the best recommendation is for processing bigger and bigger files.  Thanks.
[hidden email]

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Hardware recommendations?

Albert-Jan Roskam
Hi,

Your hardware configuration doesn't sound too shabby to me at all. Of course, you can always buy faster, bigger, better, hardware, but still. I also work routinely with huge files, and one way to speed things up is to use the SAMPLE command in conjunction with the SET SEED command. If your file is not sorted in a particular way, N OF CASES is also an option. Set seed allows you to reproduce the random sample, and you can use the Mersienne Twister algorithm for sampling:
SET SEED = 4321 RNG = MT.
SAMPLE .01.
In addition, you can increase the workspace:
SHOW WORKSPACE.
SET WORKSPACE 20000.
You'll get error messages when you set it to high.

After you've tuned/debugged your syntax, you can run it on the entire dataset. Meanwhile, you may have to grab a cup of coffee --not a punishment in my eyes ;-)

More generally, it saves a lot of time to save up on data passes. One way to do that is to use EXECUTE sparingly (see Spss programming and data management by Raynald Levesque, freely downloadble; see also this list).

Cheers!!
Albert-Jan

--- On Thu, 7/24/08, Charley Trimble <[hidden email]> wrote:

> From: Charley Trimble <[hidden email]>
> Subject: Hardware recommendations?
> To: [hidden email]
> Date: Thursday, July 24, 2008, 7:01 PM
> It seems that the type of data files I analyze become bigger
> at the outset and then become even bigger as I create
> variables and run various types of analysis.  As a
> consequence and as my c: drive becomed increasingly
> stacked with my saved data or miscellaneous files, the
> processing time becomes slower and slower.  I keep my c:
> drive relatively clear and use an external hard drive for
> that purpose.  Also, I use the temp file option when
> processing off of the external hard drive but still, the
> processing time seems to grow slower.
> Currently, I use the SPSS 15 & SPSS 16 desktop versions
> and rely upon an Intel dual core processor, 2.33 GHz, 667
> Mhz and 4.0GB SDRAM, and a 160GB hard drive.
> I'm contemplating going to a more powerful desktop but
> before I plunge into my pocketbook, I was wondering if an
> SPSS user could recommend a hardware set-up that will give
> me the faster processing times.  I've not used a
> server version but I'm open to whatever the best
> recommendation is for processing bigger and bigger files.
> Thanks.
> [hidden email]
>
> ====================To manage your subscription to SPSSX-L,
> send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Hardware recommendations?

Han, Joanne
Your hardware configuration looks fine.  What graphic card do you have?
I have been shopping for a laptop/desktop and the salesperson mentioned
that many of the analysis/modeling software have begun to rely more
heavily on graphic cards to do the computing.  They do this in order to
reduce the processing time for very complex computation.

Good luck!

Joanne

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Albert-jan Roskam
Sent: Thursday, July 24, 2008 12:32 PM
To: [hidden email]
Subject: Re: Hardware recommendations?

Hi,

Your hardware configuration doesn't sound too shabby to me at all. Of
course, you can always buy faster, bigger, better, hardware, but still.
I also work routinely with huge files, and one way to speed things up is
to use the SAMPLE command in conjunction with the SET SEED command. If
your file is not sorted in a particular way, N OF CASES is also an
option. Set seed allows you to reproduce the random sample, and you can
use the Mersienne Twister algorithm for sampling:
SET SEED = 4321 RNG = MT.
SAMPLE .01.
In addition, you can increase the workspace:
SHOW WORKSPACE.
SET WORKSPACE 20000.
You'll get error messages when you set it to high.

After you've tuned/debugged your syntax, you can run it on the entire
dataset. Meanwhile, you may have to grab a cup of coffee --not a
punishment in my eyes ;-)

More generally, it saves a lot of time to save up on data passes. One
way to do that is to use EXECUTE sparingly (see Spss programming and
data management by Raynald Levesque, freely downloadble; see also this
list).

Cheers!!
Albert-Jan

--- On Thu, 7/24/08, Charley Trimble <[hidden email]>
wrote:

> From: Charley Trimble <[hidden email]>
> Subject: Hardware recommendations?
> To: [hidden email]
> Date: Thursday, July 24, 2008, 7:01 PM
> It seems that the type of data files I analyze become bigger
> at the outset and then become even bigger as I create
> variables and run various types of analysis.  As a
> consequence and as my c: drive becomed increasingly
> stacked with my saved data or miscellaneous files, the
> processing time becomes slower and slower.  I keep my c:
> drive relatively clear and use an external hard drive for
> that purpose.  Also, I use the temp file option when
> processing off of the external hard drive but still, the
> processing time seems to grow slower.
> Currently, I use the SPSS 15 & SPSS 16 desktop versions
> and rely upon an Intel dual core processor, 2.33 GHz, 667
> Mhz and 4.0GB SDRAM, and a 160GB hard drive.
> I'm contemplating going to a more powerful desktop but
> before I plunge into my pocketbook, I was wondering if an
> SPSS user could recommend a hardware set-up that will give
> me the faster processing times.  I've not used a
> server version but I'm open to whatever the best
> recommendation is for processing bigger and bigger files.
> Thanks.
> [hidden email]
>
> ====================To manage your subscription to SPSSX-L,
> send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Hardware recommendations?

Richard Ristow
In reply to this post by Charley Trimble
At 01:01 PM 7/24/2008, Charley Trimble wrote:

>It seems that the type of data files I analyze
>become bigger at the outset and then become even
>bigger as I create variables and run various
>types of analysis.  As a consequence and as my
>c: drive becomed increasingly stacked with my
>saved data or miscellaneous files, the
>processing time becomes slower and slower.  I
>keep my c: drive relatively clear and use an
>external hard drive for that purpose.
>
>I rely upon an Intel dual core processor, 2.33 GHz, 667 Mhz and 4.0GB SDRAM

Those are probably much more than adequate for
your work. I doubt you'd see any improvement from an upgrade.

>... and a 160GB hard drive. As my c: drive
>becomed increasingly stacked with my saved data
>or miscellaneous files, the processing time
>becomes slower and slower.  I keep my c: drive
>relatively clear and use an external hard drive for that purpose.

In most SPSS runs, most of the time is spent
reading from disk, or writing to disk. That's probably true of yours.

Always, look first to change the code and logic to use less disk traffic.

If you use EXECUTE (exe.) statements, remove all
except those very few that are needed. (See "Use
EXECUTE Sparingly" in any edition of Levesque,
Raynald, "SPSS® Programming and Data Management,
A Guide for SPSS® and SAS® Users". SPSS, Inc.,
Chicago, IL, various dates. It is downloadable
free from the SPSS, Inc., Web site.)

Then, it's a good idea to use CACHE after any
transformation program that drops most of the
variables or most of the cases, and to create
analysis files with only the variables needed for the analysis.

Beyond that, one would have to look at your code.

>Also, I use the temp file option when processing
>off of the external hard drive

That may be a problem. Check the data transfer
rate for the external drive; it may be much slower than for the internal drive.

>my c: drive becomed increasingly stacked with my
>saved data or miscellaneous files

Another warning sign. It's wise, simply on
general principles, to have free disk space of
several times the total needed for your data.
Among other things, when space is tight, disk
fragmentation will get worse more rapidly. (You could try de-fragmenting.)

But the hardware improvements that are likely to help are,

. A larger c:\ drive, looking for one with the highest possible transfer rate

. A second internal hard drive, also with high
transfer rate; and then adjusting your logic so
that, where possible, data is being read from one
of the drives and written to the other.

-Best of luck to you,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Hardware recommendations?

Marta Garcia-Granero
Just a couple of comments:

- There is a nice freeware utility called ATF-cleaner (easy to find, if
you don't I can send it to you, it's 50 kb) that cleans the drive of all
garbage and temporary files. Follow the cleaning with  a good
defragmentation (I have that scheduled for fridays) and disk speed will
increase a lot.
- Set the temporary folder used by SPSS in a different drive than
Windows swap file. This works like a charm for Adobe Photoshop when
handling very big picture files (in raw format).

Best regards,
Marta

>
> Another warning sign. It's wise, simply on
> general principles, to have free disk space of
> several times the total needed for your data.
> Among other things, when space is tight, disk
> fragmentation will get worse more rapidly. (You could try
> de-fragmenting.)
>
> But the hardware improvements that are likely to help are,
>
> . A larger c:\ drive, looking for one with the highest possible
> transfer rate
>
> . A second internal hard drive, also with high
> transfer rate; and then adjusting your logic so
> that, where possible, data is being read from one
> of the drives and written to the other.
>
> -Best of luck to you,
>  Richard
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD