SPSSX Discussion

Extremely long processing time for PASW file

Classic

List

Threaded

5 messages Options

Kawashima-Ginsberg, Kei

Extremely long processing time for PASW file

Dear listserv members;

I’m working on a series of datasets that only has about 500 variables and 1000 cases in each. The files are all really large for the content (36mb to 40mb), and when I try to process them (get the file, merge the files, run restructuring), they are so slow that my computer can’t finish processing them, and it seems to take all of my processing memory (3GB). I don’t think it’s a file size problem because I handle much larger (from a different study) datasets and it usually takes only seconds. Could someone help me understand what’s wrong with these files, and if there are ways to make the files workable?

Thanks!

Kei

Kei Kawashima-Ginsberg, Ph.D.

Lead Researcher

The Center for Information and Research on Civic Learning and Engagement

Jonathan M. Tisch College of Citizenship and Public Service

Tufts University

617-627-2529

www.civicyouth.org

J. R. Carroll

Re: Extremely long processing time for PASW file

Kei,

I have some general questions that I think the other listers would like to know as well before we can help (unless someone has already come across this problem, and is a known issue!). There are lots of reasons why this problem might occur. Can you answer the following:

Windows or Mac?
How many files are you merging?
What version of SPSS are you using and are you patched up to the newest version?
Are you restructuring/merging via syntax - if so, can you copy/paste so we can see?
Is SPSS reporting any errors to you or is it just crashing?
Are the files you are trying to merge native to SPSS (i.e. .sps) or are they in different formats (i.e. .txt, or .csv) - (this shouldn't be a problem - but every bit of new information will help!)?
Also, how do you know it is taking all of your PC's 3GB of memory (i.e. if you are using Windows, are you watching memory usage through task manager?) or was this just a generalization based on your PC slowing down?

Thanks,

J. R. Carroll
Grad. Student in Pre-Doc Psychology at CSUS
Research Assistant for Just About Everyone.
Email: [hidden email] -or- [hidden email]
Phone: (916) 628-4204

On Thu, Oct 7, 2010 at 7:28 AM, Kawashima-Ginsberg, Kei <[hidden email]> wrote:

Dear listserv members;

I’m working on a series of datasets that only has about 500 variables and 1000 cases in each. The files are all really large for the content (36mb to 40mb), and when I try to process them (get the file, merge the files, run restructuring), they are so slow that my computer can’t finish processing them, and it seems to take all of my processing memory (3GB). I don’t think it’s a file size problem because I handle much larger (from a different study) datasets and it usually takes only seconds. Could someone help me understand what’s wrong with these files, and if there are ways to make the files workable?

Thanks!

Kei

Kei Kawashima-Ginsberg, Ph.D.

Lead Researcher

The Center for Information and Research on Civic Learning and Engagement

Jonathan M. Tisch College of Citizenship and Public Service

Tufts University

617-627-2529

www.civicyouth.org

Jon K Peck

Re: Extremely long processing time for PASW file

In reply to this post by Kawashima-Ginsberg, Kei

Two possibilities come to mind. First, might this file have a huge collection of documents in it? These carry over through merges, so sometimes they build up. You can use Utilities>Data File Comments to see what is there and the DROP DOCUMENTS command to get rid of them.

Another possibility is that you have a huge collection of value labels, even if they are not being used.

HTH,
Jon Peck

From: "Kawashima-Ginsberg, Kei" <[hidden email]>
To: [hidden email]
Date: 10/07/2010 09:01 AM
Subject: [SPSSX-L] Extremely long processing time for PASW file
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Dear listserv members;

I’m working on a series of datasets that only has about 500 variables and 1000 cases in each. The files are all really large for the content (36mb to 40mb), and when I try to process them (get the file, merge the files, run restructuring), they are so slow that my computer can’t finish processing them, and it seems to take all of my processing memory (3GB). I don’t think it’s a file size problem because I handle much larger (from a different study) datasets and it usually takes only seconds. Could someone help me understand what’s wrong with these files, and if there are ways to make the files workable?

Thanks!
Kei

Kei Kawashima-Ginsberg, Ph.D.
Lead Researcher
The Center for Information and Research on Civic Learning and Engagement
Jonathan M. Tisch College of Citizenship and Public Service
Tufts University
617-627-2529
www.civicyouth.org

Kawashima-Ginsberg, Kei

Re: Extremely long processing time for PASW file

In reply to this post by J. R. Carroll

Thank you for your responses. I will try dropping the data documents (I did not try that yet) and it is true that there are good amount of value labels though not unusual amount. I will answer the questions from Justin below. I really appreciate your help and sorry I didn’t specify these earlier.

Kei

From: Justin Carroll [mailto:[hidden email]]
Sent: Thursday, October 07, 2010 12:04 PM
To: Kawashima-Ginsberg, Kei
Cc: [hidden email]
Subject: Re: Extremely long processing time for PASW file

Windows or Mac? Windows XP
How many files are you merging? Five in total, and I’ve cut down each to have only about 60 variables, which did not reduce the file size by very much.
What version of SPSS are you using and are you patched up to the newest version? PASW 18 – and I think I’m up to date in all areas, as our university’s IT department takes care of that.
Are you restructuring/merging via syntax - if so, can you copy/paste so we can see?

They are little long so I’ll abbreviate some but here’s the syntax I used to cut down on the number of variables

SAVE OUTFILE='C:\Documents and Settings\kkawas01\Desktop\HLM paper\minifiles\w4 MINI file2.sav'

/KEEP FamID

w4yid…

w4ysex

In4Hw4

w4ySingPar

w4yGradeOK

Dutyw4

Neighborw4

Skillsw4

Participatew4

AECw4.

I did this for w4 , w5, w6, and w7, all containing pretty much the same variables.

Merge syntax here:

MATCH FILES /FILE=*

/FILE='DataSet8'

/BY FamID.

EXECUTE.

VARSTOCASES

/MAKE Duty FROM Dutyw4 Dutyw5 Dutyw6 Dutyw7

/MAKE Neighbor FROM Neighborw4 Neighborw5 Neighborw6 Neighborw7

/MAKE Skills FROM Skillsw4 Skillsw5 Skillsw6 Skillsw7

/MAKE Participate FROM Participatew4 Participatew5 Participatew6 Participatew7

/MAKE AEC FROM AECw4 AECw5 AECw6 AECw7

/INDEX=Index1(4)

/KEEP=FamID

/NULL=KEEP

/COUNT=Ntime "Number of timepoints".

Is SPSS reporting any errors to you or is it just crashing? It will always try to run the commands and in some cases, it did work, after about half an hour of running “match files” procedure. The only error I’ve received is “server time out.” I also tried running these commands with an older versions of these datasets (data content is not updated so I can’t use them in the analysis), which are much smaller in size (more like 2mb each), these commands run just fine.

Are the files you are trying to merge native to SPSS (i.e. .sps) or are they in different formats (i.e. .txt, or .csv) - (this shouldn't be a problem - but every bit of new information will help!)? I’m not the owner of the data but I believe it’s always been in SPSS, in so many versions.
Also, how do you know it is taking all of your PC's 3GB of memory (i.e. if you are using Windows, are you watching memory usage through task manager?) or was this just a generalization based on your PC slowing down? I’m looking at the task manager’s meter and it said it was using over 90% of the memory.

Thanks so much, any help would be really helpful.

Kei

Thanks,

J. R. Carroll
Grad. Student in Pre-Doc Psychology at CSUS
Research Assistant for Just About Everyone.
Email: [hidden email] -or- [hidden email]
Phone: (916) 628-4204

On Thu, Oct 7, 2010 at 7:28 AM, Kawashima-Ginsberg, Kei <[hidden email]> wrote:

Dear listserv members;

Thanks!

Kei

Kei Kawashima-Ginsberg, Ph.D.

Lead Researcher

The Center for Information and Research on Civic Learning and Engagement

Jonathan M. Tisch College of Citizenship and Public Service

Tufts University

617-627-2529

www.civicyouth.org

David Marso

Re: Extremely long processing time for PASW file

Administrator

In reply to this post by Kawashima-Ginsberg, Kei

Kei,
You appear to be going through all sorts of pyrotechnics to SAVE these
files after RENAMING variables, then MATCHING them (in a rather inefficient
way) You then splatter the matched WIDE data into a LONG format with
VARSTOCASES. It REALLY looks like you need to use ADD FILES rather than
MATCH FILES. Verify the variable names in the respective files match up.
You are not by chance terribly low on disk space?
ADD FILES /FILE=FILE1 / FILE = FILE2..../FILE=FileN /BY FamID.
HTH, David
--
On Thu, 7 Oct 2010 19:33:41 +0000, Kawashima-Ginsberg, Kei
<[hidden email]> wrote:

>Thank you for your responses. I will try dropping the data documents (I
did not try that yet) and it is true that there are good amount of value
labels though not unusual amount. I will answer the questions from Justin
below. I really appreciate your help and sorry I didn't specify these earlier.

>
>Kei
>
>From: Justin Carroll [mailto:[hidden email]]
>Sent: Thursday, October 07, 2010 12:04 PM
>To: Kawashima-Ginsberg, Kei
>Cc: [hidden email]
>Subject: Re: Extremely long processing time for PASW file
>
>Kei,
>
>I have some general questions that I think the other listers would like to

know as well before we can help (unless someone has already come across this
problem, and is a known issue!). There are lots of reasons why this problem
might occur. Can you answer the following:
>
>
> 1. Windows or Mac? Windows XP
> 2. How many files are you merging? Five in total, and I've cut down
each to have only about 60 variables, which did not reduce the file size by
very much.
> 3. What version of SPSS are you using and are you patched up to the
newest version? PASW 18 - and I think I'm up to date in all areas, as our
university's IT department takes care of that.
> 4. Are you restructuring/merging via syntax - if so, can you copy/paste
so we can see?
>They are little long so I'll abbreviate some but here's the syntax I used
to cut down on the number of variables
>
>SAVE OUTFILE='C:\Documents and Settings\kkawas01\Desktop\HLM
paper\minifiles\w4 MINI file2.sav'

>/KEEP FamID
>w4yid...
>w4ysex
>In4Hw4
>w4ySingPar
>w4yGradeOK
>Dutyw4
>Neighborw4
>Skillsw4
>Participatew4
>AECw4.
>
>I did this for w4 , w5, w6, and w7, all containing pretty much the same

variables.

>Merge syntax here:
>
>MATCH FILES /FILE=*
> /FILE='DataSet8'
> /BY FamID.
>EXECUTE.
>
>VARSTOCASES
> /MAKE Duty FROM Dutyw4 Dutyw5 Dutyw6 Dutyw7
> /MAKE Neighbor FROM Neighborw4 Neighborw5 Neighborw6 Neighborw7
> /MAKE Skills FROM Skillsw4 Skillsw5 Skillsw6 Skillsw7
> /MAKE Participate FROM Participatew4 Participatew5 Participatew6

Participatew7
> /MAKE AEC FROM AECw4 AECw5 AECw6 AECw7
> /INDEX=Index1(4)
> /KEEP=FamID
> /NULL=KEEP
> /COUNT=Ntime "Number of timepoints".
>
> 1. Is SPSS reporting any errors to you or is it just crashing? It will
always try to run the commands and in some cases, it did work, after about
half an hour of running "match files" procedure. The only error I've
received is "server time out." I also tried running these commands with an
older versions of these datasets (data content is not updated so I can't use
them in the analysis), which are much smaller in size (more like 2mb each),
these commands run just fine.
>
>
> 1. Are the files you are trying to merge native to SPSS (i.e. .sps) or
are they in different formats (i.e. .txt, or .csv) - (this shouldn't be a
problem - but every bit of new information will help!)? I'm not the owner
of the data but I believe it's always been in SPSS, in so many versions.
> 2. Also, how do you know it is taking all of your PC's 3GB of memory
(i.e. if you are using Windows, are you watching memory usage through task
manager?) or was this just a generalization based on your PC slowing down?
I'm looking at the task manager's meter and it said it was using over 90% of
the memory.
>
>Thanks so much, any help would be really helpful.
>Kei
>Thanks,
>
>J. R. Carroll
>Grad. Student in Pre-Doc Psychology at CSUS
>Research Assistant for Just About Everyone.
>Email: [hidden email]<mailto:[hidden email]> -or-
[hidden email]<mailto:[hidden email]>
>Phone: (916) 628-4204
>
>On Thu, Oct 7, 2010 at 7:28 AM, Kawashima-Ginsberg, Kei
<[hidden email]<mailto:[hidden email]>>
wrote:
>Dear listserv members;
>
>I'm working on a series of datasets that only has about 500 variables and
1000 cases in each. The files are all really large for the content (36mb to
40mb), and when I try to process them (get the file, merge the files, run
restructuring), they are so slow that my computer can't finish processing
them, and it seems to take all of my processing memory (3GB). I don't think
it's a file size problem because I handle much larger (from a different
study) datasets and it usually takes only seconds. Could someone help me
understand what's wrong with these files, and if there are ways to make the
files workable?

>
>Thanks!
>Kei
>
>
>Kei Kawashima-Ginsberg, Ph.D.
>Lead Researcher
>The Center for Information and Research on Civic Learning and Engagement
>Jonathan M. Tisch College of Citizenship and Public Service
>Tufts University
>617-627-2529
>www.civicyouth.org<http://www.civicyouth.org>
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"