|
Dear listserv members; I’m working on a series of datasets that only has
about 500 variables and 1000 cases in each. The files are all really
large for the content (36mb to 40mb), and when I try to process them (get the
file, merge the files, run restructuring), they are so slow that my computer can’t
finish processing them, and it seems to take all of my processing memory
(3GB). I don’t think it’s a file size problem because I
handle much larger (from a different study) datasets and it usually takes only
seconds. Could someone help me understand what’s wrong with these
files, and if there are ways to make the files workable? Thanks! Kei Kei Kawashima-Ginsberg, Ph.D. Lead Researcher The Center for Information and Research on Civic Learning
and Engagement Jonathan M. Tisch College of Citizenship and Public Service Tufts University 617-627-2529 www.civicyouth.org |
|
Kei,
I have some general questions that I think the other listers would like to know as well before we can help (unless someone has already come across this problem, and is a known issue!). There are lots of reasons why this problem might occur. Can you answer the following:
J. R. Carroll Grad. Student in Pre-Doc Psychology at CSUS Research Assistant for Just About Everyone. Email: [hidden email] -or- [hidden email] Phone: (916) 628-4204 On Thu, Oct 7, 2010 at 7:28 AM, Kawashima-Ginsberg, Kei <[hidden email]> wrote:
|
|
In reply to this post by Kawashima-Ginsberg, Kei
Two possibilities come to mind. First,
might this file have a huge collection of documents in it? These
carry over through merges, so sometimes they build up. You can use
Utilities>Data File Comments to see what is there and the DROP DOCUMENTS
command to get rid of them.
Another possibility is that you have a huge collection of value labels, even if they are not being used. HTH, Jon Peck From: "Kawashima-Ginsberg, Kei" <[hidden email]> To: [hidden email] Date: 10/07/2010 09:01 AM Subject: [SPSSX-L] Extremely long processing time for PASW file Sent by: "SPSSX(r) Discussion" <[hidden email]> Dear listserv members; I’m working on a series of datasets that only has about 500 variables and 1000 cases in each. The files are all really large for the content (36mb to 40mb), and when I try to process them (get the file, merge the files, run restructuring), they are so slow that my computer can’t finish processing them, and it seems to take all of my processing memory (3GB). I don’t think it’s a file size problem because I handle much larger (from a different study) datasets and it usually takes only seconds. Could someone help me understand what’s wrong with these files, and if there are ways to make the files workable? Thanks! Kei Kei Kawashima-Ginsberg, Ph.D. Lead Researcher The Center for Information and Research on Civic Learning and Engagement Jonathan M. Tisch College of Citizenship and Public Service Tufts University 617-627-2529 www.civicyouth.org |
|
In reply to this post by J. R. Carroll
Thank you for your responses. I will try dropping the data
documents (I did not try that yet) and it is true that there are good amount of
value labels though not unusual amount. I will answer the questions from
Justin below. I really appreciate your help and sorry I didn’t
specify these earlier. Kei From: Justin Carroll
[mailto:[hidden email]] Kei,
They
are little long so I’ll abbreviate some but here’s the syntax I
used to cut down on the number of variables SAVE OUTFILE='C:\Documents and Settings\kkawas01\Desktop\HLM
paper\minifiles\w4 MINI file2.sav' /KEEP FamID w4yid… w4ysex In4Hw4 w4ySingPar w4yGradeOK Dutyw4 Neighborw4 Skillsw4 Participatew4 AECw4. I did this for w4 , w5, w6, and w7, all containing pretty much
the same variables. Merge syntax here: MATCH FILES /FILE=* /FILE='DataSet8' /BY FamID. EXECUTE. VARSTOCASES /MAKE Duty FROM Dutyw4 Dutyw5 Dutyw6 Dutyw7 /MAKE Neighbor FROM Neighborw4 Neighborw5 Neighborw6
Neighborw7 /MAKE Skills FROM Skillsw4 Skillsw5 Skillsw6 Skillsw7 /MAKE Participate FROM Participatew4 Participatew5
Participatew6 Participatew7 /MAKE AEC FROM AECw4 AECw5 AECw6 AECw7 /INDEX=Index1(4) /KEEP=FamID /NULL=KEEP /COUNT=Ntime "Number of timepoints".
Thanks
so much, any help would be really helpful. Kei Thanks, On Thu, Oct 7, 2010 at 7:28 AM, Kawashima-Ginsberg, Kei <[hidden email]>
wrote: Dear
listserv members; I’m
working on a series of datasets that only has about 500 variables and 1000
cases in each. The files are all really large for the content (36mb to
40mb), and when I try to process them (get the file, merge the files, run
restructuring), they are so slow that my computer can’t finish processing
them, and it seems to take all of my processing memory (3GB). I
don’t think it’s a file size problem because I handle much larger
(from a different study) datasets and it usually takes only seconds.
Could someone help me understand what’s wrong with these files, and if
there are ways to make the files workable? Thanks! Kei Kei
Kawashima-Ginsberg, Ph.D. Lead
Researcher The
Center for Information and Research on Civic Learning and Engagement Jonathan
M. Tisch College of Citizenship and Public Service Tufts
University 617-627-2529 |
|
Administrator
|
In reply to this post by Kawashima-Ginsberg, Kei
Kei,
You appear to be going through all sorts of pyrotechnics to SAVE these files after RENAMING variables, then MATCHING them (in a rather inefficient way) You then splatter the matched WIDE data into a LONG format with VARSTOCASES. It REALLY looks like you need to use ADD FILES rather than MATCH FILES. Verify the variable names in the respective files match up. You are not by chance terribly low on disk space? ADD FILES /FILE=FILE1 / FILE = FILE2..../FILE=FileN /BY FamID. HTH, David -- On Thu, 7 Oct 2010 19:33:41 +0000, Kawashima-Ginsberg, Kei <[hidden email]> wrote: >Thank you for your responses. I will try dropping the data documents (I did not try that yet) and it is true that there are good amount of value labels though not unusual amount. I will answer the questions from Justin below. I really appreciate your help and sorry I didn't specify these earlier. > >Kei > >From: Justin Carroll [mailto:[hidden email]] >Sent: Thursday, October 07, 2010 12:04 PM >To: Kawashima-Ginsberg, Kei >Cc: [hidden email] >Subject: Re: Extremely long processing time for PASW file > >Kei, > >I have some general questions that I think the other listers would like to problem, and is a known issue!). There are lots of reasons why this problem might occur. Can you answer the following: > > > 1. Windows or Mac? Windows XP > 2. How many files are you merging? Five in total, and I've cut down each to have only about 60 variables, which did not reduce the file size by very much. > 3. What version of SPSS are you using and are you patched up to the newest version? PASW 18 - and I think I'm up to date in all areas, as our university's IT department takes care of that. > 4. Are you restructuring/merging via syntax - if so, can you copy/paste so we can see? >They are little long so I'll abbreviate some but here's the syntax I used to cut down on the number of variables > >SAVE OUTFILE='C:\Documents and Settings\kkawas01\Desktop\HLM paper\minifiles\w4 MINI file2.sav' >/KEEP FamID >w4yid... >w4ysex >In4Hw4 >w4ySingPar >w4yGradeOK >Dutyw4 >Neighborw4 >Skillsw4 >Participatew4 >AECw4. > >I did this for w4 , w5, w6, and w7, all containing pretty much the same >Merge syntax here: > >MATCH FILES /FILE=* > /FILE='DataSet8' > /BY FamID. >EXECUTE. > >VARSTOCASES > /MAKE Duty FROM Dutyw4 Dutyw5 Dutyw6 Dutyw7 > /MAKE Neighbor FROM Neighborw4 Neighborw5 Neighborw6 Neighborw7 > /MAKE Skills FROM Skillsw4 Skillsw5 Skillsw6 Skillsw7 > /MAKE Participate FROM Participatew4 Participatew5 Participatew6 > /MAKE AEC FROM AECw4 AECw5 AECw6 AECw7 > /INDEX=Index1(4) > /KEEP=FamID > /NULL=KEEP > /COUNT=Ntime "Number of timepoints". > > 1. Is SPSS reporting any errors to you or is it just crashing? It will always try to run the commands and in some cases, it did work, after about half an hour of running "match files" procedure. The only error I've received is "server time out." I also tried running these commands with an older versions of these datasets (data content is not updated so I can't use them in the analysis), which are much smaller in size (more like 2mb each), these commands run just fine. > > > 1. Are the files you are trying to merge native to SPSS (i.e. .sps) or are they in different formats (i.e. .txt, or .csv) - (this shouldn't be a problem - but every bit of new information will help!)? I'm not the owner of the data but I believe it's always been in SPSS, in so many versions. > 2. Also, how do you know it is taking all of your PC's 3GB of memory (i.e. if you are using Windows, are you watching memory usage through task manager?) or was this just a generalization based on your PC slowing down? I'm looking at the task manager's meter and it said it was using over 90% of the memory. > >Thanks so much, any help would be really helpful. >Kei >Thanks, > >J. R. Carroll >Grad. Student in Pre-Doc Psychology at CSUS >Research Assistant for Just About Everyone. >Email: [hidden email]<mailto:[hidden email]> -or- [hidden email]<mailto:[hidden email]> >Phone: (916) 628-4204 > >On Thu, Oct 7, 2010 at 7:28 AM, Kawashima-Ginsberg, Kei <[hidden email]<mailto:[hidden email]>> wrote: >Dear listserv members; > >I'm working on a series of datasets that only has about 500 variables and 1000 cases in each. The files are all really large for the content (36mb to 40mb), and when I try to process them (get the file, merge the files, run restructuring), they are so slow that my computer can't finish processing them, and it seems to take all of my processing memory (3GB). I don't think it's a file size problem because I handle much larger (from a different study) datasets and it usually takes only seconds. Could someone help me understand what's wrong with these files, and if there are ways to make the files workable? > >Thanks! >Kei > > >Kei Kawashima-Ginsberg, Ph.D. >Lead Researcher >The Center for Information and Research on Civic Learning and Engagement >Jonathan M. Tisch College of Citizenship and Public Service >Tufts University >617-627-2529 >www.civicyouth.org<http://www.civicyouth.org> > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
| Free forum by Nabble | Edit this page |
