Hi, I'm running VARSTOCASES collapsing over 12,000 variables into 100 variables with around 6,000 cases to start with, file sizing approaching 100MB. Is it typical for such an operation to take around 20 minutes?
What's interesting is that the Notes table in the VARSTOCASES output the processor/elapsed time indicates only a couple of seconds? Why for 20 minutes does SPSS freeze and eventually come active having completed the command which, according to itself, take only 2 seconds?
Environment: Intel Core i5 CPU / M 520 / 2.4 GHz 64 Bit Win 7 Pro 32 Bit SPSS (there is a reason why I'm using 32 bit SPSS on 64 Bit Win)
Any insight much appreciated, many thanks in advance. Jignesh |
(I don't see any response to this one, so here are my guesses.)
You are reading and writing 100MB, so you have disk I/O time that is not negligible. So I could easily expect, say, 1 second per MB, or something up to 4 minutes, just for the I/O time. Disk transfer times can be faster than that for the internal drive, but I don't know what speed SPSS will provide in the ideal condition. I've never timed it, and I don't remember what numbers other people have reported. I don't know how well SPSS handles the 12,000 var record, either, or if that makes any difference. Ordinarily, I would expect that the workspace for IO would expand so that there is no internal paging during processing. That is an aspect that conceivably could be affected by manipulating "workspace". I think. The fact that you report that SPSS says it is only taking 2 seconds of CPU time is another piece of evidence, to say that the delays are entirely in disk I/O time. If SPSS was reporting high CPU, the task could be slower because of competition for CPU with other tasks. What else determines I/O time? A fragmented disk will be slower, especially if it is nearly full. (Or, again, competing with another task could slow a job.) If you are using an external drive, that will be slower. If you are using a disk across a network, you should consider network delays as the primary candidate. -- Rich Ulrich Date: Sat, 19 May 2012 15:49:03 +0100 From: [hidden email] Subject: Processor/elapsed time To: [hidden email] Hi, I'm running VARSTOCASES collapsing over 12,000 variables into 100 variables with around 6,000 cases to start with, file sizing approaching 100MB. Is it typical for such an operation to take around 20 minutes?
What's interesting is that the Notes table in the VARSTOCASES output the processor/elapsed time indicates only a couple of seconds? Why for 20 minutes does SPSS freeze and eventually come active having completed the command which, according to itself, take only 2 seconds?
Environment: Intel Core i5 CPU / M 520 / 2.4 GHz 64 Bit Win 7 Pro 32 Bit SPSS (there is a reason why I'm using 32 bit SPSS on 64 Bit Win)
Any insight much appreciated, many thanks in advance. Jignesh |
Hi,
Have you tried running this in the Production Facility? That saves a lot of the overhead (GUI/Data viewer), which is especially relevant when you have 12000 (!) vars.
Is there no way to change the file format further upstream the process?
Regards,
Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
Administrator
|
In reply to this post by Jignesh Sutar
Hard to say, but FWIW: This restructure (OLD SCHOOL) runs in just a couple minutes on my old Mac with SPSS 11.5 Win Vista:
What does the VARSTOCASES syntax look like? It is probably pretty long and ugly? *---*. * Simulate raw data *. INPUT PROGRAM. LOOP CASEID=1 TO 6000. DO REPEAT v=v1 to v12000. compute v=trunc(uniform(100)). END REPEAT. END CASE. END LOOP. END FILE. END INPUT PROGRAM. EXE. **Restructure (OLD SCHOOL) . NUMERIC new1 TO New100 (F4.0). VECTOR Vraw=V1 TO V12000. VECTOR Vnew=new1 TO New100. LOOP raw=1 TO 120. + LOOP Vi=1 TO 100. + COMPUTE VNew(Vi)=VRaw((raw-1)*100+Vi). + END LOOP. + XSAVE OUTFILE 'C:\TEMP\RESTR.sav' / KEEP CASEID raw New1 TO New100. END LOOP. EXECUTE. GET FILE 'C:\TEMP\RESTR.sav' .
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
I used a macro to build out what I what imagine the garish V2C syntax would be (if your sets are not contiguous then it would be beyond hideous):
GET FILE 'C:\Temp\data6000x12000.sav'. VARSTOCASES /ID = id / MAKE TRANS1 FROM v001 TO v100 / MAKE TRANS2 FROM v101 TO v200 / MAKE TRANS3 FROM v201 TO v300/ MAKE TRANS4 FROM v301 TO v400 / MAKE TRANS5 FROM v401 TO v500 / MAKE TRANS6 FROM v501 TO v600 ................................. ............................................................... ...............................................................yawn........ / MAKE TRANS118 FROM v11701 TO v11800 / MAKE TRANS119 FROM v11801 TO v11900 / MAKE TRANS120 FROM v11901 TO v12000 /KEEP CASEID. and it took about 13 minutes to run. The elapsed time from the Notes table said 1:21 (liar liar pants on fire...). I ran the VECTOR/LOOP/XSAVE/EXE (VLXE) and it came back in 2:52... Drum roll.... An appropriately partitioned MATRIX program using RESHAPE operator returns in 1:58 -ALONG with the bookkeeping code- (see below). Conclusions? VARSTOCASES is an inefficient dog requiring obscenely verbose syntax . VLXE is nice but requires some skill to figure out how to map input to output indexes (NO LIMIT on number of cases).. MATRIX is da bomb!! but requires more insight into memory/sample size trade offs and may be impractical for very long files but typically can be partitioned for wide files and the code is compact,intuitive and efficient. **I attempted to build code which would pull 6000x12000 but SPSS could not allocate 72,000,000 locations despite my juggling the WORKSPACE setting. This might not be an issue in 64 bit but is a showstopper in 32 bit windoze/32 bit SPSS. --- INPUT PROGRAM. + LOOP CASEID=1 TO 6000. + LOOP VIndex=1 TO 120. + LEAVE CASEID. + END CASE. + END LOOP. + END LOOP. + END FILE. END INPUT PROGRAM. SAVE OUTFILE "C:\temp\indexes.sav". GET FILE 'C:\Temp\data6000x12000.sav'. PRESERVE. SET WORKSPACE 500000. MATRIX. GET DATA /FILE * / VARIABLES v001 to v6000. SAVE (RESHAPE(data, ncol(data)*nrow(data)/100,100)) / OUTFILE "C:\temp\data1.sav". GET DATA /FILE * / VARIABLES v6001 to v12000. SAVE (RESHAPE(data, ncol(data)*nrow(data)/100,100)) / OUTFILE *. END MATRIX. RESTORE. ADD FILES / FILE "C:\temp\data1.sav" / FILE *. MATCH FILES FILE "C:\temp\indexes.sav" / FILE * . EXE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Hi All, Thanks for the responses. Thanks Rich for getting the ball rolling. I've just noticed something which might take interest to some and may shed some light perhaps or add further confusion.
I'm continuing to run the same process VARTOCASES which takes 20 odd minutes. I have a SAVE OUTFILE command after the V2C and what I am noticing is that when running the entire syntax it actually saves the final restructured file almost instantly, even when SPSS is still showing "Running....VARSTOCASES" in the status bar and is hanging for half an hour. And this file which is saved instantly doesn't not seem to be an intermediate file, its the same file size after 2 secs than it is after when SPSS recovers itself and is active again half an hour later.
I have tested running the same procedure using Production Facility and Python (back end) and both result in the procedure being completed in seconds. Thanks, Jignesh
On 31 May 2012 17:29, David Marso <[hidden email]> wrote: I used a macro to build out what I what imagine the garish V2C syntax would |
Free forum by Nabble | Edit this page |