Dividing file into 10,000 case chunks

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Dividing file into 10,000 case chunks

Kirill Orlov
David,
I tested your updated macro which now interlaces every bunch of 64 XSAVEs by EXECUTE (of course, that is not very elegant solution, but it seems to be the only way out the [curious] limitation of "64 XSAVEs at maximum" in a transformation flow).

I faced some problems with the macro so I've just slightly modified it to be able to run it on my version 20.
Please note that my own style is to avoid periods in the end of macro lines (times ago, back to version 15 or so, Ray Levesque did prove me that the periods are not only unnecessary but somewhat down-speeded a macro expansion). In the absence of periods, one should not use pseudoidentation symbols (+ or -) before macro lines.

SET MPRINT ON PRINTBACK ON.

DEFINE BreakOut (NBreak !TOKENS(1) /BlockSize !TOKENS(1))
COMPUTE ID= $CASENUM.
!LET !L= !null
!DO !I= 1 !TO !NBreak
-DO IF RANGE(ID, (!I-1)*!BlockSize +1,!I*!BlockSize ).
- XSAVE OUTFILE !QUOTE(!CONCAT('D:\TEMP3\junk',!I,'.sav')).
-END IF.
 !LET !L= !CONCAT(!L,"x")
 !IF (!LENGTH(!L) !EQ 64) !THEN !LET !L= !null
- EXECUTE.
 !IFEND
!DOEND
EXECUTE.
!ENDDEFINE.


There was also a problem with the generated data. As far as I know and at least up to version 20, when you DECLARE a dataset and then pour out data out of MATRIX or other procedure into it, it does not automatically becomes the working dataset. So you should activate it before you use it. As seen below.

dataset display.
DATASET DECLARE test.
MATRIX.
SAVE UNIFORM(1000000,5) /OUTFILE test /VARIABLES V01 TO V05.
END MATRIX.
dataset display.
dataset activate test.
dataset display.


Alternative (and, for me, a better way) is to save data from MATRIX into the unnamed working dataset, because it automatically becomes the working one.

dataset display.
MATRIX.
SAVE UNIFORM(1000000,5) /OUTFILE= * /VARIABLES V01 TO V05.
END MATRIX.
dataset display.
dataset name test. /*Optional


Finally, your macro call. It worked fine.

BreakOut NBreak= 100 BlockSize= 10000.

Thanks for sharing your solution!


10.04.2013 0:31, David Marso пишет:
Yep! 2 more lines of ugly monkey macro poop.  When will it ever learn to add?
Added code in *BOLD*.
--
DATASET DECLARE test.
MATRIX.
SAVE (UNIFORM(1000000,5))/OUTFILE test / VARIABLES V01 TO V05.
END MATRIX.

DEFINE BreakOut (NBreak !TOKENS(1) / BlockSize !TOKENS(1)).
+  COMPUTE ID=$CASENUM.
*+  !LET !L = "".*
+  !DO !I=1 !TO !NBreak .
+    DO IF RANGE(ID, (!I-1)*!BlockSize +1,!I*!BlockSize ).
+      XSAVE OUTFILE !QUOTE(!CONCAT(!UNQUOTE('G:\TEMP3\junk'),!I,'.sav')).
+    END IF.
*+    !LET !L=!CONCAT(!L,"x")!IF ( !LENGTH(!L) !EQ 64 ) !THEN EXECUTE. !LET
!L="" !IFEND *
!DOEND .
EXECUTE.
!ENDDEFINE.

SET MPRINT ON PRINTBACK ON.
BreakOut NBreak=100 BlockSize=10000.

12