Re: [SPSSX-L] Dividing file into 10,000 case chunks

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Dividing file into 10,000 case chunks

Art Kendall
no I did not intend to leave that part of the file spec in there.  I was cannibalizing.

in my defense, I did say NCY.

Of course I would not want to routinely copy the text n number of times (here n = 38).
That is a very old way to do it.  I did say I though there was a Python way to do it.  I did mean to say that the example I was posting was what a generalized approach would be doing

However, on a one time run, copy and pasting, using a fixed font and vertically eyeballing would get be what we had to do many years ago. It would work if you double checked the syntax and its results.

BTW does the Python method have built in randomization be splitting into subsets?

If I were actually doing this on a job I would have checked Developer Works, but I thought it was down for a while.

Also, I still wonder why the OP was splitting the file into separate files.  

MAYBE something like the example would do what the OP needed.
MAYBE bootstrapping would be what the OP would want.
 MAYBE the OP would not need to randomize before the split.


compute RandomOrder = uniform(2**31).
sort cases by RandomOrder.
compute SubSet = trunc($casenum/10000).
frequencies variables = subset.
split file by subset.
. . .

Art Kendall
Social Research Consultants
On 4/8/2013 10:30 AM, Jon K Peck wrote:
I don't think Art meant to have those GET commands in there.  But putting that aside, this is an example of how NOT to do a task even though it would work.

It is painful to write all that code, and, worse, the chances of getting it exactly right are not great - boredom will set in long before that many XSAVE commands are written, so careful testing and code review is required.  Finally, it is very specific to these particular numbers, so it doesn't make a good model for a general solution.

Generalization + correctness + pain reduction = Python


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall [hidden email]
To:        [hidden email],
Date:        04/08/2013 08:16 AM
Subject:        Re: [SPSSX-L] Dividing file into 10,000 case chunks
Sent by:        "SPSSX(r) Discussion" [hidden email]




see the archive for writing out separate files.  IIRC there is a Python method.
UNTESTED
not sure if you need the +1 on the mod.  
you may not need to randomize the order of cases.
NCY -- NO Coffee Yet.
this is what you would want to Macro or Python to do. or you can just write the 38 set of xsaves.

compute RandomOrder = uniform(31**2).
sort cases by RandomOrder.
compute WhichFile = mod($casenum, 10000)+1.
do if WhichFile eq 1.
xsave outfile = 'j:Get file='D:\sbec\tea ids\Master ID List subset 1.sav'.
else if WhichFile eq 2.
xsave outfile = 'j:Get file='D:\sbec\tea ids\Master ID List subset 2.sav'.
else if WhichFile eq 3.
xsave outfile = 'j:Get file='D:\sbec\tea ids\Master ID List subset 3.sav'.

. . .

else if WhichFile eq 38.
xsave outfile = 'j:Get file='D:\sbec\tea ids\Master ID List subset 38.sav'.

else.
print /'oops WhichFile is ' WhichFile.

frequencies variables = WhichFile.


However why are you doing this?  There may be other approaches.

Art Kendall
Social Research Consultants

On 4/7/2013 8:55 PM, Van Overschelde, Jim [via SPSSX Discussion] wrote:
Hey folks,

I have tried for many hours to figure out how to write a macro to divide a 380,000 case file into 38 files with 10,000 cases.
My most recent attempt gives error: "A macro expansion required more storage than was available.  Try running with more memory."
Suggestions for fixing this code or another method that should work would be greatly appreciated!!

Thanks,
Jim

DEFINE !Looper ()
!DO !i=1 !to 38.
Get file='D:\sbec\tea ids\Master ID List.sav'.
dataset name SSNList.
/* Define Ending point.*/
!let !temp=!blanks(0).
!do !cnt=1 !to !i
 !Let !temp=!concat(!temp,!blanks(10000))
!doEnd.
!Let !EndNum=!Length(!temp).
/* Define start point.*/
!Let !j=!length(!substr(!blanks(!temp),9999)).
!Let !StartNum=!length(!concat(!blanks(!j),!blanks(1))).
Select if $casenum<=!StartNum & $casenum >=!EndNum.
SAVE TRANSLATE OUTFILE=!QUOTE(!CONCAT("d:\sbec\tea ids\newIDs\ID",!i,".txt"))
 /TYPE=CSV
 /MAP
 /REPLACE
 /CELLS=VALUES.
!DOEND.
!ENDDEFINE.

=====================
To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Dividing-file-into-10-000-case-chunks-tp5719315.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion,
click here.
NAML

Art Kendall
Social Research Consultants



View this message in context: Re: Dividing file into 10,000 case chunks
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.

Art Kendall
Social Research Consultants