Cut a large file into several smaller files

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Cut a large file into several smaller files

Carlos Renato
Dear friends

A large file containing 1 million cases needs to be cut in 1000 files containing 1000 cases. Unsuccessfully tried the macro below, but I'm still working. Any suggestions? The source file is in SPSS format (. SAV) and the other files must be saved in text format separated by tabs.

The file must be cut so that the first cut will save the first thousand cases and the second in next thousand cases and so on.

Consider below, but it still fails. The problem is, I fumbled to set two major vectors in the arguments of the macro in general use only and the last one through !CMDEND.

DEFINE LimInf()
 1
 1001
 2001
 3001
 4001
 5001
.......
.....
 995001
 996001
!ENDDEFINE.

DEFINE LimSup()
 1000
 2000
 3000
 4000
 5000
........
 ......
 996000
 997000
!ENDDEFINE.


DEFINE CortaArquivos (!POSITIONAL !TOKENS(1)/ !POSITIONAL !TOKENS(1000)/ !POSITIONAL !TOKENS(1000))

!LET !LI = !2.
!LET !LS = !3.

!DO !V1 !IN (!LI)
!DO !V2 !IN (!LS)
FILTER OFF.
USE !V1 thru !V2 /permanent.
EXECUTE.
SAVE TRANSLATE OUTFILE=!1+!QUOTE(!V1)+!QUOTE('_ATÉ_')+!QUOTE(!V2)+".dat"
  /TYPE=TAB
  /MAP
  /REPLACE
  /FIELDNAMES
  /CELLS=VALUES.
!DOEND.
!DOEND.

!ENDDEFINE.

 CortaArquivos 'D:/renato/' LimInf LimSup.

Reply | Threaded
Open this post in threaded view
|

Re: Cut a large file into several smaller files

David Marso
Administrator
Your Macro is creating *NESTED* Loops and passing 1000 tokens in each is a truly FUBAR approach!
Try starting with the following and roll from there!
*Example simulated data *.
INPUT PROGRAM.
LOOP ID=1 TO 1000000.
DO REPEAT V=V1 TO V5.
COMPUTE V = NORMAL(1).
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.

DEFINE SLAM().
!DO !I = 1 !TO 999000 !BY 1000.
TEMPORARY.
SELECT IF RANGE(ID, !I,!I+999).
SAVE TRANSLATE OUTFILE=!QUOTE(!CONCAT("C:\TEMP\TEST",!I,".DAT"))
  /TYPE=TAB  /MAP  /REPLACE  /FIELDNAMES /CELLS=VALUES.
!DOEND
!ENDDEFINE .
SLAM.


Carlos Renato wrote
Dear friends

A large file containing 1 million cases needs to be cut in 1000 files containing 1000 cases. Unsuccessfully tried the macro below, but I'm still working. Any suggestions? The source file is in SPSS format (. SAV) and the other files must be saved in text format separated by tabs.

The file must be cut so that the first cut will save the first thousand cases and the second in next thousand cases and so on.

Consider below, but it still fails. The problem is, I fumbled to set two major vectors in the arguments of the macro in general use only and the last one through !CMDEND.

DEFINE LimInf()
 1
 1001
 2001
 3001
 4001
 5001
.......
.....
 995001
 996001
!ENDDEFINE.

DEFINE LimSup()
 1000
 2000
 3000
 4000
 5000
........
 ......
 996000
 997000
!ENDDEFINE.


DEFINE CortaArquivos (!POSITIONAL !TOKENS(1)/ !POSITIONAL !TOKENS(1000)/ !POSITIONAL !TOKENS(1000))

!LET !LI = !2.
!LET !LS = !3.

!DO !V1 !IN (!LI)
!DO !V2 !IN (!LS)
FILTER OFF.
USE !V1 thru !V2 /permanent.
EXECUTE.
SAVE TRANSLATE OUTFILE=!1+!QUOTE(!V1)+!QUOTE('_ATÉ_')+!QUOTE(!V2)+".dat"
  /TYPE=TAB
  /MAP
  /REPLACE
  /FIELDNAMES
  /CELLS=VALUES.
!DOEND.
!DOEND.

!ENDDEFINE.

 CortaArquivos 'D:/renato/' LimInf LimSup.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Cut a large file into several smaller files

Carlos Renato
Dear David Marso

Thank you so much for the approach. I'm still deficient in this matter of loops. But your answer to this I'm realizing how to use. It worked perfectly and beautifully. Will use as inspiration for other similar jobs.

Carlos Renato
Statistician - Brazil
Reply | Threaded
Open this post in threaded view
|

Re: Cut a large file into several smaller files

Carlos Renato
In reply to this post by Carlos Renato
Dear friends

I gave a general in the macro sent by David Marso.

INPUT PROGRAM.
LOOP ID=1 TO 1000000.
DO REPEAT V=V1 TO V5.
COMPUTE V = NORMAL(1).
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.

DEFINE MacroParticionaDAT (!POSITIONAL !TOKENS(1)/ !POSITIONAL !TOKENS(1)/ !POSITIONAL !TOKENS(1)).
!LET !LI = !1.
!LET !LS = !2.
!LET !P = !3.
!DO !I = !LI !TO !LS !BY !P.
TEMPORARY.
SELECT IF RANGE(ID, !I,!I+!P-1).
SAVE TRANSLATE OUTFILE=!QUOTE(!CONCAT("D:\Minicurso Teresina\Parte II - SPSS Macros\Exemplo Motivacional I\Data\",!I,".DAT"))
  /TYPE=TAB  /MAP  /REPLACE  /FIELDNAMES /CELLS=VALUES.
!DOEND.
!ENDDEFINE .

MacroParticionaDAT 1 1000000 50000.

Carlos Renato
Statistician - Brazil
Reply | Threaded
Open this post in threaded view
|

Re: Cut a large file into several smaller files

Albert-Jan Roskam
!I+!P-1 --> are you sure this works? AFAIK, you can't do arithmetic with macros (unless it's with that hack method using !blanks, etc.)
 
Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: Carlos Renato <[hidden email]>
To: [hidden email]
Sent: Thursday, November 24, 2011 6:35 PM
Subject: Re: [SPSSX-L] Cut a large file into several smaller files

Dear friends

I gave a general in the macro sent by David Marso.

INPUT PROGRAM.
LOOP ID=1 TO 1000000.
DO REPEAT V=V1 TO V5.
COMPUTE V = NORMAL(1).
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.

DEFINE MacroParticionaDAT (!POSITIONAL !TOKENS(1)/ !POSITIONAL !TOKENS(1)/
!POSITIONAL !TOKENS(1)).
!LET !LI = !1.
!LET !LS = !2.
!LET !P = !3.
!DO !I = !LI !TO !LS !BY !P.
TEMPORARY.
SELECT IF RANGE(ID, !I,!I+!P-1).
SAVE TRANSLATE OUTFILE=!QUOTE(!CONCAT("D:\Minicurso Teresina\Parte II - SPSS
Macros\Exemplo Motivacional I\Data\",!I,".DAT"))
  /TYPE=TAB  /MAP  /REPLACE  /FIELDNAMES /CELLS=VALUES.
!DOEND.
!ENDDEFINE .

MacroParticionaDAT 1 1000000 50000.

Carlos Renato
Statistician - Brazil

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cut-a-large-file-into-several-smaller-files-tp5014618p5020792.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Cut a large file into several smaller files

David Marso
Administrator
Sure it works ;-)  I wrote it!
As far as the >SELECT IF RANGE(ID, !I,!I+!P-1).
It would simply evaluate to the literal values.
SELECT IF RANGE(1,5000) or whatever P is.
--
Certainly, !DO !I=1 !TO !I+!P-1 will experience an epic fail for the reason that MACRO is innocent of any sort of mathematical insight ;-))
Albert-Jan Roskam wrote
!I+!P-1 --> are you sure this works? AFAIK, you can't do arithmetic with macros (unless it's with that hack method using !blanks, etc.)

 
Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


>________________________________
> From: Carlos Renato <[hidden email]>
>To: [hidden email] 
>Sent: Thursday, November 24, 2011 6:35 PM
>Subject: Re: [SPSSX-L] Cut a large file into several smaller files
>
>Dear friends
>
>I gave a general in the macro sent by David Marso.
>
>INPUT PROGRAM.
>LOOP ID=1 TO 1000000.
>DO REPEAT V=V1 TO V5.
>COMPUTE V = NORMAL(1).
>END REPEAT.
>END CASE.
>END LOOP.
>END FILE.
>END INPUT PROGRAM.
>
>DEFINE MacroParticionaDAT (!POSITIONAL !TOKENS(1)/ !POSITIONAL !TOKENS(1)/
>!POSITIONAL !TOKENS(1)).
>!LET !LI = !1.
>!LET !LS = !2.
>!LET !P = !3.
>!DO !I = !LI !TO !LS !BY !P.
>TEMPORARY.
>SELECT IF RANGE(ID, !I,!I+!P-1).
>SAVE TRANSLATE OUTFILE=!QUOTE(!CONCAT("D:\Minicurso Teresina\Parte II - SPSS
>Macros\Exemplo Motivacional I\Data\",!I,".DAT"))
>  /TYPE=TAB  /MAP  /REPLACE  /FIELDNAMES /CELLS=VALUES.
>!DOEND.
>!ENDDEFINE .
>
>MacroParticionaDAT 1 1000000 50000.
>
>Carlos Renato
>Statistician - Brazil
>
>--
>View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cut-a-large-file-into-several-smaller-files-tp5014618p5020792.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD
>
>
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Cut a large file into several smaller files

Carlos Renato
Dear friends

I decided to put this subtraction because in the end was always missing observations. It worked well so far. I cut a file more than one million and it worked.

Carlos Renato
Reply | Threaded
Open this post in threaded view
|

Re: Cut a large file into several smaller files

Carlos Renato
In fact, however, decided to put this subtraction because two files have an intersection in case. The last case was the first of the previous file next file.