|
Get a question about using INPUT PROGRAM to create a subset of one file in
which the ID of the rows appears only in another file. Yes, I know proc MATCH FILE and TABLES subcommand can do it, but deal to some restrictions, I need to do it via INPUT PROGRAM ... DATA LIST .... Any hints are appreciated. Thanks in advance. Here is my question: -------------------- I have 2 text files, one is a list of ID that I will use in the analysis, another is a transaction file for my entire database. Both files are sorted by ID already. The text files look like the followings (original files extremely big, the followings are just to illustrate the idea): ID_list.txt -------------- ID 001 002 005 Transaction.txt --------------------- ID DollAmt 001 14 001 20 001 15 002 20 002 15 003 40 004 10 005 17 005 17 006 18 What I want to create using INPUT PROGRAM and END INPUT: --------------------------------------------------------------------------------------------- ID DollAmt 001 14 001 20 001 15 002 20 002 15 005 17 005 17 What I want to do is to create a subset of the Transaction.txt which contains the rows with the IDs appears only in the ID_list.txt. However I would like to do it such that none of the files needed to be read and saved as SPSS format data files first (because of memory and space issues). That is, I would like to use INPUT PROGRAM and END INPUT PROGRAM etc to do the task instead of the MATCH FILES ⦠TABLE ⦠procedure. Could you please shed me some light? Many Thanks in advance. |
|
At 04:45 PM 6/19/2007, chiu2 wrote:
>[I have] a question about using INPUT PROGRAM to create a subset of >one file, [i.e. those rows whose] ID of the rows appears in another >file. Yes, I know proc MATCH FILE and TABLES subcommand can do it, Indeed. And probably, the way to go. More, below. >I have 2 text files, one is a list of ID that I will use in the >analysis, >another is a transaction file for my entire database. Both files are >sorted by ID already. The text files look like the following: > >ID_list.txt >-------------- >ID >001 >002 >005 > >Transaction.txt >--------------------- >ID DollAmt >001 14 >001 20 >001 15 >002 20 >002 15 >003 40 >004 10 >005 17 >005 17 >006 18 > >What I want to create using INPUT PROGRAM and END INPUT only one way of doing something]: ---------------------------------------------------------- >ID DollAmt >001 14 >001 20 >001 15 >002 20 >002 15 >005 17 >005 17 > >[This is] a subset of the Transaction.txt which contains the rows with >the IDs [appearing] in the ID_list.txt. I would like to do it such >that none of the files needed to be read and saved as SPSS format data >files (because of memory and space issues). OK, first: if you've got SPSS 14 or 15, I think you have Virtual Active Files (VAFs) for all datasets, and they don't take space. So read in the two files, using DATASET NAME so they're separate datasets, and MATCH FILES; and you should be OK. It's a pain in the neck to do an interleave in an INPUT PROGRAM; if this doesn't work, I'll give it a try later. -Good luck, Richard |
|
In reply to this post by chiu2
Thanks a lot, Richard.
Unfortunately the version of SPSS-X in mainframe (release 4.1) is pretty old which does not have the new DATASET features. I got a SPSS 14 Syntax reference and read an example in the REREAD section (p1532). That example merged 2 text files together using INPUT PROGRAM. I tried to modify such that it fits my needs but was failed due to the fact that my transaction file usually has multiple rows for the same ID. Once an END CASE is issued, both the ID_list.txt and Transaction.txt roll to the next row, hence all rows after the first ID are skipped (as ID from the ID_list.txt is not retained). There is a LEAVE command which could be useful, because it could help to retain the ID from ID_list.txt. But I am not good enough to put all these together to make it work. Any help is appreciated. Thanks in advance. |
| Free forum by Nabble | Edit this page |
