Using INPUT PROGRAM to create a subset of datafile

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Using INPUT PROGRAM to create a subset of datafile

chiu2
Get a question about using INPUT PROGRAM to create a subset of one file in
which the ID of the rows appears only in another file. Yes, I know proc
MATCH FILE and TABLES subcommand can do it, but deal to some restrictions, I
need to do it via INPUT PROGRAM ... DATA LIST .... Any hints are
appreciated. Thanks in advance.


Here is my question:
--------------------

I have 2 text files, one is a list of ID that I will use in the analysis,
another is a transaction file for my entire database. Both files are sorted
by ID already. The text files look like the followings (original files
extremely big, the followings are just to illustrate the idea):

ID_list.txt
--------------
ID
001
002
005

Transaction.txt
---------------------
ID  DollAmt
001 14
001 20
001 15
002 20
002 15
003 40
004 10
005 17
005 17
006 18

What I want to create using INPUT PROGRAM and END INPUT:
---------------------------------------------------------------------------------------------

ID  DollAmt
001 14
001 20
001 15
002 20
002 15
005 17
005 17



What I want to do is to create a subset of the Transaction.txt which
contains the rows with the IDs appears only in the ID_list.txt. However I
would like to do it such that none of the files needed to be read and saved
as SPSS format data files first (because of memory and space issues). That
is, I would like to use INPUT PROGRAM and END INPUT PROGRAM etc to do the
task instead of the MATCH FILES … TABLE … procedure. Could you please shed
me some light?  Many Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: Using INPUT PROGRAM to create a subset of datafile

Richard Ristow
At 04:45 PM 6/19/2007, chiu2 wrote:

>[I have] a question about using INPUT PROGRAM to create a subset of
>one file, [i.e. those rows whose] ID of the rows appears in another
>file. Yes, I know proc MATCH FILE and TABLES subcommand can do it,

Indeed. And probably, the way to go. More, below.

>I have 2 text files, one is a list of ID that I will use in the
>analysis,
>another is a transaction file for my entire database. Both files are
>sorted by ID already. The text files look like the following:
>
>ID_list.txt
>--------------
>ID
>001
>002
>005
>
>Transaction.txt
>---------------------
>ID  DollAmt
>001 14
>001 20
>001 15
>002 20
>002 15
>003 40
>004 10
>005 17
>005 17
>006 18
>
>What I want to create using INPUT PROGRAM and END INPUT
[or other suitable technique - don't fall in love with
only one way of doing something]:
----------------------------------------------------------

>ID  DollAmt
>001 14
>001 20
>001 15
>002 20
>002 15
>005 17
>005 17
>
>[This is] a subset of the Transaction.txt which contains the rows with
>the IDs [appearing] in the ID_list.txt. I would like to do it such
>that none of the files needed to be read and saved as SPSS format data
>files (because of memory and space issues).

OK, first: if you've got SPSS 14 or 15, I think you have Virtual Active
Files (VAFs) for all datasets, and they don't take space. So read in
the two files, using DATASET NAME so they're separate datasets, and
MATCH FILES; and you should be OK.

It's a pain in the neck to do an interleave in an INPUT PROGRAM; if
this doesn't work, I'll give it a try later.

-Good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Using INPUT PROGRAM to create a subset of datafile

chiu2
In reply to this post by chiu2
Thanks a lot, Richard.

Unfortunately the version of SPSS-X in mainframe (release 4.1) is pretty old
which does not have the new DATASET features.

I got a SPSS 14 Syntax reference and read an example in the REREAD section
(p1532). That example merged 2 text files together using INPUT PROGRAM. I
tried to modify such that it fits my needs but was failed due to the fact
that my transaction file usually has multiple rows for the same ID. Once an
END CASE is issued, both the ID_list.txt and Transaction.txt roll to the
next row, hence all rows after the first ID are skipped (as ID from the
ID_list.txt is not retained).

There is a LEAVE command which could be useful, because it could help to
retain the ID from ID_list.txt. But I am not good enough to put all these
together to make it work. Any help is appreciated. Thanks in advance.