Problem with Match file after using aggregate function - reply to Richard Ristow

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with Match file after using aggregate function - reply to Richard Ristow

thara vardhan-2
Hi Richard

Thank you for your response to my problem with match files after
aggregate.

The reason why I sent my posting to you directly rather than to the list
(which I usually do ) is that  I was debating whether I should send a
sample of my data. But then I realised it is  confidential and the only
way I could do it was to create some dummy data.  Anyway in future I will
post all my questions to the listserve.

You have correctly observed that I do not have to create the variable
'combined' at all. I knew that and I tried to aggregate the 2 files but
failed. Now I know why.
Thanks for showing me that the matchfile function should be /Table=*
rather than /File=*

However  I am still facing the same problem - while the match /Table=*
seems to be working - the resultant file sample has 8048 records - same as
the original data file.

DATASET ACTIVATE DataSet1.
DATASET DECLARE sample.
AGGREGATE
   /OUTFILE='sample'
   /BREAK=ERefNum PersonCNI
   /WEAPONRANK_min=MIN(WEAPONRANK)
   /N_BREAK=N.

Data records - 7697


DATASET ACTIVATE sample.
MATCH FILES
   /TABLE=*
   /FILE ='DataSet1'
   /BY ERefNum PersonCNI.

I know the reason - since both Erefnum and personcni have duplicate
records in the  sample file.

Erefnum has 1354 duplicate records and PersonCNI has 93 dupliacte records
in the sample file

Example 1 - I need to retain these records because even though the erefnum
is the same - there are 3 different persons . While another Erefnum has 6
diff personCNI.

    Erefnum              PersonCNI
    12208890              103800
    12208890      26750091
    12208890      332045679

Example 2: I need to retain these records too because a person can be a
victim several times during the year.

 Erefnum              PersonCNI
12345678        020202020
91011121        020202020
12131395        020202020

Is there any way if overcoming this?

Much appreciate any help.

thanks
regards
Thara Vardhan
Senior Statistician
Performance Improvement & Planning

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information contained in this email is intended for the named recipient(s)
only. It may contain private, confidential, copyright or legally privileged
information.  If you are not the intended recipient or you have received this
email by mistake, please reply to the author and delete this email immediately.
You must not copy, print, forward or distribute this email, nor place reliance
on its contents. This email and any attachment have been virus scanned. However,
you are requested to conduct a virus scan as well.  No liability is accepted
for any loss or damage resulting from a computer virus, or resulting from a delay
or defect in transmission of this email or any attached file. This email does not
constitute a representation by the NSW Police Force unless the author is legally
entitled to do so.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Match file after using aggregate function - reply to Richard Ristow

Richard Ristow
At 09:45 PM 12/16/2008, Thara Vardhan wrote:

You have correctly observed that I do not have to create the variable
'combined' at all. I knew that and I tried to aggregate the 2 files
but failed. Now I know why.
Thanks for showing me that the matchfile function should be /Table=*
rather than /File=*
The match [with] /Table=* seems to be working

Excellent. So far, so good.

both Erefnum and personcni have duplicate records in the  sample
file. Erefnum has 1354 duplicate records and PersonCNI has 93 duplicate records

Example 1 - I need to retain these records because even though the
erefnum is the same - there are 3 different persons . While another
Erefnum has 6 diff personCNI.

Erefnum       PersonCNI
12208890      103800
12208890      26750091
12208890      332045679

Example 2: I need to retain these records too because a person can be
a victim several times during the year.

Erefnum       PersonCNI
12345678      020202020
91011121      020202020
12131395      020202020

THAT is not a problem. Your key is now the *pair* of variables,
Erefnum and personcni. All of those records, above, have unique keys
when you use the pair of variables.

However  I am still facing the same problem - the resultant file
sample has 8048 records - same as the original data file.

DATASET ACTIVATE DataSet1.
DATASET DECLARE sample.
AGGREGATE
    /OUTFILE='sample'
    /BREAK=ERefNum PersonCNI
    /WEAPONRANK_min=MIN(WEAPONRANK)
    /N_BREAK=N.

Data records - 7697

DATASET ACTIVATE sample.
MATCH FILES
    /TABLE=*
    /FILE ='DataSet1'
    /BY ERefNum PersonCNI.

Right. When you do a MATCH FILES with one /TABLE and one /FILE, the
result has exactly the number of records that are in the /FILE file.

Is there any way if overcoming this?

Lots of ways; the question is, what you want to do.

First, notice that your key was the *combination* of ERefNum
PersonCNI (as you said you wanted). You have fewer records in the
aggregated file because of records that match on *both* those
variables -- that is, more than one record for the same person in the
same incident. Is that valid, or are those what you're trying to eliminate?

In your previous request, you asked,

>I have ranked Incident further classification and Weapon to pick a
>single victim record (for the duplicates) with the most serious
>incident or weapon. (most serious being 1 and so on).

The single victim record within an incident? Or, when one person is a
victim more than once in the same incident, the record with the most
serious incident? You can do the latter by selecting after the MATCH
FILES, above:

SELECT IF  WEAPONRANK_min=WEAPONRANK.

That leaves the possibility of ties for 'worst'; think what you want
to do, in such cases.

Finally, by the way, your code

>DATASET ACTIVATE DataSet1.
>DATASET DECLARE sample.
>AGGREGATE
>    /OUTFILE='sample'
>    /BREAK=ERefNum PersonCNI
>    /WEAPONRANK_min=MIN(WEAPONRANK)
>    /N_BREAK=N.
>
>Data records - 7697
>
>DATASET ACTIVATE sample.
>MATCH FILES
>    /TABLE=*
>    /FILE ='DataSet1'
>    /BY ERefNum PersonCNI.

can be replaced by the simpler
DATASET ACTIVATE DataSet1.
AGGREGATE
    /OUTFILE=* MODE=ADDVARIABLES /* This is the change */
    /BREAK=ERefNum PersonCNI
    /WEAPONRANK_min=MIN(WEAPONRANK)
    /N_BREAK=N.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD