|
Hi Richard
Thank you for your response to my problem with match files after aggregate. The reason why I sent my posting to you directly rather than to the list (which I usually do ) is that I was debating whether I should send a sample of my data. But then I realised it is confidential and the only way I could do it was to create some dummy data. Anyway in future I will post all my questions to the listserve. You have correctly observed that I do not have to create the variable 'combined' at all. I knew that and I tried to aggregate the 2 files but failed. Now I know why. Thanks for showing me that the matchfile function should be /Table=* rather than /File=* However I am still facing the same problem - while the match /Table=* seems to be working - the resultant file sample has 8048 records - same as the original data file. DATASET ACTIVATE DataSet1. DATASET DECLARE sample. AGGREGATE /OUTFILE='sample' /BREAK=ERefNum PersonCNI /WEAPONRANK_min=MIN(WEAPONRANK) /N_BREAK=N. Data records - 7697 DATASET ACTIVATE sample. MATCH FILES /TABLE=* /FILE ='DataSet1' /BY ERefNum PersonCNI. I know the reason - since both Erefnum and personcni have duplicate records in the sample file. Erefnum has 1354 duplicate records and PersonCNI has 93 dupliacte records in the sample file Example 1 - I need to retain these records because even though the erefnum is the same - there are 3 different persons . While another Erefnum has 6 diff personCNI. Erefnum PersonCNI 12208890 103800 12208890 26750091 12208890 332045679 Example 2: I need to retain these records too because a person can be a victim several times during the year. Erefnum PersonCNI 12345678 020202020 91011121 020202020 12131395 020202020 Is there any way if overcoming this? Much appreciate any help. thanks regards Thara Vardhan Senior Statistician Performance Improvement & Planning _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The information contained in this email is intended for the named recipient(s) only. It may contain private, confidential, copyright or legally privileged information. If you are not the intended recipient or you have received this email by mistake, please reply to the author and delete this email immediately. You must not copy, print, forward or distribute this email, nor place reliance on its contents. This email and any attachment have been virus scanned. However, you are requested to conduct a virus scan as well. No liability is accepted for any loss or damage resulting from a computer virus, or resulting from a delay or defect in transmission of this email or any attached file. This email does not constitute a representation by the NSW Police Force unless the author is legally entitled to do so. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 09:45 PM 12/16/2008, Thara Vardhan wrote:
You have correctly observed that I do not have to create the variable 'combined' at all. I knew that and I tried to aggregate the 2 files but failed. Now I know why. Thanks for showing me that the matchfile function should be /Table=* rather than /File=* The match [with] /Table=* seems to be working Excellent. So far, so good. both Erefnum and personcni have duplicate records in the sample file. Erefnum has 1354 duplicate records and PersonCNI has 93 duplicate records Example 1 - I need to retain these records because even though the erefnum is the same - there are 3 different persons . While another Erefnum has 6 diff personCNI. Erefnum PersonCNI 12208890 103800 12208890 26750091 12208890 332045679 Example 2: I need to retain these records too because a person can be a victim several times during the year. Erefnum PersonCNI 12345678 020202020 91011121 020202020 12131395 020202020 THAT is not a problem. Your key is now the *pair* of variables, Erefnum and personcni. All of those records, above, have unique keys when you use the pair of variables. However I am still facing the same problem - the resultant file sample has 8048 records - same as the original data file. DATASET ACTIVATE DataSet1. DATASET DECLARE sample. AGGREGATE /OUTFILE='sample' /BREAK=ERefNum PersonCNI /WEAPONRANK_min=MIN(WEAPONRANK) /N_BREAK=N. Data records - 7697 DATASET ACTIVATE sample. MATCH FILES /TABLE=* /FILE ='DataSet1' /BY ERefNum PersonCNI. Right. When you do a MATCH FILES with one /TABLE and one /FILE, the result has exactly the number of records that are in the /FILE file. Is there any way if overcoming this? Lots of ways; the question is, what you want to do. First, notice that your key was the *combination* of ERefNum PersonCNI (as you said you wanted). You have fewer records in the aggregated file because of records that match on *both* those variables -- that is, more than one record for the same person in the same incident. Is that valid, or are those what you're trying to eliminate? In your previous request, you asked, >I have ranked Incident further classification and Weapon to pick a >single victim record (for the duplicates) with the most serious >incident or weapon. (most serious being 1 and so on). The single victim record within an incident? Or, when one person is a victim more than once in the same incident, the record with the most serious incident? You can do the latter by selecting after the MATCH FILES, above: SELECT IF WEAPONRANK_min=WEAPONRANK. That leaves the possibility of ties for 'worst'; think what you want to do, in such cases. Finally, by the way, your code >DATASET ACTIVATE DataSet1. >DATASET DECLARE sample. >AGGREGATE > /OUTFILE='sample' > /BREAK=ERefNum PersonCNI > /WEAPONRANK_min=MIN(WEAPONRANK) > /N_BREAK=N. > >Data records - 7697 > >DATASET ACTIVATE sample. >MATCH FILES > /TABLE=* > /FILE ='DataSet1' > /BY ERefNum PersonCNI. can be replaced by the simpler DATASET ACTIVATE DataSet1. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /* This is the change */ /BREAK=ERefNum PersonCNI /WEAPONRANK_min=MIN(WEAPONRANK) /N_BREAK=N. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
