Fellow SPSS users,
I have a set of data which I know has duplicates in it. The option of having the data provider go through their records and signify which are duplicates and which aren't is not an option. I have run the duplicate cases by incident number and age in order to weed out cases which I don't believe to be duplicates and am left with a set of data similar to that at the end of the e-mail below. There are around 400 cases which are differentiated from each other only by incident number and outcome - the age of the offenders are the same. It is possible that this same syntax will have to be run against a much larger number of cases in the future. In this case, '1' stands for arrested and 'N' for not arrested. I need syntax that will delete one record with an 'N' for each record where there is a '1' on the incident. Here are some of the possible scenarios and what i would like to keep using syntax. In each scenario, you can assume that all cases have the same incident number although the complete data set has 199 incident numbers. The number of offenders per incident is always between 2 and 9. The datasets at the bottom go through each of these scenarios in the same order as they are presented here. The first set is the data with the duplicates I want to delete and the second is with the duplicates I wish to delete dropped...problem and solution. I greatly appreciate any help that you may be able to give and will be glad to clarify any questions. Thanks! -Ariel Barak Scenario 1) Data Solution N N N N Scenario 2) Data Solution 1 1 N Scenario 3) Data Solution 1 1 1 1 N Scenario 4) Data Solution 1 1 N N N Scenario 5) Data Solution 1 1 N N N N N N N Scenario 6) Data Solution 1 1 1 1 N N Scenario 7) Data Solution 1 1 1 1 data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . begin data 14386912419N 14386912419N 264872871231 26487287123N 371863475451 371863475451 37186347545N 648172350341 64817235034N 64817235034N 715484287291 71548428729N 71548428729N 71548428729N 71548428729N 864708752551 864708752551 86470875255N 86470875255N 904687125411 904687125411 end data. value labels outcome '1' 'Arrested' 'N' 'Not Arrested'. DATASET NAME Problem. data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . begin data 14386912419N 14386912419N 264872871231 371863475451 371863475451 648172350341 64817235034N 715484287291 71548428729N 71548428729N 71548428729N 864708752551 864708752551 904687125411 904687125411 end data. value labels outcome '1' 'Arrested' 'N' 'Not Arrested'. DATASET NAME Solution. |
Just a thought, it seems like you could sort so that the one you want to
drop always FOLLOWS the one you would want to keep then use the LAG function to identify duplicates. Something like this: Compute drop=(id=lag(id) and outcome="N" and lag(outcome="1")). This will create drop=1 for any record with the same ID where the current outcome is N and there exists another outcome=1. Melissa The bubbling brook would lose its song if you removed the rocks. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ariel barak Sent: Tuesday, March 20, 2007 3:26 PM To: [hidden email] Subject: [SPSSX-L] Dropping duplicates Fellow SPSS users, I have a set of data which I know has duplicates in it. The option of having the data provider go through their records and signify which are duplicates and which aren't is not an option. I have run the duplicate cases by incident number and age in order to weed out cases which I don't believe to be duplicates and am left with a set of data similar to that at the end of the e-mail below. There are around 400 cases which are differentiated from each other only by incident number and outcome - the age of the offenders are the same. It is possible that this same syntax will have to be run against a much larger number of cases in the future. In this case, '1' stands for arrested and 'N' for not arrested. I need syntax that will delete one record with an 'N' for each record where there is a '1' on the incident. Here are some of the possible scenarios and what i would like to keep using syntax. In each scenario, you can assume that all cases have the same incident number although the complete data set has 199 incident numbers. The number of offenders per incident is always between 2 and 9. The datasets at the bottom go through each of these scenarios in the same order as they are presented here. The first set is the data with the duplicates I want to delete and the second is with the duplicates I wish to delete dropped...problem and solution. I greatly appreciate any help that you may be able to give and will be glad to clarify any questions. Thanks! -Ariel Barak Scenario 1) Data Solution N N N N Scenario 2) Data Solution 1 1 N Scenario 3) Data Solution 1 1 1 1 N Scenario 4) Data Solution 1 1 N N N Scenario 5) Data Solution 1 1 N N N N N N N Scenario 6) Data Solution 1 1 1 1 N N Scenario 7) Data Solution 1 1 1 1 data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . begin data 14386912419N 14386912419N 264872871231 26487287123N 371863475451 371863475451 37186347545N 648172350341 64817235034N 64817235034N 715484287291 71548428729N 71548428729N 71548428729N 71548428729N 864708752551 864708752551 86470875255N 86470875255N 904687125411 904687125411 end data. value labels outcome '1' 'Arrested' 'N' 'Not Arrested'. DATASET NAME Problem. data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . begin data 14386912419N 14386912419N 264872871231 371863475451 371863475451 648172350341 64817235034N 715484287291 71548428729N 71548428729N 71548428729N 864708752551 864708752551 904687125411 904687125411 end data. value labels outcome '1' 'Arrested' 'N' 'Not Arrested'. DATASET NAME Solution. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
Hi Melissa and Fellow SPSS users,
I think you're pointing me in the right direction...however, when I tried the syntax you suggested and I got this error: COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND lag(outcome ='1')). >Error # 4323 in column 85. Text: ) >The first argument of the LAG function must be a variable. It must not be >a constant or an expression. >This command not executed. EXE. The issue is with the second lag command...any thoughts on how to get around this? Your help is GREATLY appreciated. -Ariel On 3/21/07, Melissa Ives <[hidden email]> wrote: > > Just a thought, it seems like you could sort so that the one you want to > drop always FOLLOWS the one you would want to keep then use the LAG > function to identify duplicates. Something like this: > > Compute drop=(id=lag(id) and outcome="N" and lag(outcome="1")). > > This will create drop=1 for any record with the same ID where the > current outcome is N and there exists another outcome=1. > > Melissa > The bubbling brook would lose its song if you removed the rocks. > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > ariel barak > Sent: Tuesday, March 20, 2007 3:26 PM > To: [hidden email] > Subject: [SPSSX-L] Dropping duplicates > > Fellow SPSS users, > > I have a set of data which I know has duplicates in it. The option of > having the data provider go through their records and signify which are > duplicates and which aren't is not an option. I have run the duplicate > cases by incident number and age in order to weed out cases which I > don't believe to be duplicates and am left with a set of data similar to > that at the end of the e-mail below. There are around 400 cases which > are differentiated from each other only by incident number and outcome - > the age of the offenders are the same. It is possible that this same > syntax will have to be run against a much larger number of cases in the > future. > > In this case, '1' stands for arrested and 'N' for not arrested. I need > syntax that will delete one record with an 'N' for each record where > there is a '1' on the incident. Here are some of the possible scenarios > and what i would like to keep using syntax. In each scenario, you can > assume that all cases have the same incident number although the > complete data set has 199 incident numbers. The number of offenders per > incident is always between 2 and 9. > > The datasets at the bottom go through each of these scenarios in the > same order as they are presented here. The first set is the data with > the duplicates I want to delete and the second is with the duplicates I > wish to delete dropped...problem and solution. > > I greatly appreciate any help that you may be able to give and will be > glad to clarify any questions. Thanks! > > -Ariel Barak > > Scenario 1) > Data Solution > N N > N N > > Scenario 2) > Data Solution > 1 1 > N > > Scenario 3) > Data Solution > 1 1 > 1 1 > N > > Scenario 4) > Data Solution > 1 1 > N N > N > > Scenario 5) > Data Solution > 1 1 > N N > N N > N N > N > Scenario 6) > Data Solution > 1 1 > 1 1 > N > N > > Scenario 7) > Data Solution > 1 1 > 1 1 > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > begin data > 14386912419N > 14386912419N > 264872871231 > 26487287123N > 371863475451 > 371863475451 > 37186347545N > 648172350341 > 64817235034N > 64817235034N > 715484287291 > 71548428729N > 71548428729N > 71548428729N > 71548428729N > 864708752551 > 864708752551 > 86470875255N > 86470875255N > 904687125411 > 904687125411 > end data. > > value labels outcome > '1' 'Arrested' > 'N' 'Not Arrested'. > > DATASET NAME Problem. > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > begin data > 14386912419N > 14386912419N > 264872871231 > 371863475451 > 371863475451 > 648172350341 > 64817235034N > 715484287291 > 71548428729N > 71548428729N > 71548428729N > 864708752551 > 864708752551 > 904687125411 > 904687125411 > end data. > > value labels outcome > '1' 'Arrested' > 'N' 'Not Arrested'. > > DATASET NAME Solution. > |
In reply to this post by ariel barak
My bad. Move the close parentheses to after 'lag(outcome' and before
the '='1')' instead of after it... Like this. COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND lag(outcome)='1'). Melissa The bubbling brook would lose its song if you removed the rocks. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ariel barak Sent: Wednesday, March 21, 2007 10:26 AM To: [hidden email] Subject: Re: [SPSSX-L] Dropping duplicates Hi Melissa and Fellow SPSS users, I think you're pointing me in the right direction...however, when I tried the syntax you suggested and I got this error: COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND lag(outcome ='1')). >Error # 4323 in column 85. Text: ) >The first argument of the LAG function must be a variable. It must not >be a constant or an expression. >This command not executed. EXE. The issue is with the second lag command...any thoughts on how to get around this? Your help is GREATLY appreciated. -Ariel On 3/21/07, Melissa Ives <[hidden email]> wrote: > > Just a thought, it seems like you could sort so that the one you want > to drop always FOLLOWS the one you would want to keep then use the LAG > function to identify duplicates. Something like this: > > Compute drop=(id=lag(id) and outcome="N" and lag(outcome="1")). > > This will create drop=1 for any record with the same ID where the > current outcome is N and there exists another outcome=1. > > Melissa > The bubbling brook would lose its song if you removed the rocks. > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > Of ariel barak > Sent: Tuesday, March 20, 2007 3:26 PM > To: [hidden email] > Subject: [SPSSX-L] Dropping duplicates > > Fellow SPSS users, > > I have a set of data which I know has duplicates in it. The option of > having the data provider go through their records and signify which > are duplicates and which aren't is not an option. I have run the > duplicate cases by incident number and age in order to weed out cases > which I don't believe to be duplicates and am left with a set of data > similar to that at the end of the e-mail below. There are around 400 > cases which are differentiated from each other only by incident number > and outcome - the age of the offenders are the same. It is possible > that this same syntax will have to be run against a much larger number > of cases in the future. > > In this case, '1' stands for arrested and 'N' for not arrested. I need > syntax that will delete one record with an 'N' for each record where > there is a '1' on the incident. Here are some of the possible > scenarios and what i would like to keep using syntax. In each > scenario, you can assume that all cases have the same incident number > although the complete data set has 199 incident numbers. The number of > offenders per incident is always between 2 and 9. > > The datasets at the bottom go through each of these scenarios in the > same order as they are presented here. The first set is the data with > the duplicates I want to delete and the second is with the duplicates > I wish to delete dropped...problem and solution. > > I greatly appreciate any help that you may be able to give and will be > glad to clarify any questions. Thanks! > > -Ariel Barak > > Scenario 1) > Data Solution > N N > N N > > Scenario 2) > Data Solution > 1 1 > N > > Scenario 3) > Data Solution > 1 1 > 1 1 > N > > Scenario 4) > Data Solution > 1 1 > N N > N > > Scenario 5) > Data Solution > 1 1 > N N > N N > N N > N > Scenario 6) > Data Solution > 1 1 > 1 1 > N > N > > Scenario 7) > Data Solution > 1 1 > 1 1 > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > begin data > 14386912419N > 14386912419N > 264872871231 > 26487287123N > 371863475451 > 371863475451 > 37186347545N > 648172350341 > 64817235034N > 64817235034N > 715484287291 > 71548428729N > 71548428729N > 71548428729N > 71548428729N > 864708752551 > 864708752551 > 86470875255N > 86470875255N > 904687125411 > 904687125411 > end data. > > value labels outcome > '1' 'Arrested' > 'N' 'Not Arrested'. > > DATASET NAME Problem. > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > begin data > 14386912419N > 14386912419N > 264872871231 > 371863475451 > 371863475451 > 648172350341 > 64817235034N > 715484287291 > 71548428729N > 71548428729N > 71548428729N > 864708752551 > 864708752551 > 904687125411 > 904687125411 > end data. > > value labels outcome > '1' 'Arrested' > 'N' 'Not Arrested'. > > DATASET NAME Solution. > PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
Melissa and Fellow SPSS users,
Thanks for the quick response Melissa. The syntax you suggested is now working almost perfectly. The only issue is that it does not flag both 'N's in Scenario 6 in my initial e-mail: Scenario 6) Data Solution 1 1 1 1 N N It flags the first 'N' but not the second. Further, if the data were as in the new scenario below, it would only flag the first 'N' instead of all of them. Scenario 8) Data Solution 1 1 1 1 1 1 1 1 1 1 N N N Basically, if the number of 1's on an incident is equal to or greater than the number of 'N's, all 'N's need to be flagged. Any ideas on how to correct this? Thanks again, -Ariel On 3/21/07, Melissa Ives <[hidden email]> wrote: > My bad. Move the close parentheses to after 'lag(outcome' and before > the '='1')' instead of after it... Like this. > > COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND > lag(outcome)='1'). > > Melissa > The bubbling brook would lose its song if you removed the rocks. > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > ariel barak > Sent: Wednesday, March 21, 2007 10:26 AM > To: [hidden email] > Subject: Re: [SPSSX-L] Dropping duplicates > > Hi Melissa and Fellow SPSS users, > > I think you're pointing me in the right direction...however, when I > tried the syntax you suggested and I got this error: > > > COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND > lag(outcome > ='1')). > > >Error # 4323 in column 85. Text: ) > >The first argument of the LAG function must be a variable. It must not > > >be a constant or an expression. > >This command not executed. > > EXE. > The issue is with the second lag command...any thoughts on how to get > around this? > > Your help is GREATLY appreciated. > > -Ariel > > > On 3/21/07, Melissa Ives <[hidden email]> wrote: > > > > Just a thought, it seems like you could sort so that the one you want > > to drop always FOLLOWS the one you would want to keep then use the LAG > > > function to identify duplicates. Something like this: > > > > Compute drop=(id=lag(id) and outcome="N" and lag(outcome="1")). > > > > This will create drop=1 for any record with the same ID where the > > current outcome is N and there exists another outcome=1. > > > > Melissa > > The bubbling brook would lose its song if you removed the rocks. > > > > > > -----Original Message----- > > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > > Of ariel barak > > Sent: Tuesday, March 20, 2007 3:26 PM > > To: [hidden email] > > Subject: [SPSSX-L] Dropping duplicates > > > > Fellow SPSS users, > > > > I have a set of data which I know has duplicates in it. The option of > > having the data provider go through their records and signify which > > are duplicates and which aren't is not an option. I have run the > > duplicate cases by incident number and age in order to weed out cases > > which I don't believe to be duplicates and am left with a set of data > > similar to that at the end of the e-mail below. There are around 400 > > cases which are differentiated from each other only by incident number > > > and outcome - the age of the offenders are the same. It is possible > > that this same syntax will have to be run against a much larger number > > > of cases in the future. > > > > In this case, '1' stands for arrested and 'N' for not arrested. I need > > > syntax that will delete one record with an 'N' for each record where > > there is a '1' on the incident. Here are some of the possible > > scenarios and what i would like to keep using syntax. In each > > scenario, you can assume that all cases have the same incident number > > although the complete data set has 199 incident numbers. The number of > > > offenders per incident is always between 2 and 9. > > > > The datasets at the bottom go through each of these scenarios in the > > same order as they are presented here. The first set is the data with > > the duplicates I want to delete and the second is with the duplicates > > I wish to delete dropped...problem and solution. > > > > I greatly appreciate any help that you may be able to give and will be > > > glad to clarify any questions. Thanks! > > > > -Ariel Barak > > > > Scenario 1) > > Data Solution > > N N > > N N > > > > Scenario 2) > > Data Solution > > 1 1 > > N > > > > Scenario 3) > > Data Solution > > 1 1 > > 1 1 > > N > > > > Scenario 4) > > Data Solution > > 1 1 > > N N > > N > > > > Scenario 5) > > Data Solution > > 1 1 > > N N > > N N > > N N > > N > > Scenario 6) > > Data Solution > > 1 1 > > 1 1 > > N > > N > > > > Scenario 7) > > Data Solution > > 1 1 > > 1 1 > > > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > > begin data > > 14386912419N > > 14386912419N > > 264872871231 > > 26487287123N > > 371863475451 > > 371863475451 > > 37186347545N > > 648172350341 > > 64817235034N > > 64817235034N > > 715484287291 > > 71548428729N > > 71548428729N > > 71548428729N > > 71548428729N > > 864708752551 > > 864708752551 > > 86470875255N > > 86470875255N > > 904687125411 > > 904687125411 > > end data. > > > > value labels outcome > > '1' 'Arrested' > > 'N' 'Not Arrested'. > > > > DATASET NAME Problem. > > > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > > begin data > > 14386912419N > > 14386912419N > > 264872871231 > > 371863475451 > > 371863475451 > > 648172350341 > > 64817235034N > > 715484287291 > > 71548428729N > > 71548428729N > > 71548428729N > > 864708752551 > > 864708752551 > > 904687125411 > > 904687125411 > > end data. > > > > value labels outcome > > '1' 'Arrested' > > 'N' 'Not Arrested'. > > > > DATASET NAME Solution. |
In reply to this post by ariel barak
You should be able to have the 1 response from the first N response
'cascade' down to any other responses with something like this: If (incidentnumber=lag(incidentnumber) AND outcome='N' AND lag(drop)='1') drop=lag(drop). Melissa The bubbling brook would lose its song if you removed the rocks. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ariel barak Sent: Wednesday, March 21, 2007 11:01 AM To: [hidden email] Subject: Re: [SPSSX-L] Dropping duplicates Melissa and Fellow SPSS users, Thanks for the quick response Melissa. The syntax you suggested is now working almost perfectly. The only issue is that it does not flag both 'N's in Scenario 6 in my initial e-mail: Scenario 6) Data Solution 1 1 1 1 N N It flags the first 'N' but not the second. Further, if the data were as in the new scenario below, it would only flag the first 'N' instead of all of them. Scenario 8) Data Solution 1 1 1 1 1 1 1 1 1 1 N N N Basically, if the number of 1's on an incident is equal to or greater than the number of 'N's, all 'N's need to be flagged. Any ideas on how to correct this? Thanks again, -Ariel On 3/21/07, Melissa Ives <[hidden email]> wrote: > My bad. Move the close parentheses to after 'lag(outcome' and before > the '='1')' instead of after it... Like this. > > COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND > lag(outcome)='1'). > > Melissa > The bubbling brook would lose its song if you removed the rocks. > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > Of ariel barak > Sent: Wednesday, March 21, 2007 10:26 AM > To: [hidden email] > Subject: Re: [SPSSX-L] Dropping duplicates > > Hi Melissa and Fellow SPSS users, > > I think you're pointing me in the right direction...however, when I > tried the syntax you suggested and I got this error: > > > COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND > lag(outcome > ='1')). > > >Error # 4323 in column 85. Text: ) > >The first argument of the LAG function must be a variable. It must > >not > > >be a constant or an expression. > >This command not executed. > > EXE. > The issue is with the second lag command...any thoughts on how to get > around this? > > Your help is GREATLY appreciated. > > -Ariel > > > On 3/21/07, Melissa Ives <[hidden email]> wrote: > > > > Just a thought, it seems like you could sort so that the one you > > want to drop always FOLLOWS the one you would want to keep then use > > the LAG > > > function to identify duplicates. Something like this: > > > > Compute drop=(id=lag(id) and outcome="N" and lag(outcome="1")). > > > > This will create drop=1 for any record with the same ID where the > > current outcome is N and there exists another outcome=1. > > > > Melissa > > The bubbling brook would lose its song if you removed the rocks. > > > > > > -----Original Message----- > > From: SPSSX(r) Discussion [mailto:[hidden email]] On > > Behalf Of ariel barak > > Sent: Tuesday, March 20, 2007 3:26 PM > > To: [hidden email] > > Subject: [SPSSX-L] Dropping duplicates > > > > Fellow SPSS users, > > > > I have a set of data which I know has duplicates in it. The option > > of having the data provider go through their records and signify > > which are duplicates and which aren't is not an option. I have run > > the duplicate cases by incident number and age in order to weed out > > cases which I don't believe to be duplicates and am left with a set > > of data similar to that at the end of the e-mail below. There are > > around 400 cases which are differentiated from each other only by > > incident number > > > and outcome - the age of the offenders are the same. It is possible > > that this same syntax will have to be run against a much larger > > number > > > of cases in the future. > > > > In this case, '1' stands for arrested and 'N' for not arrested. I > > need > > > syntax that will delete one record with an 'N' for each record where > > there is a '1' on the incident. Here are some of the possible > > scenarios and what i would like to keep using syntax. In each > > scenario, you can assume that all cases have the same incident > > number although the complete data set has 199 incident numbers. The > > number of > > > offenders per incident is always between 2 and 9. > > > > The datasets at the bottom go through each of these scenarios in the > > same order as they are presented here. The first set is the data > > with the duplicates I want to delete and the second is with the > > duplicates I wish to delete dropped...problem and solution. > > > > I greatly appreciate any help that you may be able to give and will > > be > > > glad to clarify any questions. Thanks! > > > > -Ariel Barak > > > > Scenario 1) > > Data Solution > > N N > > N N > > > > Scenario 2) > > Data Solution > > 1 1 > > N > > > > Scenario 3) > > Data Solution > > 1 1 > > 1 1 > > N > > > > Scenario 4) > > Data Solution > > 1 1 > > N N > > N > > > > Scenario 5) > > Data Solution > > 1 1 > > N N > > N N > > N N > > N > > Scenario 6) > > Data Solution > > 1 1 > > 1 1 > > N > > N > > > > Scenario 7) > > Data Solution > > 1 1 > > 1 1 > > > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > > begin data > > 14386912419N > > 14386912419N > > 264872871231 > > 26487287123N > > 371863475451 > > 371863475451 > > 37186347545N > > 648172350341 > > 64817235034N > > 64817235034N > > 715484287291 > > 71548428729N > > 71548428729N > > 71548428729N > > 71548428729N > > 864708752551 > > 864708752551 > > 86470875255N > > 86470875255N > > 904687125411 > > 904687125411 > > end data. > > > > value labels outcome > > '1' 'Arrested' > > 'N' 'Not Arrested'. > > > > DATASET NAME Problem. > > > > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . > > begin data > > 14386912419N > > 14386912419N > > 264872871231 > > 371863475451 > > 371863475451 > > 648172350341 > > 64817235034N > > 715484287291 > > 71548428729N > > 71548428729N > > 71548428729N > > 864708752551 > > 864708752551 > > 904687125411 > > 904687125411 > > end data. > > > > value labels outcome > > '1' 'Arrested' > > 'N' 'Not Arrested'. > > > > DATASET NAME Solution. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
In reply to this post by ariel barak
At 04:25 PM 3/20/2007, ariel barak wrote:
>There are around 400 cases which are differentiated from each other >only by incident number and outcome - the age of the offenders are the >same. > >In this case, '1' stands for arrested and 'N' for not arrested. I need >syntax that will delete one record with an 'N' for each record where >there is a '1' on the [same] incident [and age]. Try this; it appears to get the result you want. It uses SPSS draft variables, which are carried over from case to case, to look back as many cases as necessary. SPSS 15 draft output. DATASET ACTIVATE PROBLEM. LIST. List |-----------------------------|---------------------------| |Output Created |21-MAR-2007 13:43:56 | |-----------------------------|---------------------------| [PROBLEM] incidentnumber age Outcome 143,869,124 19 N 143,869,124 19 N 264,872,871 23 1 264,872,871 23 N 371,863,475 45 1 371,863,475 45 1 371,863,475 45 N 648,172,350 34 1 648,172,350 34 N 648,172,350 34 N 715,484,287 29 1 715,484,287 29 N 715,484,287 29 N 715,484,287 29 N 715,484,287 29 N 864,708,752 55 1 864,708,752 55 1 864,708,752 55 N 864,708,752 55 N 904,687,125 41 1 904,687,125 41 1 Number of cases read: 21 Number of cases listed: 21 * Identify beginning of an incident/person . COMPUTE #New_Guy = 9. DO IF MISSING(LAG(incidentnumber)) OR MISSING(LAG(age)). . COMPUTE #New_Guy = 1. ELSE IF incidentnumber NE LAG(incidentnumber) OR age NE LAG(age). . COMPUTE #New_Guy = 1. ELSE. . COMPUTE #New_Guy = 0. END IF. * Bug check . DO IF #New_Guy EQ 9. . PRINT / 'New_Guy bug: Case ' $CASENUM ' incid. ' incidentnumber ' age ' age. END IF. * "#Drop_N" is the number of 'N' records to be dropped . IF #New_Guy EQ 1 #Drop_N = 0. NUMERIC #Drop_It (F2). DO IF outcome EQ '1'. . COMPUTE #Drop_It = 0. . COMPUTE #Drop_N = #Drop_N + 1. ELSE IF outcome EQ 'N'. . COMPUTE #Drop_It = #Drop_N GT 0. . COMPUTE #Drop_N = #Drop_N - 1. ELSE. . PRINT / 'Drop_N bug: Case ' $CASENUM ' incid. ' incidentnumber ' age ' age ' outcome ' outcome. END IF. . FORMATS #New_Guy #Drop_N (F3). . /*-- DO IF #New_Guy EQ 1 /*-*/. . /*-- PRINT / ' ' /*-*/. . /*-- END IF /*-*/. . /*-- PRINT / /*-*/ /*-- incidentnumber age outcome /*-*/ /*-- 'New: ' #New_Guy ' Drop this: ' #Drop_It /*-*/ /*-- 'Drop yet '#Drop_It /*-*/. . /*-- EXECUTE. SELECT IF NOT #Drop_It. FREQUENCIES outcome. Frequencies |-----------------------------|---------------------------| |Output Created |21-MAR-2007 13:43:57 | |-----------------------------|---------------------------| [PROBLEM] Statistics [suppressed - no missing data] Outcome |-----|---------------|---------|-------|-------------|---------------| | | |Frequency|Percent|Valid Percent|Cumulative | | | | | | |Percent | |-----|---------------|---------|-------|-------------|---------------| |Valid|1 Arrested |9 |60.0 |60.0 |60.0 | | |N Not Arrested|6 |40.0 |40.0 |100.0 | | |---------------|---------|-------|-------------|---------------| | |Total |15 |100.0 |100.0 | | |-----|---------------|---------|-------|-------------|---------------| LIST. List |-----------------------------|---------------------------| |Output Created |21-MAR-2007 13:43:57 | |-----------------------------|---------------------------| [PROBLEM] incidentnumber age Outcome 143,869,124 19 N 143,869,124 19 N 264,872,871 23 1 371,863,475 45 1 371,863,475 45 1 648,172,350 34 1 648,172,350 34 N 715,484,287 29 1 715,484,287 29 N 715,484,287 29 N 715,484,287 29 N 864,708,752 55 1 864,708,752 55 1 904,687,125 41 1 904,687,125 41 1 Number of cases read: 15 Number of cases listed: 15 * ........................................ . * Cross-check: desired result . DATASET ACTIVATE PROBLEM. DATASET DECLARE Selct_N. AGGREGATE OUTFILE=Selct_N /BREAK = incidentnumber age outcome /N_Selctd = NU. DATASET ACTIVATE SOLUTION. DATASET DECLARE Soln_N. AGGREGATE OUTFILE=Soln_N /BREAK = incidentnumber age outcome /N_Wanted = NU. MATCH FILES /FILE=Selct_N /FILE=Soln_N /BY incidentnumber age outcome. RECODE N_Selctd N_Wanted (MISSING = 0). TEMPORARY. SELECT IF N_Selctd NE N_Wanted. LIST. List |-----------------------------|---------------------------| |Output Created |21-MAR-2007 13:43:59 | |-----------------------------|---------------------------| Number of cases read: 0 Number of cases listed: 0 =================== APPENDIX: Test data =================== * ................................................... . * ............ Test data, from posting ............ . * ................................................... . * I. Input to edit . * ................................................... . NEW FILE. data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . begin data 14386912419N 14386912419N 264872871231 26487287123N 371863475451 371863475451 37186347545N 648172350341 64817235034N 64817235034N 715484287291 71548428729N 71548428729N 71548428729N 71548428729N 864708752551 864708752551 86470875255N 86470875255N 904687125411 904687125411 end data. value labels outcome '1' 'Arrested' 'N' 'Not Arrested'. FORMATS incidentnumber (COMMA12) /age (F3). SORT CASES BY incidentnumber age (A) outcome (A). DATASET NAME TestData. * ................................................... . * II. Result desired . * ................................................... . NEW FILE. data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) . begin data 14386912419N 14386912419N 264872871231 371863475451 371863475451 648172350341 64817235034N 715484287291 71548428729N 71548428729N 71548428729N 864708752551 864708752551 904687125411 904687125411 end data. FORMATS incidentnumber (COMMA12) /age (F3). value labels outcome '1' 'Arrested' 'N' 'Not Arrested'. DATASET NAME SOLUTION. * ................................................... . * III. Load input data, to run. * ................................................... . DATASET ACTIVATE TestData. DATASET COPY PROBLEM. * ................................................... . |
Free forum by Nabble | Edit this page |