Posted by
Melissa Ives on
Mar 21, 2007; 5:04pm
URL: http://spssx-discussion.165.s1.nabble.com/Dropping-duplicates-tp1074608p1074610.html
You should be able to have the 1 response from the first N response
'cascade' down to any other responses with something like this:
If (incidentnumber=lag(incidentnumber) AND outcome='N' AND
lag(drop)='1') drop=lag(drop).
Melissa
The bubbling brook would lose its song if you removed the rocks.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:
[hidden email]] On Behalf Of
ariel barak
Sent: Wednesday, March 21, 2007 11:01 AM
To:
[hidden email]
Subject: Re: [SPSSX-L] Dropping duplicates
Melissa and Fellow SPSS users,
Thanks for the quick response Melissa. The syntax you suggested is now
working almost perfectly. The only issue is that it does not flag both
'N's in Scenario 6 in my initial e-mail:
Scenario 6)
Data Solution
1 1
1 1
N
N
It flags the first 'N' but not the second. Further, if the data were as
in the new scenario below, it would only flag the first 'N' instead of
all of them.
Scenario 8)
Data Solution
1 1
1 1
1 1
1 1
1 1
N
N
N
Basically, if the number of 1's on an incident is equal to or greater
than the number of 'N's, all 'N's need to be flagged.
Any ideas on how to correct this?
Thanks again,
-Ariel
On 3/21/07, Melissa Ives <
[hidden email]> wrote:
> My bad. Move the close parentheses to after 'lag(outcome' and before
> the '='1')' instead of after it... Like this.
>
> COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND
> lag(outcome)='1').
>
> Melissa
> The bubbling brook would lose its song if you removed the rocks.
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:
[hidden email]] On Behalf
> Of ariel barak
> Sent: Wednesday, March 21, 2007 10:26 AM
> To:
[hidden email]
> Subject: Re: [SPSSX-L] Dropping duplicates
>
> Hi Melissa and Fellow SPSS users,
>
> I think you're pointing me in the right direction...however, when I
> tried the syntax you suggested and I got this error:
>
>
> COMPUTE drop=(incidentnumber=lag(incidentnumber) AND outcome='N' AND
> lag(outcome
> ='1')).
>
> >Error # 4323 in column 85. Text: )
> >The first argument of the LAG function must be a variable. It must
> >not
>
> >be a constant or an expression.
> >This command not executed.
>
> EXE.
> The issue is with the second lag command...any thoughts on how to get
> around this?
>
> Your help is GREATLY appreciated.
>
> -Ariel
>
>
> On 3/21/07, Melissa Ives <
[hidden email]> wrote:
> >
> > Just a thought, it seems like you could sort so that the one you
> > want to drop always FOLLOWS the one you would want to keep then use
> > the LAG
>
> > function to identify duplicates. Something like this:
> >
> > Compute drop=(id=lag(id) and outcome="N" and lag(outcome="1")).
> >
> > This will create drop=1 for any record with the same ID where the
> > current outcome is N and there exists another outcome=1.
> >
> > Melissa
> > The bubbling brook would lose its song if you removed the rocks.
> >
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion [mailto:
[hidden email]] On
> > Behalf Of ariel barak
> > Sent: Tuesday, March 20, 2007 3:26 PM
> > To:
[hidden email]
> > Subject: [SPSSX-L] Dropping duplicates
> >
> > Fellow SPSS users,
> >
> > I have a set of data which I know has duplicates in it. The option
> > of having the data provider go through their records and signify
> > which are duplicates and which aren't is not an option. I have run
> > the duplicate cases by incident number and age in order to weed out
> > cases which I don't believe to be duplicates and am left with a set
> > of data similar to that at the end of the e-mail below. There are
> > around 400 cases which are differentiated from each other only by
> > incident number
>
> > and outcome - the age of the offenders are the same. It is possible
> > that this same syntax will have to be run against a much larger
> > number
>
> > of cases in the future.
> >
> > In this case, '1' stands for arrested and 'N' for not arrested. I
> > need
>
> > syntax that will delete one record with an 'N' for each record where
> > there is a '1' on the incident. Here are some of the possible
> > scenarios and what i would like to keep using syntax. In each
> > scenario, you can assume that all cases have the same incident
> > number although the complete data set has 199 incident numbers. The
> > number of
>
> > offenders per incident is always between 2 and 9.
> >
> > The datasets at the bottom go through each of these scenarios in the
> > same order as they are presented here. The first set is the data
> > with the duplicates I want to delete and the second is with the
> > duplicates I wish to delete dropped...problem and solution.
> >
> > I greatly appreciate any help that you may be able to give and will
> > be
>
> > glad to clarify any questions. Thanks!
> >
> > -Ariel Barak
> >
> > Scenario 1)
> > Data Solution
> > N N
> > N N
> >
> > Scenario 2)
> > Data Solution
> > 1 1
> > N
> >
> > Scenario 3)
> > Data Solution
> > 1 1
> > 1 1
> > N
> >
> > Scenario 4)
> > Data Solution
> > 1 1
> > N N
> > N
> >
> > Scenario 5)
> > Data Solution
> > 1 1
> > N N
> > N N
> > N N
> > N
> > Scenario 6)
> > Data Solution
> > 1 1
> > 1 1
> > N
> > N
> >
> > Scenario 7)
> > Data Solution
> > 1 1
> > 1 1
> >
> > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) .
> > begin data
> > 14386912419N
> > 14386912419N
> > 264872871231
> > 26487287123N
> > 371863475451
> > 371863475451
> > 37186347545N
> > 648172350341
> > 64817235034N
> > 64817235034N
> > 715484287291
> > 71548428729N
> > 71548428729N
> > 71548428729N
> > 71548428729N
> > 864708752551
> > 864708752551
> > 86470875255N
> > 86470875255N
> > 904687125411
> > 904687125411
> > end data.
> >
> > value labels outcome
> > '1' 'Arrested'
> > 'N' 'Not Arrested'.
> >
> > DATASET NAME Problem.
> >
> > data list / incidentnumber 1-9 (F) age 10-11 Outcome 12 (A) .
> > begin data
> > 14386912419N
> > 14386912419N
> > 264872871231
> > 371863475451
> > 371863475451
> > 648172350341
> > 64817235034N
> > 715484287291
> > 71548428729N
> > 71548428729N
> > 71548428729N
> > 864708752551
> > 864708752551
> > 904687125411
> > 904687125411
> > end data.
> >
> > value labels outcome
> > '1' 'Arrested'
> > 'N' 'Not Arrested'.
> >
> > DATASET NAME Solution.
PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.