Duplicates

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicates

Robert Walker
Hi all,

What is the disadvantage of this code vs. "Identify Duplicate Cases" in the GUI?

* De-dupe.
SORT CASES BY BusinessEmail.
COMPUTE DupeFlag=1.
IF BusinessEmail=LAG(BusinessEmail) DupeFlag=2.
SELECT IF DupeFlag=1.
EXECUTE.

Cheers,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Daniel Bauer
Sent: Monday, September 14, 2020 9:33 AM
To: [hidden email]
Subject: [info] Methods Workshops this December

Hello Everyone -- Patrick and I hope you are safe and healthy wherever you are out in the world during this strange time to be alive. We apologize in advance for any cross-postings, but we wanted to send a brief note to announce that this December we will offer our first Fall Institute featuring short courses on our most popular topics. All classes will be live streamed and tuition has been steeply reduced to expand accessibility (including discounted rates for students). Our Fall curriculum includes:

Introduction to Multilevel Modeling (includes demonstrations in SPSS) --December 2-4: Dan Bauer & Patrick Curran

Applied Research Design Using Mixed Methods --December 7-8: Greg Guest

Introduction to Longitudinal Structural Equation Modeling --December 9-11: Dan Bauer & Patrick Curran

Introduction to Network Analysis
--December 14-16: Doug Steinley

Introduction to Mixture Modeling and Latent Class Analysis --December 14-16: Dan Bauer

Please see curranbauer.org/training/ for complete details.
 
Stay safe....Dan & Patrick

------------------------------------------------------

Homepage: https://curranbauer.org/

YouTube Tutorials: https://bit.ly/380rXQt

Help Desk FAQs: https://bit.ly/37WCSuB
 
-------------------------------------------------------------------

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Duplicates

Charla
Hi Bob-
Your code will not give a value for the first case (because there is no LAG), which means if the first case is a DUP, it will not be identified as such.

Charla


-----Original Message-----
From: Robert Walker <[hidden email]>
To: [hidden email]
Sent: Tue, Sep 15, 2020 1:13 pm
Subject: Duplicates

Hi all,

What is the disadvantage of this code vs. "Identify Duplicate Cases" in the GUI?

* De-dupe.
SORT CASES BY BusinessEmail.
COMPUTE DupeFlag=1.
IF BusinessEmail=LAG(BusinessEmail) DupeFlag=2.
SELECT IF DupeFlag=1.
EXECUTE.

Cheers,

Bob Walker
Surveys & Forecasts, LLC

-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Daniel Bauer
Sent: Monday, September 14, 2020 9:33 AM
Subject: [info] Methods Workshops this December

Hello Everyone -- Patrick and I hope you are safe and healthy wherever you are out in the world during this strange time to be alive. We apologize in advance for any cross-postings, but we wanted to send a brief note to announce that this December we will offer our first Fall Institute featuring short courses on our most popular topics. All classes will be live streamed and tuition has been steeply reduced to expand accessibility (including discounted rates for students). Our Fall curriculum includes:

Introduction to Multilevel Modeling (includes demonstrations in SPSS) --December 2-4: Dan Bauer & Patrick Curran

Applied Research Design Using Mixed Methods --December 7-8: Greg Guest

Introduction to Longitudinal Structural Equation Modeling --December 9-11: Dan Bauer & Patrick Curran

Introduction to Network Analysis
--December 14-16: Doug Steinley

Introduction to Mixture Modeling and Latent Class Analysis --December 14-16: Dan Bauer

Please see curranbauer.org/training/ for complete details.

Stay safe....Dan & Patrick

------------------------------------------------------


YouTube Tutorials: https://bit.ly/380rXQt

Help Desk FAQs: https://bit.ly/37WCSuB

-------------------------------------------------------------------

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Duplicates

Robert Walker
Hi Charla,

Thanks... all records are initially assigned a “1” *unless* a dupe is found. There is no value for LAG in record 1, but since it wasn’t a dupe to begin with, a “1” would remain.

Cheers,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com


From: Charla Nich <[hidden email]>
Sent: Tuesday, September 15, 2020 5:46 PM
To: Robert Walker <[hidden email]>; [hidden email]
Subject: Re: Duplicates

Hi Bob-
Your code will not give a value for the first case (because there is no LAG), which means if the first case is a DUP, it will not be identified as such.

Charla


-----Original Message-----
From: Robert Walker <mailto:[hidden email]>
To: mailto:[hidden email]
Sent: Tue, Sep 15, 2020 1:13 pm
Subject: Duplicates
Hi all,

What is the disadvantage of this code vs. "Identify Duplicate Cases" in the GUI?

* De-dupe.
SORT CASES BY BusinessEmail.
COMPUTE DupeFlag=1.
IF BusinessEmail=LAG(BusinessEmail) DupeFlag=2.
SELECT IF DupeFlag=1.
EXECUTE.

Cheers,

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com/

-----Original Message-----
From: SPSSX(r) Discussion <mailto:[hidden email]> On Behalf Of Daniel Bauer
Sent: Monday, September 14, 2020 9:33 AM
To: mailto:[hidden email]
Subject: [info] Methods Workshops this December

Hello Everyone -- Patrick and I hope you are safe and healthy wherever you are out in the world during this strange time to be alive. We apologize in advance for any cross-postings, but we wanted to send a brief note to announce that this December we will offer our first Fall Institute featuring short courses on our most popular topics. All classes will be live streamed and tuition has been steeply reduced to expand accessibility (including discounted rates for students). Our Fall curriculum includes:

Introduction to Multilevel Modeling (includes demonstrations in SPSS) --December 2-4: Dan Bauer & Patrick Curran

Applied Research Design Using Mixed Methods --December 7-8: Greg Guest

Introduction to Longitudinal Structural Equation Modeling --December 9-11: Dan Bauer & Patrick Curran

Introduction to Network Analysis
--December 14-16: Doug Steinley

Introduction to Mixture Modeling and Latent Class Analysis --December 14-16: Dan Bauer

Please see curranbauer.org/training/ for complete details.

Stay safe....Dan & Patrick

------------------------------------------------------

Homepage: https://curranbauer.org/

YouTube Tutorials: https://bit.ly/380rXQt

Help Desk FAQs: https://bit.ly/37WCSuB

-------------------------------------------------------------------

=====================
To manage your subscription to SPSSX-L, send a message to mailto:[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
mailto:[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Duplicates

Bruce Weaver
Administrator
In reply to this post by Robert Walker
Hi Bob.  Different strokes for different folks, but for Yes/No variables, I
prefer coding them such that 1=Yes and 0=No.  For your example, I would do
something like this.

* Create a small dataset to illustrate.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / BusinessEmail (A25).
BEGIN DATA
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
END DATA.

SORT CASES BY BusinessEmail.
* Code Duplicate such that 1=Yes, 0=No.
COMPUTE Duplicate = BusinessEmail EQ LAG(BusinessEmail).
FORMATS Duplicate(F1).
LIST.
SELECT IF NOT Duplicate.
LIST.


With 1/0 coding, that SELECT IF line works the same as this would:

SELECT IF Duplicate EQ 0.  

I just think it reads better as NOT Duplicate.  

HTH.



Robert Walker wrote

> Hi all,
>
> What is the disadvantage of this code vs. "Identify Duplicate Cases" in
> the GUI?
>
> * De-dupe.
> SORT CASES BY BusinessEmail.
> COMPUTE DupeFlag=1.
> IF BusinessEmail=LAG(BusinessEmail) DupeFlag=2.
> SELECT IF DupeFlag=1.
> EXECUTE.
>
> Cheers,
>
> Bob Walker
> Surveys & Forecasts, LLC
> https://www.safllc.com





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Duplicates

Robert Walker
Hi Bruce,

Agree, cleaner and easier to interpret. Thanks!

Bob Walker
Surveys & Forecasts, LLC
https://www.safllc.com


-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Bruce Weaver
Sent: Tuesday, September 15, 2020 6:14 PM
To: [hidden email]
Subject: Re: Duplicates

Hi Bob.  Different strokes for different folks, but for Yes/No variables, I prefer coding them such that 1=Yes and 0=No.  For your example, I would do something like this.

* Create a small dataset to illustrate.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / BusinessEmail (A25).
BEGIN DATA
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]
END DATA.

SORT CASES BY BusinessEmail.
* Code Duplicate such that 1=Yes, 0=No.
COMPUTE Duplicate = BusinessEmail EQ LAG(BusinessEmail).
FORMATS Duplicate(F1).
LIST.
SELECT IF NOT Duplicate.
LIST.


With 1/0 coding, that SELECT IF line works the same as this would:

SELECT IF Duplicate EQ 0.  

I just think it reads better as NOT Duplicate.  

HTH.



Robert Walker wrote

> Hi all,
>
> What is the disadvantage of this code vs. "Identify Duplicate Cases"
> in the GUI?
>
> * De-dupe.
> SORT CASES BY BusinessEmail.
> COMPUTE DupeFlag=1.
> IF BusinessEmail=LAG(BusinessEmail) DupeFlag=2.
> SELECT IF DupeFlag=1.
> EXECUTE.
>
> Cheers,
>
> Bob Walker
> Surveys & Forecasts, LLC
> https://www.safllc.com





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD