SPSSX Discussion

Delete cases with more than 80% of missing data; Handling duplicates

Classic

List

Threaded

2 messages Options

Nogitsune

Delete cases with more than 80% of missing data; Handling duplicates

Good evening,

I'm new to SPSS and trying my best to find all sorts of manuals and guides to help me understand it better. Right now I am dealing with a substantial dataset (3000 cases and 1500 variables). It consists of respondents providing answers to various psychological measures. I need to be able to delete cases with more than 80% of missing data. How can I automate this process?

Also, I have about 500 duplicate cases. I need to compare cases with identical names to each other and delete the one that has less data filled in. Is there any way to do it without going manually through each pair over all 1500 variables?

Thank you!

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon Peck

Re: Delete cases with more than 80% of missing data; Handling duplicates

On the first point, see Transform > Count Values within Cases (COUNT) and select system or system and user missing as the value to count. Then you can use Data > Select Cases (SELECT IF) to delete cases with too many missings.

For the second point, you will have missing counts from the first step. Then with the file sorted by name and the count and SELECT IF with the lag function, you can pick out the cases to keep. Exact syntax depends on details such as whether there can be more than one duplicate for a case.

On Mon, Feb 20, 2017 at 8:22 PM, Kseniya Katsman <[hidden email]> wrote:

Good evening,

I'm new to SPSS and trying my best to find all sorts of manuals and guides to help me understand it better. Right now I am dealing with a substantial dataset (3000 cases and 1500 variables). It consists of respondents providing answers to various psychological measures. I need to be able to delete cases with more than 80% of missing data. How can I automate this process?

Also, I have about 500 duplicate cases. I need to compare cases with identical names to each other and delete the one that has less data filled in. Is there any way to do it without going manually through each pair over all 1500 variables?

Thank you!

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]