filtering out cases

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

filtering out cases

leora lawton
Hi,

I need to remove a number of cases from a large (well,
3,500 cases) data file. The respondent id variable is
entryid, and I want to be able to remove about 100
cases (that have bad data), and they aren't
consecutive.  So for example, I want to remove case
number 1, 6, 20, 28, 48, and so forth.  It looks like
the filter cases entryid ~= 1 would have to be
repeated for each value?  The syntax "entryid ~= 1 or
6 or 20" doesn't work.

I'm wondering if $casenum is the right thing, but I
don't know how to use it either.

so if anyone can help, I'd be grateful!  Either syntax
or GUI instructions welcome.

thanks
Leora





Dr. Leora Lawton
TechSociety Research
"Custom Social Science and Consumer Behavior Research"
2342 Shattuck Avenue PMB 362, Berkeley, CA  94704
(510) 548-6174; fax (510) 548-6175; cell (510) 928-7572
[hidden email]
www.techsociety.com
Reply | Threaded
Open this post in threaded view
|

Re: filtering out cases

Dominic Lusinchi
You don't need $CASENUM since you already have entryid. If you did not have
entryid you could create using COMPUTE entryid=$CASENUM. This will create a
variable (entryid) with consecutive numbers corresponding to the number of
cases in your data file up to n.

But that's not going to help you delete cases with bad data.
You need to create a filter variable based on the criteria that constitute
bad data. Then you can use the SELECT IF statement to keep only those
observations with "good" data.

The complexity of creating a filter depends on the number of variables
having bad data.

You might want to check Raynald Levesque's site at spsstools.net. There may
just be a solution to your problem there. Another option is to check this
list's archives at http://listserv.uga.edu/archives/spssx-l.html.

Good luck.

Dominic Lusinchi
Statistician
Far West Research
Statistical Consulting
San Francisco, California
415-664-3032
www.farwestresearch.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
leora lawton
Sent: Sunday, September 17, 2006 10:06 PM
To: [hidden email]
Subject: filtering out cases

Hi,

I need to remove a number of cases from a large (well,
3,500 cases) data file. The respondent id variable is
entryid, and I want to be able to remove about 100
cases (that have bad data), and they aren't
consecutive.  So for example, I want to remove case
number 1, 6, 20, 28, 48, and so forth.  It looks like
the filter cases entryid ~= 1 would have to be
repeated for each value?  The syntax "entryid ~= 1 or
6 or 20" doesn't work.

I'm wondering if $casenum is the right thing, but I
don't know how to use it either.

so if anyone can help, I'd be grateful!  Either syntax
or GUI instructions welcome.

thanks
Leora





Dr. Leora Lawton
TechSociety Research
"Custom Social Science and Consumer Behavior Research"
2342 Shattuck Avenue PMB 362, Berkeley, CA  94704
(510) 548-6174; fax (510) 548-6175; cell (510) 928-7572
[hidden email]
www.techsociety.com
Reply | Threaded
Open this post in threaded view
|

Re: filtering out cases

Simon Phillip Freidin
sel if not(any(entryid,1, 6, 20, 28, 48)).
exe.

At 03:32 PM 18/09/2006, Dominic Lusinchi wrote:

>You don't need $CASENUM since you already have entryid. If you did not have
>entryid you could create using COMPUTE entryid=$CASENUM. This will create a
>variable (entryid) with consecutive numbers corresponding to the number of
>cases in your data file up to n.
>
>But that's not going to help you delete cases with bad data.
>You need to create a filter variable based on the criteria that constitute
>bad data. Then you can use the SELECT IF statement to keep only those
>observations with "good" data.
>
>The complexity of creating a filter depends on the number of variables
>having bad data.
>
>You might want to check Raynald Levesque's site at spsstools.net. There may
>just be a solution to your problem there. Another option is to check this
>list's archives at http://listserv.uga.edu/archives/spssx-l.html.
>
>Good luck.
>
>Dominic Lusinchi
>Statistician
>Far West Research
>Statistical Consulting
>San Francisco, California
>415-664-3032
>www.farwestresearch.com
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>leora lawton
>Sent: Sunday, September 17, 2006 10:06 PM
>To: [hidden email]
>Subject: filtering out cases
>
>Hi,
>
>I need to remove a number of cases from a large (well,
>3,500 cases) data file. The respondent id variable is
>entryid, and I want to be able to remove about 100
>cases (that have bad data), and they aren't
>consecutive.  So for example, I want to remove case
>number 1, 6, 20, 28, 48, and so forth.  It looks like
>the filter cases entryid ~= 1 would have to be
>repeated for each value?  The syntax "entryid ~= 1 or
>6 or 20" doesn't work.
>
>I'm wondering if $casenum is the right thing, but I
>don't know how to use it either.
>
>so if anyone can help, I'd be grateful!  Either syntax
>or GUI instructions welcome.
>
>thanks
>Leora
>
>
>
>
>
>Dr. Leora Lawton
>TechSociety Research
>"Custom Social Science and Consumer Behavior Research"
>2342 Shattuck Avenue PMB 362, Berkeley, CA  94704
>(510) 548-6174; fax (510) 548-6175; cell (510) 928-7572
>[hidden email]
>www.techsociety.com


Research Database Manager and Analyst
Melbourne Institute of Applied Economic and Social Research
The University of Melbourne
Melbourne VIC 3010 Australia
New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
http://www.melbourneinstitute.com/hilda/
Reply | Threaded
Open this post in threaded view
|

Re: filtering out cases

Hal 9000
Backup your original file first, and with your working file use a temporary
command before any select statement to make sure it's going to do what you
want it to if you're not certain it's set up right.

temp.
select if (SOME LOGICAL SET OF CONDITIONS).
freq  SOME USEFUL VAR FOR SEEING GOOD DATA.

After the frequencies command is run, everything is set back the the
pre-select state. A useful technique for those of us who really want to be
SURE!

The first part of Raynauld's book treats these best-practices axioms well.
Best!
-Gary

On 9/17/06, Simon Freidin <[hidden email]> wrote:

>
> sel if not(any(entryid,1, 6, 20, 28, 48)).
> exe.
>
> At 03:32 PM 18/09/2006, Dominic Lusinchi wrote:
> >You don't need $CASENUM since you already have entryid. If you did not
> have
> >entryid you could create using COMPUTE entryid=$CASENUM. This will create
> a
> >variable (entryid) with consecutive numbers corresponding to the number
> of
> >cases in your data file up to n.
> >
> >But that's not going to help you delete cases with bad data.
> >You need to create a filter variable based on the criteria that
> constitute
> >bad data. Then you can use the SELECT IF statement to keep only those
> >observations with "good" data.
> >
> >The complexity of creating a filter depends on the number of variables
> >having bad data.
> >
> >You might want to check Raynald Levesque's site at spsstools.net. There
> may
> >just be a solution to your problem there. Another option is to check this
> >list's archives at http://listserv.uga.edu/archives/spssx-l.html.
> >
> >Good luck.
> >
> >Dominic Lusinchi
> >Statistician
> >Far West Research
> >Statistical Consulting
> >San Francisco, California
> >415-664-3032
> >www.farwestresearch.com
> >
> >-----Original Message-----
> >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> >leora lawton
> >Sent: Sunday, September 17, 2006 10:06 PM
> >To: [hidden email]
> >Subject: filtering out cases
> >
> >Hi,
> >
> >I need to remove a number of cases from a large (well,
> >3,500 cases) data file. The respondent id variable is
> >entryid, and I want to be able to remove about 100
> >cases (that have bad data), and they aren't
> >consecutive.  So for example, I want to remove case
> >number 1, 6, 20, 28, 48, and so forth.  It looks like
> >the filter cases entryid ~= 1 would have to be
> >repeated for each value?  The syntax "entryid ~= 1 or
> >6 or 20" doesn't work.
> >
> >I'm wondering if $casenum is the right thing, but I
> >don't know how to use it either.
> >
> >so if anyone can help, I'd be grateful!  Either syntax
> >or GUI instructions welcome.
> >
> >thanks
> >Leora
> >
> >
> >
> >
> >
> >Dr. Leora Lawton
> >TechSociety Research
> >"Custom Social Science and Consumer Behavior Research"
> >2342 Shattuck Avenue PMB 362, Berkeley, CA  94704
> >(510) 548-6174; fax (510) 548-6175; cell (510) 928-7572
> >[hidden email]
> >www.techsociety.com
>
>
> Research Database Manager and Analyst
> Melbourne Institute of Applied Economic and Social Research
> The University of Melbourne
> Melbourne VIC 3010 Australia
> New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
> http://www.melbourneinstitute.com/hilda/
>