SPSSX Discussion

calculate percentage bad data based on another variable

Classic

List

Threaded

6 messages Options

jimjohn

calculate percentage bad data based on another variable

can someone plz help me with this:
two of the variables i have in my data set are: ID and BadData. There will be many different cases with the same ID. BadData is 1 if there is bad data, and 0 otherwise. I want to track the percentage of bad data in my file. If there is a bad data, it will affect all the other cases that correspond to that ID. So the value of bad data will be teh same for cases that have the same id. What I want to do is calculate the percent of id's that have bad data. does anyone know how i can do this in spss? thx!

Keith McCormick

Re: calculate percentage bad data based on another variable

Hi,

If I understand what you need, you should perform an AGGREGATE with ID
as the break variable. You can do this with syntax or in the menus.
You will want to SUM the BadData variable. The resulting data set will
have only one row per ID, and if the new SUMBadData variable is
greater than 1, at least one of the related rows is bad. If it is
zero, it must be that all of the rows were OK (not Bad).

You can run a FREQ to find out the proportion, followed by a select if
you want a list that has only good, or only bad.

I hope that helps.

Keith
www.keithmccormick.com

On Tue, May 27, 2008 at 5:54 PM, jimjohn <[hidden email]> wrote:

> can someone plz help me with this:
> two of the variables i have in my data set are: ID and BadData. There will
> be many different cases with the same ID. BadData is 1 if there is bad data,
> and 0 otherwise. I want to track the percentage of bad data in my file. If
> there is a bad data, it will affect all the other cases that correspond to
> that ID. So the value of bad data will be teh same for cases that have the
> same id. What I want to do is calculate the percent of id's that have bad
> data. does anyone know how i can do this in spss? thx!
> --
> View this message in context: http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variable-tp17500924p17500924.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: calculate percentage bad data based on another variable

In reply to this post by jimjohn

Once you have the percentage of bad for each ID, what do you want to do with
it? You can use procedures which generate output in the viewer or you can
use the AGGREGATE command which will create a file in which each case is the
value you need for each ID?

Let's assume that you just want a report. Here are several approaches:

1. The mean of the baddata variable is the proportion of cases for which
baddata equals 1. You can get the mean of baddata for each value of ID using
procedures SUMMARIZE.
2. The GRAPH and GGRAPH commands have statistics keyword PGT (percent
greater than) that you can use to calculate the percentage greater than 0
for baddata by your ID for a visual check in a chart. This is not a good
solution if you have more than 50 or so IDS.
3. You can split your file by ID and run procedure DESCRIPTIVES asking for
the mean. If you specify SPLIT FILE with keyword LAYERED, the results will
appear in a single table.

Let's assume that you want this as a transformation so that the resulting
data file can be filtered or listed:

Use the PGT (same as in GRAPH and GGRAPH) function on the AGGREGATE command
specifying your ID variable as the break variable. You can then select cases
with the result variable greater than 0, sort your IDS on the result
variable and list the result variable using the LIST CASES or SUMMARIZE
command.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Tuesday, May 27, 2008 3:55 PM
To: [hidden email]
Subject: calculate percentage bad data based on another variable

can someone plz help me with this:
two of the variables i have in my data set are: ID and BadData. There will
be many different cases with the same ID. BadData is 1 if there is bad data,
and 0 otherwise. I want to track the percentage of bad data in my file. If
there is a bad data, it will affect all the other cases that correspond to
that ID. So the value of bad data will be teh same for cases that have the
same id. What I want to do is calculate the percent of id's that have bad
data. does anyone know how i can do this in spss? thx!
--
View this message in context:
http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variabl
e-tp17500924p17500924.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

jimjohn

Re: calculate percentage bad data based on another variable

Thanks a lot guys! I just wanted to see the percentages, so I can get
an idea of what percentage of my data is bad.

Quoting ViAnn Beadle <[hidden email]>:

> Once you have the percentage of bad for each ID, what do you want to do with
> it? You can use procedures which generate output in the viewer or you can
> use the AGGREGATE command which will create a file in which each case is the
> value you need for each ID?
>
> Let's assume that you just want a report. Here are several approaches:
>
> 1. The mean of the baddata variable is the proportion of cases for which
> baddata equals 1. You can get the mean of baddata for each value of ID using
> procedures SUMMARIZE.
> 2. The GRAPH and GGRAPH commands have statistics keyword PGT (percent
> greater than) that you can use to calculate the percentage greater than 0
> for baddata by your ID for a visual check in a chart. This is not a good
> solution if you have more than 50 or so IDS.
> 3. You can split your file by ID and run procedure DESCRIPTIVES asking for
> the mean. If you specify SPLIT FILE with keyword LAYERED, the results will
> appear in a single table.
>
> Let's assume that you want this as a transformation so that the resulting
> data file can be filtered or listed:
>
> Use the PGT (same as in GRAPH and GGRAPH) function on the AGGREGATE command
> specifying your ID variable as the break variable. You can then select cases
> with the result variable greater than 0, sort your IDS on the result
> variable and list the result variable using the LIST CASES or SUMMARIZE
> command.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> jimjohn
> Sent: Tuesday, May 27, 2008 3:55 PM
> To: [hidden email]
> Subject: calculate percentage bad data based on another variable
>
> can someone plz help me with this:
> two of the variables i have in my data set are: ID and BadData. There will
> be many different cases with the same ID. BadData is 1 if there is bad data,
> and 0 otherwise. I want to track the percentage of bad data in my file. If
> there is a bad data, it will affect all the other cases that correspond to
> that ID. So the value of bad data will be teh same for cases that have the
> same id. What I want to do is calculate the percent of id's that have bad
> data. does anyone know how i can do this in spss? thx!
> --
> View this message in context:
> http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variabl
> e-tp17500924p17500924.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

jimjohn

Re: calculate percentage bad data based on another variable

In reply to this post by Keith McCormick

Thanks again Keith but I just have a follow up. When I run AGGREGATE with ID as the break variable, my resulting data set does not have only one row per id. Each id still has many different rows that correspond to it, any idea what I have to do to make the aggregate give me one row per id. This is the syntax that came from what I did: Thanks!

--------------------------------------

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID
/baddata_sum=SUM(baddata).

-----------------------------------------------

Keith McCormick wrote

Hi,

If I understand what you need, you should perform an AGGREGATE with ID
as the break variable. You can do this with syntax or in the menus.
You will want to SUM the BadData variable. The resulting data set will
have only one row per ID, and if the new SUMBadData variable is
greater than 1, at least one of the related rows is bad. If it is
zero, it must be that all of the rows were OK (not Bad).

You can run a FREQ to find out the proportion, followed by a select if
you want a list that has only good, or only bad.

I hope that helps.

Keith
www.keithmccormick.com

On Tue, May 27, 2008 at 5:54 PM, jimjohn <azam.khan@utoronto.ca> wrote:
> can someone plz help me with this:
> two of the variables i have in my data set are: ID and BadData. There will
> be many different cases with the same ID. BadData is 1 if there is bad data,
> and 0 otherwise. I want to track the percentage of bad data in my file. If
> there is a bad data, it will affect all the other cases that correspond to
> that ID. So the value of bad data will be teh same for cases that have the
> same id. What I want to do is calculate the percent of id's that have bad
> data. does anyone know how i can do this in spss? thx!
> --
> View this message in context: http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variable-tp17500924p17500924.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mark Palmberg

Re: calculate percentage bad data based on another variable

I think if you remove MODE=ADDVARIABLES, you'll get what you want.

Mark

On Thu, May 29, 2008 at 9:47 AM, jimjohn <[hidden email]> wrote:

> Thanks again Keith but I just have a follow up. When I run AGGREGATE with
> ID
> as the break variable, my resulting data set does not have only one row per
> id. Each id still has many different rows that correspond to it, any idea
> what I have to do to make the aggregate give me one row per id. This is the
> syntax that came from what I did: Thanks!
>
> --------------------------------------
>
> AGGREGATE
> /OUTFILE=* MODE=ADDVARIABLES
> /BREAK=ID
> /baddata_sum=SUM(baddata).
>
>
>
> -----------------------------------------------
>
> Keith McCormick wrote:
> >
> > Hi,
> >
> > If I understand what you need, you should perform an AGGREGATE with ID
> > as the break variable. You can do this with syntax or in the menus.
> > You will want to SUM the BadData variable. The resulting data set will
> > have only one row per ID, and if the new SUMBadData variable is
> > greater than 1, at least one of the related rows is bad. If it is
> > zero, it must be that all of the rows were OK (not Bad).
> >
> > You can run a FREQ to find out the proportion, followed by a select if
> > you want a list that has only good, or only bad.
> >
> > I hope that helps.
> >
> > Keith
> > www.keithmccormick.com
> >
> > On Tue, May 27, 2008 at 5:54 PM, jimjohn <[hidden email]> wrote:
> >> can someone plz help me with this:
> >> two of the variables i have in my data set are: ID and BadData. There
> >> will
> >> be many different cases with the same ID. BadData is 1 if there is bad
> >> data,
> >> and 0 otherwise. I want to track the percentage of bad data in my file.
> >> If
> >> there is a bad data, it will affect all the other cases that correspond
> >> to
> >> that ID. So the value of bad data will be teh same for cases that have
> >> the
> >> same id. What I want to do is calculate the percent of id's that have
> bad
> >> data. does anyone know how i can do this in spss? thx!
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variable-tp17500924p17500924.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >> [hidden email] (not to SPSSX-L), with no body text except
> the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >>
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variable-tp17500924p17536267.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>