|
can someone plz help me with this:
two of the variables i have in my data set are: ID and BadData. There will be many different cases with the same ID. BadData is 1 if there is bad data, and 0 otherwise. I want to track the percentage of bad data in my file. If there is a bad data, it will affect all the other cases that correspond to that ID. So the value of bad data will be teh same for cases that have the same id. What I want to do is calculate the percent of id's that have bad data. does anyone know how i can do this in spss? thx! |
|
Hi,
If I understand what you need, you should perform an AGGREGATE with ID as the break variable. You can do this with syntax or in the menus. You will want to SUM the BadData variable. The resulting data set will have only one row per ID, and if the new SUMBadData variable is greater than 1, at least one of the related rows is bad. If it is zero, it must be that all of the rows were OK (not Bad). You can run a FREQ to find out the proportion, followed by a select if you want a list that has only good, or only bad. I hope that helps. Keith www.keithmccormick.com On Tue, May 27, 2008 at 5:54 PM, jimjohn <[hidden email]> wrote: > can someone plz help me with this: > two of the variables i have in my data set are: ID and BadData. There will > be many different cases with the same ID. BadData is 1 if there is bad data, > and 0 otherwise. I want to track the percentage of bad data in my file. If > there is a bad data, it will affect all the other cases that correspond to > that ID. So the value of bad data will be teh same for cases that have the > same id. What I want to do is calculate the percent of id's that have bad > data. does anyone know how i can do this in spss? thx! > -- > View this message in context: http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variable-tp17500924p17500924.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by jimjohn
Once you have the percentage of bad for each ID, what do you want to do with
it? You can use procedures which generate output in the viewer or you can use the AGGREGATE command which will create a file in which each case is the value you need for each ID? Let's assume that you just want a report. Here are several approaches: 1. The mean of the baddata variable is the proportion of cases for which baddata equals 1. You can get the mean of baddata for each value of ID using procedures SUMMARIZE. 2. The GRAPH and GGRAPH commands have statistics keyword PGT (percent greater than) that you can use to calculate the percentage greater than 0 for baddata by your ID for a visual check in a chart. This is not a good solution if you have more than 50 or so IDS. 3. You can split your file by ID and run procedure DESCRIPTIVES asking for the mean. If you specify SPLIT FILE with keyword LAYERED, the results will appear in a single table. Let's assume that you want this as a transformation so that the resulting data file can be filtered or listed: Use the PGT (same as in GRAPH and GGRAPH) function on the AGGREGATE command specifying your ID variable as the break variable. You can then select cases with the result variable greater than 0, sort your IDS on the result variable and list the result variable using the LIST CASES or SUMMARIZE command. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn Sent: Tuesday, May 27, 2008 3:55 PM To: [hidden email] Subject: calculate percentage bad data based on another variable can someone plz help me with this: two of the variables i have in my data set are: ID and BadData. There will be many different cases with the same ID. BadData is 1 if there is bad data, and 0 otherwise. I want to track the percentage of bad data in my file. If there is a bad data, it will affect all the other cases that correspond to that ID. So the value of bad data will be teh same for cases that have the same id. What I want to do is calculate the percent of id's that have bad data. does anyone know how i can do this in spss? thx! -- View this message in context: http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variabl e-tp17500924p17500924.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks a lot guys! I just wanted to see the percentages, so I can get
an idea of what percentage of my data is bad. Quoting ViAnn Beadle <[hidden email]>: > Once you have the percentage of bad for each ID, what do you want to do with > it? You can use procedures which generate output in the viewer or you can > use the AGGREGATE command which will create a file in which each case is the > value you need for each ID? > > Let's assume that you just want a report. Here are several approaches: > > 1. The mean of the baddata variable is the proportion of cases for which > baddata equals 1. You can get the mean of baddata for each value of ID using > procedures SUMMARIZE. > 2. The GRAPH and GGRAPH commands have statistics keyword PGT (percent > greater than) that you can use to calculate the percentage greater than 0 > for baddata by your ID for a visual check in a chart. This is not a good > solution if you have more than 50 or so IDS. > 3. You can split your file by ID and run procedure DESCRIPTIVES asking for > the mean. If you specify SPLIT FILE with keyword LAYERED, the results will > appear in a single table. > > Let's assume that you want this as a transformation so that the resulting > data file can be filtered or listed: > > Use the PGT (same as in GRAPH and GGRAPH) function on the AGGREGATE command > specifying your ID variable as the break variable. You can then select cases > with the result variable greater than 0, sort your IDS on the result > variable and list the result variable using the LIST CASES or SUMMARIZE > command. > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > jimjohn > Sent: Tuesday, May 27, 2008 3:55 PM > To: [hidden email] > Subject: calculate percentage bad data based on another variable > > can someone plz help me with this: > two of the variables i have in my data set are: ID and BadData. There will > be many different cases with the same ID. BadData is 1 if there is bad data, > and 0 otherwise. I want to track the percentage of bad data in my file. If > there is a bad data, it will affect all the other cases that correspond to > that ID. So the value of bad data will be teh same for cases that have the > same id. What I want to do is calculate the percent of id's that have bad > data. does anyone know how i can do this in spss? thx! > -- > View this message in context: > http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variabl > e-tp17500924p17500924.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Keith McCormick
Thanks again Keith but I just have a follow up. When I run AGGREGATE with ID as the break variable, my resulting data set does not have only one row per id. Each id still has many different rows that correspond to it, any idea what I have to do to make the aggregate give me one row per id. This is the syntax that came from what I did: Thanks!
-------------------------------------- AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /baddata_sum=SUM(baddata). -----------------------------------------------
|
|
I think if you remove MODE=ADDVARIABLES, you'll get what you want.
Mark On Thu, May 29, 2008 at 9:47 AM, jimjohn <[hidden email]> wrote: > Thanks again Keith but I just have a follow up. When I run AGGREGATE with > ID > as the break variable, my resulting data set does not have only one row per > id. Each id still has many different rows that correspond to it, any idea > what I have to do to make the aggregate give me one row per id. This is the > syntax that came from what I did: Thanks! > > -------------------------------------- > > AGGREGATE > /OUTFILE=* MODE=ADDVARIABLES > /BREAK=ID > /baddata_sum=SUM(baddata). > > > > ----------------------------------------------- > > Keith McCormick wrote: > > > > Hi, > > > > If I understand what you need, you should perform an AGGREGATE with ID > > as the break variable. You can do this with syntax or in the menus. > > You will want to SUM the BadData variable. The resulting data set will > > have only one row per ID, and if the new SUMBadData variable is > > greater than 1, at least one of the related rows is bad. If it is > > zero, it must be that all of the rows were OK (not Bad). > > > > You can run a FREQ to find out the proportion, followed by a select if > > you want a list that has only good, or only bad. > > > > I hope that helps. > > > > Keith > > www.keithmccormick.com > > > > On Tue, May 27, 2008 at 5:54 PM, jimjohn <[hidden email]> wrote: > >> can someone plz help me with this: > >> two of the variables i have in my data set are: ID and BadData. There > >> will > >> be many different cases with the same ID. BadData is 1 if there is bad > >> data, > >> and 0 otherwise. I want to track the percentage of bad data in my file. > >> If > >> there is a bad data, it will affect all the other cases that correspond > >> to > >> that ID. So the value of bad data will be teh same for cases that have > >> the > >> same id. What I want to do is calculate the percent of id's that have > bad > >> data. does anyone know how i can do this in spss? thx! > >> -- > >> View this message in context: > >> > http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variable-tp17500924p17500924.html > >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a message to > >> [hidden email] (not to SPSSX-L), with no body text except > the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, send the command > >> INFO REFCARD > >> > > > > ===================== > > To manage your subscription to SPSSX-L, send a message to > > [hidden email] (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send the command > > INFO REFCARD > > > > > > -- > View this message in context: > http://www.nabble.com/calculate-percentage-bad-data-based-on-another-variable-tp17500924p17536267.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
