Treatment for Missing Values - What Options Do I Have ?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Treatment for Missing Values - What Options Do I Have ?

Chao Yawo
Hello,

I am running a logistic regression model (using a Demographic and
Health surveys dataset) and realized a drastic reduction in my
sub-population size. I traced the problem to a variable with a lot of
missing cases. As you can see from the table below, this variable
elicits whether the respondent engaged in unprotected sexual
intercourse. About a third of the cases (33.78%) are missing.

V761 -- Last intercourse used condom
-----------------------------------------------------------
                   |      Freq.    Percent      Valid       Cum.
--------------+--------------------------------------------
Valid   0 No   |       6012      56.16      84.81      84.81
         1 Yes  |       1075      10.04      15.16      99.97
            9      |          2          0.02       0.03     100.00
           Total  |       7089      66.22     100.00
 Missing .      |       3617      33.78
   Total          |      10706     100.00
-----------------------------------------------------------


According to the DHS - Demographic and health surveys, :

A “missing value” is defined as a variable that should have a
response, but because of interview errors the question was not asked.
The general rule for the survey data processing is that under no
circumstances an answer should be made up. Instead, a missing value is
assigned in the data file (see:
http://www.measuredhs.com/accesssurveys/Data_quality_use.cfm#1).

So the missing values result from interview errors, and the errors are
not related to my DV.  In fact, the DV had only 161 missing variables.
However, since the dependent variable in my deals with HIV risk, I
need to include sexual risk variables such as the V761 in the model.

One option is that I can ignore the errors on that single IV , but
then it means I will have to accept the lower N (sample size) my
analysis, and explain that in my write-up (that changes in sample size
for the regression result from missing values on some of the
covariates.

Does this sound like a reasonable option?  What other options do I have?

Thanks in advance for your help.

regards, Cy

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Treatment for Missing Values - What Options Do I Have ?

Spousta Jan
Hi Cy,

I suppose the variable is independent in your model - otherwise you probably have no better choice than to drop the missing cases.

First try to find whether the missing values are really caused by non-asking the question. It looks like it was rather a non-response - the question was asked but respondents did not give an answer. If this suspicion is true, you can use the variable as if there were 3 levels of response: yes - no - refused. Recode it into two dummy variables (Refused 1/0 and Yes 1/0) and use normal logistic regression.

Otherwise you will probably need to drop either the variable or the missing cases or use another technique than classic logistic regression.

Good luck,

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chao Yawo
Sent: Wednesday, July 15, 2009 1:20 AM
To: [hidden email]
Subject: Treatment for Missing Values - What Options Do I Have ?

Hello,

I am running a logistic regression model (using a Demographic and Health surveys dataset) and realized a drastic reduction in my sub-population size. I traced the problem to a variable with a lot of missing cases. As you can see from the table below, this variable elicits whether the respondent engaged in unprotected sexual intercourse. About a third of the cases (33.78%) are missing.

V761 -- Last intercourse used condom
-----------------------------------------------------------
                   |      Freq.    Percent      Valid       Cum.
--------------+--------------------------------------------
Valid   0 No   |       6012      56.16      84.81      84.81
         1 Yes  |       1075      10.04      15.16      99.97
            9      |          2          0.02       0.03     100.00
           Total  |       7089      66.22     100.00
 Missing .      |       3617      33.78
   Total          |      10706     100.00
-----------------------------------------------------------


According to the DHS - Demographic and health surveys, :

A "missing value" is defined as a variable that should have a response, but because of interview errors the question was not asked.
The general rule for the survey data processing is that under no circumstances an answer should be made up. Instead, a missing value is assigned in the data file (see:
http://www.measuredhs.com/accesssurveys/Data_quality_use.cfm#1).

So the missing values result from interview errors, and the errors are not related to my DV.  In fact, the DV had only 161 missing variables.
However, since the dependent variable in my deals with HIV risk, I need to include sexual risk variables such as the V761 in the model.

One option is that I can ignore the errors on that single IV , but then it means I will have to accept the lower N (sample size) my analysis, and explain that in my write-up (that changes in sample size for the regression result from missing values on some of the covariates.

Does this sound like a reasonable option?  What other options do I have?

Thanks in advance for your help.

regards, Cy

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



_____________
Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

Jste si jisti, že opravdu potřebujete vytisknout tuto zprávu a/nebo její přílohy? Myslete na přírodu.


This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

Are you sure that you really need a print version of this message and/or its attachments? Think about nature.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Fisher's Exact questions

Jarrod Teo-2
Hi all,
 
I have a few questions on Fisher's Exact.
 
1) When do you use Fisher's Exact?
 
 
2) Where do you find that in SPSS?
 
 
3) I heard that there is suppose to be a 2-tailed and 1-tailed test for Fisher's Exact. When do you use the 2-tailed and 1-tailed. Under what hypothesis testing?
 
Thanks in advance for your help.
 
Regards
Dorraj


Make the most of what you can do on your PC and the Web, just the way you want. Windows Live
Reply | Threaded
Open this post in threaded view
|

Re: Fisher's Exact questions

Spousta Jan
Hi,
 
1)
 
2)
For 2x2 tables, demand /STATISTICS=CHISQ in CROSSTABS. For bigger tables, you need a special module to conduct exact tests.
 
3)
If you know in advance, that the independence can be affected only in one way (e.g. higher proportion than expected in one cell, and not lower), use one-sided test. Otherwise use the two-sided.
 
HTH
 
Jan
 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of DorraJ Oet
Sent: Wednesday, July 15, 2009 11:22 AM
To: [hidden email]
Subject: Fisher's Exact questions

Hi all,
 
I have a few questions on Fisher's Exact.
 
1) When do you use Fisher's Exact?
 
 
2) Where do you find that in SPSS?
 
 
3) I heard that there is suppose to be a 2-tailed and 1-tailed test for Fisher's Exact. When do you use the 2-tailed and 1-tailed. Under what hypothesis testing?
 
Thanks in advance for your help.
 
Regards
Dorraj


Make the most of what you can do on your PC and the Web, just the way you want. Windows Live  

_____________

Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

P Jste si jisti, že opravdu potřebujete vytisknout tuto zprávu a/nebo její přílohy? Myslete na přírodu.

 


This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

 

P Are you sure that you really need a print version of this message and/or its attachments? Think about nature.

-.- --
Reply | Threaded
Open this post in threaded view
|

Re: Fisher's Exact questions

Bruce Weaver
Administrator
In addition to the info Jan has given, note that Fisher's exact test is quite conservative, and is probably not the best option.  A recent simulation study by Campbell (2007, Statistics in Medicine) showed that the "N-1" chi-square performs much better.  Here are my notes on that, including some SPSS syntax, and a link to Campbell's website.

   http://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/notes/chisqr_assumptions

Cheers,
Bruce

Spousta Jan wrote
Hi,
 
1)
http://en.wikipedia.org/wiki/Fisher_exact_test
 
2)
For 2x2 tables, demand /STATISTICS=CHISQ in CROSSTABS. For bigger tables, you need a special module to conduct exact tests.
 
3)
http://en.wikipedia.org/wiki/Two-tailed_test
If you know in advance, that the independence can be affected only in one way (e.g. higher proportion than expected in one cell, and not lower), use one-sided test. Otherwise use the two-sided.
 
HTH
 
Jan
 

________________________________

From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of DorraJ Oet
Sent: Wednesday, July 15, 2009 11:22 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Fisher's Exact questions


Hi all,
 
I have a few questions on Fisher's Exact.
 
1) When do you use Fisher's Exact?
 
 
2) Where do you find that in SPSS?
 
 
3) I heard that there is suppose to be a 2-tailed and 1-tailed test for Fisher's Exact. When do you use the 2-tailed and 1-tailed. Under what hypothesis testing?
 
Thanks in advance for your help.
 
Regards
Dorraj
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Treatment for Missing Values - What Options Do I Have ?

Arthur Burke
In reply to this post by Chao Yawo
Cy ... This situation has important implications for modeling and
inference. You might consider asking for guidance on  SRMSNET, which is
populated by specialists in survey methodology.

http://www.amstat.org/sections/srms/srms_net.html

Art
Art Burke
Northwest Regional Educational Laboratory
101 SW Main St, Suite 500
Portland, OR 97204-3213


-----Original Message-----
From: Chao Yawo [mailto:[hidden email]]
Sent: Tuesday, July 14, 2009 4:20 PM
To: [hidden email]
Subject: Treatment for Missing Values - What Options Do I Have ?

Hello,

I am running a logistic regression model (using a Demographic and Health
surveys dataset) and realized a drastic reduction in my sub-population
size. I traced the problem to a variable with a lot of missing cases. As
you can see from the table below, this variable elicits whether the
respondent engaged in unprotected sexual intercourse. About a third of
the cases (33.78%) are missing.

V761 -- Last intercourse used condom
-----------------------------------------------------------
                   |      Freq.    Percent      Valid       Cum.
--------------+--------------------------------------------
Valid   0 No   |       6012      56.16      84.81      84.81
         1 Yes  |       1075      10.04      15.16      99.97
            9      |          2          0.02       0.03     100.00
           Total  |       7089      66.22     100.00
 Missing .      |       3617      33.78
   Total          |      10706     100.00
-----------------------------------------------------------


According to the DHS - Demographic and health surveys, :

A "missing value" is defined as a variable that should have a response,
but because of interview errors the question was not asked.
The general rule for the survey data processing is that under no
circumstances an answer should be made up. Instead, a missing value is
assigned in the data file (see:
http://www.measuredhs.com/accesssurveys/Data_quality_use.cfm#1).

So the missing values result from interview errors, and the errors are
not related to my DV.  In fact, the DV had only 161 missing variables.
However, since the dependent variable in my deals with HIV risk, I need
to include sexual risk variables such as the V761 in the model.

One option is that I can ignore the errors on that single IV , but then
it means I will have to accept the lower N (sample size) my analysis,
and explain that in my write-up (that changes in sample size for the
regression result from missing values on some of the covariates.

Does this sound like a reasonable option?  What other options do I have?

Thanks in advance for your help.

regards, Cy

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD