There are 2 variables (ratio scale) in my data which have more than 15% of absolute zero values, i.e. 0= no loan & 0= no asset. While entering the data into SPSS, how should I deal with these absolute zero values? Should I define all zeros as "missing values", which I think it might change the meaning of these zeros? Or should I not doing anything and leave the data as it is?
I am going to analyse the data using logistic regression method. Thanks. |
Administrator
|
I don't understand where the problem is. Unlike discriminant function analysis, logistic regression makes no assumptions about how explanatory variables are distributed.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
Treating 0 as missing for those variables will result in folks with zeroes being excluded from your model. What population are you trying to make an inference about? Does it include or exclude people who have no loan, or no assets?
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Bruce Weaver
Do you have two binary variables or two continuous variables? If the
latter, what units are they in? Approximately what do the shapes of the distributions look like? Do they appear zero-inflated? What are the research questions you're hoping to answer? How were the data collected? Do you have a natural hierarchy in your data (e.g., persons nested in families)? Please provide more details. Ryan On Sat, Mar 12, 2011 at 8:25 PM, Bruce Weaver <[hidden email]> wrote: > lcl23 wrote: >> >> There are 2 variables (ratio scale) in my data which have more than 15% of >> absolute zero values, i.e. 0= no loan & 0= no asset. While entering the >> data into SPSS, how should I deal with these absolute zero values? Should >> I define all zeros as "missing values", which I think it might change the >> meaning of these zeros? Or should I not doing anything and leave the data >> as it is? >> >> I am going to analyse the data using logistic regression method. Thanks. >> > > > I don't understand where the problem is. Unlike discriminant function > analysis, logistic regression makes no assumptions about how explanatory > variables are distributed. > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/how-to-handle-absolute-zero-values-tp3451505p3545487.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
When "zero" is nothing like being an equal-interval
extension of the other numbers observed, then the proper way to model either score is to use two variables: the new variable will be an indicator variable, 0/1 for No/Yes, which indicates that "directors" (say) exist. With both variables going into the model, it does not actually matter what value is used for the sort-of-missing score. In some cases it is easier to note the effects if the Missing is assigned the mean of the rest. The same solution can work when a zero=no-event exists for data that otherwise seem appropriate for using the log transform or a reciprocal: Use the transform, and set the missing to the mean of the rest. -- Rich Ulrich ________________________________ > Date: Sun, 13 Mar 2011 09:03:41 -0700 > From: [hidden email] > Subject: Re: how to handle absolute zero values? > To: [hidden email] > > Dear Ryan, > > 1) Type of data: continuous > 2) Unit of measurement: (1) leverage: percentage; (2) directorship: integer > 3) Skewness: (1) leverage: 1.94; (2) directorship: 1.42 > 4) Research question: (1) is the level of leverage affects company's > decision to perform..... (2) is the number of directorship affects > company's decision to perform..... > 5) Zero values : (1) leverage: 19% of total sample ; (2) directorship: 15% > 6) Data collection: secondary data from company annual reports > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Rich is describing Cohen & Cohen's "indicator" method for dealing with missing data. It is no longer viewed favorably for situations where the data are truly missing (e.g., http://people.oregonstate.edu/~acock/growth-curves/working%20with%20missing%20values.pdf). But that is not the case here--the zeroes are legitimate values. I don't recall ever reading anything that made the distinction between truly missing and the kind of gap present in this case. But the indicator method might be all right here.
Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
> Date: Mon, 14 Mar 2011 08:52:24 -0700
> From: [hidden email] > Subject: Re: how to handle absolute zero values? > To: [hidden email] > > Rich is describing Cohen & Cohen's "indicator" method for dealing with > missing data. It is no longer viewed favorably for situations where the > data are truly missing (e.g., > http://people.oregonstate.edu/~acock/growth-curves/working%20with%20missing%20values.pdf). That article is (properly) harsh on the strategy of simply substituting the mean for Missing. It reviews that strategy, and strategies for deletion and Imputation. It does *not* review the "indicator" method, though it mentions it in passing, and mentions difficulties when several indicators are highly correlated. What the article says about mean-substitution yielding reduced standard deviations (owing to the inclusion of a set of zero-variance mean-scores) -- should be kept in mind. The article is a pretty good pragmatic overview. It was not sensitive to the important topic of how *many* Missings have to be accounted for. > But that is not the case here--the zeroes are legitimate values. I don't > recall ever reading anything that made the distinction between truly missing > and the kind of gap present in this case. But the indicator method might be > all right here. > > Bruce I say that it is a logical problem. If the zero is not a natural extension of the scale, and you care about the effects of this variable, then you either use an Indicator to allow for separate means, or you separate the sample into two parts (zeroes, vs. others) to allow for entirely separate regressions. > > > Rich Ulrich wrote: > > > > When "zero" is nothing like being an equal-interval > > extension of the other numbers observed, then the > > proper way to model either score is to use two variables: > > the new variable will be an indicator variable, 0/1 for > > No/Yes, which indicates that "directors" (say) exist. > > With both variables going into the model, it does not actually > > matter what value is used for the sort-of-missing score. > > > > In some cases it is easier to note the effects if the > > Missing is assigned the mean of the rest. > > > > The same solution can work when a zero=no-event exists > > for data that otherwise seem appropriate for using the > > log transform or a reciprocal: Use the transform, and > > set the missing to the mean of the rest. > > > > -- > > Rich Ulrich [snip, previous] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by lcl23
What led you to consider converting zeroes to missing values? What's the issue?
Ryan
On Sun, Mar 13, 2011 at 12:03 PM, lcl23 <[hidden email]> wrote:
|
|
In reply to this post by Bruce Weaver
|
If the variable doesn't mean anything in the analysis,
then it doesn't matter what you do, as far as the outcome of the analysis is concerned. Since you don't have a lot of experience with analyses, you should probably try three or four different things, just to see what they do and how they differ, and whether *any* of them seem to give a meaningful result. We don't know enough about the problem to say whether it makes sense to use the original scoring (Can you argue that the variable has "equal-intervals"?) or to use that scoring except that you also drop the "zero" cases as missing; or use zero and include a second, indicator variable for zero/ other; or use some other transformation. -- Rich Ulrich ________________________________ > Date: Mon, 14 Mar 2011 20:17:31 -0700 > From: [hidden email] > Subject: Re: how to handle absolute zero values? > To: [hidden email] > > Would it be alright if I don't do anything about those zeroes? > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Free forum by Nabble | Edit this page |