SPSSX Discussion

Re: SPSSX-L Digest - 24 Aug 2006 to 25 Aug 2006 (#2006-235)

Classic

List

Threaded

1 message

Fahim Jafary

Aug 26, 2006; 2:11pm

Re: SPSSX-L Digest - 24 Aug 2006 to 25 Aug 2006 (#2006-235)

Jason, thank you so much. It worked like a charm

Fahim

At 09:00 AM 8/26/2006, you wrote:

>There are 23 messages totalling 1353 lines in this issue.
>
>Topics of the day:
>
> 1. Performance Measures
> 2. Outlier detection
> 3. (C)NLR with log-likelihood loss function (6)
> 4. Basic Stat Question (3)
> 5. running ANOVA on binary data?
> 6. Interpretation from transformed variable in ANOVA (6)
> 7. Missing confidence interval for 25th percentile (2)
> 8. Syntax to search cases?
> 9. Adding numbers in columns - HELP please. (2)
>
>----------------------------------------------------------------------
>
>Date: Thu, 24 Aug 2006 21:26:18 -0700
>From: Michael Healy <[hidden email]>
>Subject: Re: Performance Measures
>
>Hi, Richard Ristow's summary of how to optimize SPSS commands was excellent
>and I just wanted to add one additional point. The slowest part of an
>analysis is going to be reading and writing data files from the disk, thus
>you should use 10 or 15k RPM disks (I'm not sure whether their really is
>much of an advantage for 15k disks when working with large files). An even
>better solution is to use a RAID configuration in which 2 or more disks are
>combined into a single volume and data are written across the disks. You can
>create a raid using the Disk Management tool in Windows XP. There are a
>number of different RAID types that can be used, but RAID 0 would be a good
>solution as long as you back up regularly. More expensive RAID software and
>hardware options are available that will offer you better speed and data
>protection capabilities.
>--Mike
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 01:17:40 -0400
>From: Richard Ristow <[hidden email]>
>Subject: Re: Outlier detection
>
>To agree wholeheartedly, with a few notes:
>
>At 09:20 PM 8/23/2006, Statisticsdoc wrote:\,
>
> >Setting a cutoff related to standard deviations to detect outliers is
> >useful in many situations, but there are exceptions. For example,
> >standard deviations are more appropriate for normally distributed
> >data. With skewed data (e.g. income, days of sickness absence, etc.)
> >some valid values may [be very large, measuris ed in] standard
> >deviations.
>
>Statistical folk wisdom in some circles is, you'll never see truly
>normally distributed data. And one of the most common is far more
>extreme values than a normal distribution would have. Often, most
>marked at large multiples of the SD, where the observed values can be
>rare and still many times the frequency for a normal distribution.
>
> >You might want to consider setting cutoffs for outliers based on
> >plausibility (e.g., when dealing with high school students who claim
> >that they consume 100 beers on a daily basis). [Or] "wild values" -
> >numbers that do not make sense and can be considered out of range.
>
>This is the easy case: 'outliers' that can confidently be identified as
>data errors. Easy, in that there's not subtlety about analysis. The
>correct handling is clear: identify them, and drop them from analysis.
>(Or, if possible, go back to the data source, and correct them.) But is
>IS the easy case. It disposes of a lot of apparent outliers, but still
>leaves you to deal with the real ones.
>
>One point that's only recently become clear to me, is that rare,
>unusually large values can give a variance in parameter estimates far
>larger than statistical procedures will assign. That's because sampling
>will commonly include very few of them; the variance in the number
>sampled will be high; and their effect on the parameters, of course,
>very large. Worst case is the difference between having any of the
>largest values in your sample, and having none.
>
>Bootstrapping should probably be used to estimate parameter variances
>in these cases.
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 09:55:09 +0200
>From: Spousta Jan <[hidden email]>
>Subject: Re: (C)NLR with log-likelihood loss function
>
>Hi Gary,
>
>If I am not mistaken, this is just what is LOGISTIC REGRESSION command
>for. Why not to try it instead of torturing nonlinear regression?
>
>LOGISTIC REGRESSION y
> /METHOD = ENTER x1 x2 .
>
>Greetings
>
>Jan
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Gary Rosin
>Sent: Friday, August 25, 2006 4:55 AM
>To: [hidden email]
>Subject: (C)NLR with log-likelihood loss function
>
>I'm trying to use nonlinear regression to fit a regression using the
>logit transformation (0<y<1):
>
> y = 1/(1+exp(-(b0+b1*x1+b2*x2)))
>
>I'm using SPSS 14.0.2, and need a macro/syntax that will let me use
>log-likelihood as the loss function. The data is in the form of cases
>with y(i), x1(i), and x2(i).
>
>Any suggestions?
>
> ---
>Prof. Gary S. Rosin [hidden email]
>South Texas College of Law
>1303 San Jacinto Voice: (713) 646-1854
>Houston, TX 77002-7000 Fax: 646-1766
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 07:32:56 -0400
>From: "Smith, Brenda R." <[hidden email]>
>Subject: Basic Stat Question
>
>Please help. I have data that I have to compare which has several =
>variables.
>I need a way to rank them to tell which locality overall is number one =
>based
>on the variable.
>I tried just ranking them in ascending order but each column ranks
>differently. =20
>=20
>Brenda Smith
>757.823.8751 (voice)
>757.823.2057 (fax)
>=20
>=20
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 07:14:29 -0500
>From: "Beadle, ViAnn" <[hidden email]>
>Subject: Re: Basic Stat Question
>
>If I understand your posting, this is a question about constructing a =
>scale across multiple dimensions. That is, are you trying to tell which =
>locality is based upon one variable or several variable?. If it's one =
>variable than that is the only variable you rank. If it is based upon =
>multiple variables, then you have to create a scale.
>=20
>When I see rankings like this in magazines (e.g., best places to raise a =
>family), each ranker has a different way of doing it.=20
>=20
>Here's one approach.
>=20
>Convert all of your scalar variables to zscores so that have the same =
>scale.=20
>=20
>Are all of your variables scalar, or do you have some that are nominal?=20
>Assign values to categories of your categorical variables to be included =
>in the scale. Good categories get positive numbers and bad categories =
>get negative numbers and then zscore them as well.
>=20
>Add the zscored variables that are "good dimensions" (e.g., Family =
>Income or your converted categories) and subtract the "bad dimensions" =
>(e.g., Crimes per 1000 population). This gives you a single number that =
>you can then convert into a rank.
>=20
>If you think that some variables are more important than others, get =
>them a higher weight in the scale by multiplying the zscore by a number =
>while adding it.
>=20
>Disclaimer: I'm no expert at this but based upon rankings I've seen, =
>this appears to be how some of them are done. Note that they might use a =
>tscore which has a mean of 0 and a standard deviation but the zscore is =
>equivalent and easily produced from the DESCRIPTIVES procedure. Whatever =
>you do, clearly describe your methodology when publishing your results =
>to others.
>=20
>I hope you've been following the discussions about outliers on this list =
>because you'll want to check your data to make sure that any invalid =
>data doesn't corrupt the rankings.
>=20
>
>________________________________
>
>From: SPSSX(r) Discussion on behalf of Smith, Brenda R.
>Sent: Fri 8/25/2006 6:32 AM
>To: [hidden email]
>Subject: Basic Stat Question
>
>
>
>Please help. I have data that I have to compare which has several =
>variables.
>I need a way to rank them to tell which locality overall is number one =
>based
>on the variable.
>I tried just ranking them in ascending order but each column ranks
>differently.=20
>
>Brenda Smith
>757.823.8751 (voice)
>757.823.2057 (fax)
>
>------------------------------
>
>Date: Mon, 21 Aug 2006 11:27:10 -0700
>From: Dominic Lusinchi <[hidden email]>
>Subject: Re: running ANOVA on binary data?
>
>Without knowing more about your data... the most appropriate technique to
>use would be logistic regression where the dependent variable can be binary,
>and the independents can be a mix (categorical, continuous).
>
>Dominic Lusinchi
>Statistician
>Far West Research
>Statistical Consulting
>San Francisco, California
>415-664-3032
>www.farwestresearch.com
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Robinson Aschoff
>Sent: Monday, August 21, 2006 10:54 AM
>To: [hidden email]
>Subject: running ANOVA on binary data?
>
>Hello,
>
>I would like to ask if you are aware of any problems (violated
>assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
>answer "no" "yes" or "patient not infected" "patient infected"). How
>severe are those violations? Would you consider running an ANOVA in this
>case "common practice" or not recommendable? Does anybody happen to know
>where this aspect is discussed in literature?
>
>Thanks a lot. I really appreciate your help.
>
>Sincerely,
>Robinson Aschoff
>
>I hope this hasn`t been asked before a alot. I didn`t found it in the
>archive though.
>----------------------------------------------------------------
>Felix-Robinson Aschoff
>Information Management Research Group
>Department of Informatics
>University of Zurich
>Binzmuehlestrasse 14
>CH-8050 Zurich, Switzerland
>
>E-Mail: [hidden email]
>Phone: +41 (0)44 635 6690
>Fax: +41 (0)44 635 6809
>Room: 2.D.11
>http://www.ifi.unizh.ch/im
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 13:52:35 +0100
>From: Jatender Mohal <[hidden email]>
>Subject: Interpretation from transformed variable in ANOVA
>
>Hi list,
>
>While working on GLM univariate, the skewed dependent variable was
>transformed to follow normal distribution.
>
>While interpreting the descriptive estimated from the model, do I need to
>consider the values after transformation? If yes, how does it may effect the
>interpretation?
>
>Thanks in Advance
>
>
>
>Mohal
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 11:04:02 -0300
>From: Hector Maletta <[hidden email]>
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>You shouldn't seek an easy applause
>By cheating on assumptions.
>You shouldn't try to force a Gauss
>With such daring presumption.
>
>However, if you do, the interpretation is on the transformed data. For
>instance, if you worked on the logarithm of the original variable, any
>conclusion from ANOVA refers to the logarithm, not to the original variable.
>
>Hector
>
>
>(Nor should you criticize this poem for failing to have a proper rhyme).
>
>-----Mensaje original-----
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
>Jatender Mohal
>Enviado el: Friday, August 25, 2006 9:53 AM
>Para: [hidden email]
>Asunto: Interpretation from transformed variable in ANOVA
>
>Hi list,
>
>While working on GLM univariate, the skewed dependent variable was
>transformed to follow normal distribution.
>
>While interpreting the descriptive estimated from the model, do I need to
>consider the values after transformation? If yes, how does it may effect the
>interpretation?
>
>Thanks in Advance
>
>
>
>Mohal
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 15:44:17 +0100
>From: Jatender Mohal <[hidden email]>
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>Hello Hector,
>
>Thanks for the signal!
>
>Still, things in mind...
>The screening of a continuous variable for normality (univariate or
>multivariate) assumption is an early important step of inferential
>statistics. If the data is not normal, possibilities to get solutions are,
>1, Nonparametric tests 2, Suitable transformation to the nonnormal data.
>Nonparametric test, okay, why the solution is degraded when the data is
>forced to normal by suitable transformation to valid the model's
>assumptions?
>Is this a very subjective issue that most inferential statistics are robust
>to the departure of the assumptions?
>
>Kind regards
>
>Mohal
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Hector Maletta
>Sent: 25 August 2006 15:04
>To: [hidden email]
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>You shouldn't seek an easy applause
>By cheating on assumptions.
>You shouldn't try to force a Gauss
>With such daring presumption.
>
>However, if you do, the interpretation is on the transformed data. For
>instance, if you worked on the logarithm of the original variable, any
>conclusion from ANOVA refers to the logarithm, not to the original variable.
>
>Hector
>
>
>(Nor should you criticize this poem for failing to have a proper rhyme).
>
>-----Mensaje original-----
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
>Jatender Mohal
>Enviado el: Friday, August 25, 2006 9:53 AM
>Para: [hidden email]
>Asunto: Interpretation from transformed variable in ANOVA
>
>Hi list,
>
>While working on GLM univariate, the skewed dependent variable was
>transformed to follow normal distribution.
>
>While interpreting the descriptive estimated from the model, do I need to
>consider the values after transformation? If yes, how does it may effect the
>interpretation?
>
>Thanks in Advance
>
>
>
>Mohal
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 10:08:13 -0500
>From: Gary Rosin <[hidden email]>
>Subject: Re: (C)NLR with log-likelihood loss function
>
>So it is, if you are using individual data. I have grouped data,
>where
>
> y(i) = the proportion of group i that "passed"
> x1(i) = the mean of predictor x1 for group i
> x2(i) = predictor x2 for group i
>
>I used probit/logit to get a model, but the statistics supplied
>with that are skimpy. I want to use the parameters from the
>probit/logit model as the initial parameters for a weighted
>(C)NLR. I tried using the default least squares regressions
>(all 4 of them), but the resulting parameters varied somewhat
>from those of the probit/logit model. I wondered what would
>happen if instead I used MLEs as the loss function.
>
>I could disaggregate the data into individual cases--I think I
>recently saw a macro for that--but I to stretch, and to get
>familiar with implementing MLEs in (C)NLR.
>
>Gary
>
> >Jan Spousta <[hidden email]> wrote:
> >
> >If I am not mistaken, this is just what is LOGISTIC REGRESSION
> >command for. Why not to try it instead of torturing nonlinear
> >regression?
> >
> >LOGISTIC REGRESSION y
> > /METHOD = ENTER x1 x2 .
>
> >Gary Rosin wrote:
> >
> >I'm trying to use nonlinear regression to fit a regression using the
> >logit transformation (0<y<1):
> >
> > y = 1/(1+exp(-(b0+b1*x1+b2*x2)))
> >
> >I'm using SPSS 14.0.2, and need a macro/syntax that will let me use
> >log-likelihood as the loss function. The data is in the form of cases
> >with y(i), x1(i), and x2(i).
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 12:24:59 -0300
>From: Hector Maletta <[hidden email]>
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>The matter has been discussed many times in this list. A normal Gaussian
>frequency distribution of the dependent variable is NOT a requirement of the
>Generalized Linear Model in its various incarnations such as ANOVA or Linear
>Regression. Where normality enters the scene is in two places:
>1. Normal sampling distribution: The sample is a random sample, and
>therefore differences between all possible samples follow a normal
>distribution, whose mean tends to coincide with the population mean as
>sample size increases. As a result, normal significance tests apply.
>2. Normal distribution of residuals: In a regression equation like y=a+bx+e,
>errors or residuals "e" for each value of X are normally distributed about
>the Y value predicted by the regression equation, with zero mean. Therefore,
>the least squares algorithm applies.
>Now, even if not absolutely forbidden, a variable whose distribution is
>extremely skewed may nonetheless have a very high variance, and the sample
>size required to obtain a given level of standard error or a given level of
>significance will be correspondingly larger. Also, if the variable
>distribution is extremely skewed, some extreme values may have a
>disproportionate influence on the results; and the situation is also likely
>to be accompanied by heterokedasticity (i.e. the variance of residuals may
>be different for different parts of the variable range). Notice, however
>that non-normality of the variable is neither a necessary nor a sufficient
>cause for heterokedasticity. The latter can be present in the tail of a
>normal curve, or absent in the tail of a skewed curve. Also, notice that the
>literature tends to suggest that moderate heterokedasticity is tolerable, in
>the sense of not causing immoderate damage to the quality of results
>obtained by regression or ANOVA.
>Hector
>
>-----Mensaje original-----
>De: Jatender Mohal [mailto:[hidden email]]
>Enviado el: Friday, August 25, 2006 11:44 AM
>Para: 'Hector Maletta'
>CC: [hidden email]
>Asunto: RE: Interpretation from transformed variable in ANOVA
>
>Hello Hector,
>
>Thanks for the signal!
>
>Still, things in mind...
>The screening of a continuous variable for normality (univariate or
>multivariate) assumption is an early important step of inferential
>statistics. If the data is not normal, possibilities to get solutions are,
>1, Nonparametric tests 2, Suitable transformation to the nonnormal data.
>Nonparametric test, okay, why the solution is degraded when the data is
>forced to normal by suitable transformation to valid the model's
>assumptions?
>Is this a very subjective issue that most inferential statistics are robust
>to the departure of the assumptions?
>
>Kind regards
>
>Mohal
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Hector Maletta
>Sent: 25 August 2006 15:04
>To: [hidden email]
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>You shouldn't seek an easy applause
>By cheating on assumptions.
>You shouldn't try to force a Gauss
>With such daring presumption.
>
>However, if you do, the interpretation is on the transformed data. For
>instance, if you worked on the logarithm of the original variable, any
>conclusion from ANOVA refers to the logarithm, not to the original variable.
>
>Hector
>
>
>(Nor should you criticize this poem for failing to have a proper rhyme).
>
>-----Mensaje original-----
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
>Jatender Mohal
>Enviado el: Friday, August 25, 2006 9:53 AM
>Para: [hidden email]
>Asunto: Interpretation from transformed variable in ANOVA
>
>Hi list,
>
>While working on GLM univariate, the skewed dependent variable was
>transformed to follow normal distribution.
>
>While interpreting the descriptive estimated from the model, do I need to
>consider the values after transformation? If yes, how does it may effect the
>interpretation?
>
>Thanks in Advance
>
>
>
>Mohal
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 16:49:52 +0100
>From: Margaret MacDougall <[hidden email]>
>Subject: Missing confidence interval for 25th percentile
>
>Dear all
>
> I would be most grateful for some explanation as to why on some
> occasions the output for a Kaplan-Meier analysis can list the value for
> the 25th percentile of the survival time distribution (under '75.00') but
> not the confidence interval. Does this problem have something to do with
> the use of Greenwood's formula in calculating the standard error for this
> percentile? In fact, what formula does SPSS use to obtain the standard error?
>
> Many thanks for your input.
>
> Best wishes
>
> Margaret
>
>
>
>---------------------------------
> All new Yahoo! Mail "The new Interface is stunning in its simplicity and
> ease of use." - PC Magazine
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 18:45:55 +0200
>From: =?ISO-8859-1?B?TWFydGEgR2FyY+1hLUdyYW5lcm8=?=
><[hidden email]>
>Subject: Re: Missing confidence interval for 25th percentile
>
>Hi Margaret
>
>MM> I would be most grateful for some explanation as to why on
>MM> some occasions the output for a Kaplan-Meier analysis can list the
>MM> value for the 25th percentile of the survival time distribution
>MM> (under '75.00') but not the confidence interval. Does this
>MM> problem have something to do with the use of Greenwood's formula
>MM> in calculating the standard error for this percentile?
>
>I think that the formula needs to have survival values at both sides
>of the percentile.
>
>MM> In fact, what formula does SPSS use to obtain the standard error?
>
>You can get that info from the installation CD, folder "Algorithms"
>file KM.pdf (if you don't have access to those files you can also
>download them from:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/Statistics/algorithms/index.html
>
>If the link is broken in two lines copy-paste them together
>If requested to login use "Guest" both as User and password
>
>
>--
>Regards,
>Dr. Marta García-Granero,PhD mailto:[hidden email]
>Statistician
>
>---
>"It is unwise to use a statistical procedure whose use one does
>not understand. SPSS syntax guide cannot supply this knowledge, and it
>is certainly no substitute for the basic understanding of statistics
>and statistical thinking that is essential for the wise choice of
>methods and the correct interpretation of their results".
>
>(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 14:04:25 -0300
>From: Hector Maletta <[hidden email]>
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>You are welcome. A small correction to the first point of my message, which
>may seem obvious to many, but worth clarifying anyway. Added words are
>capitalized:
>1. Normal sampling distribution: The sample is a random sample, and
>therefore differences between THE MEANS OF all possible samples OF THE SAME
>POPULATION follow a normal distribution whose mean tends to coincide with
>the population mean as sample size increases. As a result, normal
>significance tests apply.
>Hector
>
>-----Mensaje original-----
>De: Jatender Mohal [mailto:[hidden email]]
>Enviado el: Friday, August 25, 2006 1:58 PM
>Para: 'Hector Maletta'
>Asunto: RE: Interpretation from transformed variable in ANOVA
>
>Hector,
>
>Thanks for your suggestions
>
>Kind regards
>Mohal
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Hector Maletta
>Sent: 25 August 2006 16:25
>To: [hidden email]
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>The matter has been discussed many times in this list. A normal Gaussian
>frequency distribution of the dependent variable is NOT a requirement of the
>Generalized Linear Model in its various incarnations such as ANOVA or Linear
>Regression. Where normality enters the scene is in two places:
>1. Normal sampling distribution: The sample is a random sample, and
>therefore differences between all possible samples follow a normal
>distribution, whose mean tends to coincide with the population mean as
>sample size increases. As a result, normal significance tests apply.
>2. Normal distribution of residuals: In a regression equation like y=a+bx+e,
>errors or residuals "e" for each value of X are normally distributed about
>the Y value predicted by the regression equation, with zero mean. Therefore,
>the least squares algorithm applies.
>Now, even if not absolutely forbidden, a variable whose distribution is
>extremely skewed may nonetheless have a very high variance, and the sample
>size required to obtain a given level of standard error or a given level of
>significance will be correspondingly larger. Also, if the variable
>distribution is extremely skewed, some extreme values may have a
>disproportionate influence on the results; and the situation is also likely
>to be accompanied by heterokedasticity (i.e. the variance of residuals may
>be different for different parts of the variable range). Notice, however
>that non-normality of the variable is neither a necessary nor a sufficient
>cause for heterokedasticity. The latter can be present in the tail of a
>normal curve, or absent in the tail of a skewed curve. Also, notice that the
>literature tends to suggest that moderate heterokedasticity is tolerable, in
>the sense of not causing immoderate damage to the quality of results
>obtained by regression or ANOVA.
>Hector
>
>-----Mensaje original-----
>De: Jatender Mohal [mailto:[hidden email]]
>Enviado el: Friday, August 25, 2006 11:44 AM
>Para: 'Hector Maletta'
>CC: [hidden email]
>Asunto: RE: Interpretation from transformed variable in ANOVA
>
>Hello Hector,
>
>Thanks for the signal!
>
>Still, things in mind...
>The screening of a continuous variable for normality (univariate or
>multivariate) assumption is an early important step of inferential
>statistics. If the data is not normal, possibilities to get solutions are,
>1, Nonparametric tests 2, Suitable transformation to the nonnormal data.
>Nonparametric test, okay, why the solution is degraded when the data is
>forced to normal by suitable transformation to valid the model's
>assumptions?
>Is this a very subjective issue that most inferential statistics are robust
>to the departure of the assumptions?
>
>Kind regards
>
>Mohal
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Hector Maletta
>Sent: 25 August 2006 15:04
>To: [hidden email]
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>You shouldn't seek an easy applause
>By cheating on assumptions.
>You shouldn't try to force a Gauss
>With such daring presumption.
>
>However, if you do, the interpretation is on the transformed data. For
>instance, if you worked on the logarithm of the original variable, any
>conclusion from ANOVA refers to the logarithm, not to the original variable.
>
>Hector
>
>
>(Nor should you criticize this poem for failing to have a proper rhyme).
>
>-----Mensaje original-----
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
>Jatender Mohal
>Enviado el: Friday, August 25, 2006 9:53 AM
>Para: [hidden email]
>Asunto: Interpretation from transformed variable in ANOVA
>
>Hi list,
>
>While working on GLM univariate, the skewed dependent variable was
>transformed to follow normal distribution.
>
>While interpreting the descriptive estimated from the model, do I need to
>consider the values after transformation? If yes, how does it may effect the
>interpretation?
>
>Thanks in Advance
>
>
>
>Mohal
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 13:34:27 -0500
>From: Gary Rosin <[hidden email]>
>Subject: Re: (C)NLR with log-likelihood loss function
>
>I could easily do a logit transformation of the y proportions
>and then do a (C)NLR regression using the MLE from
>spssbase.pdf.
>
>I have some questions/concerns, though:
>
> 1. Regressing on logit(y), p/(1-p), minimizes the residuals of
>the transformed variable, rather than the residuals of the original
>variable.
>
> 2. Does anyone have the macro/syntax for the log-likelihood
>function or for the partial derivatives?
>
>Gary
>
>
>Marta García-Granero <[hidden email]> wrote:
> >The formula for the log-likelihood function and MLE estimates can be found
> >here:
> >
> >http://support.spss.com/Tech/Products/SPSS/Documentation/Statistics/algor
> ithms/14.0/logistic_regression.pdf
> >
> >Alternatively, look in the installation CD, folder "Algorithms"
> >file "logistic_regression.pdf".
> >
> > >>I'm trying to use nonlinear regression to fit a regression using the
> > >>logit transformation (0<y<1):
> > >>
> > >> y = 1/(1+exp(-(b0+b1*x1+b2*x2)))
> > >>
> > >>I'm using SPSS 14.0.2, and need a macro/syntax that will let me use
> > >>log-likelihood as the loss function. The data is in the form of cases
> > >>with y(i), x1(i), and x2(i).
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 21:19:14 +0200
>From: =?ISO-8859-15?B?TWFydGEgR2FyY+1hLUdyYW5lcm8=?=
><[hidden email]>
>Subject: Re: (C)NLR with log-likelihood loss function
>
>Hi again Gary
>
>Some afterthoughts
>
>1) Take a look at this document:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0/SPSS%20Regression%20Models%2013.0.pdf
>
>There's a chapter dedicated to NLR.
>
>(if requested to login, use "guest" as user and password).
>
>It's in a QUITE well hidden page (hust positive criticism, spss-dot-com
>people) in SPSS extense website (I found it just by chance) that
>deserves a visit:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/index.html
>
>
>2) The fact that the data are grouped doesn't mean you can't use
>logistic regression, provided you have the sample sizes for each group
>"i".
>
>I can give you more details tomorrow, if you are interested.
>
>
>Regards
>
>Marta
>
>GR> So it is, if you are using individual data. I have grouped data,
>GR> where
>
>GR> y(i) = the proportion of group i that "passed"
>GR> x1(i) = the mean of predictor x1 for group i
>GR> x2(i) = predictor x2 for group i
>
>GR> I used probit/logit to get a model, but the statistics supplied
>GR> with that are skimpy. I want to use the parameters from the
>GR> probit/logit model as the initial parameters for a weighted
>GR> (C)NLR. I tried using the default least squares regressions
>GR> (all 4 of them), but the resulting parameters varied somewhat
>GR> from those of the probit/logit model. I wondered what would
>GR> happen if instead I used MLEs as the loss function.
>
>GR> I could disaggregate the data into individual cases--I think I
>GR> recently saw a macro for that--but I to stretch, and to get
>GR> familiar with implementing MLEs in (C)NLR.
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 15:06:58 -0500
>From: Anthony Babinec <[hidden email]>
>Subject: Re: (C)NLR with log-likelihood loss function
>
>I do not have a worked example and don't
>know if I have the time to work one up.
>
>You need to use a MODEL PROGRAM block before
>CNLR; you can optionally specify the derivatives in
>a DERIVATIVES block, but SPSS has an ability
>to use numerical derivatives if you don't
>have or are unable to derive them; and you
>must specify the loss function on the /LOSS
>subcommand of CNLR.
>
>The need for these is hinted at in the syntax
>discussion for NLR and CNLR. The question seems
>to be: what is the loss function for aggregate
>logistic regression, and what are the associated
>derivatives?
>
>Here's a thought: use SPSS to disaggregate the
>data, and then find the loss function and derivatives
>in textbook sources such as Greene's Econometric Analysis.
>
>It would be nice to have a general ML estimation engine
>in SPSS, but I don't know if there's another place
>to look beside CNLR.
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Marta García-Granero
>Sent: Friday, August 25, 2006 2:19 PM
>To: [hidden email]
>Subject: Re: (C)NLR with log-likelihood loss function
>
>Hi again Gary
>
>Some afterthoughts
>
>1) Take a look at this document:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0
>/SPSS%20Regression%20Models%2013.0.pdf
>
>There's a chapter dedicated to NLR.
>
>(if requested to login, use "guest" as user and password).
>
>It's in a QUITE well hidden page (hust positive criticism, spss-dot-com
>people) in SPSS extense website (I found it just by chance) that
>deserves a visit:
>
>http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde
>x.html
>
>
>2) The fact that the data are grouped doesn't mean you can't use
>logistic regression, provided you have the sample sizes for each group
>"i".
>
>I can give you more details tomorrow, if you are interested.
>
>
>Regards
>
>Marta
>
>GR> So it is, if you are using individual data. I have grouped data,
>GR> where
>
>GR> y(i) = the proportion of group i that "passed"
>GR> x1(i) = the mean of predictor x1 for group i
>GR> x2(i) = predictor x2 for group i
>
>GR> I used probit/logit to get a model, but the statistics supplied
>GR> with that are skimpy. I want to use the parameters from the
>GR> probit/logit model as the initial parameters for a weighted
>GR> (C)NLR. I tried using the default least squares regressions
>GR> (all 4 of them), but the resulting parameters varied somewhat
>GR> from those of the probit/logit model. I wondered what would
>GR> happen if instead I used MLEs as the loss function.
>
>GR> I could disaggregate the data into individual cases--I think I
>GR> recently saw a macro for that--but I to stretch, and to get
>GR> familiar with implementing MLEs in (C)NLR.
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 16:20:32 -0500
>From: Gary Rosin <[hidden email]>
>Subject: Re: (C)NLR with log-likelihood loss function
>
> Anthony Babinec <[hidden email]> wrote:
>
> >The need for these is hinted at in the syntax
> >discussion for NLR and CNLR. The question seems
> >to be: what is the loss function for aggregate
> >logistic regression, and what are the associated
> >derivatives?
>
>You can get the algorithm off the "algorithm" link
>at the bottom of the logistic regression help page.
>That gives both the log likelihood function,
>
>Sum(i=1 to n) [w(i)*y(i)*ln(prob(i)) + w(i)*(1-y(i))*ln(1-Prob(i))]
>
>and the partial derivatives for the B(j) parameters:
>
>Sum(i=1 to n) [w(i)*(y(i)-Prob(i))*ln(x(i,i)
>
>In both, w(i)'s are case weights, y(i)'s are observed proportions,
>prob(i)'s are the fitted proportions, and x(i,j)'s are the values of
>the predictors for the cases.
>
>The question is how to work the macro/syntax.
>
>Gary
>
>
> >-----Original Message-----
> >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> >Marta García-Granero
> >Sent: Friday, August 25, 2006 2:19 PM
> >To: [hidden email]
> >Subject: Re: (C)NLR with log-likelihood loss function
> >
> >Hi again Gary
> >
> >Some afterthoughts
> >
> >1) Take a look at this document:
> >
> >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0
> >/SPSS%20Regression%20Models%2013.0.pdf
> >
> >There's a chapter dedicated to NLR.
> >
> >(if requested to login, use "guest" as user and password).
> >
> >It's in a QUITE well hidden page (hust positive criticism, spss-dot-com
> >people) in SPSS extense website (I found it just by chance) that
> >deserves a visit:
> >
> >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde
> >x.html
> >
> >
> >2) The fact that the data are grouped doesn't mean you can't use
> >logistic regression, provided you have the sample sizes for each group
> >"i".
> >
> >I can give you more details tomorrow, if you are interested.
> >
> >
> >Regards
> >
> >Marta
> >
> >GR> So it is, if you are using individual data. I have grouped data,
> >GR> where
> >
> >GR> y(i) = the proportion of group i that "passed"
> >GR> x1(i) = the mean of predictor x1 for group i
> >GR> x2(i) = predictor x2 for group i
> >
> >GR> I used probit/logit to get a model, but the statistics supplied
> >GR> with that are skimpy. I want to use the parameters from the
> >GR> probit/logit model as the initial parameters for a weighted
> >GR> (C)NLR. I tried using the default least squares regressions
> >GR> (all 4 of them), but the resulting parameters varied somewhat
> >GR> from those of the probit/logit model. I wondered what would
> >GR> happen if instead I used MLEs as the loss function.
> >
> >GR> I could disaggregate the data into individual cases--I think I
> >GR> recently saw a macro for that--but I to stretch, and to get
> >GR> familiar with implementing MLEs in (C)NLR.
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 19:41:37 -0300
>From: Hector Maletta <[hidden email]>
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>Or more exactly:
>1. Normal sampling distribution: The sample is a random sample, and
>therefore THE MEANS OF all possible samples OF THE SAME POPULATION follow a
>normal distribution whose mean tends to coincide with the population mean as
>sample size increases. As a result, normal significance tests apply.
>Hector
>
>-----Mensaje original-----
>De: Hector Maletta [mailto:[hidden email]]
>Enviado el: Friday, August 25, 2006 2:04 PM
>Para: 'Jatender Mohal'
>CC: '[hidden email]'
>Asunto: RE: Interpretation from transformed variable in ANOVA
>
>You are welcome. A small correction to the first point of my message, which
>may seem obvious to many, but worth clarifying anyway. Added words are
>capitalized:
>1. Normal sampling distribution: The sample is a random sample, and
>therefore differences between THE MEANS OF all possible samples OF THE SAME
>POPULATION follow a normal distribution whose mean tends to coincide with
>the population mean as sample size increases. As a result, normal
>significance tests apply.
>Hector
>
>-----Mensaje original-----
>De: Jatender Mohal [mailto:[hidden email]]
>Enviado el: Friday, August 25, 2006 1:58 PM
>Para: 'Hector Maletta'
>Asunto: RE: Interpretation from transformed variable in ANOVA
>
>Hector,
>
>Thanks for your suggestions
>
>Kind regards
>Mohal
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Hector Maletta
>Sent: 25 August 2006 16:25
>To: [hidden email]
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>The matter has been discussed many times in this list. A normal Gaussian
>frequency distribution of the dependent variable is NOT a requirement of the
>Generalized Linear Model in its various incarnations such as ANOVA or Linear
>Regression. Where normality enters the scene is in two places:
>1. Normal sampling distribution: The sample is a random sample, and
>therefore differences between all possible samples follow a normal
>distribution, whose mean tends to coincide with the population mean as
>sample size increases. As a result, normal significance tests apply.
>2. Normal distribution of residuals: In a regression equation like y=a+bx+e,
>errors or residuals "e" for each value of X are normally distributed about
>the Y value predicted by the regression equation, with zero mean. Therefore,
>the least squares algorithm applies.
>Now, even if not absolutely forbidden, a variable whose distribution is
>extremely skewed may nonetheless have a very high variance, and the sample
>size required to obtain a given level of standard error or a given level of
>significance will be correspondingly larger. Also, if the variable
>distribution is extremely skewed, some extreme values may have a
>disproportionate influence on the results; and the situation is also likely
>to be accompanied by heterokedasticity (i.e. the variance of residuals may
>be different for different parts of the variable range). Notice, however
>that non-normality of the variable is neither a necessary nor a sufficient
>cause for heterokedasticity. The latter can be present in the tail of a
>normal curve, or absent in the tail of a skewed curve. Also, notice that the
>literature tends to suggest that moderate heterokedasticity is tolerable, in
>the sense of not causing immoderate damage to the quality of results
>obtained by regression or ANOVA.
>Hector
>
>-----Mensaje original-----
>De: Jatender Mohal [mailto:[hidden email]]
>Enviado el: Friday, August 25, 2006 11:44 AM
>Para: 'Hector Maletta'
>CC: [hidden email]
>Asunto: RE: Interpretation from transformed variable in ANOVA
>
>Hello Hector,
>
>Thanks for the signal!
>
>Still, things in mind...
>The screening of a continuous variable for normality (univariate or
>multivariate) assumption is an early important step of inferential
>statistics. If the data is not normal, possibilities to get solutions are,
>1, Nonparametric tests 2, Suitable transformation to the nonnormal data.
>Nonparametric test, okay, why the solution is degraded when the data is
>forced to normal by suitable transformation to valid the model's
>assumptions?
>Is this a very subjective issue that most inferential statistics are robust
>to the departure of the assumptions?
>
>Kind regards
>
>Mohal
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Hector Maletta
>Sent: 25 August 2006 15:04
>To: [hidden email]
>Subject: Re: Interpretation from transformed variable in ANOVA
>
>You shouldn't seek an easy applause
>By cheating on assumptions.
>You shouldn't try to force a Gauss
>With such daring presumption.
>
>However, if you do, the interpretation is on the transformed data. For
>instance, if you worked on the logarithm of the original variable, any
>conclusion from ANOVA refers to the logarithm, not to the original variable.
>
>Hector
>
>
>(Nor should you criticize this poem for failing to have a proper rhyme).
>
>-----Mensaje original-----
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
>Jatender Mohal
>Enviado el: Friday, August 25, 2006 9:53 AM
>Para: [hidden email]
>Asunto: Interpretation from transformed variable in ANOVA
>
>Hi list,
>
>While working on GLM univariate, the skewed dependent variable was
>transformed to follow normal distribution.
>
>While interpreting the descriptive estimated from the model, do I need to
>consider the values after transformation? If yes, how does it may effect the
>interpretation?
>
>Thanks in Advance
>
>
>
>Mohal
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 09:08:36 -0400
>From: "Woodward, Charlotte" <[hidden email]>
>Subject: Re: Syntax to search cases?
>
>Lise,
>Sort your file on var1 (ascending).
>
>Then run this syntax:
>
>AGGREGATE
> /OUTFILE='tempfilename.sav'
> /BREAK=var1
> /var4 = FIRST(var2).
>
>This will create a temporary unduplicated file with student and score.
>Then sort your file by var3 (ascending) and run this syntax:
>
>MATCH FILES /FILE=*
> /TABLE='tempfilename.sav'
> /BY var3.
>EXECUTE.
>
>This will add the variable var4 to your data file. This is the variable
>that will be the friend's score.
>
>Probably a long way around, but it should get the job done.
>
>Then
>
>Charlotte Woodward
>Data Coordinator/Analyst
>Office of Planning and IR
>[hidden email]
>
>------------------------------
>
>Date: Fri, 25 Aug 2006 10:04:27 -0400
>From: Ken Chui <[hidden email]>
>Subject: Re: Basic Stat Question
>
>If you go to Transform > Rank Cases, you can generate new rank variables by
>click over the variables you wish to rank. There are couple more buttons
>you can negvigate through if you wish to specify different ranking method
>and ways of dealing with ties.
>
>Hope this helps.
>
>On Fri, 25 Aug 2006 07:32:56 -0400, Smith, Brenda R. <[hidden email]> wrote:
>
> >Please help. I have data that I have to compare which has several
> variables.
> >I need a way to rank them to tell which locality overall is number one based
> >on the variable.
> >I tried just ranking them in ascending order but each column ranks
> >differently.
> >
> >Brenda Smith
> >757.823.8751 (voice)
> >757.823.2057 (fax)
>
>------------------------------
>
>Date: Sat, 26 Aug 2006 05:29:00 +0500
>From: Fahim Jafary <[hidden email]>
>Subject: Adding numbers in columns - HELP please.
>
>I have a dataset which records ECG changes in variables v1 to v6 as
>positive and negative numbers as well as zeros (if there is no change).
>
>I want to be able to
>
>1. Add the positive numbers from v1 to v6 to get a cumulative "score",
>ignoring the negative values (so, for values 1, 2, 1, -2, 1, -3 I should
>get a "score" of 5)
>2. Add the positive AND negative numbers from v1 to v6 to get a second
>cumulative "score" but IGNORING the negative or positive sign (so for
>values 1, 2, 1, -2, 1, -3 I should get a "score" of 10)
>3. Add the negative numbers from v1 to v6 to get a third "score" but
>IGNORING the positive values (so for values 1, 2, 1, -2, 1, -3 I should
>get a score of 5).
>
>Can someone help me with the syntax on this one. I'll be very grateful
>
>Fahim H. Jafary
>Aga Khan University Hospital
>Karachi, Pakistan
>
>------------------------------
>
>Date: Sat, 26 Aug 2006 11:26:18 +1000
>From: Jason Burke <[hidden email]>
>Subject: Re: Adding numbers in columns - HELP please.
>
>Hi Fahim,
>
>This should achieve what you require:
>
>VECTOR var = v1 TO v6 .
>LOOP #n = 1 To 6.
>IF (var(#n) GE 0) pos_score = SUM(pos_score, var(#n)) .
>COMPUTE abs_score = SUM(abs_score, ABS(var(#n))) .
>IF (var(#n) LE 0) neg_score = SUM(neg_score, var(#n)) .
>END LOOP .
>EXECUTE .
>
>Cheers,
>
>
>Jason
>
>On 8/26/06, Fahim Jafary <[hidden email]> wrote:
> > I have a dataset which records ECG changes in variables v1 to v6 as
> > positive and negative numbers as well as zeros (if there is no change).
> >
> > I want to be able to
> >
> > 1. Add the positive numbers from v1 to v6 to get a cumulative "score",
> > ignoring the negative values (so, for values 1, 2, 1, -2, 1, -3 I should
> > get a "score" of 5)
> > 2. Add the positive AND negative numbers from v1 to v6 to get a second
> > cumulative "score" but IGNORING the negative or positive sign (so for
> > values 1, 2, 1, -2, 1, -3 I should get a "score" of 10)
> > 3. Add the negative numbers from v1 to v6 to get a third "score" but
> > IGNORING the positive values (so for values 1, 2, 1, -2, 1, -3 I should
> > get a score of 5).
> >
> > Can someone help me with the syntax on this one. I'll be very grateful
> >
> > Fahim H. Jafary
> > Aga Khan University Hospital
> > Karachi, Pakistan
> >
>
>------------------------------
>
>End of SPSSX-L Digest - 24 Aug 2006 to 25 Aug 2006 (#2006-235)
>**************************************************************

... [show rest of quote]