Jason, thank you so much. It worked like a charm
Fahim At 09:00 AM 8/26/2006, you wrote: >There are 23 messages totalling 1353 lines in this issue. > >Topics of the day: > > 1. Performance Measures > 2. Outlier detection > 3. (C)NLR with log-likelihood loss function (6) > 4. Basic Stat Question (3) > 5. running ANOVA on binary data? > 6. Interpretation from transformed variable in ANOVA (6) > 7. Missing confidence interval for 25th percentile (2) > 8. Syntax to search cases? > 9. Adding numbers in columns - HELP please. (2) > >---------------------------------------------------------------------- > >Date: Thu, 24 Aug 2006 21:26:18 -0700 >From: Michael Healy <[hidden email]> >Subject: Re: Performance Measures > >Hi, Richard Ristow's summary of how to optimize SPSS commands was excellent >and I just wanted to add one additional point. The slowest part of an >analysis is going to be reading and writing data files from the disk, thus >you should use 10 or 15k RPM disks (I'm not sure whether their really is >much of an advantage for 15k disks when working with large files). An even >better solution is to use a RAID configuration in which 2 or more disks are >combined into a single volume and data are written across the disks. You can >create a raid using the Disk Management tool in Windows XP. There are a >number of different RAID types that can be used, but RAID 0 would be a good >solution as long as you back up regularly. More expensive RAID software and >hardware options are available that will offer you better speed and data >protection capabilities. >--Mike > >------------------------------ > >Date: Fri, 25 Aug 2006 01:17:40 -0400 >From: Richard Ristow <[hidden email]> >Subject: Re: Outlier detection > >To agree wholeheartedly, with a few notes: > >At 09:20 PM 8/23/2006, Statisticsdoc wrote:\, > > >Setting a cutoff related to standard deviations to detect outliers is > >useful in many situations, but there are exceptions. For example, > >standard deviations are more appropriate for normally distributed > >data. With skewed data (e.g. income, days of sickness absence, etc.) > >some valid values may [be very large, measuris ed in] standard > >deviations. > >Statistical folk wisdom in some circles is, you'll never see truly >normally distributed data. And one of the most common is far more >extreme values than a normal distribution would have. Often, most >marked at large multiples of the SD, where the observed values can be >rare and still many times the frequency for a normal distribution. > > >You might want to consider setting cutoffs for outliers based on > >plausibility (e.g., when dealing with high school students who claim > >that they consume 100 beers on a daily basis). [Or] "wild values" - > >numbers that do not make sense and can be considered out of range. > >This is the easy case: 'outliers' that can confidently be identified as >data errors. Easy, in that there's not subtlety about analysis. The >correct handling is clear: identify them, and drop them from analysis. >(Or, if possible, go back to the data source, and correct them.) But is >IS the easy case. It disposes of a lot of apparent outliers, but still >leaves you to deal with the real ones. > >One point that's only recently become clear to me, is that rare, >unusually large values can give a variance in parameter estimates far >larger than statistical procedures will assign. That's because sampling >will commonly include very few of them; the variance in the number >sampled will be high; and their effect on the parameters, of course, >very large. Worst case is the difference between having any of the >largest values in your sample, and having none. > >Bootstrapping should probably be used to estimate parameter variances >in these cases. > >------------------------------ > >Date: Fri, 25 Aug 2006 09:55:09 +0200 >From: Spousta Jan <[hidden email]> >Subject: Re: (C)NLR with log-likelihood loss function > >Hi Gary, > >If I am not mistaken, this is just what is LOGISTIC REGRESSION command >for. Why not to try it instead of torturing nonlinear regression? > >LOGISTIC REGRESSION y > /METHOD = ENTER x1 x2 . > >Greetings > >Jan > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Gary Rosin >Sent: Friday, August 25, 2006 4:55 AM >To: [hidden email] >Subject: (C)NLR with log-likelihood loss function > >I'm trying to use nonlinear regression to fit a regression using the >logit transformation (0<y<1): > > y = 1/(1+exp(-(b0+b1*x1+b2*x2))) > >I'm using SPSS 14.0.2, and need a macro/syntax that will let me use >log-likelihood as the loss function. The data is in the form of cases >with y(i), x1(i), and x2(i). > >Any suggestions? > > --- >Prof. Gary S. Rosin [hidden email] >South Texas College of Law >1303 San Jacinto Voice: (713) 646-1854 >Houston, TX 77002-7000 Fax: 646-1766 > >------------------------------ > >Date: Fri, 25 Aug 2006 07:32:56 -0400 >From: "Smith, Brenda R." <[hidden email]> >Subject: Basic Stat Question > >Please help. I have data that I have to compare which has several = >variables. >I need a way to rank them to tell which locality overall is number one = >based >on the variable. >I tried just ranking them in ascending order but each column ranks >differently. =20 >=20 >Brenda Smith >757.823.8751 (voice) >757.823.2057 (fax) >=20 >=20 > >------------------------------ > >Date: Fri, 25 Aug 2006 07:14:29 -0500 >From: "Beadle, ViAnn" <[hidden email]> >Subject: Re: Basic Stat Question > >If I understand your posting, this is a question about constructing a = >scale across multiple dimensions. That is, are you trying to tell which = >locality is based upon one variable or several variable?. If it's one = >variable than that is the only variable you rank. If it is based upon = >multiple variables, then you have to create a scale. >=20 >When I see rankings like this in magazines (e.g., best places to raise a = >family), each ranker has a different way of doing it.=20 >=20 >Here's one approach. >=20 >Convert all of your scalar variables to zscores so that have the same = >scale.=20 >=20 >Are all of your variables scalar, or do you have some that are nominal?=20 >Assign values to categories of your categorical variables to be included = >in the scale. Good categories get positive numbers and bad categories = >get negative numbers and then zscore them as well. >=20 >Add the zscored variables that are "good dimensions" (e.g., Family = >Income or your converted categories) and subtract the "bad dimensions" = >(e.g., Crimes per 1000 population). This gives you a single number that = >you can then convert into a rank. >=20 >If you think that some variables are more important than others, get = >them a higher weight in the scale by multiplying the zscore by a number = >while adding it. >=20 >Disclaimer: I'm no expert at this but based upon rankings I've seen, = >this appears to be how some of them are done. Note that they might use a = >tscore which has a mean of 0 and a standard deviation but the zscore is = >equivalent and easily produced from the DESCRIPTIVES procedure. Whatever = >you do, clearly describe your methodology when publishing your results = >to others. >=20 >I hope you've been following the discussions about outliers on this list = >because you'll want to check your data to make sure that any invalid = >data doesn't corrupt the rankings. >=20 > >________________________________ > >From: SPSSX(r) Discussion on behalf of Smith, Brenda R. >Sent: Fri 8/25/2006 6:32 AM >To: [hidden email] >Subject: Basic Stat Question > > > >Please help. I have data that I have to compare which has several = >variables. >I need a way to rank them to tell which locality overall is number one = >based >on the variable. >I tried just ranking them in ascending order but each column ranks >differently.=20 > >Brenda Smith >757.823.8751 (voice) >757.823.2057 (fax) > >------------------------------ > >Date: Mon, 21 Aug 2006 11:27:10 -0700 >From: Dominic Lusinchi <[hidden email]> >Subject: Re: running ANOVA on binary data? > >Without knowing more about your data... the most appropriate technique to >use would be logistic regression where the dependent variable can be binary, >and the independents can be a mix (categorical, continuous). > >Dominic Lusinchi >Statistician >Far West Research >Statistical Consulting >San Francisco, California >415-664-3032 >www.farwestresearch.com > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Robinson Aschoff >Sent: Monday, August 21, 2006 10:54 AM >To: [hidden email] >Subject: running ANOVA on binary data? > >Hello, > >I would like to ask if you are aware of any problems (violated >assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for >answer "no" "yes" or "patient not infected" "patient infected"). How >severe are those violations? Would you consider running an ANOVA in this >case "common practice" or not recommendable? Does anybody happen to know >where this aspect is discussed in literature? > >Thanks a lot. I really appreciate your help. > >Sincerely, >Robinson Aschoff > >I hope this hasn`t been asked before a alot. I didn`t found it in the >archive though. >---------------------------------------------------------------- >Felix-Robinson Aschoff >Information Management Research Group >Department of Informatics >University of Zurich >Binzmuehlestrasse 14 >CH-8050 Zurich, Switzerland > >E-Mail: [hidden email] >Phone: +41 (0)44 635 6690 >Fax: +41 (0)44 635 6809 >Room: 2.D.11 >http://www.ifi.unizh.ch/im > >------------------------------ > >Date: Fri, 25 Aug 2006 13:52:35 +0100 >From: Jatender Mohal <[hidden email]> >Subject: Interpretation from transformed variable in ANOVA > >Hi list, > >While working on GLM univariate, the skewed dependent variable was >transformed to follow normal distribution. > >While interpreting the descriptive estimated from the model, do I need to >consider the values after transformation? If yes, how does it may effect the >interpretation? > >Thanks in Advance > > > >Mohal > >------------------------------ > >Date: Fri, 25 Aug 2006 11:04:02 -0300 >From: Hector Maletta <[hidden email]> >Subject: Re: Interpretation from transformed variable in ANOVA > >You shouldn't seek an easy applause >By cheating on assumptions. >You shouldn't try to force a Gauss >With such daring presumption. > >However, if you do, the interpretation is on the transformed data. For >instance, if you worked on the logarithm of the original variable, any >conclusion from ANOVA refers to the logarithm, not to the original variable. > >Hector > > >(Nor should you criticize this poem for failing to have a proper rhyme). > >-----Mensaje original----- >De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de >Jatender Mohal >Enviado el: Friday, August 25, 2006 9:53 AM >Para: [hidden email] >Asunto: Interpretation from transformed variable in ANOVA > >Hi list, > >While working on GLM univariate, the skewed dependent variable was >transformed to follow normal distribution. > >While interpreting the descriptive estimated from the model, do I need to >consider the values after transformation? If yes, how does it may effect the >interpretation? > >Thanks in Advance > > > >Mohal > >------------------------------ > >Date: Fri, 25 Aug 2006 15:44:17 +0100 >From: Jatender Mohal <[hidden email]> >Subject: Re: Interpretation from transformed variable in ANOVA > >Hello Hector, > >Thanks for the signal! > >Still, things in mind... >The screening of a continuous variable for normality (univariate or >multivariate) assumption is an early important step of inferential >statistics. If the data is not normal, possibilities to get solutions are, >1, Nonparametric tests 2, Suitable transformation to the nonnormal data. >Nonparametric test, okay, why the solution is degraded when the data is >forced to normal by suitable transformation to valid the model's >assumptions? >Is this a very subjective issue that most inferential statistics are robust >to the departure of the assumptions? > >Kind regards > >Mohal > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Hector Maletta >Sent: 25 August 2006 15:04 >To: [hidden email] >Subject: Re: Interpretation from transformed variable in ANOVA > >You shouldn't seek an easy applause >By cheating on assumptions. >You shouldn't try to force a Gauss >With such daring presumption. > >However, if you do, the interpretation is on the transformed data. For >instance, if you worked on the logarithm of the original variable, any >conclusion from ANOVA refers to the logarithm, not to the original variable. > >Hector > > >(Nor should you criticize this poem for failing to have a proper rhyme). > >-----Mensaje original----- >De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de >Jatender Mohal >Enviado el: Friday, August 25, 2006 9:53 AM >Para: [hidden email] >Asunto: Interpretation from transformed variable in ANOVA > >Hi list, > >While working on GLM univariate, the skewed dependent variable was >transformed to follow normal distribution. > >While interpreting the descriptive estimated from the model, do I need to >consider the values after transformation? If yes, how does it may effect the >interpretation? > >Thanks in Advance > > > >Mohal > >------------------------------ > >Date: Fri, 25 Aug 2006 10:08:13 -0500 >From: Gary Rosin <[hidden email]> >Subject: Re: (C)NLR with log-likelihood loss function > >So it is, if you are using individual data. I have grouped data, >where > > y(i) = the proportion of group i that "passed" > x1(i) = the mean of predictor x1 for group i > x2(i) = predictor x2 for group i > >I used probit/logit to get a model, but the statistics supplied >with that are skimpy. I want to use the parameters from the >probit/logit model as the initial parameters for a weighted >(C)NLR. I tried using the default least squares regressions >(all 4 of them), but the resulting parameters varied somewhat >from those of the probit/logit model. I wondered what would >happen if instead I used MLEs as the loss function. > >I could disaggregate the data into individual cases--I think I >recently saw a macro for that--but I to stretch, and to get >familiar with implementing MLEs in (C)NLR. > >Gary > > >Jan Spousta <[hidden email]> wrote: > > > >If I am not mistaken, this is just what is LOGISTIC REGRESSION > >command for. Why not to try it instead of torturing nonlinear > >regression? > > > >LOGISTIC REGRESSION y > > /METHOD = ENTER x1 x2 . > > >Gary Rosin wrote: > > > >I'm trying to use nonlinear regression to fit a regression using the > >logit transformation (0<y<1): > > > > y = 1/(1+exp(-(b0+b1*x1+b2*x2))) > > > >I'm using SPSS 14.0.2, and need a macro/syntax that will let me use > >log-likelihood as the loss function. The data is in the form of cases > >with y(i), x1(i), and x2(i). > >------------------------------ > >Date: Fri, 25 Aug 2006 12:24:59 -0300 >From: Hector Maletta <[hidden email]> >Subject: Re: Interpretation from transformed variable in ANOVA > >The matter has been discussed many times in this list. A normal Gaussian >frequency distribution of the dependent variable is NOT a requirement of the >Generalized Linear Model in its various incarnations such as ANOVA or Linear >Regression. Where normality enters the scene is in two places: >1. Normal sampling distribution: The sample is a random sample, and >therefore differences between all possible samples follow a normal >distribution, whose mean tends to coincide with the population mean as >sample size increases. As a result, normal significance tests apply. >2. Normal distribution of residuals: In a regression equation like y=a+bx+e, >errors or residuals "e" for each value of X are normally distributed about >the Y value predicted by the regression equation, with zero mean. Therefore, >the least squares algorithm applies. >Now, even if not absolutely forbidden, a variable whose distribution is >extremely skewed may nonetheless have a very high variance, and the sample >size required to obtain a given level of standard error or a given level of >significance will be correspondingly larger. Also, if the variable >distribution is extremely skewed, some extreme values may have a >disproportionate influence on the results; and the situation is also likely >to be accompanied by heterokedasticity (i.e. the variance of residuals may >be different for different parts of the variable range). Notice, however >that non-normality of the variable is neither a necessary nor a sufficient >cause for heterokedasticity. The latter can be present in the tail of a >normal curve, or absent in the tail of a skewed curve. Also, notice that the >literature tends to suggest that moderate heterokedasticity is tolerable, in >the sense of not causing immoderate damage to the quality of results >obtained by regression or ANOVA. >Hector > >-----Mensaje original----- >De: Jatender Mohal [mailto:[hidden email]] >Enviado el: Friday, August 25, 2006 11:44 AM >Para: 'Hector Maletta' >CC: [hidden email] >Asunto: RE: Interpretation from transformed variable in ANOVA > >Hello Hector, > >Thanks for the signal! > >Still, things in mind... >The screening of a continuous variable for normality (univariate or >multivariate) assumption is an early important step of inferential >statistics. If the data is not normal, possibilities to get solutions are, >1, Nonparametric tests 2, Suitable transformation to the nonnormal data. >Nonparametric test, okay, why the solution is degraded when the data is >forced to normal by suitable transformation to valid the model's >assumptions? >Is this a very subjective issue that most inferential statistics are robust >to the departure of the assumptions? > >Kind regards > >Mohal > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Hector Maletta >Sent: 25 August 2006 15:04 >To: [hidden email] >Subject: Re: Interpretation from transformed variable in ANOVA > >You shouldn't seek an easy applause >By cheating on assumptions. >You shouldn't try to force a Gauss >With such daring presumption. > >However, if you do, the interpretation is on the transformed data. For >instance, if you worked on the logarithm of the original variable, any >conclusion from ANOVA refers to the logarithm, not to the original variable. > >Hector > > >(Nor should you criticize this poem for failing to have a proper rhyme). > >-----Mensaje original----- >De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de >Jatender Mohal >Enviado el: Friday, August 25, 2006 9:53 AM >Para: [hidden email] >Asunto: Interpretation from transformed variable in ANOVA > >Hi list, > >While working on GLM univariate, the skewed dependent variable was >transformed to follow normal distribution. > >While interpreting the descriptive estimated from the model, do I need to >consider the values after transformation? If yes, how does it may effect the >interpretation? > >Thanks in Advance > > > >Mohal > >------------------------------ > >Date: Fri, 25 Aug 2006 16:49:52 +0100 >From: Margaret MacDougall <[hidden email]> >Subject: Missing confidence interval for 25th percentile > >Dear all > > I would be most grateful for some explanation as to why on some > occasions the output for a Kaplan-Meier analysis can list the value for > the 25th percentile of the survival time distribution (under '75.00') but > not the confidence interval. Does this problem have something to do with > the use of Greenwood's formula in calculating the standard error for this > percentile? In fact, what formula does SPSS use to obtain the standard error? > > Many thanks for your input. > > Best wishes > > Margaret > > > >--------------------------------- > All new Yahoo! Mail "The new Interface is stunning in its simplicity and > ease of use." - PC Magazine > >------------------------------ > >Date: Fri, 25 Aug 2006 18:45:55 +0200 >From: =?ISO-8859-1?B?TWFydGEgR2FyY+1hLUdyYW5lcm8=?= ><[hidden email]> >Subject: Re: Missing confidence interval for 25th percentile > >Hi Margaret > >MM> I would be most grateful for some explanation as to why on >MM> some occasions the output for a Kaplan-Meier analysis can list the >MM> value for the 25th percentile of the survival time distribution >MM> (under '75.00') but not the confidence interval. Does this >MM> problem have something to do with the use of Greenwood's formula >MM> in calculating the standard error for this percentile? > >I think that the formula needs to have survival values at both sides >of the percentile. > >MM> In fact, what formula does SPSS use to obtain the standard error? > >You can get that info from the installation CD, folder "Algorithms" >file KM.pdf (if you don't have access to those files you can also >download them from: > >http://support.spss.com/Tech/Products/SPSS/Documentation/Statistics/algorithms/index.html > >If the link is broken in two lines copy-paste them together >If requested to login use "Guest" both as User and password > > >-- >Regards, >Dr. Marta García-Granero,PhD mailto:[hidden email] >Statistician > >--- >"It is unwise to use a statistical procedure whose use one does >not understand. SPSS syntax guide cannot supply this knowledge, and it >is certainly no substitute for the basic understanding of statistics >and statistical thinking that is essential for the wise choice of >methods and the correct interpretation of their results". > >(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) > >------------------------------ > >Date: Fri, 25 Aug 2006 14:04:25 -0300 >From: Hector Maletta <[hidden email]> >Subject: Re: Interpretation from transformed variable in ANOVA > >You are welcome. A small correction to the first point of my message, which >may seem obvious to many, but worth clarifying anyway. Added words are >capitalized: >1. Normal sampling distribution: The sample is a random sample, and >therefore differences between THE MEANS OF all possible samples OF THE SAME >POPULATION follow a normal distribution whose mean tends to coincide with >the population mean as sample size increases. As a result, normal >significance tests apply. >Hector > >-----Mensaje original----- >De: Jatender Mohal [mailto:[hidden email]] >Enviado el: Friday, August 25, 2006 1:58 PM >Para: 'Hector Maletta' >Asunto: RE: Interpretation from transformed variable in ANOVA > >Hector, > >Thanks for your suggestions > >Kind regards >Mohal > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Hector Maletta >Sent: 25 August 2006 16:25 >To: [hidden email] >Subject: Re: Interpretation from transformed variable in ANOVA > >The matter has been discussed many times in this list. A normal Gaussian >frequency distribution of the dependent variable is NOT a requirement of the >Generalized Linear Model in its various incarnations such as ANOVA or Linear >Regression. Where normality enters the scene is in two places: >1. Normal sampling distribution: The sample is a random sample, and >therefore differences between all possible samples follow a normal >distribution, whose mean tends to coincide with the population mean as >sample size increases. As a result, normal significance tests apply. >2. Normal distribution of residuals: In a regression equation like y=a+bx+e, >errors or residuals "e" for each value of X are normally distributed about >the Y value predicted by the regression equation, with zero mean. Therefore, >the least squares algorithm applies. >Now, even if not absolutely forbidden, a variable whose distribution is >extremely skewed may nonetheless have a very high variance, and the sample >size required to obtain a given level of standard error or a given level of >significance will be correspondingly larger. Also, if the variable >distribution is extremely skewed, some extreme values may have a >disproportionate influence on the results; and the situation is also likely >to be accompanied by heterokedasticity (i.e. the variance of residuals may >be different for different parts of the variable range). Notice, however >that non-normality of the variable is neither a necessary nor a sufficient >cause for heterokedasticity. The latter can be present in the tail of a >normal curve, or absent in the tail of a skewed curve. Also, notice that the >literature tends to suggest that moderate heterokedasticity is tolerable, in >the sense of not causing immoderate damage to the quality of results >obtained by regression or ANOVA. >Hector > >-----Mensaje original----- >De: Jatender Mohal [mailto:[hidden email]] >Enviado el: Friday, August 25, 2006 11:44 AM >Para: 'Hector Maletta' >CC: [hidden email] >Asunto: RE: Interpretation from transformed variable in ANOVA > >Hello Hector, > >Thanks for the signal! > >Still, things in mind... >The screening of a continuous variable for normality (univariate or >multivariate) assumption is an early important step of inferential >statistics. If the data is not normal, possibilities to get solutions are, >1, Nonparametric tests 2, Suitable transformation to the nonnormal data. >Nonparametric test, okay, why the solution is degraded when the data is >forced to normal by suitable transformation to valid the model's >assumptions? >Is this a very subjective issue that most inferential statistics are robust >to the departure of the assumptions? > >Kind regards > >Mohal > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Hector Maletta >Sent: 25 August 2006 15:04 >To: [hidden email] >Subject: Re: Interpretation from transformed variable in ANOVA > >You shouldn't seek an easy applause >By cheating on assumptions. >You shouldn't try to force a Gauss >With such daring presumption. > >However, if you do, the interpretation is on the transformed data. For >instance, if you worked on the logarithm of the original variable, any >conclusion from ANOVA refers to the logarithm, not to the original variable. > >Hector > > >(Nor should you criticize this poem for failing to have a proper rhyme). > >-----Mensaje original----- >De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de >Jatender Mohal >Enviado el: Friday, August 25, 2006 9:53 AM >Para: [hidden email] >Asunto: Interpretation from transformed variable in ANOVA > >Hi list, > >While working on GLM univariate, the skewed dependent variable was >transformed to follow normal distribution. > >While interpreting the descriptive estimated from the model, do I need to >consider the values after transformation? If yes, how does it may effect the >interpretation? > >Thanks in Advance > > > >Mohal > >------------------------------ > >Date: Fri, 25 Aug 2006 13:34:27 -0500 >From: Gary Rosin <[hidden email]> >Subject: Re: (C)NLR with log-likelihood loss function > >I could easily do a logit transformation of the y proportions >and then do a (C)NLR regression using the MLE from >spssbase.pdf. > >I have some questions/concerns, though: > > 1. Regressing on logit(y), p/(1-p), minimizes the residuals of >the transformed variable, rather than the residuals of the original >variable. > > 2. Does anyone have the macro/syntax for the log-likelihood >function or for the partial derivatives? > >Gary > > >Marta García-Granero <[hidden email]> wrote: > >The formula for the log-likelihood function and MLE estimates can be found > >here: > > > >http://support.spss.com/Tech/Products/SPSS/Documentation/Statistics/algor > ithms/14.0/logistic_regression.pdf > > > >Alternatively, look in the installation CD, folder "Algorithms" > >file "logistic_regression.pdf". > > > > >>I'm trying to use nonlinear regression to fit a regression using the > > >>logit transformation (0<y<1): > > >> > > >> y = 1/(1+exp(-(b0+b1*x1+b2*x2))) > > >> > > >>I'm using SPSS 14.0.2, and need a macro/syntax that will let me use > > >>log-likelihood as the loss function. The data is in the form of cases > > >>with y(i), x1(i), and x2(i). > >------------------------------ > >Date: Fri, 25 Aug 2006 21:19:14 +0200 >From: =?ISO-8859-15?B?TWFydGEgR2FyY+1hLUdyYW5lcm8=?= ><[hidden email]> >Subject: Re: (C)NLR with log-likelihood loss function > >Hi again Gary > >Some afterthoughts > >1) Take a look at this document: > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0/SPSS%20Regression%20Models%2013.0.pdf > >There's a chapter dedicated to NLR. > >(if requested to login, use "guest" as user and password). > >It's in a QUITE well hidden page (hust positive criticism, spss-dot-com >people) in SPSS extense website (I found it just by chance) that >deserves a visit: > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/index.html > > >2) The fact that the data are grouped doesn't mean you can't use >logistic regression, provided you have the sample sizes for each group >"i". > >I can give you more details tomorrow, if you are interested. > > >Regards > >Marta > >GR> So it is, if you are using individual data. I have grouped data, >GR> where > >GR> y(i) = the proportion of group i that "passed" >GR> x1(i) = the mean of predictor x1 for group i >GR> x2(i) = predictor x2 for group i > >GR> I used probit/logit to get a model, but the statistics supplied >GR> with that are skimpy. I want to use the parameters from the >GR> probit/logit model as the initial parameters for a weighted >GR> (C)NLR. I tried using the default least squares regressions >GR> (all 4 of them), but the resulting parameters varied somewhat >GR> from those of the probit/logit model. I wondered what would >GR> happen if instead I used MLEs as the loss function. > >GR> I could disaggregate the data into individual cases--I think I >GR> recently saw a macro for that--but I to stretch, and to get >GR> familiar with implementing MLEs in (C)NLR. > >------------------------------ > >Date: Fri, 25 Aug 2006 15:06:58 -0500 >From: Anthony Babinec <[hidden email]> >Subject: Re: (C)NLR with log-likelihood loss function > >I do not have a worked example and don't >know if I have the time to work one up. > >You need to use a MODEL PROGRAM block before >CNLR; you can optionally specify the derivatives in >a DERIVATIVES block, but SPSS has an ability >to use numerical derivatives if you don't >have or are unable to derive them; and you >must specify the loss function on the /LOSS >subcommand of CNLR. > >The need for these is hinted at in the syntax >discussion for NLR and CNLR. The question seems >to be: what is the loss function for aggregate >logistic regression, and what are the associated >derivatives? > >Here's a thought: use SPSS to disaggregate the >data, and then find the loss function and derivatives >in textbook sources such as Greene's Econometric Analysis. > >It would be nice to have a general ML estimation engine >in SPSS, but I don't know if there's another place >to look beside CNLR. > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Marta García-Granero >Sent: Friday, August 25, 2006 2:19 PM >To: [hidden email] >Subject: Re: (C)NLR with log-likelihood loss function > >Hi again Gary > >Some afterthoughts > >1) Take a look at this document: > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0 >/SPSS%20Regression%20Models%2013.0.pdf > >There's a chapter dedicated to NLR. > >(if requested to login, use "guest" as user and password). > >It's in a QUITE well hidden page (hust positive criticism, spss-dot-com >people) in SPSS extense website (I found it just by chance) that >deserves a visit: > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde >x.html > > >2) The fact that the data are grouped doesn't mean you can't use >logistic regression, provided you have the sample sizes for each group >"i". > >I can give you more details tomorrow, if you are interested. > > >Regards > >Marta > >GR> So it is, if you are using individual data. I have grouped data, >GR> where > >GR> y(i) = the proportion of group i that "passed" >GR> x1(i) = the mean of predictor x1 for group i >GR> x2(i) = predictor x2 for group i > >GR> I used probit/logit to get a model, but the statistics supplied >GR> with that are skimpy. I want to use the parameters from the >GR> probit/logit model as the initial parameters for a weighted >GR> (C)NLR. I tried using the default least squares regressions >GR> (all 4 of them), but the resulting parameters varied somewhat >GR> from those of the probit/logit model. I wondered what would >GR> happen if instead I used MLEs as the loss function. > >GR> I could disaggregate the data into individual cases--I think I >GR> recently saw a macro for that--but I to stretch, and to get >GR> familiar with implementing MLEs in (C)NLR. > >------------------------------ > >Date: Fri, 25 Aug 2006 16:20:32 -0500 >From: Gary Rosin <[hidden email]> >Subject: Re: (C)NLR with log-likelihood loss function > > Anthony Babinec <[hidden email]> wrote: > > >The need for these is hinted at in the syntax > >discussion for NLR and CNLR. The question seems > >to be: what is the loss function for aggregate > >logistic regression, and what are the associated > >derivatives? > >You can get the algorithm off the "algorithm" link >at the bottom of the logistic regression help page. >That gives both the log likelihood function, > >Sum(i=1 to n) [w(i)*y(i)*ln(prob(i)) + w(i)*(1-y(i))*ln(1-Prob(i))] > >and the partial derivatives for the B(j) parameters: > >Sum(i=1 to n) [w(i)*(y(i)-Prob(i))*ln(x(i,i) > >In both, w(i)'s are case weights, y(i)'s are observed proportions, >prob(i)'s are the fitted proportions, and x(i,j)'s are the values of >the predictors for the cases. > >The question is how to work the macro/syntax. > >Gary > > > >-----Original Message----- > >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > >Marta García-Granero > >Sent: Friday, August 25, 2006 2:19 PM > >To: [hidden email] > >Subject: Re: (C)NLR with log-likelihood loss function > > > >Hi again Gary > > > >Some afterthoughts > > > >1) Take a look at this document: > > > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/13.0 > >/SPSS%20Regression%20Models%2013.0.pdf > > > >There's a chapter dedicated to NLR. > > > >(if requested to login, use "guest" as user and password). > > > >It's in a QUITE well hidden page (hust positive criticism, spss-dot-com > >people) in SPSS extense website (I found it just by chance) that > >deserves a visit: > > > >http://support.spss.com/Tech/Products/SPSS/Documentation/SPSSforWindows/inde > >x.html > > > > > >2) The fact that the data are grouped doesn't mean you can't use > >logistic regression, provided you have the sample sizes for each group > >"i". > > > >I can give you more details tomorrow, if you are interested. > > > > > >Regards > > > >Marta > > > >GR> So it is, if you are using individual data. I have grouped data, > >GR> where > > > >GR> y(i) = the proportion of group i that "passed" > >GR> x1(i) = the mean of predictor x1 for group i > >GR> x2(i) = predictor x2 for group i > > > >GR> I used probit/logit to get a model, but the statistics supplied > >GR> with that are skimpy. I want to use the parameters from the > >GR> probit/logit model as the initial parameters for a weighted > >GR> (C)NLR. I tried using the default least squares regressions > >GR> (all 4 of them), but the resulting parameters varied somewhat > >GR> from those of the probit/logit model. I wondered what would > >GR> happen if instead I used MLEs as the loss function. > > > >GR> I could disaggregate the data into individual cases--I think I > >GR> recently saw a macro for that--but I to stretch, and to get > >GR> familiar with implementing MLEs in (C)NLR. > >------------------------------ > >Date: Fri, 25 Aug 2006 19:41:37 -0300 >From: Hector Maletta <[hidden email]> >Subject: Re: Interpretation from transformed variable in ANOVA > >Or more exactly: >1. Normal sampling distribution: The sample is a random sample, and >therefore THE MEANS OF all possible samples OF THE SAME POPULATION follow a >normal distribution whose mean tends to coincide with the population mean as >sample size increases. As a result, normal significance tests apply. >Hector > >-----Mensaje original----- >De: Hector Maletta [mailto:[hidden email]] >Enviado el: Friday, August 25, 2006 2:04 PM >Para: 'Jatender Mohal' >CC: '[hidden email]' >Asunto: RE: Interpretation from transformed variable in ANOVA > >You are welcome. A small correction to the first point of my message, which >may seem obvious to many, but worth clarifying anyway. Added words are >capitalized: >1. Normal sampling distribution: The sample is a random sample, and >therefore differences between THE MEANS OF all possible samples OF THE SAME >POPULATION follow a normal distribution whose mean tends to coincide with >the population mean as sample size increases. As a result, normal >significance tests apply. >Hector > >-----Mensaje original----- >De: Jatender Mohal [mailto:[hidden email]] >Enviado el: Friday, August 25, 2006 1:58 PM >Para: 'Hector Maletta' >Asunto: RE: Interpretation from transformed variable in ANOVA > >Hector, > >Thanks for your suggestions > >Kind regards >Mohal > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Hector Maletta >Sent: 25 August 2006 16:25 >To: [hidden email] >Subject: Re: Interpretation from transformed variable in ANOVA > >The matter has been discussed many times in this list. A normal Gaussian >frequency distribution of the dependent variable is NOT a requirement of the >Generalized Linear Model in its various incarnations such as ANOVA or Linear >Regression. Where normality enters the scene is in two places: >1. Normal sampling distribution: The sample is a random sample, and >therefore differences between all possible samples follow a normal >distribution, whose mean tends to coincide with the population mean as >sample size increases. As a result, normal significance tests apply. >2. Normal distribution of residuals: In a regression equation like y=a+bx+e, >errors or residuals "e" for each value of X are normally distributed about >the Y value predicted by the regression equation, with zero mean. Therefore, >the least squares algorithm applies. >Now, even if not absolutely forbidden, a variable whose distribution is >extremely skewed may nonetheless have a very high variance, and the sample >size required to obtain a given level of standard error or a given level of >significance will be correspondingly larger. Also, if the variable >distribution is extremely skewed, some extreme values may have a >disproportionate influence on the results; and the situation is also likely >to be accompanied by heterokedasticity (i.e. the variance of residuals may >be different for different parts of the variable range). Notice, however >that non-normality of the variable is neither a necessary nor a sufficient >cause for heterokedasticity. The latter can be present in the tail of a >normal curve, or absent in the tail of a skewed curve. Also, notice that the >literature tends to suggest that moderate heterokedasticity is tolerable, in >the sense of not causing immoderate damage to the quality of results >obtained by regression or ANOVA. >Hector > >-----Mensaje original----- >De: Jatender Mohal [mailto:[hidden email]] >Enviado el: Friday, August 25, 2006 11:44 AM >Para: 'Hector Maletta' >CC: [hidden email] >Asunto: RE: Interpretation from transformed variable in ANOVA > >Hello Hector, > >Thanks for the signal! > >Still, things in mind... >The screening of a continuous variable for normality (univariate or >multivariate) assumption is an early important step of inferential >statistics. If the data is not normal, possibilities to get solutions are, >1, Nonparametric tests 2, Suitable transformation to the nonnormal data. >Nonparametric test, okay, why the solution is degraded when the data is >forced to normal by suitable transformation to valid the model's >assumptions? >Is this a very subjective issue that most inferential statistics are robust >to the departure of the assumptions? > >Kind regards > >Mohal > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Hector Maletta >Sent: 25 August 2006 15:04 >To: [hidden email] >Subject: Re: Interpretation from transformed variable in ANOVA > >You shouldn't seek an easy applause >By cheating on assumptions. >You shouldn't try to force a Gauss >With such daring presumption. > >However, if you do, the interpretation is on the transformed data. For >instance, if you worked on the logarithm of the original variable, any >conclusion from ANOVA refers to the logarithm, not to the original variable. > >Hector > > >(Nor should you criticize this poem for failing to have a proper rhyme). > >-----Mensaje original----- >De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de >Jatender Mohal >Enviado el: Friday, August 25, 2006 9:53 AM >Para: [hidden email] >Asunto: Interpretation from transformed variable in ANOVA > >Hi list, > >While working on GLM univariate, the skewed dependent variable was >transformed to follow normal distribution. > >While interpreting the descriptive estimated from the model, do I need to >consider the values after transformation? If yes, how does it may effect the >interpretation? > >Thanks in Advance > > > >Mohal > >------------------------------ > >Date: Fri, 25 Aug 2006 09:08:36 -0400 >From: "Woodward, Charlotte" <[hidden email]> >Subject: Re: Syntax to search cases? > >Lise, >Sort your file on var1 (ascending). > >Then run this syntax: > >AGGREGATE > /OUTFILE='tempfilename.sav' > /BREAK=var1 > /var4 = FIRST(var2). > >This will create a temporary unduplicated file with student and score. >Then sort your file by var3 (ascending) and run this syntax: > >MATCH FILES /FILE=* > /TABLE='tempfilename.sav' > /BY var3. >EXECUTE. > >This will add the variable var4 to your data file. This is the variable >that will be the friend's score. > >Probably a long way around, but it should get the job done. > >Then > >Charlotte Woodward >Data Coordinator/Analyst >Office of Planning and IR >[hidden email] > >------------------------------ > >Date: Fri, 25 Aug 2006 10:04:27 -0400 >From: Ken Chui <[hidden email]> >Subject: Re: Basic Stat Question > >If you go to Transform > Rank Cases, you can generate new rank variables by >click over the variables you wish to rank. There are couple more buttons >you can negvigate through if you wish to specify different ranking method >and ways of dealing with ties. > >Hope this helps. > >On Fri, 25 Aug 2006 07:32:56 -0400, Smith, Brenda R. <[hidden email]> wrote: > > >Please help. I have data that I have to compare which has several > variables. > >I need a way to rank them to tell which locality overall is number one based > >on the variable. > >I tried just ranking them in ascending order but each column ranks > >differently. > > > >Brenda Smith > >757.823.8751 (voice) > >757.823.2057 (fax) > >------------------------------ > >Date: Sat, 26 Aug 2006 05:29:00 +0500 >From: Fahim Jafary <[hidden email]> >Subject: Adding numbers in columns - HELP please. > >I have a dataset which records ECG changes in variables v1 to v6 as >positive and negative numbers as well as zeros (if there is no change). > >I want to be able to > >1. Add the positive numbers from v1 to v6 to get a cumulative "score", >ignoring the negative values (so, for values 1, 2, 1, -2, 1, -3 I should >get a "score" of 5) >2. Add the positive AND negative numbers from v1 to v6 to get a second >cumulative "score" but IGNORING the negative or positive sign (so for >values 1, 2, 1, -2, 1, -3 I should get a "score" of 10) >3. Add the negative numbers from v1 to v6 to get a third "score" but >IGNORING the positive values (so for values 1, 2, 1, -2, 1, -3 I should >get a score of 5). > >Can someone help me with the syntax on this one. I'll be very grateful > >Fahim H. Jafary >Aga Khan University Hospital >Karachi, Pakistan > >------------------------------ > >Date: Sat, 26 Aug 2006 11:26:18 +1000 >From: Jason Burke <[hidden email]> >Subject: Re: Adding numbers in columns - HELP please. > >Hi Fahim, > >This should achieve what you require: > >VECTOR var = v1 TO v6 . >LOOP #n = 1 To 6. >IF (var(#n) GE 0) pos_score = SUM(pos_score, var(#n)) . >COMPUTE abs_score = SUM(abs_score, ABS(var(#n))) . >IF (var(#n) LE 0) neg_score = SUM(neg_score, var(#n)) . >END LOOP . >EXECUTE . > >Cheers, > > >Jason > >On 8/26/06, Fahim Jafary <[hidden email]> wrote: > > I have a dataset which records ECG changes in variables v1 to v6 as > > positive and negative numbers as well as zeros (if there is no change). > > > > I want to be able to > > > > 1. Add the positive numbers from v1 to v6 to get a cumulative "score", > > ignoring the negative values (so, for values 1, 2, 1, -2, 1, -3 I should > > get a "score" of 5) > > 2. Add the positive AND negative numbers from v1 to v6 to get a second > > cumulative "score" but IGNORING the negative or positive sign (so for > > values 1, 2, 1, -2, 1, -3 I should get a "score" of 10) > > 3. Add the negative numbers from v1 to v6 to get a third "score" but > > IGNORING the positive values (so for values 1, 2, 1, -2, 1, -3 I should > > get a score of 5). > > > > Can someone help me with the syntax on this one. I'll be very grateful > > > > Fahim H. Jafary > > Aga Khan University Hospital > > Karachi, Pakistan > > > >------------------------------ > >End of SPSSX-L Digest - 24 Aug 2006 to 25 Aug 2006 (#2006-235) >************************************************************** |
Free forum by Nabble | Edit this page |