Hi All,
Does anybody have an example of using the Poisson regression module for python (downloadable from spss.com/devcentral) with example data. I'd like to have a look at what other people are doing in order to apply it to my work. Thanks in advance Mike ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ |
If you're running 15.0, you might want to check out the new GENLIN procedure, which can fit Poisson regressions.
Alex -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Pearmain Sent: Tuesday, November 07, 2006 4:56 AM To: [hidden email] Subject: Poisson regression python module. Hi All, Does anybody have an example of using the Poisson regression module for python (downloadable from spss.com/devcentral) with example data. I'd like to have a look at what other people are doing in order to apply it to my work. Thanks in advance Mike |
Hi everybody
RA> If you're running 15.0, you might want to check out the new RA> GENLIN procedure, which can fit Poisson regressions. Just in case anyone is interested, here is a worked example (from Campbell's "Statistics at Square Two") of Poisson regression solved using GENLIN (SPSS 15) or GENLOG (older SPSS versions). Results are consistent with the ones obtained with STATA (as shown in the mentioned book), and POISSREG.exe (from PEPI 4.0 freeware statistical package). Sorry but I'm not Pythoning yet (but that will change soon). Cheers, Marta * Example dataset (from MJ Campbell "Statistics at Square Two", BMJ books) *. DATA LIST list / id(f2.0) agegroup(f8.0) smoker(f1.0) pyears(f8.0) deaths(f4.0). BEGIN DATA 1 0 0 18790 2 2 1 0 10673 12 3 2 0 5712 28 4 3 0 2585 28 5 4 0 1462 31 6 0 1 52407 32 7 1 1 43248 104 8 2 1 28612 206 9 3 1 12663 186 10 4 1 5317 102 END DATA. DOCUMENT 'Coronary deaths from British male doctors. Doll & Hill (Nat Cancer Inst Monog 1996; 19:205-68)'. VARIABLE LABELS agegroup "Age group". VALUE LABELS agegroup 0 "35-44 years" 1 "45-54 years" 2 "55-64 years" 3 "65-74 years" 4 "75-84 years". VARIABLE LABELS smoker "Smoking status". VALUE LABELS smoker 0 "No" 1 "Yes". * Using SPSS 15 - GENLIN *. COMPUTE logpyears=LN(pyears). GENLIN deaths BY agegroup smoker (ORDER=DESCENDING) /MODEL agegroup smoker INTERCEPT=YES OFFSET=logpyears DISTRIBUTION=POISSON LINK=LOG. * Older SPSS: use GENLOG *. * GENLOG uses the last group as reference group: agegroup needs recoding *. RECODE agegroup (0=5) . ADD VALUE LABELS agegroup 0 "" 5 "35-44 years". RECODE smoker (0=2) . ADD VALUE LABELS smoker 0 "" 2 "No". FREQUENCIES VARIABLES=agegroup smoker /ORDER VARIABLES . * Statistical analysis *. GENLOG agegroup smoker /CSTRUCTURE=pyears /MODEL=POISSON /PRINT FREQ RESID ESTIM /PLOT NONE /CRITERIA =DELTA(0) /DESIGN agegroup smoker . |
With a little care to make sure that the same reference categories are used, the Python poisson_regression module delivers parameter estimates identical to GENLIN. See the Draft Viewer output below.
* Example dataset (from MJ Campbell "Statistics at Square Two", BMJ books) *. DATA LIST list / id(f2.0) agegroup(f8.0) smoker(f1.0) pyears(f8.0) deaths(f4.0). BEGIN DATA 1 0 0 18790 2 2 1 0 10673 12 3 2 0 5712 28 4 3 0 2585 28 5 4 0 1462 31 6 0 1 52407 32 7 1 1 43248 104 8 2 1 28612 206 9 3 1 12663 186 10 4 1 5317 102 END DATA. DOCUMENT 'Coronary deaths from British male doctors. Doll & Hill (Nat Cancer Inst Monog 1996; 19:205-68)'. VARIABLE LABELS agegroup "Age group". VALUE LABELS agegroup 0 "35-44 years" 1 "45-54 years" 2 "55-64 years" 3 "65-74 years" 4 "75-84 years". VARIABLE LABELS smoker "Smoking status". VALUE LABELS smoker 0 "No" 1 "Yes". DATASET NAME campbell . * GENLOG uses the last group as reference group: agegroup needs recoding *. RECODE agegroup (0=5) . ADD VALUE LABELS agegroup 0 "" 5 "35-44 years". RECODE smoker (0=2) . ADD VALUE LABELS smoker 0 "" 2 "No". BEGIN PROGRAM. from poisson_regression import * #help(poisson_regression) poisson_regression(dependent="deaths", factors=["agegroup", "smoker"], ratevar="pyears") END PROGRAM. ************************************************************************************************************. * ... lots of output omitted ... ************************************************************************************************************. Parameter Estimates |--------------|-------------------|-----------|-----------|----------------------------------|----------------------------| | |Parameter |Estimate |Std. Error |95% Confidence Interval |95% Trimmed Range | | | |-----------|-----------|----------------------|-----------|----------------|-----------| | | |Lower Bound|Upper Bound|Lower Bound |Upper Bound|Lower Bound |Upper Bound| |--------------|-------------------|-----------|-----------|----------------------|-----------|----------------|-----------| |Bootstrap(a,b)|Intercept |-7.919 |.613 |-9.127 |-6.712 |-9.148 |-7.301 | | |-------------------|-----------|-----------|----------------------|-----------|----------------|-----------| | |agegroup_45_54years|1.484 |.593 |.315 |2.653 |.840 |2.923 | | |-------------------|-----------|-----------|----------------------|-----------|----------------|-----------| | |agegroup_55_64years|2.627 |.613 |1.418 |3.836 |2.341 |4.020 | | |-------------------|-----------|-----------|----------------------|-----------|----------------|-----------| | |agegroup_65_74years|3.350 |.585 |2.197 |4.504 |3.135 |4.755 | | |-------------------|-----------|-----------|----------------------|-----------|----------------|-----------| | |agegroup_75_84years|3.700 |.645 |2.428 |4.972 |3.447 |5.294 | | |-------------------|-----------|-----------|----------------------|-----------|----------------|-----------| | |smoker_Yes |.355 |.305 |-.247 |.956 |-.100 |1.081 | |--------------|-------------------|-----------|-----------|----------------------|-----------|----------------|-----------| a Based on 210 samples. b Loss function value equals 33.600. Correlations of Parameter Estimates |---------|-------------------|---------|-------------------|-------------------|-------------------|-------------------|-----------| | | |Intercept|agegroup_45_54years|agegroup_55_64years|agegroup_65_74years|agegroup_75_84years|smoker_Yes| |---------|-------------------|---------|-------------------|-------------------|-------------------|-------------------|-----------| |Bootstrap|Intercept |1.000 |-.840 |-.908 |-.816 |-.820 |-.147 | | |-------------------|---------|-------------------|-------------------|-------------------|-------------------|----------| | |agegroup_45_54years|-.840 |1.000 |.819 |.689 |.727 |-.126 | | |-------------------|---------|-------------------|-------------------|-------------------|-------------------|----------| | |agegroup_55_64years|-.908 |.819 |1.000 |.774 |.764 |-.082 | | |-------------------|---------|-------------------|-------------------|-------------------|-------------------|----------| | |agegroup_65_74years|-.816 |.689 |.774 |1.000 |.735 |-.063 | | |-------------------|---------|-------------------|-------------------|-------------------|-------------------|----------| | |agegroup_75_84years|-.820 |.727 |.764 |.735 |1.000 |-.061 | | |-------------------|---------|-------------------|-------------------|-------------------|-------------------|----------| | |smoker_Yes |-.147 |-.126 |-.082 |-.063 |-.061 |1.000 | |---------|-------------------|---------|-------------------|-------------------|-------------------|-------------------|-----------| -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero Sent: Tuesday, November 07, 2006 12:34 PM To: [hidden email] Subject: Re: Poisson regression python module. Hi everybody RA> If you're running 15.0, you might want to check out the new GENLIN RA> procedure, which can fit Poisson regressions. Just in case anyone is interested, here is a worked example (from Campbell's "Statistics at Square Two") of Poisson regression solved using GENLIN (SPSS 15) or GENLOG (older SPSS versions). Results are consistent with the ones obtained with STATA (as shown in the mentioned book), and POISSREG.exe (from PEPI 4.0 freeware statistical package). Sorry but I'm not Pythoning yet (but that will change soon). Cheers, Marta * Example dataset (from MJ Campbell "Statistics at Square Two", BMJ books) *. DATA LIST list / id(f2.0) agegroup(f8.0) smoker(f1.0) pyears(f8.0) deaths(f4.0). BEGIN DATA 1 0 0 18790 2 2 1 0 10673 12 3 2 0 5712 28 4 3 0 2585 28 5 4 0 1462 31 6 0 1 52407 32 7 1 1 43248 104 8 2 1 28612 206 9 3 1 12663 186 10 4 1 5317 102 END DATA. DOCUMENT 'Coronary deaths from British male doctors. Doll & Hill (Nat Cancer Inst Monog 1996; 19:205-68)'. VARIABLE LABELS agegroup "Age group". VALUE LABELS agegroup 0 "35-44 years" 1 "45-54 years" 2 "55-64 years" 3 "65-74 years" 4 "75-84 years". VARIABLE LABELS smoker "Smoking status". VALUE LABELS smoker 0 "No" 1 "Yes". * Using SPSS 15 - GENLIN *. COMPUTE logpyears=LN(pyears). GENLIN deaths BY agegroup smoker (ORDER=DESCENDING) /MODEL agegroup smoker INTERCEPT=YES OFFSET=logpyears DISTRIBUTION=POISSON LINK=LOG. * Older SPSS: use GENLOG *. * GENLOG uses the last group as reference group: agegroup needs recoding *. RECODE agegroup (0=5) . ADD VALUE LABELS agegroup 0 "" 5 "35-44 years". RECODE smoker (0=2) . ADD VALUE LABELS smoker 0 "" 2 "No". FREQUENCIES VARIABLES=agegroup smoker /ORDER VARIABLES . * Statistical analysis *. GENLOG agegroup smoker /CSTRUCTURE=pyears /MODEL=POISSON /PRINT FREQ RESID ESTIM /PLOT NONE /CRITERIA =DELTA(0) /DESIGN agegroup smoker . |
John,
I'd like to respectfully ask where spss is going with python specifically and, more generally, with add-on languages such as python and sax basic. I ask this out of a curiosity driven by the above poisson regression example. It seems to me that python could be or will become either a parallel to the current syntax or a replacement for it. Either way, I curious as to the thinking that drove this decision (and I can imagine that it was a decision not undertaken lightly). For instance, are there groups of spss users for whom python is a benfit relative to syntax? Gene Maguin |
Assume a binary logistic regression, where the overall
proportion of positive responses is well above 0.50. If I run a BLR using the default classification cutoff of 0.50, the classification table shows that the resulting model is better at predicting positive responses than it is at predicting negative responses. If I raise classification cutoff to just under the observed overall proportion of positive responses, the overall proporti0on of correct classifications declines somewhat, but the postive v. negative responses are more equialent--the model predicts positives and negatives about equally well. My initial reaction is to prefer the more balance model that results from using the higher classification cutoffs. Thoughts? Gary --- Gary S. Rosin Professor of Law South Texas College of Law 1303 San Jacinto Houston, TX 77002 <[hidden email]> 713-646-1854 |
In reply to this post by Maguin, Eugene
When using Probit (logit) or GenLin (logit) models in
SPSS 15.0, I notice no obvious equivalent to changing the classification cutoff in binary logistic. What am I missing? --- Gary S. Rosin Professor of Law South Texas College of Law 1303 San Jacinto Houston, TX 77002 <[hidden email]> 713-646-1854 |
In reply to this post by Gary Rosin
The relative importance of a false negative prediction and a false
positive prediction will depend on the particular situation you are analyzing (e.g. a wrong prediction that a convict will commit a crime may lead a parole board to deny release, resulting in the prisoner serving a longer prison sentence; a wrong prediction that the convict will not commit a crime may result in the prisoner's release, and the commission of a new crime). Your question is not a statistics question. You have to make a judgment about the tradeoffs here. David Greenberg, Sociology Department, New York University ----- Original Message ----- From: Gary Rosin <[hidden email]> Date: Tuesday, November 7, 2006 6:21 pm Subject: Binary Logistic Classification Cutoffs > Assume a binary logistic regression, where the overall > proportion of positive responses is well above 0.50. > > If I run a BLR using the default classification cutoff > of 0.50, the classification table shows that the > resulting model is better at predicting positive > responses than it is at predicting negative responses. > > If I raise classification cutoff to just under the > observed overall proportion of positive responses, the > overall proporti0on of correct classifications declines > somewhat, but the postive v. negative responses are more > equialent--the model predicts positives and negatives > about equally well. > > My initial reaction is to prefer the more balance model > that results from using the higher classification > cutoffs. Thoughts? > > Gary > > --- > > Gary S. Rosin > Professor of Law > South Texas College of Law > 1303 San Jacinto > Houston, TX 77002 > > <[hidden email]> > 713-646-1854 > |
In reply to this post by Gary Rosin
There is no equivalent control in Probit or Genlin. You can obtain equivalent results in Genlin by saving the predicted probabilities, computing new variables for each competing classification cutoff, and then running Crosstabs (or Ctables) to create the classification tables.
Note that when choosing between cutoffs, saving the predicted probabilities should/could actually save you some time because you won't be recomputing the logistic regression equation to test each cutoff. You can also run the predicted probabilities through ROC Curve, which can help you find the right balance between specificity and sensitivity in your choice of cutoff. Alex -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Rosin Sent: Tuesday, November 07, 2006 5:29 PM To: [hidden email] Subject: Classification cutoffs & Probit & GenLin When using Probit (logit) or GenLin (logit) models in SPSS 15.0, I notice no obvious equivalent to changing the classification cutoff in binary logistic. What am I missing? --- Gary S. Rosin Professor of Law South Texas College of Law 1303 San Jacinto Houston, TX 77002 <[hidden email]> 713-646-1854 |
In reply to this post by Maguin, Eugene
Gene,
I don't speak for SPSS here, so I will not comment on what the company was thinking. However, there is one group of users that springs to mind immediately for whom Python is of greatest benefit relative to syntax: MACRO users. Or to be precise: those who need the kind of functionality that should be provided by MACRO, but who are not using it because of its difficulty. But check SPSS Developer Central http://www.spss.com/devcentral/index.cfm for more of where I'm going with Python. There should be a couple of new downloads this week. John -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Tuesday, November 07, 2006 3:37 PM To: [hidden email] Subject: Re: Poisson regression python module. John, I'd like to respectfully ask where spss is going with python specifically and, more generally, with add-on languages such as python and sax basic. I ask this out of a curiosity driven by the above poisson regression example. It seems to me that python could be or will become either a parallel to the current syntax or a replacement for it. Either way, I curious as to the thinking that drove this decision (and I can imagine that it was a decision not undertaken lightly). For instance, are there groups of spss users for whom python is a benfit relative to syntax? Gene Maguin |
I've been having some email problems and had thought I sent a response earlier today. Apologies to those of you who may receive this twice.
1. SPSS has no intention of abandoning traditional SPSS syntax. 2. By adding the capability of using a general purpose programming language within SPSS, we can offer a major increase in the capabilities of SPSS by combining the power of such a language with the traditional strengths of the SPSS statistical and data management engine. You can find a discussion of the benefits on SPSS Developer Central (www.spss.com/devcentral) in the Directions presentation article, which is linked on the right-hand side of the main page. The SPSS Programming and Data Management book, also linked there, illustrates many of the capabilities of this combination. In brief, a general purpose language such as Python or VB, the two currently provided, allows building of more flexible and more robust jobs and automating many tasks that previously had to be done manually. 3. While programmability offers many capabilities for those building statistical applications, it also offers benefits to users who can take advantage of modules built by SPSS or users who do not need or want to learn the new language. Recent examples include the partial least squares regression and raking modules which will be made available soon for download on Developer Central and the expansion of the SPSS transformation system using the trans and extendedTransforms modules. 4. In the future, SPSS will continue to offer modules that use the traditional syntax, but some new capabilities will be offered as programmability modules. The combination of SPSS and user-written modules means that the capabilities of the SPSS system can advance faster than would be the case with the traditional development methods. Regards. Kyle Weeks, Ph.D. Director of Product Management, SPSS Product Line Product Management SPSS Inc. [hidden email] www.spss.com SPSS Inc. helps organizations turn data into insight through predictive analytics. ________________________________ From: SPSSX(r) Discussion on behalf of Bauer, John H. Sent: Wed 11/8/2006 3:10 PM To: [hidden email] Subject: Re: Poisson regression python module. Gene, I don't speak for SPSS here, so I will not comment on what the company was thinking. However, there is one group of users that springs to mind immediately for whom Python is of greatest benefit relative to syntax: MACRO users. Or to be precise: those who need the kind of functionality that should be provided by MACRO, but who are not using it because of its difficulty. But check SPSS Developer Central http://www.spss.com/devcentral/index.cfm for more of where I'm going with Python. There should be a couple of new downloads this week. John -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Tuesday, November 07, 2006 3:37 PM To: [hidden email] Subject: Re: Poisson regression python module. John, I'd like to respectfully ask where spss is going with python specifically and, more generally, with add-on languages such as python and sax basic. I ask this out of a curiosity driven by the above poisson regression example. It seems to me that python could be or will become either a parallel to the current syntax or a replacement for it. Either way, I curious as to the thinking that drove this decision (and I can imagine that it was a decision not undertaken lightly). For instance, are there groups of spss users for whom python is a benfit relative to syntax? Gene Maguin |
In reply to this post by Bauer, John H.
Hi everybody
Going back to the subject of Poisson regression with SPSS (I haven't learnt Python, therefore, I think I can't say anything on the topic of where this SPSS-Python marriage is going to...), one thing that strikes me as unusual in the output (either with with GENLOG, GENLIG or the Python module) is the lack of EXP(b) with its confidence interval (it is called IRR: Incidence Rate Ratio). In logistic or Cox regression, SPSS output gives EXP(b) - OR in the first case and HR in the second) - which is more easily interpreted than the coefficient in log scale. Neither GENLIN nor the Python module do. This is my small contribution to improve the output (the last table mimicks STATA output): * POISSON REGRESSION WITH SPSS 15 *. * Sample dataset *. DATA LIST list /id(F2.0) agegroup(F8.0) smoker(F1.0) pyears(F8.0) deaths(F4.0). BEGIN DATA 1 0 0 18790 2 2 1 0 10673 12 3 2 0 5712 28 4 3 0 2585 28 5 4 0 1462 31 6 0 1 52407 32 7 1 1 43248 104 8 2 1 28612 206 9 3 1 12663 186 10 4 1 5317 102 END DATA. DOCUMENT 'Coronary deaths from British male doctors. Doll & Hill (Nat Cancer Inst Monog 1996; 19:205-68)'. VARIABLE LABELS agegroup "Age group". VALUE LABELS agegroup 0 "35-44 years" 1 "45-54 years" 2 "55-64 years" 3 "65-74 years" 4 "75-84 years". VARIABLE LABELS smoker "Smoking status". VALUE LABELS smoker 0 "No" 1 "Yes". DATASET NAME Campbell . * OMS (to capture the parameter estimates) *. DATASET DECLARE Coefficients. SET OLANG=ENGLISH. OMS /SELECT TABLES /IF COMMANDS = 'Generalized Linear Models' SUBTYPES = 'ParameterEstimates' /DESTINATION FORMAT = SAV OUTFILE = Coefficients. * GENLIN *. COMPUTE logpyears=LN(pyears). GENLIN deaths BY agegroup smoker (ORDER=DESCENDING) /MODEL agegroup smoker INTERCEPT=YES OFFSET=logpyears DISTRIBUTION=POISSON LINK=LOG. OMSEND. * Computing & displaying IRR *. DATASET ACTIVATE Coefficients. DELETE VARIABLES Command_ TO Label_. SELECT IF (NOT MISSING(Sig)) AND (Var1 NE '(Intercept)'). COMPUTE IRR=EXP(B). COMPUTE LowerIRR=EXP(Lower). COMPUTE UpperIRR=EXP(Upper). COMPUTE Zvalue=SQRT(WaldChiSquare). VAR LABEL Var1 'Parameter'/ IRR 'IRR'/ LowerIRR 'Lower 95% CL for IRR'/ UpperIRR 'Upper 95% CL for IRR'/ Zvalue 'Z' Sig 'Sig.'. FORMAT IRR TO Zvalue (F8.4). COMPUTE id=$casenum. SORT CASES BY id(D). OMS /SELECT TABLES /IF COMMANDS = 'Summarize' SUBTYPES = 'Case Processing Summary' /DESTINATION VIEWER = NO. SUMMARIZE /TABLES=Var1 IRR Zvalue Sig LowerIRR UpperIRR /FORMAT=LIST NOCASENUM NOTOTAL /TITLE='Poisson regression: Incidence Rate Ratio (IRR) & 95% Wald CI' /MISSING=VARIABLE /CELLS=NONE. DATASET ACTIVATE Campbell. DATASET CLOSE Coefficients. OMSEND. Regards, Marta |
Hi
Thursday, November 9, 2006, 10:36:45 AM, I wrote: MGG> Going back to the subject of Poisson regression with SPSS [...] MGG> one thing that strikes me as unusual in the output (either with MGG> with GENLOG, GENLIG or the Python module) is the lack of EXP(b) MGG> with its confidence interval (it is called IRR: Incidence Rate MGG> Ratio). Ooooops! I didn't spot the "/PRINT SOLUTION(EXPONENTIATED)" subcommand in GENLIN, sorry. Still very new with SPSS 15, I didn't have time to mess with every option... Regards, Marta |
In reply to this post by Marta García-Granero
Hi Eric
I'd rather follow this exchange of mail on the list, if you don't mind. ejff> Dear Marta ejff> That's what I did before and was answered the same tip by Kyle (from SPSS). ejff> Thanks anyway. Well that's how things go... If you don't have SPSS 15, then you have to work a bit harder to fit a negative binomial regression model. ejff> By the way when it says "You may be able to use the following set of command ejff> syntax, after editing": 'after editing' means 'getting the ejff> dataset'? No, it means that you have to modify the variable names supplied with the example (v1, v2, v3...) by the ones you are going to use to fit the model. If we take a look at resolution 54271 (see my comments to the syntax) : Resolution Description: SPSS releases through Release 14 have no procedure designed to fit negative binomial regression models. The new GENLIN procedure in Release 15 includes the ability to fit negative binomial regression models. In releases prior to 15, the following approach may be of use. The CNLR procedure fits nonlinear regression models, including ones with user defined loss functions. You may be able to use the following set of command syntax, after editing, to fit negative binomial regression models: * Change y to the actual dependent variable. * Add as many parameters b0, b1, ... to the model program as needed. * Change v1, v2, v3... into the names of the independent variables. * Modify "compute bx" to be the sum of parameters times independent variables. * MLE FOR NEGATIVE BINOMIAL (x = threshold p = prob) . * My comments: define as many "b" as independent variables + 1 (don't forget the intercept!), those are only starting values *. Model program x = 1.5 b0 = 0.0 b1 = 1 b2 = 1 b3 = 1 . * Change "v1", "v2", "v3"... by the names of the independent variables of your model *. compute bx = b0+b1*v1+b2*v2+b3*v3 . * Now, leave the following commands unchanged *. compute k = exp(bx) . compute pred_ = x/k . COMPUTE loss_ = -(lngamma(x+y)-lngamma(x)-lngamma(y+1)+x*bx-(x+y)*ln(1+k)) . CNLR y /PRED pred_ /LOSS loss_ /BOUNDS x >= 1 . >> You'd better ask the whole list, since some SPSS people might give you >> a better answer than mine. See this SPSS Resolution: >> >> http://support.spss.com/tech/troubleshooting/ressearchdetail.asp?ID=54271 >> >> (if requested to login, use Guest both as user & password) >> >> I have never tried it myself. >> >> ejff> Right, I wish to run a Poisson regression but I have trouble with >> ejff> overdispersion. Note that the DV is a rate and IV are categorical and >> ejff> continuous. Thing is: >> >> ejff> - I don't have SPSS 15 and therefore not allowed to use the brand new >> Genlin option in order to run a negative binomial regression; >> ejff> - SAS offers the possibility of including a correction term within a >> Poisson regression. Nevertheless SAS does not accept continuous independent >> variable and I'm afraid not to be able to categorize the ones I intend to use. >> >> ejff> So: are you aware of the possibility of this correcting term in SPSS >> (version 12 or 14)? How can I manage to take into account a variance greater (or >> smaller) than the mean? -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
Hi Marta
Thanks for your concern. Another couple questions: I did as was told, edited the variables and then ran the CNLR. But the output appeared quite weird, at least for my understanding: - first, does CNLR accept catgorical IV? I put one in the model (others IV are continuous) and just got one estimator, which does not make it quite easy to interpret. - I did a NB regression using SAS and the results are quite different. I know it might be a difference in calculation algorythmn used in each software but still I think I did not handle the regression very well. - next, when I add the BOOTSTRAP subcommand, I get valueless estimated SE. Any other tips? Thanks again E. > Hi Eric > > I'd rather follow this exchange of mail on the list, if you don't > mind. > > ejff> Dear Marta > > ejff> That's what I did before and was answered the same tip by Kyle (from > SPSS). > ejff> Thanks anyway. > > Well that's how things go... If you don't have SPSS 15, then you have > to work a bit harder to fit a negative binomial regression model. > > ejff> By the way when it says "You may be able to use the following set of > command > ejff> syntax, after editing": 'after editing' means 'getting the > ejff> dataset'? > > No, it means that you have to modify the variable names supplied with > the example (v1, v2, v3...) by the ones you are going to use to fit the > model. If we take a look at resolution 54271 (see my comments to the > syntax) : > > Resolution Description: > > SPSS releases through Release 14 have no procedure designed to fit > negative binomial regression models. The new GENLIN procedure in > Release 15 includes the ability to fit negative binomial regression > models. In releases prior to 15, the following approach may be of use. > > The CNLR procedure fits nonlinear regression models, including ones > with user defined loss functions. You may be able to use the following > set of command syntax, after editing, to fit negative binomial > regression models: > > * Change y to the actual dependent variable. > * Add as many parameters b0, b1, ... to the model program as needed. > * Change v1, v2, v3... into the names of the independent variables. > * Modify "compute bx" to be the sum of parameters times independent > variables. > > * MLE FOR NEGATIVE BINOMIAL (x = threshold p = prob) . > > * My comments: define as many "b" as independent variables + 1 (don't > forget the intercept!), those are only starting values *. > > Model program x = 1.5 b0 = 0.0 b1 = 1 b2 = 1 b3 = 1 . > > * Change "v1", "v2", "v3"... by the names of the independent variables > of your model *. > > compute bx = b0+b1*v1+b2*v2+b3*v3 . > > * Now, leave the following commands unchanged *. > > compute k = exp(bx) . > compute pred_ = x/k . > COMPUTE loss_ = -(lngamma(x+y)-lngamma(x)-lngamma(y+1)+x*bx-(x+y)*ln(1+k)) . > CNLR y > /PRED pred_ > /LOSS loss_ > /BOUNDS x >= 1 . > > >> You'd better ask the whole list, since some SPSS people might give you > >> a better answer than mine. See this SPSS Resolution: > >> > >> http://support.spss.com/tech/troubleshooting/ressearchdetail.asp?ID=54271 > >> > >> (if requested to login, use Guest both as user & password) > >> > >> I have never tried it myself. > >> > >> ejff> Right, I wish to run a Poisson regression but I have trouble with > >> ejff> overdispersion. Note that the DV is a rate and IV are categorical > and > >> ejff> continuous. Thing is: > >> > >> ejff> - I don't have SPSS 15 and therefore not allowed to use the brand > new > >> Genlin option in order to run a negative binomial regression; > >> ejff> - SAS offers the possibility of including a correction term within a > >> Poisson regression. Nevertheless SAS does not accept continuous > independent > >> variable and I'm afraid not to be able to categorize the ones I intend to > use. > >> > >> ejff> So: are you aware of the possibility of this correcting term in SPSS > >> (version 12 or 14)? How can I manage to take into account a variance > greater (or > >> smaller) than the mean? > > > -- > Regards, > Dr. Marta García-Granero,PhD mailto:[hidden email] > Statistician > > --- > "It is unwise to use a statistical procedure whose use one does > not understand. SPSS syntax guide cannot supply this knowledge, and it > is certainly no substitute for the basic understanding of statistics > and statistical thinking that is essential for the wise choice of > methods and the correct interpretation of their results". > > (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) > |
Hi Eric
ejff> Another couple questions: I did as was told, edited the variables and then ran ejff> the CNLR. But the output appeared quite weird, at least for my understanding: ejff> - first, does CNLR accept catgorical IV? I put one in the model (others IV are ejff> continuous) and just got one estimator, which does not make it quite easy to ejff> interpret. Not directly, you have to dummy code them and pass them to CNLR as different variables (each dummy with its own "Bi" coefficient). For instance, to dummy code Agegroup (with the following categories: 1:25-35 years; 2:35-45 years; 3:45-55 years; 4:55.65 years) using first category as indicator, use the following syntax: DO REPAT a/agegrp1 agegrp2 agegrp3/B=2 3 4. COMPUTE A= agegroup(B). END REPEAT. This will create 3 dummy variables, called agegrp1 ti agegrp3. Pass them to the CNLR syntax as b1*agegrp1, b2*agegrp2, b3*agegrp3... ejff> - I did a NB regression using SAS and the results are quite different. I know it ejff> might be a difference in calculation algorythmn used in each software but still ejff> I think I did not handle the regression very well. Try now and see if the results agree. ejff> - next, when I add the BOOTSTRAP subcommand, I get valueless estimated SE. I don't know, check them after dummy coding the categorical variable and running the syntax again. Regards, Marta |
In reply to this post by Bauer, John H.
All,
Here is the setup. Two datasets are allegedly identical. Consider variables x1 to x10. Possible values include sysmis and user missing (two or three values). Let's do this in syntax and not through the identify duplicates thing. If there are no user missing, then this will work. COMPUTE MATCH=0. DO IF (FAMID EQ LAG(FAMID)). + DO REPEAT X=X1 TO X10. + IF (X EQ LAG(X)) MATCH=MATCH+1. + IF (SYSMIS(X) AND SYSMIS(X)) MATCH=MATCH+1. + END REPEAT. ELSE. + COMPUTE MATCH=99. END IF. IF ($CASENUM EQ 1) MATCH=99. /* NEEDED FOR PAIR 1 RECORD 1. Now add user missing. I'd like to say COMPUTE MATCH=0. DO IF (FAMID EQ LAG(FAMID) AND PARID EQ LAG(PARID)). + DO REPEAT X=C1PRCA1 TO C1PRCA9. + IF (X EQ VALUE(LAG(X))) MATCH=MATCH+1. + IF (SYSMIS(X) AND SYSMIS(LAG(X))) MATCH=MATCH+1. + END REPEAT. ELSE. + COMPUTE MATCH=99. END IF. IF ($CASENUM EQ 1) MATCH=99. /* NEEDED FOR PAIR 1 RECORD 1. However, the Value and Lag functions don't work together--in any sequence. A plausible alternative is COMPUTE MATCH=0. DO IF (FAMID EQ LAG(FAMID) AND PARID EQ LAG(PARID)). + DO REPEAT X=C1PRCA1 TO C1PRCA9. + COMPUTE #TEMP=LAG(X). + IF (X EQ VALUE(#TEMP)) MATCH=MATCH+1. + IF (SYSMIS(X) AND SYSMIS(#TEMP)) MATCH=MATCH+1. + END REPEAT. ELSE. + COMPUTE MATCH=99. END IF. IF ($CASENUM EQ 1) MATCH=99. /* NEEDED FOR PAIR 1 RECORD 1. But the problem here is that a user missing value is resent to sysmis in this statement. + COMPUTE #TEMP=LAG(X). So things don't work correctly. The only alternative I can think of is to set user missing off, execute the comparison and then set user missing back on. My question: is there a one step alternative? Follow up question: Will the identify duplicates thing work correctly in this case? Thanks, Gene Maguin |
Hi all,
Does anyone know if there is an easy way to compare two data files to determine if their data contents are identical? I know how to do this by exporting SPSS files to excel and then writing a formula to compare cell-by-cell, but I know there must be an easier way in SPSS. Thanks very much, Fiona Graff Any information, including protected health information (PHI), transmitted in this email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential and or exempt from disclosure under applicable Federal or State law. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon, protected health information (PHI) by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material from any computer. |
In reply to this post by Maguin, Eugene
At 05:09 PM 11/13/2006, Gene Maguin wrote:
>Two datasets are allegedly identical. [This will be tested by >interleaving them, and comparing values in successive cases.] Consider >variables x1 to x10. > >If there are no user missing, then this will work. > >COMPUTE MATCH=0. >DO IF (FAMID EQ LAG(FAMID)). >+ DO REPEAT X=X1 TO X10. >+ IF (X EQ LAG(X)) MATCH=MATCH+1. >+ IF (SYSMIS(X) AND SYSMIS(X)) MATCH=MATCH+1. >+ END REPEAT. >ELSE. >+ COMPUTE MATCH=99. >END IF. >IF ($CASENUM EQ 1) MATCH=99. /* NEEDED FOR PAIR 1 RECORD 1. > >Now add user missing. I'd like to say > >COMPUTE MATCH=0. >DO IF (FAMID EQ LAG(FAMID) AND PARID EQ LAG(PARID)). >+ DO REPEAT X=C1PRCA1 TO C1PRCA9. >+ IF (X EQ VALUE(LAG(X))) MATCH=MATCH+1. >+ IF (SYSMIS(X) AND SYSMIS(LAG(X))) MATCH=MATCH+1. >+ END REPEAT. >ELSE. >+ COMPUTE MATCH=99. >END IF. >IF ($CASENUM EQ 1) MATCH=99. /* NEEDED FOR PAIR 1 RECORD 1. > >However, the Value and Lag functions don't work together--in any >sequence. Barf. One of those things SPSS didn't fully think out. (If it'll make you feel any 'better', VALUE also doesn't work for variables referenced as vector elements; see thread "VECTOR and VALUE problem", Thu, 20 Jan 2005 ff. If it'll make you feel still 'better', that one surprised the SPSS folks.) As an alternative to LAG, there's logic using LEAVE - except I'd use scratch variables, which always behave as if LEAVE had been specified for them, and which don't clutter up the final file. I'd try something like this (not tested): * Prepare the variables to be "left": . NUMERIC #X1 TO #X10 (F5.3) /* or, any other format . * Check for match with previous case, as before: . COMPUTE MATCH=0. DO IF (FAMID EQ LAG(FAMID) AND PARID EQ LAG(PARID)). + DO REPEAT X = C1PRCA1 TO C1PRCA9 /LAG_X = #X1 TO #X10. + IF (VALUE(X) EQ LAG_X) MATCH=MATCH+1. + IF (SYSMIS(X) AND SYSMIS(LAG_X) MATCH=MATCH+1. + END REPEAT. ELSE. + COMPUTE MATCH=99. END IF. IF ($CASENUM EQ 1) MATCH=99. /* NEEDED FOR PAIR 1 RECORD 1. * Save the variables to be "left" for the next case. * (I'm retaining your indents so I can write this . * code easily by copying yours.) . + DO REPEAT X = C1PRCA1 TO C1PRCA9 /LAG_X = #X1 TO #X10. + COMPUTE LAG_X = VALUE(X). + END REPEAT. |
In reply to this post by Fiona Graff
Hi Fiona
I'm not aware of a way to do what you want with SPSS, but there are free (and payware) utilities available which will compare plain text (data) files. So if you are prepared to export your SPSS files to a plain ASCII file, you can tell the utility to compare the 2 files. What the ones I'm used to will do is open both files side by side and highlight lines which have been inserted in one but not the other, and then draw a line over to the other file to show where it was put in. Ditto for lines which have been altered. Some then have the ability to nominate which bits from one file you want duplicated into the other. I did a quick Google on "file comparison" and "freeware" and the first one it came up with was this one: http://www.prestosoft.com/ps.asp?page=edp_examdiff I've not used it so I can't comment on it, but this is a way you could start. Some text editors have a file comparison utility built in, so if you spend most of your program editing time in one of these, it is very convenient. Some people use TextPad, which has a built-in file comparison utility, but it is payware. A freeware editor with built-in comparison is ConText: http://context.cx/component/option,com_frontpage/Itemid,1/ I've not used either of these but their list of features looks useful and extensive. Hope this is of some help Regards Adrian -- Adrian Barnett Senior Project Officer Ph: +61 8 82266615 Research, Analysis and Evaluation Fax: +61 8 82267088 Strategic Planning and Research Branch Policy and Intergovernment Relations Department of Health This e-mail may contain confidential information, which also may be legally privileged. Only the intended recipient(s) may access, use, distribute or copy this e-mail. If this e-mail is received in error, please inform the sender by return e-mail and delete the original. If there are doubts about the validity of this message, please contact the sender by telephone. It is the recipient's responsibility to check the e-mail and any attached files for viruses. > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of Fiona Graff > Sent: Tuesday, 14 November 2006 9:37 > To: [hidden email] > Subject: Comparing two data sets > > Hi all, > > Does anyone know if there is an easy way to compare two data > files to determine if their data contents are identical? I > know how to do this by exporting SPSS files to excel and then > writing a formula to compare cell-by-cell, but I know there > must be an easier way in SPSS. > > Thanks very much, > > Fiona Graff > > > > > > Any information, including protected health information > (PHI), transmitted in this email is intended only for the > person or entity to which it is addressed and may contain > information that is privileged, confidential and or exempt > from disclosure under applicable Federal or State law. Any > review, retransmission, dissemination or other use of or > taking of any action in reliance upon, protected health > information (PHI) by persons or entities other than the > intended recipient is prohibited. If you received this email > in error, please contact the sender and delete the material > from any computer. > |
Free forum by Nabble | Edit this page |