|
Hi all,
Does anyone know how to export a regression equation to the output? I'm looking to cross validate a regression model from one dataset to another and score cases in that separate dataset. For example, below is the regression (logistic) equation that I had to create by hand by copying and pasting each coefficient cell of the regression table in the output into this equation. Is there any function of SPSS that will do this automatically? COMPUTE probscore = EXP(-1.8837110889221 + -0.0610812604199899 * educationlevel + 0.38059306773648 * military + -0.262865101225782 * cross + 0.36407564076912 * southsa + -0.622317432758787 * Canada + -0.342604201465917 * doctorate + 0.195794807986137 * counseling + -0.932893661626645 * distanceeducation + -1.1392796442106 * healthservices + 0.434347821351045 * humanservices + -1.89066122627975 * technologymanagement + -0.498233972265128 * Business + -1.07565245282589 * MKCode + 0.152397590137721 * WinNT6OS + -0.00727179535309702 * dayofweek2 + -1.98753437730861 * LiveCareer + 0.326307767213455 * DanRosenfeld + -1.88329168706132 * snagajob + -1.12664042058232 * FindTuition + -0.227924359312289 * CPAdeal + 0.966410333846172 * emailideal) . EXECUTE . Thanks, Adam |
|
Administrator
|
If you are trying to save the predicted probabilities for a cross-validation data set, do the following:
1. Merge (via ADD FILES) the original data set with the new (cross-validation) data set. Use the /IN sub-command to create a flag variable telling you which data set each case belongs to. E.g., * Assuming the original dataset is open & active . ADD FILES file = * / file = 'cvdata' / in = crossval . EXE. 2. IF the outcome variables exist in the new data set, compute a copy of the outcome variable, but only for the original data set. E.g., if NOT crossval DVcopy = dv. This step is not necessary if the DV does not exist in the cross-validation data set (or is missing for all cases in the cross-validation data). 3. Run your model using the copy of the outcome variable, and save the predicted probabilities from the model. By using the copy of the outcome variable, you ensure that only the original data set is used for building the model; but predicted probabilities will be saved for all cases in the file. If you want fitted log-odds instead of (or in addition to) the predicted probabilities, they are easy enough to compute. compute log_odds = ln(predprob / (1 - predprob)). One reason you might want the log-odds is that if you plot the data, things that are linear in the model will look linear in the plot. With predicted probabilities, that will not be the case. And don't forget, you've got the CROSSVAL flag variable (1=cross-validation data, 0 = original data) you can use to separate the two data sets. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Thanks Bruce. I have used this trick in the past. The problem is that I'm working with very large datasets, and to merge and rerun the model often locks up the computer for a bit. I was hoping for a shortcut, and we also want to provide this equation in the report.
Thanks, Adam On Fri, May 28, 2010 at 11:42 AM, Bruce Weaver <[hidden email]> wrote: If you are trying to save the predicted probabilities for a cross-validation |
|
Administrator
|
Oh, I see. How about using OMS to send the table of coefficients out to a data set when you're running the original model. That file will contain all of the coefficients & variable names you need. With a bit of data management & use of string functions, you should be able to cobble together the terms you need for computing your equation, and send them out to a text file with WRITE OUTFILE. Then use INCLUDE (or INSERT) FILE to run that syntax on your cross-validation dataset.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Adam Troy
Or combine OMS with a little Python and get away from writing out text files and "cobbling".
Regards, Jon Peck (From Florence) ----------------- Sent from my BlackBerry Handheld. ----- Original Message ----- From: Bruce Weaver [[hidden email]] Sent: 05/28/2010 11:47 AM MST To: [hidden email] Subject: Re: [SPSSX-L] exporting regression equation Oh, I see. How about using OMS to send the table of coefficients out to a data set when you're running the original model. That file will contain all of the coefficients & variable names you need. With a bit of data management & use of string functions, you should be able to cobble together the terms you need for computing your equation, and send them out to a text file with WRITE OUTFILE. Then use INCLUDE (or INSERT) FILE to run that syntax on your cross-validation dataset. Adam B. Troy-3 wrote: > > Thanks Bruce. I have used this trick in the past. The problem is that > I'm > working with very large datasets, and to merge and rerun the model often > locks up the computer for a bit. I was hoping for a shortcut, and we also > want to provide this equation in the report. > > Thanks, > > Adam > > On Fri, May 28, 2010 at 11:42 AM, Bruce Weaver > <[hidden email]>wrote: > >> If you are trying to save the predicted probabilities for a >> cross-validation >> data set, do the following: >> >> 1. Merge (via ADD FILES) the original data set with the new >> (cross-validation) data set. Use the /IN sub-command to create a flag >> variable telling you which data set each case belongs to. E.g., >> >> * Assuming the original dataset is open & active . >> >> ADD FILES >> file = * / >> file = 'cvdata' / in = crossval . >> EXE. >> >> 2. IF the outcome variables exist in the new data set, compute a copy of >> the >> outcome variable, but only for the original data set. E.g., >> >> if NOT crossval DVcopy = dv. >> >> This step is not necessary if the DV does not exist in the >> cross-validation >> data set (or is missing for all cases in the cross-validation data). >> >> 3. Run your model using the copy of the outcome variable, and save the >> predicted probabilities from the model. By using the copy of the outcome >> variable, you ensure that only the original data set is used for building >> the model; but predicted probabilities will be saved for all cases in the >> file. >> >> If you want fitted log-odds instead of (or in addition to) the predicted >> probabilities, they are easy enough to compute. >> >> compute log_odds = ln(predprob / (1 - predprob)). >> >> One reason you might want the log-odds is that if you plot the data, >> things >> that are linear in the model will look linear in the plot. With >> predicted >> probabilities, that will not be the case. >> >> And don't forget, you've got the CROSSVAL flag variable >> (1=cross-validation >> data, 0 = original data) you can use to separate the two data sets. >> >> HTH. >> >> >> >> Adam B. Troy-3 wrote: >> > >> > Hi all, >> > >> > Does anyone know how to export a regression equation to the output? >> I'm >> > looking to cross validate a regression model from one dataset to >> another >> > and >> > score cases in that separate dataset. For example, below is the >> > regression >> > (logistic) equation that I had to create by hand by copying and pasting >> > each >> > coefficient cell of the regression table in the output into this >> equation. >> > Is there any function of SPSS that will do this automatically? >> > >> > COMPUTE probscore = EXP(-1.8837110889221 + -0.0610812604199899 * >> > educationlevel + 0.38059306773648 * military + -0.262865101225782 * >> > cross >> > + >> > 0.36407564076912 * southsa + -0.622317432758787 * Canada + >> > -0.342604201465917 * doctorate + 0.195794807986137 * counseling + >> > -0.932893661626645 * distanceeducation + -1.1392796442106 * >> > healthservices >> > + >> > 0.434347821351045 * humanservices + -1.89066122627975 * >> > technologymanagement >> > + -0.498233972265128 * Business + -1.07565245282589 * MKCode + >> > 0.152397590137721 * WinNT6OS + -0.00727179535309702 * dayofweek2 + >> > -1.98753437730861 * LiveCareer + 0.326307767213455 * DanRosenfeld + >> > -1.88329168706132 * snagajob + -1.12664042058232 * FindTuition + >> > -0.227924359312289 * CPAdeal + 0.966410333846172 * emailideal) . >> > EXECUTE . >> > >> > >> > Thanks, >> > >> > Adam >> > >> > >> >> >> ----- >> -- >> Bruce Weaver >> [hidden email] >> http://sites.google.com/a/lakeheadu.ca/bweaver/ >> "When all else fails, RTFM." >> >> NOTE: My Hotmail account is not monitored regularly. >> To send me an e-mail, please use the address shown above. >> -- >> View this message in context: >> http://old.nabble.com/exporting-regression-equation-tp28707706p28708524.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://old.nabble.com/exporting-regression-equation-tp28707706p28710620.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Adam Troy
If you are using very large data sets 3 things that might work are:
1) if you have the server version of SPSS see the option to save the XML for scoring the other set. 2) create a data set with only the variables involved in any of the computations and MATCH just those data and using the ?In option for create a flag to subset the data. It still may use less of your time than using options you are not used to. 3)The equation itself is there in the output hardcode the equation in syntax. compute yhat = constant + (b1*x1) + (b2*x2) ... . compute yresid = y-yhat. Art Kendall Social Research Consultants On 5/28/2010 12:19 PM, Adam B. Troy wrote: Thanks Bruce. I have used this trick in the past. The problem is that I'm working with very large datasets, and to merge and rerun the model often locks up the computer for a bit. I was hoping for a shortcut, and we also want to provide this equation in the report.===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
| Free forum by Nabble | Edit this page |
