Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

Quentin Zavala

Hello Rich,

 

Thank you for you response.  It gave me many good points to think about…it is been seven years since I completed a MS Thesis whereby I was using logistic and linear regressions without a second thought.  The regional aspect of the formula for example doesn’t apply to all regions (i.e., branches are basically located in 4 separate counties in California therefore 99% of mortgages are in California within the 4 counties and this distribution is skewed).     

 

Let me declare I’m not a statistician although my boss is and has 40 years of Direct Marketing experience.  He was tasked with developing a Mortgage Model, so we can determine what variables best predict what type of members may apply for a mortgage. He told me to use the stepwise method.  But I’ll have to investigate this method as you recommended.       

 

To highlight the nature of the IV regressed on the dichotomously coded Mortgage variable; I used a binary (0, 1) or (0=No, 1=Yes) to represent if a customer had a current mortgage instead of the mortgage balance range of (0 to 1,536,757) on 3.3% of the cases or 12,307.  The independent variables used consisted of ordinal and interval/ratio data.   Basic demographic variables were dummy coded; for example male and female were separated out into there own variables (0 = no, 1 = yes).  Some of the continuous variables were fico score, age (continuous) and other products.  The products were dummy coded as well (0 = no participation, 1 = participation).

 

I first used the entire sample or universe of cases.  Then I used a 27% random sample (using SPSS’s probability algorithm in V. 19) and applied it to the full sample and scored the data set using the regression equation to create the gain chart.  

 

What would you recommend in terms what “Method of regression (Enter, Stepwise, Remove Forward, Backwards) to use?   Or if Mortgage Bal should be used instead of binary variables due to the fact I’m using linear regression models.  

 

Thank you in advance for any further comments,

 

 

Quentin Zavala, MS

 

 

 

Quentin Zavala

SchoolsFirst Federal Credit Union
Business Analyst, Research and Analytics

714-258-4000  ext 8601

qzavala[hidden email]

[hidden email]

 

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

ViAnn Beadle

Why OLS. This seems to be a classic example for CART or CHAID?

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Quentin Zavala
Sent: Thursday, May 24, 2012 1:03 PM
To: [hidden email]
Subject: Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

 

Hello Rich,

 

Thank you for you response.  It gave me many good points to think about…it is been seven years since I completed a MS Thesis whereby I was using logistic and linear regressions without a second thought.  The regional aspect of the formula for example doesn’t apply to all regions (i.e., branches are basically located in 4 separate counties in California therefore 99% of mortgages are in California within the 4 counties and this distribution is skewed).     

 

Let me declare I’m not a statistician although my boss is and has 40 years of Direct Marketing experience.  He was tasked with developing a Mortgage Model, so we can determine what variables best predict what type of members may apply for a mortgage. He told me to use the stepwise method.  But I’ll have to investigate this method as you recommended.       

 

To highlight the nature of the IV regressed on the dichotomously coded Mortgage variable; I used a binary (0, 1) or (0=No, 1=Yes) to represent if a customer had a current mortgage instead of the mortgage balance range of (0 to 1,536,757) on 3.3% of the cases or 12,307.  The independent variables used consisted of ordinal and interval/ratio data.   Basic demographic variables were dummy coded; for example male and female were separated out into there own variables (0 = no, 1 = yes).  Some of the continuous variables were fico score, age (continuous) and other products.  The products were dummy coded as well (0 = no participation, 1 = participation).

 

I first used the entire sample or universe of cases.  Then I used a 27% random sample (using SPSS’s probability algorithm in V. 19) and applied it to the full sample and scored the data set using the regression equation to create the gain chart.  

 

What would you recommend in terms what “Method of regression (Enter, Stepwise, Remove Forward, Backwards) to use?   Or if Mortgage Bal should be used instead of binary variables due to the fact I’m using linear regression models.  

 

Thank you in advance for any further comments,

 

 

Quentin Zavala, MS

 

 

 

Quentin Zavala

SchoolsFirst Federal Credit Union
Business Analyst, Research and Analytics

714-258-4000  ext 8601

qzavala[hidden email]

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

Art Kendall
In reply to this post by Quentin Zavala
Check legal requirements for credit unions. Gender may be a "protected" class like race and specifically illegal to use in mortgage models to qualify applicants..

Also why not some kind of tree model.

In regression you might want to use some kind of stepped (aka hierarchical) model  but stepwise approaches are notorious.

Also, by "may apply"  do you mean "will be allowed to"  or "are likely to"?
Art Kendall
Social Research Consultants

On 5/24/2012 3:03 PM, Quentin Zavala wrote:

Hello Rich,

 

Thank you for you response.  It gave me many good points to think about…it is been seven years since I completed a MS Thesis whereby I was using logistic and linear regressions without a second thought.  The regional aspect of the formula for example doesn’t apply to all regions (i.e., branches are basically located in 4 separate counties in California therefore 99% of mortgages are in California within the 4 counties and this distribution is skewed).     

 

Let me declare I’m not a statistician although my boss is and has 40 years of Direct Marketing experience.  He was tasked with developing a Mortgage Model, so we can determine what variables best predict what type of members may apply for a mortgage. He told me to use the stepwise method.  But I’ll have to investigate this method as you recommended.       

 

To highlight the nature of the IV regressed on the dichotomously coded Mortgage variable; I used a binary (0, 1) or (0=No, 1=Yes) to represent if a customer had a current mortgage instead of the mortgage balance range of (0 to 1,536,757) on 3.3% of the cases or 12,307.  The independent variables used consisted of ordinal and interval/ratio data.   Basic demographic variables were dummy coded; for example male and female were separated out into there own variables (0 = no, 1 = yes).  Some of the continuous variables were fico score, age (continuous) and other products.  The products were dummy coded as well (0 = no participation, 1 = participation).

 

I first used the entire sample or universe of cases.  Then I used a 27% random sample (using SPSS’s probability algorithm in V. 19) and applied it to the full sample and scored the data set using the regression equation to create the gain chart.  

 

What would you recommend in terms what “Method of regression (Enter, Stepwise, Remove Forward, Backwards) to use?   Or if Mortgage Bal should be used instead of binary variables due to the fact I’m using linear regression models.  

 

Thank you in advance for any further comments,

 

 

Quentin Zavala, MS

 

 

 

Quentin Zavala

SchoolsFirst Federal Credit Union
Business Analyst, Research and Analytics

714-258-4000  ext 8601

qzavala[hidden email]

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

Rich Ulrich
In reply to this post by Quentin Zavala
One thing you did right was to create your Gain chart
as (mostly) scores projected from a different subsample,
so you aren't showing merely a "fit."

Okay, you don't have states -- that was just a "for instance". 
You do have Counties.  You do have months or years of data.
When you select subsets this way, or any other way, there are
two questions to consider.  DO you consistently get similar Gain
charts?  - If any single selection fails, that is a warning that
other results are leaning on biases and artifacts; and that this
set of Predictions, though it may seem to fit your data today, is
apt to become obsolete pretty damn fast.  (Now, I am reminded
of the studies that show how quickly the best computerized Wall
Street stock-fund strategies deteriorate.  A find, new stock-picking
strategy works for a year or so.)

 - DO you get the same stepwise-selected variables?  - Probably not.
Forward or backward?  The best generality I've read is that if you
intend to keep most the variables, start with them all in.  If you will
keep only a few, start with them out.

For my purposes, I've never wanted an equation that uses an
explicit "suppressor" variable -- If the variables are confounding
each other, I want to untangle the confounding.  That is: Find
the two (or more) variables that are collinear, and compute a
new composite score that is the difference or ratio (or whatever)
that plays a sensible role; and use that new variable as a predictor.

Sometimes what invites a suppressor variable is a problem in
scaling, where the top or bottom of a measurement needs to
be compressed or expanded in order to have linear prediction,
and the suppressor is trying to correct that fit.


I'm still at a loss to figure out what you are attempting to
achieve.  As I read it, you are using other variables to "predict"
which customers presently have a loan.  So, why?  You *know*
that already.  Let's see: People not yet retired; and living in the
suburbs, who are buying a house.  I don't imagine what goals
you have in mind for fitting to that criterion, so I can't guess
whether using the Balance (House versus Car?  Current versus
original?) would help anything.
But if you want to see how it works, why not take 15 minutes to
try it?  For an estimate that shows some difference, I would try
a regression using just for the people with a positive balance
 - Using the log of balance would create a predictor with better
variance characteristics.

--
Rich Ulrich



Date: Thu, 24 May 2012 19:03:25 +0000
From: [hidden email]
Subject: Re: SPSS OLS Regression to develop Mortgage Model for a financial institution
To: [hidden email]

Hello Rich,

 

Thank you for you response.  It gave me many good points to think about…it is been seven years since I completed a MS Thesis whereby I was using logistic and linear regressions without a second thought.  The regional aspect of the formula for example doesn’t apply to all regions (i.e., branches are basically located in 4 separate counties in California therefore 99% of mortgages are in California within the 4 counties and this distribution is skewed).     

 

Let me declare I’m not a statistician although my boss is and has 40 years of Direct Marketing experience.  He was tasked with developing a Mortgage Model, so we can determine what variables best predict what type of members may apply for a mortgage. He told me to use the stepwise method.  But I’ll have to investigate this method as you recommended.       

 

To highlight the nature of the IV regressed on the dichotomously coded Mortgage variable; I used a binary (0, 1) or (0=No, 1=Yes) to represent if a customer had a current mortgage instead of the mortgage balance range of (0 to 1,536,757) on 3.3% of the cases or 12,307.  The independent variables used consisted of ordinal and interval/ratio data.   Basic demographic variables were dummy coded; for example male and female were separated out into there own variables (0 = no, 1 = yes).  Some of the continuous variables were fico score, age (continuous) and other products.  The products were dummy coded as well (0 = no participation, 1 = participation).

 

I first used the entire sample or universe of cases.  Then I used a 27% random sample (using SPSS’s probability algorithm in V. 19) and applied it to the full sample and scored the data set using the regression equation to create the gain chart.  

 

What would you recommend in terms what “Method of regression (Enter, Stepwise, Remove Forward, Backwards) to use?   Or if Mortgage Bal should be used instead of binary variables due to the fact I’m using linear regression models.  

 

Thank you in advance for any further comments,

 

 

Quentin Zavala, MS

 

 

 

Quentin Zavala

SchoolsFirst Federal Credit Union
Business Analyst, Research and Analytics

714-258-4000  ext 8601

qzavala[hidden email]

[hidden email]

 

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

Art Kendall
In reply to this post by Art Kendall
The choice of blocks is a matter of the substantive nature of the question.
If you have a large number of cases, you could bunch together variables that are pretty much the same thing.

Art Kendall
Social Research Consultants

On 5/24/2012 7:15 PM, Quentin Zavala wrote:

Kendall,

 

Are there any definitive guidelines when using hierarchical regressions in terms of what IV variables should one put together in the same block?  Should one use IVs that are related by a posteriori relationship?  For example, should I group IV’s or enter them into blocks by some relationship between these type of variables?  For example personal income, liquid investments, retirement, et. cetera?    

 

Thank you,

 

 

Quentin Zavala

SchoolsFirst Federal Credit Union
Business Analyst, Research and Analytics

714-258-4000  ext 8601

qzavala[hidden email]

 

 


From: Art Kendall [[hidden email]]
Sent: Thursday, May 24, 2012 3:03 PM
To: Quentin Zavala
Cc: [hidden email]
Subject: Re: [SPSSX-L] SPSS OLS Regression to develop Mortgage Model for a financial institution

 

Check legal requirements for credit unions. Gender may be a "protected" class like race and specifically illegal to use in mortgage models to qualify applicants..

Also why not some kind of tree model.

In regression you might want to use some kind of stepped (aka hierarchical) model  but stepwise approaches are notorious.

Also, by "may apply"  do you mean "will be allowed to"  or "are likely to"?

Art Kendall
Social Research Consultants


On 5/24/2012 3:03 PM, Quentin Zavala wrote:

Hello Rich,

 

Thank you for you response.  It gave me many good points to think about…it is been seven years since I completed a MS Thesis whereby I was using logistic and linear regressions without a second thought.  The regional aspect of the formula for example doesn’t apply to all regions (i.e., branches are basically located in 4 separate counties in California therefore 99% of mortgages are in California within the 4 counties and this distribution is skewed).     

 

Let me declare I’m not a statistician although my boss is and has 40 years of Direct Marketing experience.  He was tasked with developing a Mortgage Model, so we can determine what variables best predict what type of members may apply for a mortgage. He told me to use the stepwise method.  But I’ll have to investigate this method as you recommended.       

 

To highlight the nature of the IV regressed on the dichotomously coded Mortgage variable; I used a binary (0, 1) or (0=No, 1=Yes) to represent if a customer had a current mortgage instead of the mortgage balance range of (0 to 1,536,757) on 3.3% of the cases or 12,307.  The independent variables used consisted of ordinal and interval/ratio data.   Basic demographic variables were dummy coded; for example male and female were separated out into there own variables (0 = no, 1 = yes).  Some of the continuous variables were fico score, age (continuous) and other products.  The products were dummy coded as well (0 = no participation, 1 = participation).

 

I first used the entire sample or universe of cases.  Then I used a 27% random sample (using SPSS’s probability algorithm in V. 19) and applied it to the full sample and scored the data set using the regression equation to create the gain chart.  

 

What would you recommend in terms what “Method of regression (Enter, Stepwise, Remove Forward, Backwards) to use?   Or if Mortgage Bal should be used instead of binary variables due to the fact I’m using linear regression models.  

 

Thank you in advance for any further comments,

 

 

Quentin Zavala, MS

 

 

 

Quentin Zavala

SchoolsFirst Federal Credit Union
Business Analyst, Research and Analytics

714-258-4000  ext 8601

qzavala[hidden email]

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants