SPSSX Discussion

Predictive Analytics Seminar: May 27-28, New York City

Classic

List

Threaded

14 messages Options

Elise Johnson

Predictive Analytics Seminar: May 27-28, New York City

Hi all,

I wanted to let you know about our training seminar on predictive analytics - coming May, Oct, and Nov in NYC, Stockholm, DC and other cities. This is intensive training for marketers, managers and business professionals to make actionable sense of customer data by predicting buying behavior, churn, etc. Past attendees provided rave reviews.

Here's more info:
----------------------

Training Program: Predictive Analytics for Business, Marketing and Web

A two-day intensive seminar brought to you by Prediction Impact, Inc.

Dates: May 27-28, Oct 14-15, Oct 18-19, and Nov 11-12, 2009
Locations: NYC (May), Stockholm (Oct), DC (Oct), San Francisco (Nov)

93% rate this program Excellent or Very Good.
**The official training program of Predictive Analytics World**
**Offered in conjunction with eMetrics events**

Also see our Online Training: Predictive Analytics Applied - immediate access at any time: www.predictionimpact.com/predictive-analytics-online-training.html

ABOUT THIS SEMINAR:

Business metrics do a great job summarizing the past. But if you want to predict how customers will respond in the future, there is one place to turn--predictive analytics. By learning from your abundant historical data, predictive analytics provides the marketer something beyond standard business reports and sales forecasts: actionable predictions for each customer. These predictions encompass all channels, both online and off, foreseeing which customers will buy, click, respond, convert or cancel. If you predict it, you own it.

The customer predictions generated by predictive analytics deliver more relevant content to each customer, improving response rates, click rates, buying behavior, retention and overall profit. For online applications such as e-marketing and customer care recommendations, predictive analytics acts in real-time, dynamically selecting the ad, web content or cross-sell product each visitor is most likely to click on or respond to, according to that visitor's profile. This is AB selection, rather than just AB testing.

Predictive Analytics for Business, Marketing and Web is a concentrated training program that includes interactive breakout sessions and a brief hands-on exercise. In two days we cover:

- The techniques, tips and pointers you need in order to run a successful predictive analytics and data mining initiative

- How to strategically position and tactically deploy predictive analytics and data mining at your company

- How to bridge the prevalent gap between technical understanding and practical use

- How a predictive model works, how it's created and how much revenue it generates

- Several detailed case studies that demonstrate predictive analytics in action and make the concepts concrete

- NEW TOPIC: Five Ways to Lower Costs with Predictive Analytics

No background in statistics or modeling is required. The only specific knowledge assumed for this training program is moderate experience with Microsoft Excel or equivalent.

For more information, visit www.predictionimpact.com/predictive-analytics-training.html, or e-mail us at training@predictionimpact.com. You may also call (415) 683-1146.

Cross-Registration Special: Attendees earn $250 off the Predictive Analytics World Conference

SNEAK PREVIEW VIDEO: www.predictionimpact.com/predictive-analytics-times.html

$100 off early registration, 3 weeks ahead

Garry Gelade

Python NumPy installation problem

Dear List

I am using SPSS15 under Windows x64 and Python 2.4 I need to use a modified
Bessel function to do some calcultaions. This is available in the Python
SciPy package, which needs NumPy. However, I can't figure out how to
install NumPy. I downloaded the NumPy zip file (there is no .exe installer
for NumPy that is compatible with 2.4) and I used the command
python setup.py install

I get several error messages:

Non-existing path in numpy\distutils: site.cfg
No module named msvccompiler in numpy.distutils trying from distutils
The .NET frameowrk SDK needs to be installed before buiding extensions for
Python

Actually, .NET Framework 3.5, and .NET Framework SDK 2.0 are installed.

Any ideas how I can get NumPy installed? Or could I change to Python 2.5? Is
SPSS15 compatible with Python 2.5?

Thanks for any thoughts.

Garry Gelade
Business Analytic Ltd

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4063 (20090508) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Garry Gelade

Re: Python NumPy installation problem

OK, I found a workaround. There is an ealier version of NumPy with a
Windows installer. It is

numpy-1.2.1-win32-superpack-python2.4.exe

And can be found at
http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103

It seems to install fine, but I haven't done any serious tests other than to
show it reads arrays OK.

Garry
============================================================================
=============================

Dear List

I am using SPSS15 under Windows x64 and Python 2.4 I need to use a modified
Bessel function to do some calcultaions. This is available in the Python
SciPy package, which needs NumPy. However, I can't figure out how to
install NumPy. I downloaded the NumPy zip file (there is no .exe installer
for NumPy that is compatible with 2.4) and I used the command
python setup.py install

I get several error messages:

Non-existing path in numpy\distutils: site.cfg No module named msvccompiler
in numpy.distutils trying from distutils The .NET frameowrk SDK needs to be
installed before buiding extensions for Python

Actually, .NET Framework 3.5, and .NET Framework SDK 2.0 are installed.

Any ideas how I can get NumPy installed? Or could I change to Python 2.5? Is
SPSS15 compatible with Python 2.5?

Thanks for any thoughts.

Garry Gelade
Business Analytic Ltd

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4063 (20090508) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4063 (20090508) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Caroline Davis-2

Regression Analysis

Hello list!

I would like to perform a regression analysis for 4, however here are my constraints:

Var 1: Prototype rating (Likert scale 1 to 5), not normally distributed

Var 2: Influence rating (Likert scale 1 to 5), not normally distributed

Var 3: Accurary (binary 0 or 1)

Var 4: Score on test (normally distributed).

The goal is to determine how well Var 1-3 predict Var 4. Is a regression analysis the best way to get at this question? It seems tricky, because Var 1 & 2 are not normally distributed. If I do a regression analysis for the accuracy and test variables, should it be a binary logistic regression, with accuracy as the dependent variable and score as the independent variable?

Thanks for any suggestions you may have.

Caroline

Garry Gelade

Re: Regression Analysis

Dear Caroline

It is not a requirement for regression that your independent variables be normally distributed. (Remember for example that dummy variables, or binary variables like your var3 are often used as IVs.)

The requirement is that your DV is normally distributed, so you should have no problem.

Garry Gelade

Business Analytic Ltd

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Caroline Davis
Sent: 10 May 2009 22:31
To: [hidden email]
Subject: Regression Analysis

Hello list!

I would like to perform a regression analysis for 4, however here are my constraints:

Var 1: Prototype rating (Likert scale 1 to 5), not normally distributed

Var 2: Influence rating (Likert scale 1 to 5), not normally distributed

Var 3: Accurary (binary 0 or 1)

Var 4: Score on test (normally distributed).

Thanks for any suggestions you may have.

Caroline

__________ Information from ESET NOD32 Antivirus, version of virus signature database 4064 (20090511) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__________ Information from ESET NOD32 Antivirus, version of virus signature database 4064 (20090511) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Marta Garcia-Granero

Re: Regression Analysis

In reply to this post by Caroline Davis-2

Caroline Davis wrote:

> Hello list!
>
> I would like to perform a regression analysis for 4, however here are
> my constraints:
>
> Var 1: Prototype rating (Likert scale 1 to 5), not normally distributed
> Var 2: Influence rating (Likert scale 1 to 5), not normally distributed
> Var 3: Accuracy (binary 0 or 1)
> Var 4: Score on test (normally distributed).
>
> The goal is to determine how well Var 1-3 predict Var 4. Is a
> regression analysis the best way to get at this question? It seems
> tricky, because Var 1 & 2 are not normally distributed.

Normality of the IV (or "predictor variables") is NOT a condition for
linear regression. Anyway, check for linearity in the response to your
Likert predictors. Binary (0/1 coded) variables are also OK, you don't
have to worry about Accuracy (Var 3).

> If I do a regression analysis for the accuracy and test variables,
> should it be a binary logistic regression, with accuracy as the
> dependent variable and score as the independent variable?

Then you would be predicting Accuracy as a function of the other
variables, including Var 4 (clearly not your goal).
>
> Thanks for any suggestions you may have.
1) Is sample size enough? (you don't mention it). As a rule of thumb,
there should be 10 to 20 cases for each IV (30 to 60 cases for your study)
2) Plot (scatter plot) Var 4 against Var1 first, then Var 4 against Var
2. Visually check for for departures from linearity. Recode Var 1 and/or
Var 2 if necessary.
3) Are there any missing values? A listwise deletion might lower your
sample size a lot.

HTH,
Marta García-Granero

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: Regression Analysis

In reply to this post by Garry Gelade

Garry Gelade wrote:
>
> It is not a requirement for regression that your independent variables
> be normally distributed. (Remember for example that dummy variables,
> or binary variables like your var3 are often used as IVs.)
>
> The requirement is that your DV is normally distributed, so you should
> have no problem.

Tiny correction: it is NOT the DV itself which should be normally
distributed, but its residuals.

Caroline, I forgot to add that that to the list of suggestions. Save the
residuals for the final model and check its normality. Also, consider
the possibility of interaction terms: "is the effect of Var 1 (or Var 2)
modified by the effect of other variable (Var 3, for instance)?"

Best regards,
Marta

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Regression Analysis

In reply to this post by Caroline Davis-2

Normality of the predictors is not an issue.
Normality of the DV is not an issue. What you want to check is the normality of the residuals.

Using accuracy as a dv is not consistent with your first question. Is there a reason to explore accuracy as a DV?

Art Kendall
Social Research Consultants

Caroline Davis wrote:

Hello list!

I would like to perform a regression analysis for 4, however here are my constraints:

Var 1: Prototype rating (Likert scale 1 to 5), not normally distributed

Var 2: Influence rating (Likert scale 1 to 5), not normally distributed

Var 3: Accurary (binary 0 or 1)

Var 4: Score on test (normally distributed).

The goal is to determine how well Var 1-3 predict Var 4. Is a regression analysis the best way to get at this question? It seems tricky, because Var 1 & 2 are not normally distributed. If I do a regression analysis for the accuracy and test variables, should it be a binary logistic regression, with accuracy as the dependent variable and score as the independent variable?

Thanks for any suggestions you may have.

Caroline

Art Kendall
Social Research Consultants

stace swayne

Tetrachoric Correlations

In reply to this post by Marta Garcia-Granero

Dear list,

I have 15 variables (v1-v15) and they are dichotomous items. I would like to run an tetrachoric correlation, but I don't have an example of syntax for doing that. Does anyone on the list have code for this or can you direct me to a source?

All suggestions are appreciated,

Stace

Marta Garcia-Granero

Re: Tetrachoric Correlations

stace swayne wrote:
>
> I have 15 variables (v1-v15) and they are dichotomous items. I would
> like to run an tetrachoric correlation, but I don't have an example of
> syntax for doing that. Does anyone on the list have code for this or
> can you direct me to a source?
>
Check Dirk Enzmann's page:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/Enzmann_Software.html

Close to the end (you'll have to scroll down quite a lot), you will find
this:

*TetCorr: *
DOS program and source code (Pascal) for computing a matrix of
tetrachoric correlation coefficients of up to 50 variables and a maximum
of 8,000 cases (see also: r_tetra).

Since the program reads text files, you can export them from SPSS, then
use Tetcorr, and import the output (a text file with the correlation
matrix) back into SPSS afterwards. Detailed instructions for doing that
are provided too.

HTH,
Marta García-Granero

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: Tetrachoric Correlations

Will tetrachoric really make much of a difference compared to the
point-biserial (Pearson Corr) with dichotomous variables?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta García-Granero
Sent: Monday, May 11, 2009 9:09 AM
To: [hidden email]
Subject: Re: Tetrachoric Correlations

stace swayne wrote:
>
> I have 15 variables (v1-v15) and they are dichotomous items. I would
> like to run an tetrachoric correlation, but I don't have an example of
> syntax for doing that. Does anyone on the list have code for this or
> can you direct me to a source?
>
Check Dirk Enzmann's page:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/So
ftware/Enzmann_Software.html

Close to the end (you'll have to scroll down quite a lot), you will find
this:

*TetCorr: *
DOS program and source code (Pascal) for computing a matrix of
tetrachoric correlation coefficients of up to 50 variables and a maximum
of 8,000 cases (see also: r_tetra).

Since the program reads text files, you can export them from SPSS, then
use Tetcorr, and import the output (a text file with the correlation
matrix) back into SPSS afterwards. Detailed instructions for doing that
are provided too.

HTH,
Marta García-Granero

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Swank, Paul R

Re: Tetrachoric Correlations

It depends on the frequency breakdown of the dichotomies. Tetrachoric correlation are based on the assumption of an underlying normal distribution and approximate the correlation between the supposedly normal underlying variables. It should not be used if the variables are true dichotomies or the underlying distribution is non-normal, although for the latter it is a matter of degree. If the dichotomies are badly skewed but the underlying distribution is normal, there could be substantial differences between point biserial and tetrachoric correlation.

Paul

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ViAnn Beadle
Sent: Monday, May 11, 2009 10:23 AM
To: [hidden email]
Subject: Re: Tetrachoric Correlations

Will tetrachoric really make much of a difference compared to the
point-biserial (Pearson Corr) with dichotomous variables?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta García-Granero
Sent: Monday, May 11, 2009 9:09 AM
To: [hidden email]
Subject: Re: Tetrachoric Correlations

stace swayne wrote:
>
> I have 15 variables (v1-v15) and they are dichotomous items. I would
> like to run an tetrachoric correlation, but I don't have an example of
> syntax for doing that. Does anyone on the list have code for this or
> can you direct me to a source?
>
Check Dirk Enzmann's page:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/So
ftware/Enzmann_Software.html

Close to the end (you'll have to scroll down quite a lot), you will find
this:

*TetCorr: *
DOS program and source code (Pascal) for computing a matrix of
tetrachoric correlation coefficients of up to 50 variables and a maximum
of 8,000 cases (see also: r_tetra).

Since the program reads text files, you can export them from SPSS, then
use Tetcorr, and import the output (a text file with the correlation
matrix) back into SPSS afterwards. Detailed instructions for doing that
are provided too.

HTH,
Marta García-Granero

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Swank, Paul R

Re: Tetrachoric Correlations

In reply to this post by stace swayne

Re: Re: [SPSSX-L] Tetrachoric Correlations

Likewise, polychoric and polyserial correlations are based on underlying normal distributions. Thus, if you have a dichotomous variable that is a true dichotomy, say gender for instance, these procedures should not be used. However, if you have a test item that is either right or wrong, you can make a good case that there is an underlying distribution. More difficult is the normality issue. One question is how abnormal does it have to be before it will cause problems? I’m not sure anyone knows the answer to that.

Paul

Dr. Paul R. Swank,

Professor and Director of Research

Children's Learning Institute

University of Texas Health Science Center-Houston

From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, May 12, 2009 2:05 PM
To: Swank, Paul R
Subject: Re: Re: [SPSSX-L] Tetrachoric Correlations

What about polyserial/polychoric procedures such as available as an extention command?

----- Original Message -----
From: SPSSX(r) Discussion <[hidden email]>
To: [hidden email] <[hidden email]>
Sent: Tue May 12 11:47:10 2009
Subject: Re: [SPSSX-L] Tetrachoric Correlations

It depends on the frequency breakdown of the dichotomies. Tetrachoric correlation are based on the assumption of an underlying normal distribution and approximate the correlation between the supposedly normal underlying variables. It should not be used if the variables are true dichotomies or the underlying distribution is non-normal, although for the latter it is a matter of degree. If the dichotomies are badly skewed but the underlying distribution is normal, there could be substantial differences between point biserial and tetrachoric correlation.

Paul

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of ViAnn Beadle
Sent: Monday, May 11, 2009 10:23 AM
To: [hidden email]
Subject: Re: Tetrachoric Correlations

Will tetrachoric really make much of a difference compared to the
point-biserial (Pearson Corr) with dichotomous variables?

-----Original Message-----
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of
Marta García-Granero
Sent: Monday, May 11, 2009 9:09 AM
To: [hidden email]
Subject: Re: Tetrachoric Correlations

stace swayne wrote:
>
> I have 15 variables (v1-v15) and they are dichotomous items. I would
> like to run an tetrachoric correlation, but I don't have an example of
> syntax for doing that. Does anyone on the list have code for this or
> can you direct me to a source?
>
Check Dirk Enzmann's page:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/So
ftware/Enzmann_Software.html

Close to the end (you'll have to scroll down quite a lot), you will find
this:

*TetCorr: *
DOS program and source code (Pascal) for computing a matrix of
tetrachoric correlation coefficients of up to 50 variables and a maximum
of 8,000 cases (see also: r_tetra).

Since the program reads text files, you can export them from SPSS, then
use Tetcorr, and import the output (a text file with the correlation
matrix) back into SPSS afterwards. Detailed instructions for doing that
are provided too.

HTH,
Marta García-Granero

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

King Douglas

Large files and SPSS Text Analysis for Surveys

Folks,

Version 2 of SPSS Text Analysis for Surveys handles open-ended responses of up to 1,600 cases pretty well, but bogs down at 16,000 cases (both file sizes are approximations).

We are open to suggestions, including upgrading to the latest version, regarding how to make the application handle larger files much faster and/or more efficiently.

Thanks,

King Douglas
American Airlines Customer Research