History question

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

History question

Salbod
Good afternoon,

Can anyone recommend a chapter (or article) on the History of Statistics that includes a discussion of software (e.g., Statjob, SPSS, SAS) and hopefully mentions the work of Jacob Cohen. I’m looking for material to supplement a text in an undergraduate history of psychology course.

Any suggests are most welcome. Thank you.

Stephen Salbod, Pace University, NYC
Reply | Threaded
Open this post in threaded view
|

Re: History question

Mike
On Sunday, December 06, 2015 10:37 AM, Stephen Salbod wrote:
>Good afternoon,
>
>Can anyone recommend a chapter (or article) on the History of
>Statistics
>that includes a discussion of software (e.g., Statjob, SPSS, SAS) and
>hopefully mentions the work of Jacob Cohen. I’m looking for material to
>supplement a text in an undergraduate history of psychology course.
>
>Any suggests are most welcome. Thank you.

A few points:

(1)  I would suggest going over to the Jstor database (www.jstor.org --
Pace should have access to it) and searching the journal "American
Statistician" which had numerous reviews of statistical software over
the decades. An article that I found particularly useful when I was in
grad school in the 1970s is the following:

Wilkinson, L., & Dallal, G. E.. (1977). Accuracy of Sample Moments
Calculations among Widely Used Statistical Programs. The American
Statistician, 31(3), 128–131.

Both Wilkinson's and Dallal's names should be familiar to folks.
Wilkinson would develop Systat which was bought out by SPSS
but SPSS would eventually sell or "spin off" Systat. I'm not sure
what Wilkinson's relationship, if any, is with SPSS today.
Wilkinson would also be lead author on the American
Psychological Association's (APA) task force on the use of
statistics.  Gerry Dallal would go on to be a produactive researcher
and developed the Dos-based STATOOLS package in the 1990s
that supplemented the then version of Systat (see:
http://www.tufts.edu/~gdallal/STATPKG.HTM  ).

The importance of the Wilkinson and Dallal's (1988) paper is
reflected in its abstract:

|Four widely used statistical program packages-BMDP, SPSS,
|DATATEXT, and OSIRIS-were compared for computational
|accuracy on sample means, standard deviations, and correlations.
|Only one, BMDP, was not seriously inaccurate in calculations on
|a data set of three observations.Further,S PSS computed inaccurate
|statistics in a discriminanta nalysis on a real data set of 848
observations.
|It is recommended that the desk calculator algo-rithm,f ound in most
|of these programs,n ot be used in packages which may run on short
|word length machines.

The key problem was that not enough space was allocated to represent
a number and this would produce underflow errors (i.e., very small
numbers
getting truncated without warning and used in subsequent calculations)
and overflow errors (i.e., very large numbers that got truncated).  This
was a particular problem in calculating the Sum of Squares because
many programs used the "computational formula" (i.e., sumX^2 -
[(SumX)/N]
instead of the "definitional" or mean-deviation formula (i.e.,
sum(X - MeanX)^2).   The BMDP series (RIP) was the only one that
correctly
calculated the statistics because it used the mean-deviation form (which
required a double pass reading of the data -- the other form only
requires
one reading of the data).  Most software was modified to overcome these
problems but Wilkinson would develop a set of testing procedures for
newer versions of statistical software; see:
https://www.cs.uic.edu/~wilkinson/Publications/accuracy.pdf

One has to wonder how many research reports and dissertations reported
statistics that were affected by the errors that Wilkinson and Dallal
discovered
but I have the feeling that most psychologists are blissfully unaware
that
there was any problem at all (making the assumption that all statistical
software is 100% accurate).

See also:
Norusis, M. J., Van Eck, N., Montanelli, R. G., Wilkinson, L., Dallal,
G. E., Neter, J., … Conover, W. J.. (1978). Letters to the Editor.
The American Statistician, 32(3), 113–114.

Berk, K. N., & Francis, I. S.. (1978). A Review of the Manuals for BMDP
and SPSS. Journal of the American Statistical Association, 73(361),
65–71.

Dallal, G. E.. (1988). Statistical Microcomputing-Like It Is. The
American
Statistician, 42(3), 212–216.

McCullough, B. D.. (1998). Assessing the Reliability of Statistical
Software:
Part I. The American Statistician, 52(4), 358–366.

(2) Regarding Jack Cohen: I was a grad student in experimental psych
at SUNY-Stony Brook but I got a one-year pre-doc fellowship to do
research at NYU with Martin Braine in 1978-1979.  I sat in on Jack's
year long grad statistics course and he did not use any computer
software in the course. Later, when I worked in New York State
Psychiatric
Institute (NYSPI) where Jack had a one day a week appointment in
the Biometrics department (Pat Cohen worked full-time there), I had
opportunities to consult with him on the statistical analyses I was
doing on a NIMH granted funded project.  By this time I was pretty
much expert in BMDP but was stuck on what the appropriate analysis
was for the data that the project had (it was an experimental design).
I went over BMDP ANOVA (2V) analyses with Jack and we discussed
the problems at a conceptual level (it is now clear that a multilevel
analysis would have been most appropriate but neither I nor Jack
had worked this out; Joe Fleiss was also consulted and pointed me
in that direction but he couldn't be bothered to provide additional
help).
In these consultations Jack didn't suggested alternative analyses in
other software (e.g., SAS) and I didn't get any sense of what software
he actually used.  Most data analysts and statisticians at NYSPI would
do their computer work in the second sub-level of the then "new
building" where the IBM medium sized computer was located and
had a room full of IBM 3270 series terminals (which one guy referred
to as the "Checker Cab" [NYC ref] of computer terminals; the ADM 3a
popular with Unix systems and other mainframes were 90 pound
weakling by comparison).  I spent a lot time in the "terminal room" and
came across many people (Pat Shrout would pop in now and then)
but I never saw Jack there.  It is possible that Biometrics had their
own terminals which would explain why I never saw him around the
computer center.  All of the above is just to say that, in retrospect,
I don't know which software statistical package he used, moreover,
if memory serves, I don't remember any reference to specific software
in any of his major publications.

On a sidenote: Jack got his Ph.D. at NYU in the School of Education
in 1950 and his dissertation was the comparison of the factor structure
of Weschler Adult Intelligence Scale (WAIS)  for different psychiatric
diagnostic groups (Jack worked as a psychologist at the Bronx VA
Hospital in the late 1940s and used patients there for his disseration;
he would join the psychology department at NYU a few years later).
Jack does not mention the use of any computer based analysis
being used but he does refer to "mechincal computation" which I
take to mean some type of calculating machine (for some examples
of what was available back circa 1950, see:
http://americanhistory.si.edu/collections/search/main?edan_q=set_name:%22Calculating+Machines%22
The Friden calculators appear to have many functions that were
developed to do complex calculations for WW II operations; see:
http://www.rauck.net/friden/History-03.htm ).
Given that he had 300 subjects/participants and he had to calculate
correlation matrices and factor analysis results, this had to be
some labor-intensive, time-consuming activity.

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: History question

Bruce Weaver
Administrator
Thanks Mike, that was interesting.  I know Jerry Dallal from his posts to sci.stat.consult & sci.stat.math (a few years ago, when those groups were still very active), but did now know about his work with Wilkinson.  

Jan de Leeuw (from UCLA) has written an overview, which may be of interest.  It can be seen via the following links:

   http://gifi.stat.ucla.edu/janspubs/2009/reports/deleeuw_R_09a.pdf
   http://gifi.stat.ucla.edu/janspubs/2011/chapters/deleeuw_C_11.pdf

The reference for the second link is:

J. De Leeuw. Statistical Software: An Overview. In M. Lovric, editor, International Encyclopedia of Statistical Science, pages 1470-1473. Springer Verlag, 2011.

I'm not convinced de Leeuw has all of the details about SPSS right.  E.g., I believe the name was changed to PASW before IBM bought out SPSS, and it has certainly gone back to IMB-SPSS-Statistics for the last several releases (see http://www-01.ibm.com/support/docview.wss?uid=swg21476197).  

This makes me wonder how accurate de Leeuw's characterizations of the other packages are.  

HTH.

Mike wrote
On Sunday, December 06, 2015 10:37 AM, Stephen Salbod wrote:
>Good afternoon,
>
>Can anyone recommend a chapter (or article) on the History of
>Statistics
>that includes a discussion of software (e.g., Statjob, SPSS, SAS) and
>hopefully mentions the work of Jacob Cohen. I’m looking for material to
>supplement a text in an undergraduate history of psychology course.
>
>Any suggests are most welcome. Thank you.

A few points:

(1)  I would suggest going over to the Jstor database (www.jstor.org --
Pace should have access to it) and searching the journal "American
Statistician" which had numerous reviews of statistical software over
the decades. An article that I found particularly useful when I was in
grad school in the 1970s is the following:

Wilkinson, L., & Dallal, G. E.. (1977). Accuracy of Sample Moments
Calculations among Widely Used Statistical Programs. The American
Statistician, 31(3), 128–131.

Both Wilkinson's and Dallal's names should be familiar to folks.
Wilkinson would develop Systat which was bought out by SPSS
but SPSS would eventually sell or "spin off" Systat. I'm not sure
what Wilkinson's relationship, if any, is with SPSS today.
Wilkinson would also be lead author on the American
Psychological Association's (APA) task force on the use of
statistics.  Gerry Dallal would go on to be a produactive researcher
and developed the Dos-based STATOOLS package in the 1990s
that supplemented the then version of Systat (see:
http://www.tufts.edu/~gdallal/STATPKG.HTM  ).

The importance of the Wilkinson and Dallal's (1988) paper is
reflected in its abstract:

|Four widely used statistical program packages-BMDP, SPSS,
|DATATEXT, and OSIRIS-were compared for computational
|accuracy on sample means, standard deviations, and correlations.
|Only one, BMDP, was not seriously inaccurate in calculations on
|a data set of three observations.Further,S PSS computed inaccurate
|statistics in a discriminanta nalysis on a real data set of 848
observations.
|It is recommended that the desk calculator algo-rithm,f ound in most
|of these programs,n ot be used in packages which may run on short
|word length machines.

The key problem was that not enough space was allocated to represent
a number and this would produce underflow errors (i.e., very small
numbers
getting truncated without warning and used in subsequent calculations)
and overflow errors (i.e., very large numbers that got truncated).  This
was a particular problem in calculating the Sum of Squares because
many programs used the "computational formula" (i.e., sumX^2 -
[(SumX)/N]
instead of the "definitional" or mean-deviation formula (i.e.,
sum(X - MeanX)^2).   The BMDP series (RIP) was the only one that
correctly
calculated the statistics because it used the mean-deviation form (which
required a double pass reading of the data -- the other form only
requires
one reading of the data).  Most software was modified to overcome these
problems but Wilkinson would develop a set of testing procedures for
newer versions of statistical software; see:
https://www.cs.uic.edu/~wilkinson/Publications/accuracy.pdf

One has to wonder how many research reports and dissertations reported
statistics that were affected by the errors that Wilkinson and Dallal
discovered
but I have the feeling that most psychologists are blissfully unaware
that
there was any problem at all (making the assumption that all statistical
software is 100% accurate).

See also:
Norusis, M. J., Van Eck, N., Montanelli, R. G., Wilkinson, L., Dallal,
G. E., Neter, J., … Conover, W. J.. (1978). Letters to the Editor.
The American Statistician, 32(3), 113–114.

Berk, K. N., & Francis, I. S.. (1978). A Review of the Manuals for BMDP
and SPSS. Journal of the American Statistical Association, 73(361),
65–71.

Dallal, G. E.. (1988). Statistical Microcomputing-Like It Is. The
American
Statistician, 42(3), 212–216.

McCullough, B. D.. (1998). Assessing the Reliability of Statistical
Software:
Part I. The American Statistician, 52(4), 358–366.

(2) Regarding Jack Cohen: I was a grad student in experimental psych
at SUNY-Stony Brook but I got a one-year pre-doc fellowship to do
research at NYU with Martin Braine in 1978-1979.  I sat in on Jack's
year long grad statistics course and he did not use any computer
software in the course. Later, when I worked in New York State
Psychiatric
Institute (NYSPI) where Jack had a one day a week appointment in
the Biometrics department (Pat Cohen worked full-time there), I had
opportunities to consult with him on the statistical analyses I was
doing on a NIMH granted funded project.  By this time I was pretty
much expert in BMDP but was stuck on what the appropriate analysis
was for the data that the project had (it was an experimental design).
I went over BMDP ANOVA (2V) analyses with Jack and we discussed
the problems at a conceptual level (it is now clear that a multilevel
analysis would have been most appropriate but neither I nor Jack
had worked this out; Joe Fleiss was also consulted and pointed me
in that direction but he couldn't be bothered to provide additional
help).
In these consultations Jack didn't suggested alternative analyses in
other software (e.g., SAS) and I didn't get any sense of what software
he actually used.  Most data analysts and statisticians at NYSPI would
do their computer work in the second sub-level of the then "new
building" where the IBM medium sized computer was located and
had a room full of IBM 3270 series terminals (which one guy referred
to as the "Checker Cab" [NYC ref] of computer terminals; the ADM 3a
popular with Unix systems and other mainframes were 90 pound
weakling by comparison).  I spent a lot time in the "terminal room" and
came across many people (Pat Shrout would pop in now and then)
but I never saw Jack there.  It is possible that Biometrics had their
own terminals which would explain why I never saw him around the
computer center.  All of the above is just to say that, in retrospect,
I don't know which software statistical package he used, moreover,
if memory serves, I don't remember any reference to specific software
in any of his major publications.

On a sidenote: Jack got his Ph.D. at NYU in the School of Education
in 1950 and his dissertation was the comparison of the factor structure
of Weschler Adult Intelligence Scale (WAIS)  for different psychiatric
diagnostic groups (Jack worked as a psychologist at the Bronx VA
Hospital in the late 1940s and used patients there for his disseration;
he would join the psychology department at NYU a few years later).
Jack does not mention the use of any computer based analysis
being used but he does refer to "mechincal computation" which I
take to mean some type of calculating machine (for some examples
of what was available back circa 1950, see:
http://americanhistory.si.edu/collections/search/main?edan_q=set_name:%22Calculating+Machines%22
The Friden calculators appear to have many functions that were
developed to do complex calculations for WW II operations; see:
http://www.rauck.net/friden/History-03.htm ).
Given that he had 300 subjects/participants and he had to calculate
correlation matrices and factor analysis results, this had to be
some labor-intensive, time-consuming activity.

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: History question

Mike
On Monday, December 07, 2015 4:10 PM, Bruce Weaver wrote:>
>Thanks Mike, that was interesting.  I know Jerry Dallal from his posts
>to
>sci.stat.consult & sci.stat.math (a few years ago, when those groups
>were
>still very active), but did now know about his work with Wilkinson.

You mean posts like this:
https://groups.google.com/forum/#!search/sct.stat.consult$20palij$20dallel/sci.stat.consult/s9MsZisNIIQ/HTMOxwLmDXwJ

;-)

>Jan de Leeuw (from UCLA) has written an overview, which may be of
>interest.
>It can be seen via the following links:
>
>   http://gifi.stat.ucla.edu/janspubs/2009/reports/deleeuw_R_09a.pdf
>  http://gifi.stat.ucla.edu/janspubs/2011/chapters/deleeuw_C_11.pdf
>
>The reference for the second link is:
>
>J. De Leeuw. Statistical Software: An Overview. In M. Lovric, editor,
>/International Encyclopedia of Statistical Science/, pages 1470-1473.
>Springer Verlag, 2011.
>
>I'm not convinced de Leeuw has all of the details about SPSS right.
>E.g., I
>believe the name was changed to PASW /before/ IBM bought out SPSS, and
>it
>has certainly gone back to IMB-SPSS-Statistics for the last several
>releases
>(see http://www-01.ibm.com/support/docview.wss?uid=swg21476197).

He's right.  The Wikipedia entry on SPS says the following:

|SPSS Inc announced on July 28, 2009 that it was being acquired by
|IBM for US$1.2 billion.[5] Because of a dispute about ownership of the
|name "SPSS", between 2009 and 2010, the product was referred to
|as PASW (Predictive Analytics SoftWare).[6] As of January 2010, it
|became "SPSS: An IBM Company". Complete transfer of business
|to IBM was done by October 1, 2010. By that date, SPSS: An IBM
|Company ceased to exist. IBM SPSS is now fully integrated into the
|IBM Corporation, and is one of the brands under IBM Software Group's
|Business Analytics Portfolio, together with IBM Algorithmics, IBM
|Cognos and IBM OpenPages

The reference for footnote [5] is an IBM press release and for [6{ is
a news item in Scientific American.

>This makes me wonder how accurate de Leeuw's characterizations of the
>other
>packages are.


Mike wrote

> On Sunday, December 06, 2015 10:37 AM, Stephen Salbod wrote:
>>Good afternoon,
>>
>>Can anyone recommend a chapter (or article) on the History of
>>Statistics
>>that includes a discussion of software (e.g., Statjob, SPSS, SAS) and
>>hopefully mentions the work of Jacob Cohen. I’m looking for material
>>to
>>supplement a text in an undergraduate history of psychology course.
>>
>>Any suggests are most welcome. Thank you.
>
> A few points:
>
> (1)  I would suggest going over to the Jstor database
> (www.jstor.org --
> Pace should have access to it) and searching the journal "American
> Statistician" which had numerous reviews of statistical software over
> the decades. An article that I found particularly useful when I was in
> grad school in the 1970s is the following:
>
> Wilkinson, L., & Dallal, G. E.. (1977). Accuracy of Sample Moments
> Calculations among Widely Used Statistical Programs. The American
> Statistician, 31(3), 128–131.
>
> Both Wilkinson's and Dallal's names should be familiar to folks.
> Wilkinson would develop Systat which was bought out by SPSS
> but SPSS would eventually sell or "spin off" Systat. I'm not sure
> what Wilkinson's relationship, if any, is with SPSS today.
> Wilkinson would also be lead author on the American
> Psychological Association's (APA) task force on the use of
> statistics.  Gerry Dallal would go on to be a produactive researcher
> and developed the Dos-based STATOOLS package in the 1990s
> that supplemented the then version of Systat (see:
> http://www.tufts.edu/~gdallal/STATPKG.HTM  ).
>
> The importance of the Wilkinson and Dallal's (1988) paper is
> reflected in its abstract:
>
> |Four widely used statistical program packages-BMDP, SPSS,
> |DATATEXT, and OSIRIS-were compared for computational
> |accuracy on sample means, standard deviations, and correlations.
> |Only one, BMDP, was not seriously inaccurate in calculations on
> |a data set of three observations.Further,S PSS computed inaccurate
> |statistics in a discriminanta nalysis on a real data set of 848
> observations.
> |It is recommended that the desk calculator algo-rithm,f ound in most
> |of these programs,n ot be used in packages which may run on short
> |word length machines.
>
> The key problem was that not enough space was allocated to represent
> a number and this would produce underflow errors (i.e., very small
> numbers
> getting truncated without warning and used in subsequent calculations)
> and overflow errors (i.e., very large numbers that got truncated).
> This
> was a particular problem in calculating the Sum of Squares because
> many programs used the "computational formula" (i.e., sumX^2 -
> [(SumX)/N]
> instead of the "definitional" or mean-deviation formula (i.e.,
> sum(X - MeanX)^2).   The BMDP series (RIP) was the only one that
> correctly
> calculated the statistics because it used the mean-deviation form
> (which
> required a double pass reading of the data -- the other form only
> requires
> one reading of the data).  Most software was modified to overcome
> these
> problems but Wilkinson would develop a set of testing procedures for
> newer versions of statistical software; see:
> https://www.cs.uic.edu/~wilkinson/Publications/accuracy.pdf
>
> One has to wonder how many research reports and dissertations reported
> statistics that were affected by the errors that Wilkinson and Dallal
> discovered
> but I have the feeling that most psychologists are blissfully unaware
> that
> there was any problem at all (making the assumption that all
> statistical
> software is 100% accurate).
>
> See also:
> Norusis, M. J., Van Eck, N., Montanelli, R. G., Wilkinson, L., Dallal,
> G. E., Neter, J., … Conover, W. J.. (1978). Letters to the Editor.
> The American Statistician, 32(3), 113–114.
>
> Berk, K. N., & Francis, I. S.. (1978). A Review of the Manuals for
> BMDP
> and SPSS. Journal of the American Statistical Association, 73(361),
> 65–71.
>
> Dallal, G. E.. (1988). Statistical Microcomputing-Like It Is. The
> American
> Statistician, 42(3), 212–216.
>
> McCullough, B. D.. (1998). Assessing the Reliability of Statistical
> Software:
> Part I. The American Statistician, 52(4), 358–366.
>
> (2) Regarding Jack Cohen: I was a grad student in experimental psych
> at SUNY-Stony Brook but I got a one-year pre-doc fellowship to do
> research at NYU with Martin Braine in 1978-1979.  I sat in on Jack's
> year long grad statistics course and he did not use any computer
> software in the course. Later, when I worked in New York State
> Psychiatric
> Institute (NYSPI) where Jack had a one day a week appointment in
> the Biometrics department (Pat Cohen worked full-time there), I had
> opportunities to consult with him on the statistical analyses I was
> doing on a NIMH granted funded project.  By this time I was pretty
> much expert in BMDP but was stuck on what the appropriate analysis
> was for the data that the project had (it was an experimental design).
> I went over BMDP ANOVA (2V) analyses with Jack and we discussed
> the problems at a conceptual level (it is now clear that a multilevel
> analysis would have been most appropriate but neither I nor Jack
> had worked this out; Joe Fleiss was also consulted and pointed me
> in that direction but he couldn't be bothered to provide additional
> help).
> In these consultations Jack didn't suggested alternative analyses in
> other software (e.g., SAS) and I didn't get any sense of what software
> he actually used.  Most data analysts and statisticians at NYSPI would
> do their computer work in the second sub-level of the then "new
> building" where the IBM medium sized computer was located and
> had a room full of IBM 3270 series terminals (which one guy referred
> to as the "Checker Cab" [NYC ref] of computer terminals; the ADM 3a
> popular with Unix systems and other mainframes were 90 pound
> weakling by comparison).  I spent a lot time in the "terminal room"
> and
> came across many people (Pat Shrout would pop in now and then)
> but I never saw Jack there.  It is possible that Biometrics had their
> own terminals which would explain why I never saw him around the
> computer center.  All of the above is just to say that, in retrospect,
> I don't know which software statistical package he used, moreover,
> if memory serves, I don't remember any reference to specific software
> in any of his major publications.
>
> On a sidenote: Jack got his Ph.D. at NYU in the School of Education
> in 1950 and his dissertation was the comparison of the factor
> structure
> of Weschler Adult Intelligence Scale (WAIS)  for different psychiatric
> diagnostic groups (Jack worked as a psychologist at the Bronx VA
> Hospital in the late 1940s and used patients there for his
> disseration;
> he would join the psychology department at NYU a few years later).
> Jack does not mention the use of any computer based analysis
> being used but he does refer to "mechincal computation" which I
> take to mean some type of calculating machine (for some examples
> of what was available back circa 1950, see:
> http://americanhistory.si.edu/collections/search/main?edan_q=set_name:%22Calculating+Machines%22
> The Friden calculators appear to have many functions that were
> developed to do complex calculations for WW II operations; see:
> http://www.rauck.net/friden/History-03.htm ).
> Given that he had 300 subjects/participants and he had to calculate
> correlation matrices and factor analysis results, this had to be
> some labor-intensive, time-consuming activity.
>
> -Mike Palij
> New York University

> mp26@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/History-question-tp5731092p5731096.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: History question

Jon Peck
For sure, the ill-fated PASW name change occurred prior to the IBM acquisition but as a happy side effect of the acquisition, it was immediately changed back by IBM.  The "SPSS: An IBM Company" identifier was a temporary moniker used only until IBM completed the full Transfer of Business: something to do with lawyers.

On Mon, Dec 7, 2015 at 3:28 PM, Mike Palij <[hidden email]> wrote:
On Monday, December 07, 2015 4:10 PM, Bruce Weaver wrote:>
Thanks Mike, that was interesting.  I know Jerry Dallal from his posts to
sci.stat.consult & sci.stat.math (a few years ago, when those groups were
still very active), but did now know about his work with Wilkinson.

You mean posts like this:
https://groups.google.com/forum/#!search/sct.stat.consult$20palij$20dallel/sci.stat.consult/s9MsZisNIIQ/HTMOxwLmDXwJ

;-)

Jan de Leeuw (from UCLA) has written an overview, which may be of interest.
It can be seen via the following links:

  http://gifi.stat.ucla.edu/janspubs/2009/reports/deleeuw_R_09a.pdf
 http://gifi.stat.ucla.edu/janspubs/2011/chapters/deleeuw_C_11.pdf

The reference for the second link is:

J. De Leeuw. Statistical Software: An Overview. In M. Lovric, editor,
/International Encyclopedia of Statistical Science/, pages 1470-1473.
Springer Verlag, 2011.

I'm not convinced de Leeuw has all of the details about SPSS right. E.g., I
believe the name was changed to PASW /before/ IBM bought out SPSS, and it
has certainly gone back to IMB-SPSS-Statistics for the last several releases
(see http://www-01.ibm.com/support/docview.wss?uid=swg21476197).

He's right.  The Wikipedia entry on SPS says the following:

|SPSS Inc announced on July 28, 2009 that it was being acquired by
|IBM for US$1.2 billion.[5] Because of a dispute about ownership of the
|name "SPSS", between 2009 and 2010, the product was referred to
|as PASW (Predictive Analytics SoftWare).[6] As of January 2010, it
|became "SPSS: An IBM Company". Complete transfer of business
|to IBM was done by October 1, 2010. By that date, SPSS: An IBM
|Company ceased to exist. IBM SPSS is now fully integrated into the
|IBM Corporation, and is one of the brands under IBM Software Group's
|Business Analytics Portfolio, together with IBM Algorithmics, IBM
|Cognos and IBM OpenPages

The reference for footnote [5] is an IBM press release and for [6{ is
a news item in Scientific American.

This makes me wonder how accurate de Leeuw's characterizations of the other
packages are.


Mike wrote
On Sunday, December 06, 2015 10:37 AM, Stephen Salbod wrote:
Good afternoon,

Can anyone recommend a chapter (or article) on the History of
Statistics
that includes a discussion of software (e.g., Statjob, SPSS, SAS) and
hopefully mentions the work of Jacob Cohen. I’m looking for material to
supplement a text in an undergraduate history of psychology course.

Any suggests are most welcome. Thank you.

A few points:

(1)  I would suggest going over to the Jstor database (www.jstor.org -- Pace should have access to it) and searching the journal "American
Statistician" which had numerous reviews of statistical software over
the decades. An article that I found particularly useful when I was in
grad school in the 1970s is the following:

Wilkinson, L., & Dallal, G. E.. (1977). Accuracy of Sample Moments
Calculations among Widely Used Statistical Programs. The American
Statistician, 31(3), 128–131.

Both Wilkinson's and Dallal's names should be familiar to folks.
Wilkinson would develop Systat which was bought out by SPSS
but SPSS would eventually sell or "spin off" Systat. I'm not sure
what Wilkinson's relationship, if any, is with SPSS today.
Wilkinson would also be lead author on the American
Psychological Association's (APA) task force on the use of
statistics.  Gerry Dallal would go on to be a produactive researcher
and developed the Dos-based STATOOLS package in the 1990s
that supplemented the then version of Systat (see:
http://www.tufts.edu/~gdallal/STATPKG.HTM  ).

The importance of the Wilkinson and Dallal's (1988) paper is
reflected in its abstract:

|Four widely used statistical program packages-BMDP, SPSS,
|DATATEXT, and OSIRIS-were compared for computational
|accuracy on sample means, standard deviations, and correlations.
|Only one, BMDP, was not seriously inaccurate in calculations on
|a data set of three observations.Further,S PSS computed inaccurate
|statistics in a discriminanta nalysis on a real data set of 848
observations.
|It is recommended that the desk calculator algo-rithm,f ound in most
|of these programs,n ot be used in packages which may run on short
|word length machines.

The key problem was that not enough space was allocated to represent
a number and this would produce underflow errors (i.e., very small
numbers
getting truncated without warning and used in subsequent calculations)
and overflow errors (i.e., very large numbers that got truncated). This
was a particular problem in calculating the Sum of Squares because
many programs used the "computational formula" (i.e., sumX^2 -
[(SumX)/N]
instead of the "definitional" or mean-deviation formula (i.e.,
sum(X - MeanX)^2).   The BMDP series (RIP) was the only one that
correctly
calculated the statistics because it used the mean-deviation form (which
required a double pass reading of the data -- the other form only
requires
one reading of the data).  Most software was modified to overcome these
problems but Wilkinson would develop a set of testing procedures for
newer versions of statistical software; see:
https://www.cs.uic.edu/~wilkinson/Publications/accuracy.pdf

One has to wonder how many research reports and dissertations reported
statistics that were affected by the errors that Wilkinson and Dallal
discovered
but I have the feeling that most psychologists are blissfully unaware
that
there was any problem at all (making the assumption that all statistical
software is 100% accurate).

See also:
Norusis, M. J., Van Eck, N., Montanelli, R. G., Wilkinson, L., Dallal,
G. E., Neter, J., … Conover, W. J.. (1978). Letters to the Editor.
The American Statistician, 32(3), 113–114.

Berk, K. N., & Francis, I. S.. (1978). A Review of the Manuals for BMDP
and SPSS. Journal of the American Statistical Association, 73(361),
65–71.

Dallal, G. E.. (1988). Statistical Microcomputing-Like It Is. The
American
Statistician, 42(3), 212–216.

McCullough, B. D.. (1998). Assessing the Reliability of Statistical
Software:
Part I. The American Statistician, 52(4), 358–366.

(2) Regarding Jack Cohen: I was a grad student in experimental psych
at SUNY-Stony Brook but I got a one-year pre-doc fellowship to do
research at NYU with Martin Braine in 1978-1979.  I sat in on Jack's
year long grad statistics course and he did not use any computer
software in the course. Later, when I worked in New York State
Psychiatric
Institute (NYSPI) where Jack had a one day a week appointment in
the Biometrics department (Pat Cohen worked full-time there), I had
opportunities to consult with him on the statistical analyses I was
doing on a NIMH granted funded project.  By this time I was pretty
much expert in BMDP but was stuck on what the appropriate analysis
was for the data that the project had (it was an experimental design).
I went over BMDP ANOVA (2V) analyses with Jack and we discussed
the problems at a conceptual level (it is now clear that a multilevel
analysis would have been most appropriate but neither I nor Jack
had worked this out; Joe Fleiss was also consulted and pointed me
in that direction but he couldn't be bothered to provide additional
help).
In these consultations Jack didn't suggested alternative analyses in
other software (e.g., SAS) and I didn't get any sense of what software
he actually used.  Most data analysts and statisticians at NYSPI would
do their computer work in the second sub-level of the then "new
building" where the IBM medium sized computer was located and
had a room full of IBM 3270 series terminals (which one guy referred
to as the "Checker Cab" [NYC ref] of computer terminals; the ADM 3a
popular with Unix systems and other mainframes were 90 pound
weakling by comparison).  I spent a lot time in the "terminal room" and
came across many people (Pat Shrout would pop in now and then)
but I never saw Jack there.  It is possible that Biometrics had their
own terminals which would explain why I never saw him around the
computer center.  All of the above is just to say that, in retrospect,
I don't know which software statistical package he used, moreover,
if memory serves, I don't remember any reference to specific software
in any of his major publications.

On a sidenote: Jack got his Ph.D. at NYU in the School of Education
in 1950 and his dissertation was the comparison of the factor structure
of Weschler Adult Intelligence Scale (WAIS)  for different psychiatric
diagnostic groups (Jack worked as a psychologist at the Bronx VA
Hospital in the late 1940s and used patients there for his disseration;
he would join the psychology department at NYU a few years later).
Jack does not mention the use of any computer based analysis
being used but he does refer to "mechincal computation" which I
take to mean some type of calculating machine (for some examples
of what was available back circa 1950, see:
http://americanhistory.si.edu/collections/search/main?edan_q=set_name:%22Calculating+Machines%22
The Friden calculators appear to have many functions that were
developed to do complex calculations for WW II operations; see:
http://www.rauck.net/friden/History-03.htm ).
Given that he had 300 subjects/participants and he had to calculate
correlation matrices and factor analysis results, this had to be
some labor-intensive, time-consuming activity.

-Mike Palij
New York University

mp26@


=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@.UGA

 (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/History-question-tp5731092p5731096.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD