Good afternoon,
Can anyone recommend a chapter (or article) on the History of Statistics that includes a discussion of software (e.g., Statjob, SPSS, SAS) and hopefully mentions the work of Jacob Cohen. I’m looking for material to supplement a text in an undergraduate history of psychology course. Any suggests are most welcome. Thank you. Stephen Salbod, Pace University, NYC |
On Sunday, December 06, 2015 10:37 AM, Stephen Salbod wrote:
>Good afternoon, > >Can anyone recommend a chapter (or article) on the History of >Statistics >that includes a discussion of software (e.g., Statjob, SPSS, SAS) and >hopefully mentions the work of Jacob Cohen. I’m looking for material to >supplement a text in an undergraduate history of psychology course. > >Any suggests are most welcome. Thank you. A few points: (1) I would suggest going over to the Jstor database (www.jstor.org -- Pace should have access to it) and searching the journal "American Statistician" which had numerous reviews of statistical software over the decades. An article that I found particularly useful when I was in grad school in the 1970s is the following: Wilkinson, L., & Dallal, G. E.. (1977). Accuracy of Sample Moments Calculations among Widely Used Statistical Programs. The American Statistician, 31(3), 128–131. Both Wilkinson's and Dallal's names should be familiar to folks. Wilkinson would develop Systat which was bought out by SPSS but SPSS would eventually sell or "spin off" Systat. I'm not sure what Wilkinson's relationship, if any, is with SPSS today. Wilkinson would also be lead author on the American Psychological Association's (APA) task force on the use of statistics. Gerry Dallal would go on to be a produactive researcher and developed the Dos-based STATOOLS package in the 1990s that supplemented the then version of Systat (see: http://www.tufts.edu/~gdallal/STATPKG.HTM ). The importance of the Wilkinson and Dallal's (1988) paper is reflected in its abstract: |Four widely used statistical program packages-BMDP, SPSS, |DATATEXT, and OSIRIS-were compared for computational |accuracy on sample means, standard deviations, and correlations. |Only one, BMDP, was not seriously inaccurate in calculations on |a data set of three observations.Further,S PSS computed inaccurate |statistics in a discriminanta nalysis on a real data set of 848 observations. |It is recommended that the desk calculator algo-rithm,f ound in most |of these programs,n ot be used in packages which may run on short |word length machines. The key problem was that not enough space was allocated to represent a number and this would produce underflow errors (i.e., very small numbers getting truncated without warning and used in subsequent calculations) and overflow errors (i.e., very large numbers that got truncated). This was a particular problem in calculating the Sum of Squares because many programs used the "computational formula" (i.e., sumX^2 - [(SumX)/N] instead of the "definitional" or mean-deviation formula (i.e., sum(X - MeanX)^2). The BMDP series (RIP) was the only one that correctly calculated the statistics because it used the mean-deviation form (which required a double pass reading of the data -- the other form only requires one reading of the data). Most software was modified to overcome these problems but Wilkinson would develop a set of testing procedures for newer versions of statistical software; see: https://www.cs.uic.edu/~wilkinson/Publications/accuracy.pdf One has to wonder how many research reports and dissertations reported statistics that were affected by the errors that Wilkinson and Dallal discovered but I have the feeling that most psychologists are blissfully unaware that there was any problem at all (making the assumption that all statistical software is 100% accurate). See also: Norusis, M. J., Van Eck, N., Montanelli, R. G., Wilkinson, L., Dallal, G. E., Neter, J., … Conover, W. J.. (1978). Letters to the Editor. The American Statistician, 32(3), 113–114. Berk, K. N., & Francis, I. S.. (1978). A Review of the Manuals for BMDP and SPSS. Journal of the American Statistical Association, 73(361), 65–71. Dallal, G. E.. (1988). Statistical Microcomputing-Like It Is. The American Statistician, 42(3), 212–216. McCullough, B. D.. (1998). Assessing the Reliability of Statistical Software: Part I. The American Statistician, 52(4), 358–366. (2) Regarding Jack Cohen: I was a grad student in experimental psych at SUNY-Stony Brook but I got a one-year pre-doc fellowship to do research at NYU with Martin Braine in 1978-1979. I sat in on Jack's year long grad statistics course and he did not use any computer software in the course. Later, when I worked in New York State Psychiatric Institute (NYSPI) where Jack had a one day a week appointment in the Biometrics department (Pat Cohen worked full-time there), I had opportunities to consult with him on the statistical analyses I was doing on a NIMH granted funded project. By this time I was pretty much expert in BMDP but was stuck on what the appropriate analysis was for the data that the project had (it was an experimental design). I went over BMDP ANOVA (2V) analyses with Jack and we discussed the problems at a conceptual level (it is now clear that a multilevel analysis would have been most appropriate but neither I nor Jack had worked this out; Joe Fleiss was also consulted and pointed me in that direction but he couldn't be bothered to provide additional help). In these consultations Jack didn't suggested alternative analyses in other software (e.g., SAS) and I didn't get any sense of what software he actually used. Most data analysts and statisticians at NYSPI would do their computer work in the second sub-level of the then "new building" where the IBM medium sized computer was located and had a room full of IBM 3270 series terminals (which one guy referred to as the "Checker Cab" [NYC ref] of computer terminals; the ADM 3a popular with Unix systems and other mainframes were 90 pound weakling by comparison). I spent a lot time in the "terminal room" and came across many people (Pat Shrout would pop in now and then) but I never saw Jack there. It is possible that Biometrics had their own terminals which would explain why I never saw him around the computer center. All of the above is just to say that, in retrospect, I don't know which software statistical package he used, moreover, if memory serves, I don't remember any reference to specific software in any of his major publications. On a sidenote: Jack got his Ph.D. at NYU in the School of Education in 1950 and his dissertation was the comparison of the factor structure of Weschler Adult Intelligence Scale (WAIS) for different psychiatric diagnostic groups (Jack worked as a psychologist at the Bronx VA Hospital in the late 1940s and used patients there for his disseration; he would join the psychology department at NYU a few years later). Jack does not mention the use of any computer based analysis being used but he does refer to "mechincal computation" which I take to mean some type of calculating machine (for some examples of what was available back circa 1950, see: http://americanhistory.si.edu/collections/search/main?edan_q=set_name:%22Calculating+Machines%22 The Friden calculators appear to have many functions that were developed to do complex calculations for WW II operations; see: http://www.rauck.net/friden/History-03.htm ). Given that he had 300 subjects/participants and he had to calculate correlation matrices and factor analysis results, this had to be some labor-intensive, time-consuming activity. -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Thanks Mike, that was interesting. I know Jerry Dallal from his posts to sci.stat.consult & sci.stat.math (a few years ago, when those groups were still very active), but did now know about his work with Wilkinson.
Jan de Leeuw (from UCLA) has written an overview, which may be of interest. It can be seen via the following links: http://gifi.stat.ucla.edu/janspubs/2009/reports/deleeuw_R_09a.pdf http://gifi.stat.ucla.edu/janspubs/2011/chapters/deleeuw_C_11.pdf The reference for the second link is: J. De Leeuw. Statistical Software: An Overview. In M. Lovric, editor, International Encyclopedia of Statistical Science, pages 1470-1473. Springer Verlag, 2011. I'm not convinced de Leeuw has all of the details about SPSS right. E.g., I believe the name was changed to PASW before IBM bought out SPSS, and it has certainly gone back to IMB-SPSS-Statistics for the last several releases (see http://www-01.ibm.com/support/docview.wss?uid=swg21476197). This makes me wonder how accurate de Leeuw's characterizations of the other packages are. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
On Monday, December 07, 2015 4:10 PM, Bruce Weaver wrote:>
>Thanks Mike, that was interesting. I know Jerry Dallal from his posts >to >sci.stat.consult & sci.stat.math (a few years ago, when those groups >were >still very active), but did now know about his work with Wilkinson. You mean posts like this: https://groups.google.com/forum/#!search/sct.stat.consult$20palij$20dallel/sci.stat.consult/s9MsZisNIIQ/HTMOxwLmDXwJ ;-) >Jan de Leeuw (from UCLA) has written an overview, which may be of >interest. >It can be seen via the following links: > > http://gifi.stat.ucla.edu/janspubs/2009/reports/deleeuw_R_09a.pdf > http://gifi.stat.ucla.edu/janspubs/2011/chapters/deleeuw_C_11.pdf > >The reference for the second link is: > >J. De Leeuw. Statistical Software: An Overview. In M. Lovric, editor, >/International Encyclopedia of Statistical Science/, pages 1470-1473. >Springer Verlag, 2011. > >I'm not convinced de Leeuw has all of the details about SPSS right. >E.g., I >believe the name was changed to PASW /before/ IBM bought out SPSS, and >it >has certainly gone back to IMB-SPSS-Statistics for the last several >releases >(see http://www-01.ibm.com/support/docview.wss?uid=swg21476197). He's right. The Wikipedia entry on SPS says the following: |SPSS Inc announced on July 28, 2009 that it was being acquired by |IBM for US$1.2 billion.[5] Because of a dispute about ownership of the |name "SPSS", between 2009 and 2010, the product was referred to |as PASW (Predictive Analytics SoftWare).[6] As of January 2010, it |became "SPSS: An IBM Company". Complete transfer of business |to IBM was done by October 1, 2010. By that date, SPSS: An IBM |Company ceased to exist. IBM SPSS is now fully integrated into the |IBM Corporation, and is one of the brands under IBM Software Group's |Business Analytics Portfolio, together with IBM Algorithmics, IBM |Cognos and IBM OpenPages The reference for footnote [5] is an IBM press release and for [6{ is a news item in Scientific American. >This makes me wonder how accurate de Leeuw's characterizations of the >other >packages are. Mike wrote > On Sunday, December 06, 2015 10:37 AM, Stephen Salbod wrote: >>Good afternoon, >> >>Can anyone recommend a chapter (or article) on the History of >>Statistics >>that includes a discussion of software (e.g., Statjob, SPSS, SAS) and >>hopefully mentions the work of Jacob Cohen. I’m looking for material >>to >>supplement a text in an undergraduate history of psychology course. >> >>Any suggests are most welcome. Thank you. > > A few points: > > (1) I would suggest going over to the Jstor database > (www.jstor.org -- > Pace should have access to it) and searching the journal "American > Statistician" which had numerous reviews of statistical software over > the decades. An article that I found particularly useful when I was in > grad school in the 1970s is the following: > > Wilkinson, L., & Dallal, G. E.. (1977). Accuracy of Sample Moments > Calculations among Widely Used Statistical Programs. The American > Statistician, 31(3), 128–131. > > Both Wilkinson's and Dallal's names should be familiar to folks. > Wilkinson would develop Systat which was bought out by SPSS > but SPSS would eventually sell or "spin off" Systat. I'm not sure > what Wilkinson's relationship, if any, is with SPSS today. > Wilkinson would also be lead author on the American > Psychological Association's (APA) task force on the use of > statistics. Gerry Dallal would go on to be a produactive researcher > and developed the Dos-based STATOOLS package in the 1990s > that supplemented the then version of Systat (see: > http://www.tufts.edu/~gdallal/STATPKG.HTM ). > > The importance of the Wilkinson and Dallal's (1988) paper is > reflected in its abstract: > > |Four widely used statistical program packages-BMDP, SPSS, > |DATATEXT, and OSIRIS-were compared for computational > |accuracy on sample means, standard deviations, and correlations. > |Only one, BMDP, was not seriously inaccurate in calculations on > |a data set of three observations.Further,S PSS computed inaccurate > |statistics in a discriminanta nalysis on a real data set of 848 > observations. > |It is recommended that the desk calculator algo-rithm,f ound in most > |of these programs,n ot be used in packages which may run on short > |word length machines. > > The key problem was that not enough space was allocated to represent > a number and this would produce underflow errors (i.e., very small > numbers > getting truncated without warning and used in subsequent calculations) > and overflow errors (i.e., very large numbers that got truncated). > This > was a particular problem in calculating the Sum of Squares because > many programs used the "computational formula" (i.e., sumX^2 - > [(SumX)/N] > instead of the "definitional" or mean-deviation formula (i.e., > sum(X - MeanX)^2). The BMDP series (RIP) was the only one that > correctly > calculated the statistics because it used the mean-deviation form > (which > required a double pass reading of the data -- the other form only > requires > one reading of the data). Most software was modified to overcome > these > problems but Wilkinson would develop a set of testing procedures for > newer versions of statistical software; see: > https://www.cs.uic.edu/~wilkinson/Publications/accuracy.pdf > > One has to wonder how many research reports and dissertations reported > statistics that were affected by the errors that Wilkinson and Dallal > discovered > but I have the feeling that most psychologists are blissfully unaware > that > there was any problem at all (making the assumption that all > statistical > software is 100% accurate). > > See also: > Norusis, M. J., Van Eck, N., Montanelli, R. G., Wilkinson, L., Dallal, > G. E., Neter, J., … Conover, W. J.. (1978). Letters to the Editor. > The American Statistician, 32(3), 113–114. > > Berk, K. N., & Francis, I. S.. (1978). A Review of the Manuals for > BMDP > and SPSS. Journal of the American Statistical Association, 73(361), > 65–71. > > Dallal, G. E.. (1988). Statistical Microcomputing-Like It Is. The > American > Statistician, 42(3), 212–216. > > McCullough, B. D.. (1998). Assessing the Reliability of Statistical > Software: > Part I. The American Statistician, 52(4), 358–366. > > (2) Regarding Jack Cohen: I was a grad student in experimental psych > at SUNY-Stony Brook but I got a one-year pre-doc fellowship to do > research at NYU with Martin Braine in 1978-1979. I sat in on Jack's > year long grad statistics course and he did not use any computer > software in the course. Later, when I worked in New York State > Psychiatric > Institute (NYSPI) where Jack had a one day a week appointment in > the Biometrics department (Pat Cohen worked full-time there), I had > opportunities to consult with him on the statistical analyses I was > doing on a NIMH granted funded project. By this time I was pretty > much expert in BMDP but was stuck on what the appropriate analysis > was for the data that the project had (it was an experimental design). > I went over BMDP ANOVA (2V) analyses with Jack and we discussed > the problems at a conceptual level (it is now clear that a multilevel > analysis would have been most appropriate but neither I nor Jack > had worked this out; Joe Fleiss was also consulted and pointed me > in that direction but he couldn't be bothered to provide additional > help). > In these consultations Jack didn't suggested alternative analyses in > other software (e.g., SAS) and I didn't get any sense of what software > he actually used. Most data analysts and statisticians at NYSPI would > do their computer work in the second sub-level of the then "new > building" where the IBM medium sized computer was located and > had a room full of IBM 3270 series terminals (which one guy referred > to as the "Checker Cab" [NYC ref] of computer terminals; the ADM 3a > popular with Unix systems and other mainframes were 90 pound > weakling by comparison). I spent a lot time in the "terminal room" > and > came across many people (Pat Shrout would pop in now and then) > but I never saw Jack there. It is possible that Biometrics had their > own terminals which would explain why I never saw him around the > computer center. All of the above is just to say that, in retrospect, > I don't know which software statistical package he used, moreover, > if memory serves, I don't remember any reference to specific software > in any of his major publications. > > On a sidenote: Jack got his Ph.D. at NYU in the School of Education > in 1950 and his dissertation was the comparison of the factor > structure > of Weschler Adult Intelligence Scale (WAIS) for different psychiatric > diagnostic groups (Jack worked as a psychologist at the Bronx VA > Hospital in the late 1940s and used patients there for his > disseration; > he would join the psychology department at NYU a few years later). > Jack does not mention the use of any computer based analysis > being used but he does refer to "mechincal computation" which I > take to mean some type of calculating machine (for some examples > of what was available back circa 1950, see: > http://americanhistory.si.edu/collections/search/main?edan_q=set_name:%22Calculating+Machines%22 > The Friden calculators appear to have many functions that were > developed to do complex calculations for WW II operations; see: > http://www.rauck.net/friden/History-03.htm ). > Given that he had 300 subjects/participants and he had to calculate > correlation matrices and factor analysis results, this had to be > some labor-intensive, time-consuming activity. > > -Mike Palij > New York University > mp26@ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/History-question-tp5731092p5731096.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
For sure, the ill-fated PASW name change occurred prior to the IBM acquisition but as a happy side effect of the acquisition, it was immediately changed back by IBM. The "SPSS: An IBM Company" identifier was a temporary moniker used only until IBM completed the full Transfer of Business: something to do with lawyers. On Mon, Dec 7, 2015 at 3:28 PM, Mike Palij <[hidden email]> wrote: On Monday, December 07, 2015 4:10 PM, Bruce Weaver wrote:> |
Free forum by Nabble | Edit this page |