Hi all
I found a strange behaviour of SPSS 15.01. When I use the DELETE VARIABLE command the following command (see example below) missbehaves. The non-missing(ABB) is not recognized then and a value 1 is written into the variable 'StrangeBehaviour' despite of a non missing value in the variable ABB. This happens with and without a subsequent EXECUTE command after the delete statement. The same code but omitting the 'Delete Var Flag' line runs perfectly ok. Can somebody reproduce this? Christian * Code to reproduce the bug. new file. input program. loop #i = 1 to 200. compute test = rv.normal(100,15). end case. end loop. end file. end input program. execute. If Test > 70 pat = 1. IF Test < 80 flag = 1. IF Flag = 1 ABB = 1. Exec. DELETE VAR Flag. Exec. * Here comes the bug. IF Pat = 1 AND (Misssing(ABB)) StrangeBehaviour = 1. Exec. ******************************* la volta statistics Christian Schmidhauser, Dr.phil.II Weinbergstrasse 108 Ch-8006 Zürich Tel: +41 (043) 233 98 01 Fax: +41 (043) 233 98 02 email: mailto:[hidden email] internet: http://www.lavolta.ch/ |
Hi Christian,
I confirm that I reproduce the described behavior. Raynald Levesque [hidden email] Website: www.spsstools.net -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of la volta statistics Sent: April 13, 2007 7:58 AM To: [hidden email] Subject: Bug in DELETE VAR command? Hi all I found a strange behaviour of SPSS 15.01. When I use the DELETE VARIABLE command the following command (see example below) missbehaves. The non-missing(ABB) is not recognized then and a value 1 is written into the variable 'StrangeBehaviour' despite of a non missing value in the variable ABB. This happens with and without a subsequent EXECUTE command after the delete statement. The same code but omitting the 'Delete Var Flag' line runs perfectly ok. Can somebody reproduce this? Christian * Code to reproduce the bug. new file. input program. loop #i = 1 to 200. compute test = rv.normal(100,15). end case. end loop. end file. end input program. execute. If Test > 70 pat = 1. IF Test < 80 flag = 1. IF Flag = 1 ABB = 1. Exec. DELETE VAR Flag. Exec. * Here comes the bug. IF Pat = 1 AND (Misssing(ABB)) StrangeBehaviour = 1. Exec. ******************************* la volta statistics Christian Schmidhauser, Dr.phil.II Weinbergstrasse 108 Ch-8006 Zürich Tel: +41 (043) 233 98 01 Fax: +41 (043) 233 98 02 email: mailto:[hidden email] internet: http://www.lavolta.ch/ |
In reply to this post by la volta statistics
At 07:58 AM 4/13/2007, la volta statistics wrote:
>I found a strange behaviour of SPSS 15.01. >When I use the DELETE VARIABLE command the following command (see >example >below) missbehaves. At 07:46 PM 4/13/2007, Raynald Levesque wrote: >I confirm that I reproduce the described behavior. So do I. As a twist, the two statements . IF Flag = 1 ABB = 1. . COMPUTE ABB2 = Flag. should (and do) give exactly the same values for variables ABB and ABB2. However, the strange behavior is exhibited only when using variable ABB; using ABB2 gives the proper result. (Not tested: using *only* the code for ABB2.) SPSS 15.0.1 draft output (WRR-not saved separately). Code as originally posted, except . 'Exec' statements replaced by 'LIST'. . Test data generation modified, to work with fewer cases . Add code creating and using ABB2, as described above. >A value 1 is written into the variable 'StrangeBehaviour' despite of a >non missing value in the variable ABB. It is also so written, despite 'pat' not having value 1. Final output listing: .................... |-----------------------------|---------------------------| |Output Created |13-APR-2007 20:37:38 | |-----------------------------|---------------------------| test pat ABB ABB2 StrangeBehaviour Strange2 99.24 1.00 . . 1.00 1.00 81.29 1.00 . . 1.00 1.00 93.77 1.00 . . 1.00 1.00 90.37 1.00 . . 1.00 1.00 13.84 . 1.00 1.00 1.00 . 92.29 1.00 . . 1.00 1.00 21.62 . 1.00 1.00 1.00 . 36.19 . 1.00 1.00 1.00 . 58.75 . 1.00 1.00 1.00 . 29.08 . 1.00 1.00 1.00 . 89.07 1.00 . . 1.00 1.00 5.83 . 1.00 1.00 1.00 . 50.49 . 1.00 1.00 1.00 . 80.59 1.00 . . 1.00 1.00 9.21 . 1.00 1.00 1.00 . 4.36 . 1.00 1.00 1.00 . 91.93 1.00 . . 1.00 1.00 67.93 1.00 1.00 1.00 1.00 . 76.72 1.00 . . 1.00 1.00 61.97 1.00 1.00 1.00 1.00 . Number of cases read: 20 Number of cases listed: 20 .................... Full code and output: .................... * Code to reproduce the bug. new file. input program. loop #i = 1 to 20. compute test = rv.uniform(0,100). end case. end loop. end file. end input program. LIST. List |-----------------------------|---------------------------| |Output Created |13-APR-2007 20:37:37 | |-----------------------------|---------------------------| test 99.24 81.29 93.77 90.37 13.84 92.29 21.62 36.19 58.75 29.08 89.07 5.83 50.49 80.59 9.21 4.36 91.93 67.93 76.72 61.97 Number of cases read: 20 Number of cases listed: 20 If Test > 60 pat = 1. IF Test < 70 flag = 1. IF Flag = 1 ABB = 1. COMPUTE ABB2 = Flag. LIST. List |-----------------------------|---------------------------| |Output Created |13-APR-2007 20:37:38 | |-----------------------------|---------------------------| test pat flag ABB ABB2 99.24 1.00 . . . 81.29 1.00 . . . 93.77 1.00 . . . 90.37 1.00 . . . 13.84 . 1.00 1.00 1.00 92.29 1.00 . . . 21.62 . 1.00 1.00 1.00 36.19 . 1.00 1.00 1.00 58.75 . 1.00 1.00 1.00 29.08 . 1.00 1.00 1.00 89.07 1.00 . . . 5.83 . 1.00 1.00 1.00 50.49 . 1.00 1.00 1.00 80.59 1.00 . . . 9.21 . 1.00 1.00 1.00 4.36 . 1.00 1.00 1.00 91.93 1.00 . . . 67.93 1.00 1.00 1.00 1.00 76.72 1.00 . . . 61.97 1.00 1.00 1.00 1.00 Number of cases read: 20 Number of cases listed: 20 DELETE VAR Flag. LIST. List |-----------------------------|---------------------------| |Output Created |13-APR-2007 20:37:38 | |-----------------------------|---------------------------| test pat ABB ABB2 99.24 1.00 . . 81.29 1.00 . . 93.77 1.00 . . 90.37 1.00 . . 13.84 . 1.00 1.00 92.29 1.00 . . 21.62 . 1.00 1.00 36.19 . 1.00 1.00 58.75 . 1.00 1.00 29.08 . 1.00 1.00 89.07 1.00 . . 5.83 . 1.00 1.00 50.49 . 1.00 1.00 80.59 1.00 . . 9.21 . 1.00 1.00 4.36 . 1.00 1.00 91.93 1.00 . . 67.93 1.00 1.00 1.00 76.72 1.00 . . 61.97 1.00 1.00 1.00 Number of cases read: 20 Number of cases listed: 20 * Here comes the bug. IF Pat = 1 AND (Missing(ABB)) StrangeBehaviour = 1. IF Pat = 1 AND (Missing(ABB2)) Strange2 = 1. LIST. List |-----------------------------|---------------------------| |Output Created |13-APR-2007 20:37:38 | |-----------------------------|---------------------------| test pat ABB ABB2 StrangeBehaviour Strange2 99.24 1.00 . . 1.00 1.00 81.29 1.00 . . 1.00 1.00 93.77 1.00 . . 1.00 1.00 90.37 1.00 . . 1.00 1.00 13.84 . 1.00 1.00 1.00 . 92.29 1.00 . . 1.00 1.00 21.62 . 1.00 1.00 1.00 . 36.19 . 1.00 1.00 1.00 . 58.75 . 1.00 1.00 1.00 . 29.08 . 1.00 1.00 1.00 . 89.07 1.00 . . 1.00 1.00 5.83 . 1.00 1.00 1.00 . 50.49 . 1.00 1.00 1.00 . 80.59 1.00 . . 1.00 1.00 9.21 . 1.00 1.00 1.00 . 4.36 . 1.00 1.00 1.00 . 91.93 1.00 . . 1.00 1.00 67.93 1.00 1.00 1.00 1.00 . 76.72 1.00 . . 1.00 1.00 61.97 1.00 1.00 1.00 1.00 . Number of cases read: 20 Number of cases listed: 20 |
Hi Richard and Ray
Thanks for responding. Yes, I have contacted SPSS's Tech Support (in Switzerland), but got now answer so far. I know, it is easy to circumvent the situation I describe. However, I find it annoying to control my data for bugs after having written a perfectly valuable syntax. Christian By the way, I recently fuond an other bug (confirmed by SPSS). They told me they will fix it with version 16. The bug comes with the VARSTOCASES command. Here an example: I select six cases upon a certain criteria out of a larger data set. Sometimes these six cases have only missing valus. I save the selected cases and subsequently use an ADD FILES command. When all six files are empty, the resulting file is empty as well. I then use a VARSTOCASES command. That should give me the value 0.00 in a variable named 'trans' for all nine cases (INDEX = Index1(9). I then do an AGGREGATE command do calculate the mean and the standard deviation of 'trans'. If all six cases selected in the first step had only missing values, I should get a mean = 0.00 and a standard deviation of 0.00. However, when I repeatedly do that as part of a larger routine, I sometimes get an absurdly high value for one of the values in 'trans'. This happens not all the time; neither the occurrence is regularly, nor comes the wrong mean always as the same value. Typically, the wrong value appears one to two times when I repeat the procedure 10 times. Below is a code that reproduces the error (at least on my computer) and prints the calculated means. It runs 10 times through the ADD FILES, VARSTOCASES, and AGGREGATE commands. The result is for example: mean = .0000 mean = 9.7E+288 mean = .0000 mean = .0000 mean = .0000 mean = .0000 mean = .0000 mean = .0000 mean = 2.9E+184 mean = .0000 The Syntax 1. creates six empty files 2. add them (resulting in an empty file) 3. restructures the empty file (VARSTOCASES) 4. aggregates the restructured file 5. writes the aggregated values into a syntax file 6. prints the values 7. erases the created syntax files (from step 5) * BEGIN OF SYNTAX. * Syntax to reproduce the wrong mean. *************************************. NEW FILE. DATASET CLOSE all. DATA LIST FREE /qrel Q1a Q2a Q3a Q4a Q5a Q6a Q7a Q8a Q9a imp_1 imp_2 imp_3 imp_4 imp_5 imp_6 imp_7 imp_8 imp_9 Caegory Questions. begin data. end data. * Make Data. DEFINE !DoTbl (). !LET !nby = 6. !DO !cnt=1 !TO !nby. !LET !Path = !QUOTE(!CONCAT('c:\Temp\tmp_',!cnt,'.sav') ). Save Outfile = !Path. !DOEND. !ENDDEFINE. !DoTbl. * Make mean. ************. DEFINE !DoData (). !LET !nby = 10. !DO !cnt=1 !TO !nby. !LET !Path1 = !QUOTE(!CONCAT('c:\Temp\tmp_1.sav') ). !LET !Path2 = !QUOTE(!CONCAT('c:\Temp\tmp_2.sav') ). !LET !Path3 = !QUOTE(!CONCAT('c:\Temp\tmp_3.sav') ). !LET !Path4 = !QUOTE(!CONCAT('c:\Temp\tmp_4.sav') ). !LET !Path5 = !QUOTE(!CONCAT('c:\Temp\tmp_5.sav') ). !LET !Path6 = !QUOTE(!CONCAT('c:\Temp\tmp_6.sav') ). New File. DATASET CLOSE ALL. Get FILE = !Path1. ADD FILES /FILE=* /FILE = !Path2 /FILE = !Path3 /FILE = !Path4 /FILE = !Path5 /FILE = !Path6. Exec. OMS /DESTINATION VIEWER=NO . VARSTOCASES /MAKE trans FROM Q1a to Q9a /INDEX = Index1(9) /NULL = DROP /DROP = QREL imp_1 to imp_9 Caegory Questions /COUNT = Counter . Exec. OMSEND. Compute dummy = 1. DATASET DECLARE Agg. AGGREGATE /OUTFILE ='Agg' /BREAK = dummy /Self_mean = MEAN(trans) /Self_sd = SD(trans). Exec. DATASET ACTIVE agg. FORMAT Self_Mean (F8.4). FORMAT Self_SD (F8.4). !LET !PathA = !QUOTE(!CONCAT('c:\Temp\myMean_',!cnt,'.sps') ). !LET !PathB = !QUOTE(!CONCAT('c:\Temp\myMean_',!cnt,'.sps') ). !LET !myMean = !QUOTE(!CONCAT("DEFINE !Tb_Mean",!cnt,"()") ). !LET !mySD = !QUOTE(!CONCAT("DEFINE !Tb_Std",!cnt,"()") ). WRITE OUTFILE !PathA/!myMean/Self_Mean/"!ENDDEFINE."/"Exec.". WRITE OUTFILE !PathA/!mySD/Self_SD/"!ENDDEFINE."/"Exec.". EXEC. !DOEND. !ENDDEFINE. !DoData. Exec. * Print the created means (should all be '.0000'). ***************************************************. Title "Results:". DEFINE !DoPrint (). !LET !nby = 10. !DO !cnt=1 !TO !nby. !LET !PathA = !QUOTE(!CONCAT('c:\Temp\myMean_',!cnt,'.sps') ). INSERT FILE = !PathA. !LET !Print = !CONCAT('mean = ','!Tb_mean',!cnt). TITLE !Print. Exec. !DOEND. !ENDDEFINE. !DoPrint. Echo " ". * Clean up. **********. DEFINE !DoClean (). !LET !nby = 10. !DO !cnt=1 !TO !nby. !LET !PathA = !QUOTE(!CONCAT('c:\Temp\myMean_',!cnt,'.sps') ). ERASE FILE = !PathA. !DOEND. !LET !nby2 = 6. !DO !cnt2=1 !TO !nby2. !LET !Path = !QUOTE(!CONCAT('c:\Temp\tmp_',!cnt2,'.sav') ). ERASE FILE= !Path. !DOEND. !ENDDEFINE. !DoClean . EXEC. * END OF SYNTAX. *******************************. -----Ursprungliche Nachricht----- Von: SPSSX(r) Discussion [mailto:[hidden email]]Im Auftrag von Richard Ristow Gesendet: Samstag, 14. April 2007 21:35 An: [hidden email] Betreff: Re: Bug in DELETE VAR command? At 07:58 AM 4/13/2007, la volta statistics wrote: >I found a strange behaviour of SPSS 15.0.1. Raynald Levesque and I have both posted, confirming the behavior seen. I think it's time to open a case with Tech Support; have you done this? As a follow-up: I don't see the problem in 14.0.2. The following is 14.0.2 draft output (WRR-code & output not saved separately), and the results appear to be what they should be. It's the code I ran and posted for 15.0.1, though the values will be different because I didn't explicitly seed the random number generator. Final output listing: .................... IF Pat = 1 AND (Missing(ABB)) StrangeBehaviour = 1. IF Pat = 1 AND (Missing(ABB2)) Strange2 = 1. LIST. List |-----------------------------|---------------------------| |Output Created |14-APR-2007 15:23:40 | |-----------------------------|---------------------------| test pat ABB ABB2 StrangeBehaviour Strange2 28.99 . 1.00 1.00 . . 5.52 . 1.00 1.00 . . 8.93 . 1.00 1.00 . . 5.61 . 1.00 1.00 . . 60.25 1.00 1.00 1.00 . . 87.04 1.00 . . 1.00 1.00 67.79 1.00 1.00 1.00 . . 63.91 1.00 1.00 1.00 . . 95.94 1.00 . . 1.00 1.00 66.97 1.00 1.00 1.00 . . 75.40 1.00 . . 1.00 1.00 19.18 . 1.00 1.00 . . 66.89 1.00 1.00 1.00 . . 40.62 . 1.00 1.00 . . 6.53 . 1.00 1.00 . . 52.25 . 1.00 1.00 . . 56.46 . 1.00 1.00 . . 67.43 1.00 1.00 1.00 . . 54.80 . 1.00 1.00 . . 77.63 1.00 . . 1.00 1.00 Number of cases read: 20 Number of cases listed: 20 .................... Full code and output: .................... * Code to reproduce the bug. new file. input program. loop #i = 1 to 20. compute test = rv.uniform(0,100). end case. end loop. end file. end input program. LIST. List |-----------------------------|---------------------------| |Output Created |14-APR-2007 15:23:39 | |-----------------------------|---------------------------| test 28.99 5.52 8.93 5.61 60.25 87.04 67.79 63.91 95.94 66.97 75.40 19.18 66.89 40.62 6.53 52.25 56.46 67.43 54.80 77.63 Number of cases read: 20 Number of cases listed: 20 If Test > 60 pat = 1. IF Test < 70 flag = 1. IF Flag = 1 ABB = 1. COMPUTE ABB2 = Flag. LIST. List |-----------------------------|---------------------------| |Output Created |14-APR-2007 15:23:40 | |-----------------------------|---------------------------| test pat flag ABB ABB2 28.99 . 1.00 1.00 1.00 5.52 . 1.00 1.00 1.00 8.93 . 1.00 1.00 1.00 5.61 . 1.00 1.00 1.00 60.25 1.00 1.00 1.00 1.00 87.04 1.00 . . . 67.79 1.00 1.00 1.00 1.00 63.91 1.00 1.00 1.00 1.00 95.94 1.00 . . . 66.97 1.00 1.00 1.00 1.00 75.40 1.00 . . . 19.18 . 1.00 1.00 1.00 66.89 1.00 1.00 1.00 1.00 40.62 . 1.00 1.00 1.00 6.53 . 1.00 1.00 1.00 52.25 . 1.00 1.00 1.00 56.46 . 1.00 1.00 1.00 67.43 1.00 1.00 1.00 1.00 54.80 . 1.00 1.00 1.00 77.63 1.00 . . . Number of cases read: 20 Number of cases listed: 20 DELETE VAR Flag. LIST. List |-----------------------------|---------------------------| |Output Created |14-APR-2007 15:23:40 | |-----------------------------|---------------------------| test pat ABB ABB2 28.99 . 1.00 1.00 5.52 . 1.00 1.00 8.93 . 1.00 1.00 5.61 . 1.00 1.00 60.25 1.00 1.00 1.00 87.04 1.00 . . 67.79 1.00 1.00 1.00 63.91 1.00 1.00 1.00 95.94 1.00 . . 66.97 1.00 1.00 1.00 75.40 1.00 . . 19.18 . 1.00 1.00 66.89 1.00 1.00 1.00 40.62 . 1.00 1.00 6.53 . 1.00 1.00 52.25 . 1.00 1.00 56.46 . 1.00 1.00 67.43 1.00 1.00 1.00 54.80 . 1.00 1.00 77.63 1.00 . . Number of cases read: 20 Number of cases listed: 20 * Here comes the bug. IF Pat = 1 AND (Missing(ABB)) StrangeBehaviour = 1. IF Pat = 1 AND (Missing(ABB2)) Strange2 = 1. LIST. List |-----------------------------|---------------------------| |Output Created |14-APR-2007 15:23:40 | |-----------------------------|---------------------------| test pat ABB ABB2 StrangeBehaviour Strange2 28.99 . 1.00 1.00 . . 5.52 . 1.00 1.00 . . 8.93 . 1.00 1.00 . . 5.61 . 1.00 1.00 . . 60.25 1.00 1.00 1.00 . . 87.04 1.00 . . 1.00 1.00 67.79 1.00 1.00 1.00 . . 63.91 1.00 1.00 1.00 . . 95.94 1.00 . . 1.00 1.00 66.97 1.00 1.00 1.00 . . 75.40 1.00 . . 1.00 1.00 19.18 . 1.00 1.00 . . 66.89 1.00 1.00 1.00 . . 40.62 . 1.00 1.00 . . 6.53 . 1.00 1.00 . . 52.25 . 1.00 1.00 . . 56.46 . 1.00 1.00 . . 67.43 1.00 1.00 1.00 . . 54.80 . 1.00 1.00 . . 77.63 1.00 . . 1.00 1.00 Number of cases read: 20 Number of cases listed: 20 |
In reply to this post by Richard Ristow
Dear colleagues,
I have been working on a project wherein I am creating charts to show how mean GPAs compare ACROSS several different ability goups and BETWEEN two residential groups. I have done the sig tests and all, but am developing charts to SHOW the data some administrators. Yes, I can show tables of numbers and p values, and present statments that mean 1 is significantly greater than mean 2, etc. Honestly, that is not really what this audience is interested in. I really don't think tables of numbers will make an impact. The charting scheme I use makes use of error bars to show the uncertantly of my point estimates. As I began thinking about the numbers I had available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I have used CI and never reall considered an alternative (perhaps due to the nature of my research). I figured someone out there had a really good answer. Well, I have been disappointed. My research has revealed the following: Whether you use SE, SD of CI depends largely on your discipline, your question, your sample size..., etc. Strangely enough, I really don't recall a common use of error bars in the education/psych journals that I grew up with. All this is a slam in the face of my hero (cited below) and inventor fo the box plot, John Wilder Tukey, who would have considered the display of uncertainly in charts nothing less than standard (read, required) practice. SPSS certainly makes it easy enough to include each of the 3 types and alter the criteria (CI 90%, SE*2, etc.). OK, all of that to ask this: Have you all, in your years of experience and practice developed an opinion about the proper (or preferred) use of SD, SE, CI error bars in particular suituations or with particular audiences in the social sciences? Do you prefer SE (typically about half the width of CI with decent sample sizes)? Do you even think it's important? I really wonder at this last question, considering the lack of use in the published ed and psych literature. I respectfully surrender the soapbox to the next confused (or bored) soul. Mark *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962) |
http://psyphz.psych.wisc.edu/%7Eshackman/mediation_moderation_resources.htm#Resources_for_Within-Subjects_%28Repeated_
the unpublished paper by christian schunn is particularly insightful hth, alex shackman On 4/26/07, Mark A Davenport MADAVENP <[hidden email]> wrote: > > Dear colleagues, > > I have been working on a project wherein I am creating charts to show how > mean GPAs compare ACROSS several different ability goups and BETWEEN two > residential groups. I have done the sig tests and all, but am developing > charts to SHOW the data some administrators. Yes, I can show tables of > numbers and p values, and present statments that mean 1 is significantly > greater than mean 2, etc. Honestly, that is not really what this audience > is interested in. I really don't think tables of numbers will make an > impact. > > The charting scheme I use makes use of error bars to show the uncertantly > of my point estimates. As I began thinking about the numbers I had > available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I > have used CI and never reall considered an alternative (perhaps due to the > nature of my research). I figured someone out there had a really good > answer. Well, I have been disappointed. My research has revealed the > following: > > Whether you use SE, SD of CI depends largely on your discipline, your > question, your sample size..., etc. Strangely enough, I really don't > recall a common use of error bars in the education/psych journals that I > grew up with. All this is a slam in the face of my hero (cited below) and > inventor fo the box plot, John Wilder Tukey, who would have considered the > display of uncertainly in charts nothing less than standard (read, > required) practice. SPSS certainly makes it easy enough to include each > of the 3 types and alter the criteria (CI 90%, SE*2, etc.). > > OK, all of that to ask this: Have you all, in your years of experience and > practice developed an opinion about the proper (or preferred) use of SD, > SE, CI error bars in particular suituations or with particular audiences > in the social sciences? Do you prefer SE (typically about half the width > of CI with decent sample sizes)? Do you even think it's important? I > really wonder at this last question, considering the lack of use in the > published ed and psych literature. > > I respectfully surrender the soapbox to the next confused (or bored) soul. > > Mark > > > *************************************************************************************************************************************************************** > Mark A. Davenport Ph.D. > Senior Research Analyst > Office of Institutional Research > The University of North Carolina at Greensboro > 336.256.0395 > [hidden email] > > 'An approximate answer to the right question is worth a good deal more > than an exact answer to an approximate question.' --a paraphrase of J. W. > Tukey (1962) > -- Alexander J. Shackman Laboratory for Affective Neuroscience Waisman Laboratory for Brain Imaging & Behavior University of Wisconsin-Madison 1202 West Johnson Street Madison, Wisconsin 53706 Telephone: +1 (608) 358-5025 FAX: +1 (608) 265-2875 EMAIL: [hidden email] http://psyphz.psych.wisc.edu/~shackman |
This PowerPoint presentation by Jody Culham was also very helpful
http://defiant.ssc.uwo.ca/Jody_web/Culham_Lab_Docs/Advice/ErrorBars_Lecture5.ppt *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962) "Alexander J. Shackman" <[hidden email]> Sent by: [hidden email] 04/26/2007 11:56 AM Please respond to [hidden email] To "Mark A Davenport MADAVENP" <[hidden email]> cc [hidden email] Subject Re: [SPSSX-L] Question: Use (misuse, non-use) of error bars http://psyphz.psych.wisc.edu/%7Eshackman/mediation_moderation_resources.htm#Resources_for_Within-Subjects_%28Repeated_ the unpublished paper by christian schunn is particularly insightful hth, alex shackman On 4/26/07, Mark A Davenport MADAVENP < [hidden email]> wrote: Dear colleagues, I have been working on a project wherein I am creating charts to show how mean GPAs compare ACROSS several different ability goups and BETWEEN two residential groups. I have done the sig tests and all, but am developing charts to SHOW the data some administrators. Yes, I can show tables of numbers and p values, and present statments that mean 1 is significantly greater than mean 2, etc. Honestly, that is not really what this audience is interested in. I really don't think tables of numbers will make an impact. The charting scheme I use makes use of error bars to show the uncertantly of my point estimates. As I began thinking about the numbers I had available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I have used CI and never reall considered an alternative (perhaps due to the nature of my research). I figured someone out there had a really good answer. Well, I have been disappointed. My research has revealed the following: Whether you use SE, SD of CI depends largely on your discipline, your question, your sample size..., etc. Strangely enough, I really don't recall a common use of error bars in the education/psych journals that I grew up with. All this is a slam in the face of my hero (cited below) and inventor fo the box plot, John Wilder Tukey, who would have considered the display of uncertainly in charts nothing less than standard (read, required) practice. SPSS certainly makes it easy enough to include each of the 3 types and alter the criteria (CI 90%, SE*2, etc.). OK, all of that to ask this: Have you all, in your years of experience and practice developed an opinion about the proper (or preferred) use of SD, SE, CI error bars in particular suituations or with particular audiences in the social sciences? Do you prefer SE (typically about half the width of CI with decent sample sizes)? Do you even think it's important? I really wonder at this last question, considering the lack of use in the published ed and psych literature. I respectfully surrender the soapbox to the next confused (or bored) soul. Mark *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962) -- Alexander J. Shackman Laboratory for Affective Neuroscience Waisman Laboratory for Brain Imaging & Behavior University of Wisconsin-Madison 1202 West Johnson Street Madison, Wisconsin 53706 Telephone: +1 (608) 358-5025 FAX: +1 (608) 265-2875 EMAIL: [hidden email] http://psyphz.psych.wisc.edu/~shackman |
In reply to this post by Mark A Davenport MADAVENP
Mark,
I'm sure you will get more learned opinions than mine, but I have also struggled with the issue. I've never considered using CI, but wrestled with SE vs SD. As I understand it, the choice relates to your purpose: Use SE if you want to show how "close" your group means are likely to be to the population mean. Use SD if you are interested in showing the variability of your particular samples, without concern for the population values. Thus, if you are using statistical tests of significance, SE's are probably most appropriate. That's my 2 cents. My piggy-back question: Is there a way in SPSS to have error bars appear on bar charts, particularly if using the GUI? I have yet to figure that one out. Fred -- Fredric E. Rose, Ph.D. Assistant Professor of Psychology Palomar College 760-744-1150 x2344 [hidden email] On 4/26/07 8:49 AM, "Mark A Davenport MADAVENP" <[hidden email]> wrote: > Dear colleagues, > > I have been working on a project wherein I am creating charts to show how > mean GPAs compare ACROSS several different ability goups and BETWEEN two > residential groups. I have done the sig tests and all, but am developing > charts to SHOW the data some administrators. Yes, I can show tables of > numbers and p values, and present statments that mean 1 is significantly > greater than mean 2, etc. Honestly, that is not really what this audience > is interested in. I really don't think tables of numbers will make an > impact. > > The charting scheme I use makes use of error bars to show the uncertantly > of my point estimates. As I began thinking about the numbers I had > available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I > have used CI and never reall considered an alternative (perhaps due to the > nature of my research). I figured someone out there had a really good > answer. Well, I have been disappointed. My research has revealed the > following: > > Whether you use SE, SD of CI depends largely on your discipline, your > question, your sample size..., etc. Strangely enough, I really don't > recall a common use of error bars in the education/psych journals that I > grew up with. All this is a slam in the face of my hero (cited below) and > inventor fo the box plot, John Wilder Tukey, who would have considered the > display of uncertainly in charts nothing less than standard (read, > required) practice. SPSS certainly makes it easy enough to include each > of the 3 types and alter the criteria (CI 90%, SE*2, etc.). > > OK, all of that to ask this: Have you all, in your years of experience and > practice developed an opinion about the proper (or preferred) use of SD, > SE, CI error bars in particular suituations or with particular audiences > in the social sciences? Do you prefer SE (typically about half the width > of CI with decent sample sizes)? Do you even think it's important? I > really wonder at this last question, considering the lack of use in the > published ed and psych literature. > > I respectfully surrender the soapbox to the next confused (or bored) soul. > > Mark > > ****************************************************************************** > ****************************************************************************** > *** > Mark A. Davenport Ph.D. > Senior Research Analyst > Office of Institutional Research > The University of North Carolina at Greensboro > 336.256.0395 > [hidden email] > > 'An approximate answer to the right question is worth a good deal more > than an exact answer to an approximate question.' --a paraphrase of J. W. > Tukey (1962) |
In reply to this post by Mark A Davenport MADAVENP
Hi Mark
At least in my field of work (biomedical research mainly), I've found that the book "How to Report Statistics in Medicine" (Lang & Secic, ACP Series) is a *must have*. It gives clear indications about which (and why) is the best approach to the presentation of data, either numerically (tables) or graphically. To keep it short: 1) Avoid the use of "dynamite pushers" graphs (*) 2) Avoid the use of SE. If the graph is simply descriptive, use SD, if you want to show the precision of your estimates, then use CI. (*) A bit of ASCII art (use courier font to view the figures): _ | |--|--| | | | | ----| |----- Use this type of graphs instead: _ | | | O | | | _ I hope this helps, Marta García-Granero MADM> I have been working on a project wherein I am creating charts to show how MADM> mean GPAs compare ACROSS several different ability goups and BETWEEN two MADM> residential groups. I have done the sig tests and all, but am developing MADM> charts to SHOW the data some administrators. Yes, I can show tables of MADM> numbers and p values, and present statments that mean 1 is significantly MADM> greater than mean 2, etc. Honestly, that is not really what this audience MADM> is interested in. I really don't think tables of numbers will make an MADM> impact. MADM> The charting scheme I use makes use of error bars to show the uncertantly MADM> of my point estimates. As I began thinking about the numbers I had MADM> available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I MADM> have used CI and never reall considered an alternative (perhaps due to the MADM> nature of my research). I figured someone out there had a really good MADM> answer. Well, I have been disappointed. My research has revealed the MADM> following: MADM> Whether you use SE, SD of CI depends largely on your discipline, your MADM> question, your sample size..., etc. Strangely enough, I really don't MADM> recall a common use of error bars in the education/psych journals that I MADM> grew up with. All this is a slam in the face of my hero (cited below) and MADM> inventor fo the box plot, John Wilder Tukey, who would have considered the MADM> display of uncertainly in charts nothing less than standard (read, MADM> required) practice. SPSS certainly makes it easy enough to include each MADM> of the 3 types and alter the criteria (CI 90%, SE*2, etc.). MADM> OK, all of that to ask this: Have you all, in your years of experience and MADM> practice developed an opinion about the proper (or preferred) use of SD, MADM> SE, CI error bars in particular suituations or with particular audiences MADM> in the social sciences? Do you prefer SE (typically about half the width MADM> of CI with decent sample sizes)? Do you even think it's important? I MADM> really wonder at this last question, considering the lack of use in the MADM> published ed and psych literature. MADM> I respectfully surrender the soapbox to the next confused (or bored) soul. |
In reply to this post by Rose, Fred
Ah, this is weird. maybe Jon or ViAnn can chime in on this one. I was
trying to respond to Frederic's question and found that error bars are easy enough in the old chart method and the Interactive Graphs method. However, I tried to make the same bar chart using the Chart Builder and the error bars option was greyed out. I could never get it to present itself. What gives? *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962) "Fredric E. Rose, Ph.D." <[hidden email]> Sent by: "SPSSX(r) Discussion" <[hidden email]> 04/26/2007 11:59 AM Please respond to "Fredric E. Rose, Ph.D." <[hidden email]> To [hidden email] cc Subject Re: Question: Use (misuse, non-use) of error bars Mark, I'm sure you will get more learned opinions than mine, but I have also struggled with the issue. I've never considered using CI, but wrestled with SE vs SD. As I understand it, the choice relates to your purpose: Use SE if you want to show how "close" your group means are likely to be to the population mean. Use SD if you are interested in showing the variability of your particular samples, without concern for the population values. Thus, if you are using statistical tests of significance, SE's are probably most appropriate. That's my 2 cents. My piggy-back question: Is there a way in SPSS to have error bars appear on bar charts, particularly if using the GUI? I have yet to figure that one out. Fred -- Fredric E. Rose, Ph.D. Assistant Professor of Psychology Palomar College 760-744-1150 x2344 [hidden email] On 4/26/07 8:49 AM, "Mark A Davenport MADAVENP" <[hidden email]> wrote: > Dear colleagues, > > I have been working on a project wherein I am creating charts to show how > mean GPAs compare ACROSS several different ability goups and BETWEEN two > residential groups. I have done the sig tests and all, but am developing > charts to SHOW the data some administrators. Yes, I can show tables of > numbers and p values, and present statments that mean 1 is significantly > greater than mean 2, etc. Honestly, that is not really what this audience > is interested in. I really don't think tables of numbers will make an > impact. > > The charting scheme I use makes use of error bars to show the uncertantly > of my point estimates. As I began thinking about the numbers I had > available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I > have used CI and never reall considered an alternative (perhaps due to the > nature of my research). I figured someone out there had a really good > answer. Well, I have been disappointed. My research has revealed the > following: > > Whether you use SE, SD of CI depends largely on your discipline, your > question, your sample size..., etc. Strangely enough, I really don't > recall a common use of error bars in the education/psych journals that I > grew up with. All this is a slam in the face of my hero (cited below) and > inventor fo the box plot, John Wilder Tukey, who would have considered the > display of uncertainly in charts nothing less than standard (read, > required) practice. SPSS certainly makes it easy enough to include each > of the 3 types and alter the criteria (CI 90%, SE*2, etc.). > > OK, all of that to ask this: Have you all, in your years of experience and > practice developed an opinion about the proper (or preferred) use of SD, > SE, CI error bars in particular suituations or with particular audiences > in the social sciences? Do you prefer SE (typically about half the width > of CI with decent sample sizes)? Do you even think it's important? I > really wonder at this last question, considering the lack of use in the > published ed and psych literature. > > I respectfully surrender the soapbox to the next confused (or bored) soul. > > Mark > > ****************************************************************************** > ****************************************************************************** > *** > Mark A. Davenport Ph.D. > Senior Research Analyst > Office of Institutional Research > The University of North Carolina at Greensboro > 336.256.0395 > [hidden email] > > 'An approximate answer to the right question is worth a good deal more > than an exact answer to an approximate question.' --a paraphrase of J. > Tukey (1962) |
In reply to this post by Alexander J. Shackman-2
Here is another interesting read on the topic of error bars. This is the
one that piqued my self-doubt. http://scienceblogs.com/cognitivedaily/2007/03/most_researchers_dont_understa.php *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962) |
In reply to this post by Rose, Fred
To get error bars on charts using the legacy graph dialogs, go to Options sub-dialog. You can get error bars on bar charts, line charts, area charts, ...
You can also get error bars using Chart Builder in version 15 by specifying a count, mean, or median statistic and turning on error bars in the properties sheet for the element. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Fredric E. Rose, Ph.D. Sent: Thursday, April 26, 2007 10:00 AM To: [hidden email] Subject: Re: Question: Use (misuse, non-use) of error bars Mark, I'm sure you will get more learned opinions than mine, but I have also struggled with the issue. I've never considered using CI, but wrestled with SE vs SD. As I understand it, the choice relates to your purpose: Use SE if you want to show how "close" your group means are likely to be to the population mean. Use SD if you are interested in showing the variability of your particular samples, without concern for the population values. Thus, if you are using statistical tests of significance, SE's are probably most appropriate. That's my 2 cents. My piggy-back question: Is there a way in SPSS to have error bars appear on bar charts, particularly if using the GUI? I have yet to figure that one out. Fred -- Fredric E. Rose, Ph.D. Assistant Professor of Psychology Palomar College 760-744-1150 x2344 [hidden email] On 4/26/07 8:49 AM, "Mark A Davenport MADAVENP" <[hidden email]> wrote: > Dear colleagues, > > I have been working on a project wherein I am creating charts to show how > mean GPAs compare ACROSS several different ability goups and BETWEEN two > residential groups. I have done the sig tests and all, but am developing > charts to SHOW the data some administrators. Yes, I can show tables of > numbers and p values, and present statments that mean 1 is significantly > greater than mean 2, etc. Honestly, that is not really what this audience > is interested in. I really don't think tables of numbers will make an > impact. > > The charting scheme I use makes use of error bars to show the uncertantly > of my point estimates. As I began thinking about the numbers I had > available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I > have used CI and never reall considered an alternative (perhaps due to the > nature of my research). I figured someone out there had a really good > answer. Well, I have been disappointed. My research has revealed the > following: > > Whether you use SE, SD of CI depends largely on your discipline, your > question, your sample size..., etc. Strangely enough, I really don't > recall a common use of error bars in the education/psych journals that I > grew up with. All this is a slam in the face of my hero (cited below) and > inventor fo the box plot, John Wilder Tukey, who would have considered the > display of uncertainly in charts nothing less than standard (read, > required) practice. SPSS certainly makes it easy enough to include each > of the 3 types and alter the criteria (CI 90%, SE*2, etc.). > > OK, all of that to ask this: Have you all, in your years of experience and > practice developed an opinion about the proper (or preferred) use of SD, > SE, CI error bars in particular suituations or with particular audiences > in the social sciences? Do you prefer SE (typically about half the width > of CI with decent sample sizes)? Do you even think it's important? I > really wonder at this last question, considering the lack of use in the > published ed and psych literature. > > I respectfully surrender the soapbox to the next confused (or bored) soul. > > Mark > > ****************************************************************************** > ****************************************************************************** > *** > Mark A. Davenport Ph.D. > Senior Research Analyst > Office of Institutional Research > The University of North Carolina at Greensboro > 336.256.0395 > [hidden email] > > 'An approximate answer to the right question is worth a good deal more > than an exact answer to an approximate question.' --a paraphrase of J. W. > Tukey (1962) |
In reply to this post by Mark A Davenport MADAVENP
Dear Mark,
I can recommend the following article: Goldstein, H. and Healy, M. J. R. (1995): "The graphical presentation of a collection of means". Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 158, No. 1. (1995), pp. 175-177. The method described in this article is often used in the context of multilevel analysis. Best, Henrik -- Henrik Lolle Institut for Økonomi, Politik og Forvaltning Fibigerstræde 1 Aalborg Universitet 9200 Aalborg Ø. Tlf.: 96 35 81 84 Quoting Mark A Davenport MADAVENP <[hidden email]>: > Dear colleagues, > > I have been working on a project wherein I am creating charts to show how > mean GPAs compare ACROSS several different ability goups and BETWEEN two > residential groups. I have done the sig tests and all, but am developing > charts to SHOW the data some administrators. Yes, I can show tables of > numbers and p values, and present statments that mean 1 is significantly > greater than mean 2, etc. Honestly, that is not really what this audience > is interested in. I really don't think tables of numbers will make an > impact. > > The charting scheme I use makes use of error bars to show the uncertantly > of my point estimates. As I began thinking about the numbers I had > available (SDs, SE, 95% CIs, I began to wonder, 'Which should I use?" I > have used CI and never reall considered an alternative (perhaps due to the > nature of my research). I figured someone out there had a really good > answer. Well, I have been disappointed. My research has revealed the > following: > > Whether you use SE, SD of CI depends largely on your discipline, your > question, your sample size..., etc. Strangely enough, I really don't > recall a common use of error bars in the education/psych journals that I > grew up with. All this is a slam in the face of my hero (cited below) and > inventor fo the box plot, John Wilder Tukey, who would have considered the > display of uncertainly in charts nothing less than standard (read, > required) practice. SPSS certainly makes it easy enough to include each > of the 3 types and alter the criteria (CI 90%, SE*2, etc.). > > OK, all of that to ask this: Have you all, in your years of experience and > practice developed an opinion about the proper (or preferred) use of SD, > SE, CI error bars in particular suituations or with particular audiences > in the social sciences? Do you prefer SE (typically about half the width > of CI with decent sample sizes)? Do you even think it's important? I > really wonder at this last question, considering the lack of use in the > published ed and psych literature. > > I respectfully surrender the soapbox to the next confused (or bored) soul. > > Mark > > *************************************************************************************************************************************************************** > Mark A. Davenport Ph.D. > Senior Research Analyst > Office of Institutional Research > The University of North Carolina at Greensboro > 336.256.0395 > [hidden email] > > 'An approximate answer to the right question is worth a good deal more > than an exact answer to an approximate question.' --a paraphrase of J. W. > Tukey (1962) > |
Free forum by Nabble | Edit this page |