|
Hi Everyone,
Could anyone suggest a method of transforming a data set of percentages where there are many zero values and the remainder are small values. I have tried various transformations including square, square root, log, and Fidell & Tabachnik's p. 83 suggestion for an "L-shaped with zeros" distribution ( ie constant addedd to each score so smallest = 1), without much success. I am trying to produce a transformation such that you can get a visualisation of the frequency distribution where it is possible to compare groups (and see the overlap between them). Many thanks Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Open a new instance of SPSS. Copy the syntax below to a syntax file.
Click <run>. Click <all>. Is this what you are looking for? new file. * generate distributions. input program. loop id = 1 to 50. compute group =1. compute original = rv.normal(0, .1). end case. end loop. loop id = 101 to 150. compute group =2. compute original = rv.normal(0, .12). end case. end loop. loop id = 201 to 250. compute group =3. compute original = rv.normal(0, .13). end case. end loop. end file. end input program. recode original (lo thru 0=0)(else=copy). formats group (f1) id(f3). var level group id (nominal) original (scale). execute. * have some data now look at it. EXAMINE VARIABLES=original BY group /ID=id /PLOT BOXPLOT HISTOGRAM /COMPARE GROUPS /PERCENTILES(5,10,25,50,75,90,95) HAVERAGE /STATISTICS DESCRIPTIVES EXTREME /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. Art Kendall Social Research Consultants Clive Downs wrote: > Hi Everyone, > > Could anyone suggest a method of transforming a data set of percentages > where there are many zero values and the remainder are small values. > > I have tried various transformations including square, square root, log, > and Fidell & Tabachnik's p. 83 suggestion for an "L-shaped with zeros" > distribution ( ie constant addedd to each score so smallest = 1), without > much success. > > I am trying to produce a transformation such that you can get a > visualisation of the frequency distribution where it is possible to compare > groups (and see the overlap between them). > > Many thanks > > Clive. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
What aspects of the data are you trying to visualize? What do you mean to "be visualized better"? Of course a lot depends on the intrinsic meaning of your data. Would such a generating function make substantive sense? Would a transformation? Try diddling with the generation of the random variables until you get a distribution that looks more like yours., e.g., the number of groups, the n in each, the proportion of zero, etc. By lowering the center of the normal function you will have more zeros after the recode. By raising it you will have fewer zeros. It may be that some function other than the normal will generate a better looking set of data. If you have a good guess at how the data might be generated then you have a lead on how to display it. e.g., just the non-zero values (zero case weight if a zero value) or a small case weight if the value is zero) etc. Did you try normal curve equivalents or the percentiles of NCEs if they make sense in your context? Art Kendall Social Research Consultants Downs, Clive wrote: Hi Art, Thank you for this. I've tried your syntax and that does generate data that is similar to that which I am working on, except that mine is more extreme - the histograms are "L" shaped. Your syntax produces the histograms, but what I was looking for was a *data transformation* that would enable the data to be visualised better. In its untransformed form it is so L-shaped that you can't see much of a distribution shape. I have tried an inverse transformation - this produces a better shape, but I am thinking through the interpretation of it. Thanks Regards Clive. -----Original Message----- From: Art Kendall [[hidden email]] Sent: 20 March 2009 11:49 To: Downs, Clive Cc: [hidden email] Subject: Re: Data transformations Open a new instance of SPSS. Copy the syntax below to a syntax file. Click <run>. Click <all>. Is this what you are looking for? new file. * generate distributions. input program. loop id = 1 to 50. compute group =1. compute original = rv.normal(0, .1). end case. end loop. loop id = 101 to 150. compute group =2. compute original = rv.normal(0, .12). end case. end loop. loop id = 201 to 250. compute group =3. compute original = rv.normal(0, .13). end case. end loop. end file. end input program. recode original (lo thru 0=0)(else=copy). formats group (f1) id(f3). var level group id (nominal) original (scale). execute. * have some data now look at it. EXAMINE VARIABLES=original BY group /ID=id /PLOT BOXPLOT HISTOGRAM /COMPARE GROUPS /PERCENTILES(5,10,25,50,75,90,95) HAVERAGE /STATISTICS DESCRIPTIVES EXTREME /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. Art Kendall Social Research Consultants Clive Downs wrote: |
|
Dear List,
I would like to make a histogram with varying bin widths in SPSS (15). For example a histogram with the following data: Age Frequency --- --------- 0-4 28 5-9 46 10-15 58 16 20 17 31 18-19 64 20-24 149 25-59 316 Any help will be highly appreciated! Pieter van Groenestijn -- RadboudUniversiteit Nijmegen Faculteit Sociale Wetenschappen Research Technische OndersteuningsGroep Thomas van Aquinostraat 4.00.51 tel: 024-3612035 fax: 024-3612351 email: [hidden email] hp: http://www.ru.nl/fsw/rtog/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Clive Downs
Hi Clive,
Do you really need to transform your data if your goal is to visualise differences between groups? I would use two charts: - bar chart with percentages of zero and of non-zero values - separate histogram for nonzero values (what's the distribution of positive values?) And using GPL, it is possible to prepare a nice visualisation within one chart. Maybe someone already has a code doing it? This should be relatively an often problem for some types of data. Regards, Mariusz -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Clive Downs Sent: Friday, March 20, 2009 9:49 AM To: [hidden email] Subject: Data transformations Hi Everyone, Could anyone suggest a method of transforming a data set of percentages where there are many zero values and the remainder are small values. I have tried various transformations including square, square root, log, and Fidell & Tabachnik's p. 83 suggestion for an "L-shaped with zeros" distribution ( ie constant addedd to each score so smallest = 1), without much success. I am trying to produce a transformation such that you can get a visualisation of the frequency distribution where it is possible to compare groups (and see the overlap between them). Many thanks Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by P.van Groenestijn
Open a new instance of SPSS. Copy the syntax below to a syntax file.
Click <run>. Click <all>. Is this what you are looking for? * if the data is already aggregated. data list list/category(a12) kount(f3). begin data 1 28 2 46 3 58 4 20 5 31 6 64 7 149 8 316 end data. Missing values category (-1). value labels category 1 '0-4' 2 '5-9' 3 '10-15' 4 '16' 5 '17' 6 '18-19' 7 '20-24' 8 '25 thru 59' -1 'out of range or missing'. *syntax below generated via GUI. * Chart Builder. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=category kount MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: category=col(source(s), name("category"), unit.category()) DATA: kount=col(source(s), name("kount"), unit.category()) GUIDE: axis(dim(1), label("category")) GUIDE: axis(dim(2), label("kount")) SCALE: cat(dim(1), include("-1", "1", "2", "3", "4", "5", "6", "7", "8")) ELEMENT: interval(position(category*kount), shape.interior(shape.square)) END GPL. ------------ *if you have raw case wise data recode into your categories. recode oldvar (0 thru 4 =1) (5 thru 9 =2) (10 thru 15 =3) (16=4)(17=5) (18,19 = 6) (20 thru 24= 7) (25 thru 59 =8) (else= -1). frequencies vars = categories ... / histogram ... Art Kendall Social Research Consultants P.van Groenestijn wrote: > Dear List, > > I would like to make a histogram with varying bin widths in SPSS (15). > For example a histogram with the following data: > > Age Frequency > --- --------- > 0-4 28 > 5-9 46 > 10-15 58 > 16 20 > 17 31 > 18-19 64 > 20-24 149 > 25-59 316 > > Any help will be highly appreciated! > > Pieter van Groenestijn > -- > RadboudUniversiteit Nijmegen > Faculteit Sociale Wetenschappen > Research Technische OndersteuningsGroep > Thomas van Aquinostraat 4.00.51 > tel: 024-3612035 > fax: 024-3612351 > email: [hidden email] > hp: http://www.ru.nl/fsw/rtog/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Dear Art,
Thank you for your reply. It is , however, not exactly where I am looking for. In the output the witdhs of the bars should be varying. In the example below the width of category 1 should be 5 times the width of category 4. In stata this can be done as explained on http://www.stata.com/support/faqs/graphics/histvary.html Is it possible to make a similar graph in SPSS 15? Pieter van Groenestijn Art Kendall wrote: > Open a new instance of SPSS. Copy the syntax below to a syntax file. > Click <run>. Click <all>. > Is this what you are looking for? > > > * if the data is already aggregated. > data list list/category(a12) kount(f3). > begin data > 1 28 > 2 46 > 3 58 > 4 20 > 5 31 > 6 64 > 7 149 > 8 316 > end data. > > > > Missing values category (-1). > value labels category > 1 '0-4' > 2 '5-9' > 3 '10-15' > 4 '16' > 5 '17' > 6 '18-19' > 7 '20-24' > 8 '25 thru 59' > -1 'out of range or missing'. > > *syntax below generated via GUI. > * Chart Builder. > GGRAPH > /GRAPHDATASET NAME="graphdataset" VARIABLES=category kount > MISSING=LISTWISE REPORTMISSING=NO > /GRAPHSPEC SOURCE=INLINE. > BEGIN GPL > SOURCE: s=userSource(id("graphdataset")) > DATA: category=col(source(s), name("category"), unit.category()) > DATA: kount=col(source(s), name("kount"), unit.category()) > GUIDE: axis(dim(1), label("category")) > GUIDE: axis(dim(2), label("kount")) > SCALE: cat(dim(1), include("-1", "1", "2", "3", "4", "5", "6", "7", > "8")) > ELEMENT: interval(position(category*kount), > shape.interior(shape.square)) > END GPL. > ------------ > *if you have raw case wise data recode into your categories. > > recode oldvar (0 thru 4 =1) (5 thru 9 =2) (10 thru 15 =3) > (16=4)(17=5) (18,19 = 6) (20 thru 24= 7) (25 thru 59 =8) > (else= -1). > frequencies vars = categories ... / histogram ... > > Art Kendall > Social Research Consultants > > P.van Groenestijn wrote: >> Dear List, >> >> I would like to make a histogram with varying bin widths in SPSS (15). >> For example a histogram with the following data: >> >> Age Frequency >> --- --------- >> 0-4 28 >> 5-9 46 >> 10-15 58 >> 16 20 >> 17 31 >> 18-19 64 >> 20-24 149 >> 25-59 316 >> >> Any help will be highly appreciated! >> >> Pieter van Groenestijn >> -- >> RadboudUniversiteit Nijmegen >> Faculteit Sociale Wetenschappen >> Research Technische OndersteuningsGroep >> Thomas van Aquinostraat 4.00.51 >> tel: 024-3612035 >> fax: 024-3612351 >> email: [hidden email] >> hp: http://www.ru.nl/fsw/rtog/ >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
A possible kludge?
I do not have time to test this but suppose for each age value you had a case with the the count and the number of years in the interval. I.e., you want to assume that events are evenly spread across the interval. data list list/interval (f2) age (f2) kount (f3) width(f2). begin data 1 0 28 5 1 1 28 5 1 2 28 5 1 3 28 5 1 4 28 5 2 5 46 5 2 6 46 5 2 7 46 5 2 8 46 5 2 9 46 5 3 10 58 6 3 11 58 6 . . . 8 58 316 35 8 59 316 35 end data. compute y = kount/width. perhaps you can have each interval be a distinct color (blue for odd numbered intervals, green for even?) or you might be able to edit the graph to remove the interior bar outlines in an interval. If you cannot do that in the graph editor you could do it in Photoshop. HTH Art Kendall Social Research Consultants P.van Groenestijn wrote: > Dear Art, > > Thank you for your reply. > > It is , however, not exactly where I am looking for. > > In the output the witdhs of the bars should be varying. In the example > below the width of category 1 should be 5 times the width of category 4. > In stata this can be done as explained on > http://www.stata.com/support/faqs/graphics/histvary.html > Is it possible to make a similar graph in SPSS 15? > > Pieter van Groenestijn > > > Art Kendall wrote: >> Open a new instance of SPSS. Copy the syntax below to a syntax file. >> Click <run>. Click <all>. >> Is this what you are looking for? >> >> >> * if the data is already aggregated. >> data list list/category(a12) kount(f3). >> begin data >> 1 28 >> 2 46 >> 3 58 >> 4 20 >> 5 31 >> 6 64 >> 7 149 >> 8 316 >> end data. >> >> >> >> Missing values category (-1). >> value labels category >> 1 '0-4' >> 2 '5-9' >> 3 '10-15' >> 4 '16' >> 5 '17' >> 6 '18-19' >> 7 '20-24' >> 8 '25 thru 59' >> -1 'out of range or missing'. >> >> *syntax below generated via GUI. >> * Chart Builder. >> GGRAPH >> /GRAPHDATASET NAME="graphdataset" VARIABLES=category kount >> MISSING=LISTWISE REPORTMISSING=NO >> /GRAPHSPEC SOURCE=INLINE. >> BEGIN GPL >> SOURCE: s=userSource(id("graphdataset")) >> DATA: category=col(source(s), name("category"), unit.category()) >> DATA: kount=col(source(s), name("kount"), unit.category()) >> GUIDE: axis(dim(1), label("category")) >> GUIDE: axis(dim(2), label("kount")) >> SCALE: cat(dim(1), include("-1", "1", "2", "3", "4", "5", "6", "7", >> "8")) >> ELEMENT: interval(position(category*kount), >> shape.interior(shape.square)) >> END GPL. >> ------------ >> *if you have raw case wise data recode into your categories. >> >> recode oldvar (0 thru 4 =1) (5 thru 9 =2) (10 thru 15 =3) >> (16=4)(17=5) (18,19 = 6) (20 thru 24= 7) (25 thru 59 =8) >> (else= -1). >> frequencies vars = categories ... / histogram ... >> >> Art Kendall >> Social Research Consultants >> >> P.van Groenestijn wrote: >>> Dear List, >>> >>> I would like to make a histogram with varying bin widths in SPSS (15). >>> For example a histogram with the following data: >>> >>> Age Frequency >>> --- --------- >>> 0-4 28 >>> 5-9 46 >>> 10-15 58 >>> 16 20 >>> 17 31 >>> 18-19 64 >>> 20-24 149 >>> 25-59 316 >>> >>> Any help will be highly appreciated! >>> >>> Pieter van Groenestijn >>> -- >>> RadboudUniversiteit Nijmegen >>> Faculteit Sociale Wetenschappen >>> Research Technische OndersteuningsGroep >>> Thomas van Aquinostraat 4.00.51 >>> tel: 024-3612035 >>> fax: 024-3612351 >>> email: [hidden email] >>> hp: http://www.ru.nl/fsw/rtog/ >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except >>> the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >>> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by P.van Groenestijn
Dear Colleagues,
I apologize for this non-SPSS statistics question. Perhaps someone can enlighten my muddled thinking about estimating power and estimating margin of error. Is a margin of error calculation a particuliar type of power calculation? Or are they mathematically unrelated? My intution is that they are related, but I do not have a logical argument for persuading anyone else!
Thank you,
John
|
| Free forum by Nabble | Edit this page |
