|
Dear all,
I am working on a household dataset from India (n=21331) and am trying to calculate a Gini coefficient for income for each village (n=1451). I understand the formula and syntax for calculating Gini coefficients and am using the syntax below. My problem is that I can’t seem to figure out the proper way to calculate one for each village (variable name IDPSU). I have tried do if loops, select if’s and filters but can’t get it right. I may very well be missing something very simple. I would be very grateful for any help in solving this. Thanks in advance Mattias * Step 1. SORT CASES BY INCOME. * Step 2. AGGREGATE OUTFILE = * /PRESORTED /BREAK = INCOME /persons = N . WEIGHT BY persons. * Step 3. COMPUTE brk = 1. AGGREGATE OUTFILE = incagg.sav /BREAK = brk /suminc = SUM(INCOME). MATCH FILES / FILE = * / TABLE = incagg.sav / BY brk . EXECUTE. * Step 4 . DO IF ($CASENUM = 1). + COMPUTE cincome = persons * income . ELSE. + COMPUTE cincome = LAG(cincome) + persons * income . END IF. * Step 5 . COMPUTE pcinc = cincome/suminc . EXECUTE. * Step 6. RANK VARIABLES=income (A) /RFRACTION into cdfinc /PRINT=YES /TIES=HIGH . * Step 7. COMPUTE d1 = ($casenum = 1). COMPUTE d2 = ($casenum = 1). * Note that it doesn't matter whether D1 or D2 is the Y variable * in the D1-D2 pair. * D1 and D2 are identical and are created to allow you to draw a * diagonal line on the graph. GRAPH /SCATTERPLOT(OVERLAY)=cdfinc d2 WITH pcinc d1 (PAIR) /MISSING=LISTWISE /TITLE= 'Lorenz Curve for Income'. * Step 8. * Calculate and print the Gini coefficient. * For last case, LAREA is area under the Lorenz curve. DO IF ($casenum = 1) . + COMPUTE larea = 0. ELSE. + COMPUTE larea = LAG(larea) + (cdfinc - LAG(cdfinc)) * (pcinc + LAG(pcinc))/2 . END IF. IF (cdfinc = 1) gini = (.5 - larea)/.5 . REPORT /VARIABLES gini (VALUES) /BREAK (TOTAL) '' (SKIP(1)) /SUMMARY MAX( gini) SKIP(1) '' . |
|
Administrator
|
Maybe you need to include IDPSU in the code? Say in the AGGREATE as a BREAK and in the LAG logic?
You don't provide the formula for Gini or a usable reference and I don't feel like looking it up and reinventing code.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
In reply to this post by Mattias
Here is an example of calculating Gini coefficients across split files. It requires the R Essentials. If you are interested in comparing these villages, some notion of a confidence interval for the Gini coefficients might be helpful, so this code produces bootstrapped CIs as well.
This examples uses the employee data.sav file shipped with Statistics and computes the Gini statistics for salary for each job category. sort cases by jobcat. split files by jobcat. begin program r. tryCatch(library(DescTools), error=function(e) {install.packages("DescTools", repos="http://cran.us.r-project.org") library(DescTools)} ) while (!spssdata.IsLastSplit()) { dta = spssdata.GetSplitDataFromSPSS("jobcat salary") g = Gini(dta[[2]], conf.level=.95) print(sprintf("jobcat = %s %s %s %s", dta[[1,1]], g[1],g[2],g[3])) } end program. The first statement in the R code, tryCatch..., attempts to install the required package from the R repository. Once you have done that, you could remove that code. |
|
Hi,
Thank you for helping out. I tried running the syntax you suggested but get the following error message: Error # 6887. Command name: begin program External program failed during initialization. Execution of this command stops. Additional error message: create startx process is failure. Could it be that the syntax is no longer up to date? Thanks Mattias |
|
Administrator
|
More likely that python or R are absent or not properly installed?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
In reply to this post by Mattias
This indicates a problem with the R plugin
installation. Did you install the appropriate version of R for your
version of Statistics and the corresponding R Essentials (including matching
the bit size)?
Run this to see if your R plugin connection works at all. begin program r. print(sessionInfo()) end program. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Mattias <[hidden email]> To: [hidden email], Date: 02/20/2014 05:31 AM Subject: Re: [SPSSX-L] Calculating Gini coefficients for each subset (villages) of large data set Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi, Thank you for helping out. I tried running the syntax you suggested but get the following error message: Error # 6887. Command name: begin program External program failed during initialization. Execution of this command stops. Additional error message: create startx process is failure. Could it be that the syntax is no longer up to date? Thanks Mattias -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Calculating-Gini-coefficients-for-each-subset-villages-of-large-data-set-tp5724495p5724575.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I have never used R before and installed it only for this. When I run the last syntax you suggested Jon I get the following error message:
begin program r. print(sessionInfo()) end program. >Warning # 6894. Command name: begin program >The external program exit unexpectedly and lost its content, a new exteranl >program will startup to execute the rest of job. >Error # 6887. Command name: begin program >External program failed during initialization. >Execution of this command stops. Additional error message: create startx process is failure. |
|
In reply to this post by Jon K Peck
I am running spss 20 on Windows 7 professional 64-bit and have installed R 3.0.2, 32- and 64-bit and now also the Essentials for R for spss 20. I keep getting the same error message. I then tried installing R 2.12.1 but the same problem remains.
I also read here (https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014729480) that I am not the only person with the same problem. What should I do? |
|
Statistics 20 uses only R2.12.x. Any
other R versions are irrelevant. Make sure that you install the R
Essentials 32 or 64-bit version that matches with the 32 or 64-bit version
of Statistics. Just because you are running 64-bit Win7 doesn't mean
that you are running 64-bit Statistics.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Mattias <[hidden email]> To: [hidden email], Date: 02/20/2014 09:11 AM Subject: Re: [SPSSX-L] Calculating Gini coefficients for each subset (villages) of large data set Sent by: "SPSSX(r) Discussion" <[hidden email]> I am running spss 20 on Windows 7 professional 64-bit and have installed R 3.0.2, 32- and 64-bit and now also the Essentials for R for spss 20. I keep getting the same error message. I then tried installing R 2.12.1 but the same problem remains. I also read here (https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014729480) that I am not the only person with the same problem. What should I do? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Calculating-Gini-coefficients-for-each-subset-villages-of-large-data-set-tp5724495p5724584.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Mattias
A side note. At 08:19 AM 2/14/2014, Mattias wrote:
>I am working on a household dataset from India (n=21331) and am >trying to calculate a Gini coefficient for income for each village (n=1451). That's a mean of 21331/1451=14.7 households per village. Isn't that a little small for calculating a Gini coefficient? Should villages, especially smaller ones, be combined into, say, regional groups before calculating the Gini coefficient? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
And the population is probably not evenly
distributed. Using the Gini code I suggested, though, will include
confidence intervals in the output.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Richard Ristow <[hidden email]> To: [hidden email], Date: 02/20/2014 10:59 AM Subject: Re: [SPSSX-L] Calculating Gini coefficients for each subset (villages) of large data set Sent by: "SPSSX(r) Discussion" <[hidden email]> A side note. At 08:19 AM 2/14/2014, Mattias wrote: >I am working on a household dataset from India (n=21331) and am >trying to calculate a Gini coefficient for income for each village (n=1451). That's a mean of 21331/1451=14.7 households per village. Isn't that a little small for calculating a Gini coefficient? Should villages, especially smaller ones, be combined into, say, regional groups before calculating the Gini coefficient? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I finally managed to get the Essentials for R working (at least I get no error messages…)
First - Richard wrote: >That's a mean of 21331/1451=14.7 households per village. Isn't that a >little small for calculating a Gini coefficient? Yes it is, which is why I became interested in the syntax Jon suggested; because it provides CI’s. I have data on household income from the variable INCOME and village id from the variable IDPSU. Following Jon’s original suggestion I ran the following syntax: sort cases by IDPSU. split file by IDPSU. begin program r. tryCatch(library(DescTools), error=function(e) {install.packages("DescTools", repos="http://cran.us.r-project.org") library(DescTools)} ) while (!spssdata.IsLastSplit()) { dta = spssdata.GetSplitDataFromSPSS("IDPSU INCOME") g = Gini(dta[[2]], conf.level=.95) print(sprintf("IDPSU = %s %s %s %s", dta[[1,1]], g[1],g[2],g[3])) } end program. split file off. I get no error messages but also no Gini coefficients computed… What is not working? |
|
In reply to this post by Jon K Peck
I'd written,
>>[You have] a mean of 21331/1451=14.7 households per village. Isn't >>that a little small for calculating a Gini coefficient? At 02:25 PM 2/20/2014, Jon K Peck wrote: >And the population is probably not evenly distributed. Most definitely. If the mean is 15 households per village, you'll probably have many fewer for some villages, making the problem worse. >Using the Gini code I suggested, though, will include confidence >intervals in the output. However -- and this isn't about SPSS, or R, or the package -- Gini coefficients derived from small samples, and their calculated confidence intervals, should be viewed skeptically. If the distribution of income (or whatever) is highly concentrated, a small sample may underestimate the Gini coefficient badly, through failing to include any of the small, very-high-income subgroup in the sample. I doubt that any statistical methodology can compensate for this. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Richard
I was interested in your comment because it so happens I am calculating a lot of Ginis myself, (though I'm using Python not R). You said " a small sample may underestimate the Gini coefficient badly, through failing to include any of the small, very-high-income subgroup in the sample." I don't get your point I'm afraid. Of course the Gini coefficient calculated on a small sample will be subject to sampling error, and unrepresentative of the population at large, But it seems to me Mattias is not trying to estimate the Gini coefficient for the population at large, he is interested in the village level of analysis, and the distribution of village Ginis. If the high income group in the population is small, few villages will have high income members, and the distribution of village Ginis will (correctly) reflect that. Regards Garry -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: 21 February 2014 17:49 To: [hidden email] Subject: Re: Calculating Gini coefficients for each subset (villages) of large data set I'd written, >>[You have] a mean of 21331/1451=14.7 households per village. Isn't >>that a little small for calculating a Gini coefficient? At 02:25 PM 2/20/2014, Jon K Peck wrote: >And the population is probably not evenly distributed. Most definitely. If the mean is 15 households per village, you'll probably have many fewer for some villages, making the problem worse. >Using the Gini code I suggested, though, will include confidence >intervals in the output. However -- and this isn't about SPSS, or R, or the package -- Gini coefficients derived from small samples, and their calculated confidence intervals, should be viewed skeptically. If the distribution of income (or whatever) is highly concentrated, a small sample may underestimate the Gini coefficient badly, through failing to include any of the small, very-high-income subgroup in the sample. I doubt that any statistical methodology can compensate for this. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 03:54 AM 2/24/2014, Garry Gelade wrote:
>You said "a small sample may underestimate the Gini coefficient >badly, through failing to include any of the small, very-high-income >subgroup in the sample." > >Of course the Gini coefficient calculated on a small sample will be >subject to sampling error, and unrepresentative of the population at >large, But it seems to me Mattias is not trying to estimate the >Gini coefficient for the population at large, he is interested in >the village level of analysis, and the distribution of village >Ginis. If the high income group in the population is small, few >villages will have high income members, and the distribution of >village Ginis will (correctly) reflect that. Calculating a Gini coefficient is a problem in numerical integration; and numerical integration gets less reliable as the function being integrated gets more 'peaked'. If Matthias has everybody in each village, his village Gini coefficients will be right, period. But if he's sampling -- here's a concrete example, to illustrate the problem. Suppose there's a population of (large) size P, with mean income I. Suppose, also, that one half the income is equally distributed among 99% of the population, and that the remaining half is evenly distributed among the remaining 1%. (So, total income is P*I; total income for the lower 99% and the upper 1% are each P*I/2.) Then, the Gini coefficient can be calculated exactly (see APPENDIX, below); it is 0.49. Now, suppose you draw a sample of size 10. With probability 0.99**10=~0.904, the sample will include no high-income individuals; everybody in the sample will have income I/1.98, and the empirical Gini coefficient is 0. =============================================================== APPENDIX: Gini coefficient, for the hypothetical case: In the lower 99%, mean individual income (and, by hypothesis, actual income) is (P*I/2)/(0.99*P) = I/1.98 =~ 0.505*I; in the top 1% mean (and actual) individual income is (P*I/2)/(0.01*P) = 50*I. If p is the cumulative proportion of the population and i the cumulative proportion of income, the Lorenz curve becomes (A) i = p/1.98 p<= 0.99 (B) i = 0.5 + 50*(p-0.99) p>= 0.99 where, if i=0.5, p=0.99. The Gini coefficient is 1-2*(area under Lorenz curve). Here, that area is (by area of triangles and trapezoids) area (A): 0.99*0.5/2 = 0.25*0.99 = 0.2475 area (B): .01*(1+0.5)/2 = 0.75*0.01 = 0.0075 ------ 0.0255 So the Gini coefficient is 1-2*.0255= 0.49 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Correcting an apparent transcription error in the example, at 01:35
PM 2/24/2014, I wrote: >The Gini coefficient is 1-2*(area under Lorenz curve). [In the >example distribution], that area is (by area of triangles and trapezoids) Below, for "0.0255" read "0.2550". The Gini-coefficient calculation uses the correct value. >area (A): 0.99*0.5/2 = 0.25*0.99 = 0.2475 >area (B): .01*(1+0.5)/2 = 0.75*0.01 = 0.0075 > ------ > 0.0255 > >So the Gini coefficient is 1-2*.0255= 0.49 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
What alternative measure of inequality would you suggest I use?
I am running multilevel models and am, among other things, interested in the association between village inequality and household behavior. Households are distributed into villages like this: N_IDPSU Frequency Percent Valid Percent Cumulative Percent Valid 1 6 ,0 ,0 ,0 2 44 ,2 ,2 ,2 3 99 ,5 ,5 ,7 4 144 ,7 ,7 1,4 5 160 ,8 ,8 2,1 6 276 1,3 1,3 3,4 7 322 1,5 1,5 4,9 8 432 2,0 2,0 7,0 9 369 1,7 1,7 8,7 10 560 2,6 2,6 11,3 11 759 3,6 3,6 14,9 12 768 3,6 3,6 18,5 13 936 4,4 4,4 22,9 14 1358 6,4 6,4 29,2 15 1650 7,7 7,7 37,0 16 2416 11,3 11,3 48,3 17 2618 12,3 12,3 60,6 18 2304 10,8 10,8 71,4 19 1254 5,9 5,9 77,2 20 1140 5,3 5,3 82,6 21 315 1,5 1,5 84,1 22 396 1,9 1,9 85,9 23 483 2,3 2,3 88,2 24 432 2,0 2,0 90,2 25 225 1,1 1,1 91,3 26 234 1,1 1,1 92,4 27 297 1,4 1,4 93,7 28 252 1,2 1,2 94,9 29 232 1,1 1,1 96,0 30 120 ,6 ,6 96,6 31 155 ,7 ,7 97,3 32 96 ,5 ,5 97,8 33 33 ,2 ,2 97,9 34 68 ,3 ,3 98,2 35 105 ,5 ,5 98,7 36 144 ,7 ,7 99,4 37 74 ,3 ,3 99,7 55 55 ,3 ,3 100,0 Total 21331 100,0 100,0 |
|
After I wrote about the problems inherent in
calculating a Gini coefficient based on a small sample, at 04:10 AM 2/25/2014, Mattias wrote: >What alternative measure of inequality would you suggest I use? I'm out of my depth on this; I'm far below professional level in the relevant social sciences, and I don't know the village societies you're studying at all. You need to talk with subject specialists -- people who are familiar with village societies of the kind you're studying; with people who've done similar investigations; and, if possible, with someone who has both kinds of experience. You don't say how you got your samples, or what fraction of the village population you usually sampled. If you have the whole population, your Gini coefficient is right, but not necessarily relevant; especially for small villages, you have to consider whether a group of villages is the relevant economic unit. (To repeat myself: I cannot judge this. I know nothing about the villages you are studying; you know a good deal; and you should have access to people who know more.) You may have to exclude some villages from analysis because your sample is too small. From the table you sent, I see that for 9% of the villages, your sample is fewer than 10 households; you may have to exclude those, unless you can pool them with others. But, again, talk with subject specialists about these issues. The Wikipedia article on the Gini coefficient seems pretty good, but you can read it as well as I can. The most helpful lead I found in it is, >"Small sample bias sparsely populated regions >more likely to have low Gini coefficient": Gini >index has a downward-bias for small populations.[56] ... That cites the article: George Deltas (February 2003). "The Small-Sample Bias of the Gini Coefficient: Results and Implications for Empirical Research". The Review of Economics and Statistics 85 (1): 226234. doi:10.1162/rest.2003.85.1.226. Reviewing that article would be a place to start. Here, you will benefit from the help of methodologists more than subject specialists. (Is it clear what I mean by "methodologists" and "subject specialists"? The former are familiar with analytic techniques; the latter, with the people you are studying --though they may well know techniques as well. The two can help you in different ways.) The second-most helpful hint I found in the Wikipedia article was a caution: the Gini index measures inequality, not prosperity. It may be that people in all income quantiles of a more prosperous village have higher incomes than those in a less prosperous village, even if the more prosperous village has higher inequality. Relative prosperity surely also affects the behaviors you're studying. Finally, a hint I found in Webbing around about the Gini coefficient, but can't find again: If, by any chance, you have household mean income for your villages, compare it with mean income from your sample. If the mean income of the sample is notably lower, that suggests there's a high-income group whom your sample missed. -Go forth in peace, and best of success to you, Richard Ristow ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Richard Ristow
Hi Richard
Yes, your calculation is correct, but it has nothing to do with Gini or problems of numerical integration. It shows is that drawing a small sample (N=10) from a *large* population containing a small proportion of high income individuals is likely to give a misleading Gini for the population at large. As I see it, that is likely to happen whatever index of inequality you use. It's simply that when taking a small sample from a large population, the high income individuals are likely to be excluded from the sample, and the inequality measure will be biased. A similar sort of thing happens when we calculate an SD (sometimes used as a measure of inequality). If we assume your example population, and give it a mean income of 1, then 1% have an income of 50, and 99% have an income of .505. The SD of income for the whole population is SQRT( .01*(50 - 1)^2 + .99*(.505 - 1)^2 ) = 4.9. But on your (correct) logic, a sample of 10 is highly likely to consist of individuals with identical income, giving a sample SD of zero. So I wouldn't say Gini is particularly susceptible to bias in this sort of situation. Using a different measure of inequality would lead to similar problems. Best Regards Garry -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: 24 February 2014 18:35 To: [hidden email] Subject: Re: Calculating Gini coefficients for each subset (villages) of large data set At 03:54 AM 2/24/2014, Garry Gelade wrote: >You said "a small sample may underestimate the Gini coefficient badly, >through failing to include any of the small, very-high-income subgroup >in the sample." > >Of course the Gini coefficient calculated on a small sample will be >subject to sampling error, and unrepresentative of the population at >large, But it seems to me Mattias is not trying to estimate the Gini >coefficient for the population at large, he is interested in the >village level of analysis, and the distribution of village Ginis. If >the high income group in the population is small, few villages will >have high income members, and the distribution of village Ginis will >(correctly) reflect that. Calculating a Gini coefficient is a problem in numerical integration; and numerical integration gets less reliable as the function being integrated gets more 'peaked'. If Matthias has everybody in each village, his village Gini coefficients will be right, period. But if he's sampling -- here's a concrete example, to illustrate the problem. Suppose there's a population of (large) size P, with mean income I. Suppose, also, that one half the income is equally distributed among 99% of the population, and that the remaining half is evenly distributed among the remaining 1%. (So, total income is P*I; total income for the lower 99% and the upper 1% are each P*I/2.) Then, the Gini coefficient can be calculated exactly (see APPENDIX, below); it is 0.49. Now, suppose you draw a sample of size 10. With probability 0.99**10=~0.904, the sample will include no high-income individuals; everybody in the sample will have income I/1.98, and the empirical Gini coefficient is 0. =============================================================== APPENDIX: Gini coefficient, for the hypothetical case: In the lower 99%, mean individual income (and, by hypothesis, actual income) is (P*I/2)/(0.99*P) = I/1.98 =~ 0.505*I; in the top 1% mean (and actual) individual income is (P*I/2)/(0.01*P) = 50*I. If p is the cumulative proportion of the population and i the cumulative proportion of income, the Lorenz curve becomes (A) i = p/1.98 p<= 0.99 (B) i = 0.5 + 50*(p-0.99) p>= 0.99 where, if i=0.5, p=0.99. The Gini coefficient is 1-2*(area under Lorenz curve). Here, that area is (by area of triangles and trapezoids) area (A): 0.99*0.5/2 = 0.25*0.99 = 0.2475 area (B): .01*(1+0.5)/2 = 0.75*0.01 = 0.0075 ------ 0.0255 So the Gini coefficient is 1-2*.0255= 0.49 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Richard and Garry,
I think that you both make important points and that the one does not exclude the other as I understand the two of you to be thinking. Garry’s comment makes a lot of sense to me; that it is essentially a question of how representative the (small) sample is. Richard refers to work (e.g. the Deltas (2003) article) which shows that there is a downward bias with sample size for Gini coefficients when samples are small as they are in my data. I asked the question of possible alternative measures that would be more appropriate for small samples because I also found the Deltas (2003) article you mention which by the way also argues that “The small sample bias is especially relevant when [..] the Gini is used to compare income inequality across sub-populations, some of which may have very small sample sizes”. Delta suggests a ‘small sample adjusted’ Gini instead when samples are small. At the same time, the Wiki on income inequality measures also describes the property of population independence as one of four properties any measure on inequality should fulfil in the following way: “the income inequality metric should not depend on whether an economy has a large or small population. An economy with only a few people should not be automatically judged by the metric as being more equal than a large economy with lots of people. This means that the metric should be independent of the level of population” I have spent most of my time being what you call a subject specialist, for example through conducting surveys and by doing field work in village India. It is indeed of particular substantive interest to investigate village level inequality because the village is a social unit of specific importance in India. In this project, however, I am working on secondary data from a large scale survey and this is the first time I use an inequality measure so I am trying to find my way and as such I am anxious to understand the limitations of for example the Gini and also reluctant to use unconventional “adjusted” Gini measures such as that suggested by Deltas (2003). As for how to actually calculate Gini coefficients for villages - which was my original question - I am still looking for a way to do this in terms of syntax. Jon's suggestion to use an R plug in is exhausted for now since it appears I will need special technical assistance to install things properly. I am now at a stage where I am considering cutting and pasting a syntax section for each village id since I cannot find a correct way of looping. Any suggestions for syntax in SPSS? Mattias |
| Free forum by Nabble | Edit this page |
