in reply to your gini index calculation in spss

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

in reply to your gini index calculation in spss

Lakshmikanth Makaraju
Hi you have given the answer to the calculation of gini index using spss.

I just want to know some thing. Generally Lorenz curve is the graph between
cumulative proportion of income and cumulative proportion of persons.

Is your procedure also represents the same or not?? Because here you have
taken cumulative proportiono of income and cumulative distribution function
of income and drawn lorenz curve.

Can i use this procedure to calculate Gini index??

Why it is different from the gini index that was calculated in the Microsoft
excel..

Your help will be appreciated..

Thanks and regards

Lakshmikanth.



A.
It would be quite feasible to draw a Lorenz curve and calculate the
Gini coefficient in SPSS. Given a variable INCOME in the open data
file, the basic steps would be:



1. Sort the file by ascending INCOME.


2. If the active data file is not already aggregated by INCOME,
then use the AGGREGATE procedure to do so, using the N function
to save the number of cases at each income value to a new variable,
which we'll call PERSONS for this example. Replace the active data
file with this aggregate file and assign PERSONS as the weight
variable.


If you run AGGREGATE from a syntax window, adding the
PRESORTED keyword as in the example below, the memory demands of
the AGGREGATE procedure will be greatly reduced. Use of this keyword
will only work when the file has actually been sorted by the break
variable, as directed in step 1.


If you started with a file that was aggregated by INCOME,
make sure that the file is weighted by a variable that indicates
the number of cases that are represented by each row of data.


3. Use the AGGREGATE procedure to save the sum of income to a new
variable SUMINC in a new file and then merge that file with the
current data set.


4. Use conditional COMPUTE commands to calculate the cumulative sum
of INCOME as a new variable, say CINCOME.


5. Compute PCINC (cumulative proportion of income) as CINCOME/SUMINC.


6. Use the RANK procedure to store the empirical Cumulative
Distribution Function (CDF) result of INCOME in the new variable
CDFINC. For every INCOME value, CDFINC will be the proportion of the
sample with incomes less than or equal to that value.


7. The Lorenz curve will be the graph of CDFINC (horizontal axis)
by PCINC. Before running the GRAPH command, compute two new variables,
D1 and D2, such that both variables = 1 for the first case and 0 for
all other cases. In the GRAPH procedure, choose Scatterplot->Overlay.
The Y-X pairs will be PCINC-CDFINC and D1-D2. When the graph is drawn,
open it in the Chart Editor by right-clicking anywhere in the graph.
Use the Chart->Axis menu to label the axis and perhaps place a grid
on the graph. Use the Interpolation tool and choose straight-line
interpolation for each of the D1-D2 and PCINC-CDFINC pairs. The former
line will be the diagonal line that represents equal income
distribution. The latter line will be the Lorenz curve.


8. Trapezoidal integration is used to compute the area under the
Lorenz curve. This area is subtracted from .5 and that difference is
divided by .5 to provide the Gini coefficient.


The following commands illustrate the above steps. For further
guidance in editing the graph with the Chart Editor, consult the
SPSS Base User's Guide for your version of SPSS.


* Step 1.
SORT CASES BY INCOME.
* Step 2.
AGGREGATE OUTFILE = *
/PRESORTED
/BREAK = INCOME
/persons = N .
WEIGHT BY persons.
* Step 3.
COMPUTE brk = 1.
AGGREGATE OUTFILE = incagg.sav
/BREAK = brk
/suminc = SUM(INCOME).


MATCH FILES / FILE = * / TABLE = incagg.sav / BY brk .
EXECUTE.


* Step 4 .
DO IF ($CASENUM = 1).
+ COMPUTE cincome = persons * income .
ELSE.
+ COMPUTE cincome = LAG(cincome) + persons * income .
END IF.
* Step 5 .
COMPUTE pcinc = cincome/suminc .
EXECUTE.


* Step 6.
RANK VARIABLES=income (A)
/RFRACTION into cdfinc
/PRINT=YES
/TIES=HIGH .


* Step 7.
COMPUTE d1 = ($casenum = 1).
COMPUTE d2 = ($casenum = 1).
* Note that it doesn't matter whether D1 or D2 is the Y variable
* in the D1-D2 pair.
* D1 and D2 are identical and are created to allow you to draw a
* diagonal line on the graph.
GRAPH
/SCATTERPLOT(OVERLAY)=cdfinc d2 WITH pcinc d1 (PAIR)
/MISSING=LISTWISE
/TITLE= 'Lorenz Curve for Income'.


* Step 8.
* Calculate and print the Gini coefficient.
* For last case, LAREA is area under the Lorenz curve.
DO IF ($casenum = 1) .
+ COMPUTE larea = 0.
ELSE.
+ COMPUTE larea = LAG(larea) +
(cdfinc - LAG(cdfinc)) * (pcinc + LAG(pcinc))/2 .
END IF.
IF (cdfinc = 1) gini = (.5 - larea)/.5 .
REPORT
/VARIABLES gini (VALUES)
/BREAK (TOTAL) '' (SKIP(1))
/SUMMARY MAX( gini) SKIP(1) '' .




--
Lakshmikanth Makaraju
Research Associate.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: in reply to your gini index calculation in spss

Muir Houston-2
The following links provide some reasons why doing stats with excel may
be problematic

http://www.cs.uiowa.edu/~jcryer/JSMTalk2001.pdf

http://www.coventry.ac.uk/ec/~nhunt/pottel.pdf

http://www.practicalstats.com/xlsstats/excelstats.html



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Lakshmikanth Makaraju
Sent: 05 November 2008 09:56
To: [hidden email]
Subject: in reply to your gini index calculation in spss

Hi you have given the answer to the calculation of gini index using
spss.

I just want to know some thing. Generally Lorenz curve is the graph
between cumulative proportion of income and cumulative proportion of
persons.

Is your procedure also represents the same or not?? Because here you
have taken cumulative proportiono of income and cumulative distribution
function of income and drawn lorenz curve.

Can i use this procedure to calculate Gini index??

Why it is different from the gini index that was calculated in the
Microsoft excel..

Your help will be appreciated..

Thanks and regards

Lakshmikanth.



A.
It would be quite feasible to draw a Lorenz curve and calculate the Gini
coefficient in SPSS. Given a variable INCOME in the open data file, the
basic steps would be:



1. Sort the file by ascending INCOME.


2. If the active data file is not already aggregated by INCOME, then use
the AGGREGATE procedure to do so, using the N function to save the
number of cases at each income value to a new variable, which we'll call
PERSONS for this example. Replace the active data file with this
aggregate file and assign PERSONS as the weight variable.


If you run AGGREGATE from a syntax window, adding the PRESORTED keyword
as in the example below, the memory demands of the AGGREGATE procedure
will be greatly reduced. Use of this keyword will only work when the
file has actually been sorted by the break variable, as directed in step
1.


If you started with a file that was aggregated by INCOME, make sure that
the file is weighted by a variable that indicates the number of cases
that are represented by each row of data.


3. Use the AGGREGATE procedure to save the sum of income to a new
variable SUMINC in a new file and then merge that file with the current
data set.


4. Use conditional COMPUTE commands to calculate the cumulative sum of
INCOME as a new variable, say CINCOME.


5. Compute PCINC (cumulative proportion of income) as CINCOME/SUMINC.


6. Use the RANK procedure to store the empirical Cumulative Distribution
Function (CDF) result of INCOME in the new variable CDFINC. For every
INCOME value, CDFINC will be the proportion of the sample with incomes
less than or equal to that value.


7. The Lorenz curve will be the graph of CDFINC (horizontal axis) by
PCINC. Before running the GRAPH command, compute two new variables,
D1 and D2, such that both variables = 1 for the first case and 0 for all
other cases. In the GRAPH procedure, choose Scatterplot->Overlay.
The Y-X pairs will be PCINC-CDFINC and D1-D2. When the graph is drawn,
open it in the Chart Editor by right-clicking anywhere in the graph.
Use the Chart->Axis menu to label the axis and perhaps place a grid on
the graph. Use the Interpolation tool and choose straight-line
interpolation for each of the D1-D2 and PCINC-CDFINC pairs. The former
line will be the diagonal line that represents equal income
distribution. The latter line will be the Lorenz curve.


8. Trapezoidal integration is used to compute the area under the Lorenz
curve. This area is subtracted from .5 and that difference is divided by
.5 to provide the Gini coefficient.


The following commands illustrate the above steps. For further guidance
in editing the graph with the Chart Editor, consult the SPSS Base User's
Guide for your version of SPSS.


* Step 1.
SORT CASES BY INCOME.
* Step 2.
AGGREGATE OUTFILE = *
/PRESORTED
/BREAK = INCOME
/persons = N .
WEIGHT BY persons.
* Step 3.
COMPUTE brk = 1.
AGGREGATE OUTFILE = incagg.sav
/BREAK = brk
/suminc = SUM(INCOME).


MATCH FILES / FILE = * / TABLE = incagg.sav / BY brk .
EXECUTE.


* Step 4 .
DO IF ($CASENUM = 1).
+ COMPUTE cincome = persons * income .
ELSE.
+ COMPUTE cincome = LAG(cincome) + persons * income .
END IF.
* Step 5 .
COMPUTE pcinc = cincome/suminc .
EXECUTE.


* Step 6.
RANK VARIABLES=income (A)
/RFRACTION into cdfinc
/PRINT=YES
/TIES=HIGH .


* Step 7.
COMPUTE d1 = ($casenum = 1).
COMPUTE d2 = ($casenum = 1).
* Note that it doesn't matter whether D1 or D2 is the Y variable
* in the D1-D2 pair.
* D1 and D2 are identical and are created to allow you to draw a
* diagonal line on the graph.
GRAPH
/SCATTERPLOT(OVERLAY)=cdfinc d2 WITH pcinc d1 (PAIR) /MISSING=LISTWISE
/TITLE= 'Lorenz Curve for Income'.


* Step 8.
* Calculate and print the Gini coefficient.
* For last case, LAREA is area under the Lorenz curve.
DO IF ($casenum = 1) .
+ COMPUTE larea = 0.
ELSE.
+ COMPUTE larea = LAG(larea) +
(cdfinc - LAG(cdfinc)) * (pcinc + LAG(pcinc))/2 .
END IF.
IF (cdfinc = 1) gini = (.5 - larea)/.5 .
REPORT
/VARIABLES gini (VALUES)
/BREAK (TOTAL) '' (SKIP(1))
/SUMMARY MAX( gini) SKIP(1) '' .




--
Lakshmikanth Makaraju
Research Associate.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: in reply to your gini index calculation in spss

SPSS Support
In reply to this post by Lakshmikanth Makaraju
 Hello Lakshmikanth,
  The variable cdfinc, which is represented in the horizontal axis of the Lorenz curve in the resolution that you quoted (Resolution 18022), is defined in Step 6 of the overview section of the resolution.

"6. Use the RANK procedure to store the empirical Cumulative Distribution Function (CDF) result of INCOME in the new variable CDFINC. For every INCOME value, CDFINC will be the proportion of the sample with incomes less than or equal to that value."

As noted there, CDFINC is the empirical distribution function for income. For each observed income value, it does represent the cumulative proportion of persons, i.e. the proportion of persons that have incomes less than or equal to that value.

  On a side note, the AGGREGATE and MATCH FILE commands in Step 3 can now be condensed into a single AGGREGATE command. Adding the phrase "MODE=ADDVARIABLES" allows you to copy the summary values as new variables to the current active file, filling in the correct summary value for each case's break group.  The revised step 3 would then become:

* Step 3.
COMPUTE brk = 1.
AGGREGATE OUTFILE = *  MODE = ADDVARIABLES
  /BREAK = brk
  /suminc = SUM(INCOME).

The "MODE=ADDVARIABLES" option was introduced in SPSS 13.

  I hope this helps. I can't speak to what Excel may be doing to calculate the GINI cindex.

David Matheson

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Lakshmikanth Makaraju
Sent: Wednesday, November 05, 2008 3:56 AM
To: [hidden email]
Subject: in reply to your gini index calculation in spss

Hi you have given the answer to the calculation of gini index using spss.

I just want to know some thing. Generally Lorenz curve is the graph between cumulative proportion of income and cumulative proportion of persons.

Is your procedure also represents the same or not?? Because here you have taken cumulative proportiono of income and cumulative distribution function of income and drawn lorenz curve.

Can i use this procedure to calculate Gini index??

Why it is different from the gini index that was calculated in the Microsoft excel..

Your help will be appreciated..

Thanks and regards

Lakshmikanth.



A.
It would be quite feasible to draw a Lorenz curve and calculate the Gini coefficient in SPSS. Given a variable INCOME in the open data file, the basic steps would be:



1. Sort the file by ascending INCOME.


2. If the active data file is not already aggregated by INCOME, then use the AGGREGATE procedure to do so, using the N function to save the number of cases at each income value to a new variable, which we'll call PERSONS for this example. Replace the active data file with this aggregate file and assign PERSONS as the weight variable.


If you run AGGREGATE from a syntax window, adding the PRESORTED keyword as in the example below, the memory demands of the AGGREGATE procedure will be greatly reduced. Use of this keyword will only work when the file has actually been sorted by the break variable, as directed in step 1.


If you started with a file that was aggregated by INCOME, make sure that the file is weighted by a variable that indicates the number of cases that are represented by each row of data.


3. Use the AGGREGATE procedure to save the sum of income to a new variable SUMINC in a new file and then merge that file with the current data set.


4. Use conditional COMPUTE commands to calculate the cumulative sum of INCOME as a new variable, say CINCOME.


5. Compute PCINC (cumulative proportion of income) as CINCOME/SUMINC.


6. Use the RANK procedure to store the empirical Cumulative Distribution Function (CDF) result of INCOME in the new variable CDFINC. For every INCOME value, CDFINC will be the proportion of the sample with incomes less than or equal to that value.


7. The Lorenz curve will be the graph of CDFINC (horizontal axis) by PCINC. Before running the GRAPH command, compute two new variables,
D1 and D2, such that both variables = 1 for the first case and 0 for all other cases. In the GRAPH procedure, choose Scatterplot->Overlay.
The Y-X pairs will be PCINC-CDFINC and D1-D2. When the graph is drawn, open it in the Chart Editor by right-clicking anywhere in the graph.
Use the Chart->Axis menu to label the axis and perhaps place a grid on the graph. Use the Interpolation tool and choose straight-line interpolation for each of the D1-D2 and PCINC-CDFINC pairs. The former line will be the diagonal line that represents equal income distribution. The latter line will be the Lorenz curve.


8. Trapezoidal integration is used to compute the area under the Lorenz curve. This area is subtracted from .5 and that difference is divided by .5 to provide the Gini coefficient.


The following commands illustrate the above steps. For further guidance in editing the graph with the Chart Editor, consult the SPSS Base User's Guide for your version of SPSS.


* Step 1.
SORT CASES BY INCOME.
* Step 2.
AGGREGATE OUTFILE = *
/PRESORTED
/BREAK = INCOME
/persons = N .
WEIGHT BY persons.
* Step 3.
COMPUTE brk = 1.
AGGREGATE OUTFILE = incagg.sav
/BREAK = brk
/suminc = SUM(INCOME).


MATCH FILES / FILE = * / TABLE = incagg.sav / BY brk .
EXECUTE.


* Step 4 .
DO IF ($CASENUM = 1).
+ COMPUTE cincome = persons * income .
ELSE.
+ COMPUTE cincome = LAG(cincome) + persons * income .
END IF.
* Step 5 .
COMPUTE pcinc = cincome/suminc .
EXECUTE.


* Step 6.
RANK VARIABLES=income (A)
/RFRACTION into cdfinc
/PRINT=YES
/TIES=HIGH .


* Step 7.
COMPUTE d1 = ($casenum = 1).
COMPUTE d2 = ($casenum = 1).
* Note that it doesn't matter whether D1 or D2 is the Y variable
* in the D1-D2 pair.
* D1 and D2 are identical and are created to allow you to draw a
* diagonal line on the graph.
GRAPH
/SCATTERPLOT(OVERLAY)=cdfinc d2 WITH pcinc d1 (PAIR) /MISSING=LISTWISE /TITLE= 'Lorenz Curve for Income'.


* Step 8.
* Calculate and print the Gini coefficient.
* For last case, LAREA is area under the Lorenz curve.
DO IF ($casenum = 1) .
+ COMPUTE larea = 0.
ELSE.
+ COMPUTE larea = LAG(larea) +
(cdfinc - LAG(cdfinc)) * (pcinc + LAG(pcinc))/2 .
END IF.
IF (cdfinc = 1) gini = (.5 - larea)/.5 .
REPORT
/VARIABLES gini (VALUES)
/BREAK (TOTAL) '' (SKIP(1))
/SUMMARY MAX( gini) SKIP(1) '' .




--
Lakshmikanth Makaraju
Research Associate.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD