Ranking and Selection Procedure for Gini Index

Gini index, which is derived from the Lorenz curve of income inequality and shows income inequality in different populations, can be applied to ranking and selectionpopulations. Many procedures are available for ordering and ranking income distributions where the ordering is not linear. However, the researchers often are not interested in ordering the populations but selecting the best (or worst) of available populations indicating a lower (or higher) level of disparities in incomes within the population.Madhuri S. Mulekar. (2005) discussed the estimation of overlap ofincome distributions and selection in terms of Gini Measure of income inequality.In this paper, we simulate populations ranking and selection based on Gini index of income inequality for case that the variances are equal but known in income distributions and for case that the variances are unequal but known in income distributions.


Introduction
There are many different numerical measures, such as Gini coefficient (Soltow,1971), coefficient of variation (Braun, 1988), Theil index (Theil, 1967), Atkinson ratio (Atkinson, 1970), and Nelson ratio (Nelson, 1984) among others, used to express thedegree of inequality or variability in income among the members of a given population.However, Lorenz curve-based measures seem to be the most popular ones used in practice. Of those, Gini coefficient the most widely used is Lorenz curvebased measureof income inequality. Besides income or wealth inequality, Gini index is also used tostudy inequality in health (Illsey and Le Grand, 1987), inter-individual inequality in ageat death (Anand, et al., 2001), and life tables (Shkolnikov, Andreev, and Begun, 2003).Income inequality measures are both simple as well as complex. When comparingdifferent populations the measured differences in coefficients are often small. Althoughsome methods of estimating variances of some inequality measures (Gastwirth, 1974;andSandstrom, et al., 1988) have been already developed, very commonly these measures areused in applications, for comparative purposes and trend analysis, without any considerationto the sampling variation. In other words, comparisons are carried out based on simplypoint estimates of the coefficients. Gini index is no exception to this. For example, Braun (1988) ranked U.S. states in terms of several family income inequality measures andMoore (1996) ranked income distributions in 26 countries using geometric mean andAtkinson index related generalized welfare measures, without estimating standard errors. Karoly (1990) compared inequality in individual wage and salary income for years 1967to 1986 using ten different inequality measures including Gini index and estimatedvariances of eight out of ten measures. Again, the variance of Gini coefficient was notestimated due to expenses involved in computation. Sometimes researchers are not interested in just ranking the populations byincome inequalities, but they are more interested in selecting (or identifying) from agroup of populations, the population with least (or most)inequality. Here we simulateprocedure method for selecting a population based on Gini coefficient and obtainoptimal sample size needed to makesuch a selection.

Gini index and calculation method
Suppose f(x) is the density of income X, and ( ) = ∫ 0 ( ) is the cumulative distribution function (CDF) of Xfor the individuals or families. For mean income of the population ( = ∫ ( ) ∞ 0 ), the Gini index is defined as Here Φ( ) is the share of the total income received by the population with income less than or equal to x and it is defined as it is known as Lorenz curve.Gini index is a function of Lorenz curve. Consider the empirical Lorenz curve characterized by a set of ordinates {Φ| = 1,2, … , } corresponding to abscissas { | = 1,2, … , }. An income quantile on a Lorenz curve corresponding to abscissas p, (0<p<1), is given by ( ) = . Then corresponding to a set of L abscissas p 1 <p 2 <…<p L , there is a set of L population income quantiles 1 < 2 < ⋯ < and a set of L population Lorenz curve ordinates This result is obtained from equation 1 of Beach and Davidson (1983), where ( = 1,2, … , ) is the conditional mean of incomes less than or equal to , i.e.
Denote the conditional variance of x given quantile ( = 1,2, … , ) by Let x 1 ,x 2 , ... ,x n be a random sample of size n from the given population. Denote the ordered observations by x [1] ≤x [2] ≤ …≤x [n] . Then the pth income quantile denoted by can be estimated as rth order statistic x [r] , where r= [np] gives the greatest integer less than or equal to np. The sample estimates of Lorenz curve ordinates can be computed as (Beach and Davidson, 1983, Eq 3) Then the Gini coefficient can be estimated from the sampled values as Here is also described as mapping out of the line of equality. where for ≤ ′ = 1,2, … ,

And
Note that gives the covariance matrix of Lorenz curve ordinates (Beach andDavidson, 1983, andGail andGastwirth, 1978), is the maximum income in the population, and is the variance of incomes in the population.

Selection procedure
Consider independent populations, 1 , 2 , … , . Let f i (x) be a density function of income in population i(i=1,2,…,k) . Suppose Gini coefficient is the true income inequality existing in 2,…,k). Of course the true value of G i in the population is unknown, and so is the correspondence between G i and  i (i=1,2,…,k). Here the problem considered is that of selecting the population corresponding to the largest income inequality, i.e. the max{G i ,i=1,2,…,k}. The ordered values of inequality measures are denoted by G [1] ≤G [2] ≤…≤G [k] Selection of the population corresponding to G [k] or any population with income inequality measure equal to is considered as the correct selection (CS). The problem of selecting the population with least inequality, i.e. the most integration of incomes can be treated similarly by selecting population corresponding to . Indicate the estimated Gini coefficient for the population with coefficient G [1] by Let us denote the parameter space, Ω, as Although we would like to select a unique population associated with G [k] , with specified difference or distance d* any population in IZ will be accepted as a correct decision. Therefore, if more than one population belongs to IZ, then we'll use some randomization scheme to select a population from IZ as the best one. If a non-randomized scheme is used then it should be uncorrelated with the risk involved.

In caseofvariance equality
Select the population corresponding to the as the best population. If there are tied populations then select one of the contenders for the first place by giving equal opportunity to each of the contenders. For the predetermined risk we are interested in obtaining an optimal sample size needed to control this risk. Note that the population with largest inequality does not necessarily produce a sample with the largest estimate of Gini coefficient . Using the complement of the risk, also referred to as the probability of correct selection, the above problem reduces to that of determining optimal sample size needed for Here p*(the desired probability of correct selection) and d* (the desired distance the best population is to be from the second best population) are specified by the experimenter before sampling begins.The experimenter uses experience and judgment in specifying constants (d* p*),.The sample size is an increasing function of p* and a decreasing function of d*. With the estimated Gini Coefficient for the population with the coefficient G [i] denoted by , , the P(CS) is given by When n 1 =n 2 =…=n k =n, where n i is the size of sample from the ith population. Assuming equal variances, i.e.
, the probability of correct selection becomes, Here H(y) is the of the CDF standard normal variate, . The relation between constant τ and distance d* is given by 4
Here τ can be interpreted as the standardized difference between Gini coefficients, expressed in the scale of instead of the original scale. Equating τ with and solving for n, we get

In case ofunequal variances but known:
In this situation first calculate n 0 , Where n 0 is given by the following equation Because this method is based on the equality of variances, here consider σ 0 =1. where
In in the condition If only one population satisfies the above condition, it is the best population and if more than one population satisfies in the above condition, we'll use some randomization scheme to select a population as the best population. In this article consider p* equal 0.90.

Conclusion:
We consider five hypothetical normal populations with different parametersas the densityincomepopulations. The average and Standard deviation of populations are different. Here a population with largest Gini index is defined as the best population. We consider different value for and we compare the results obtained for any The results are shown in the rows, columns anddiagonals of above tables.