Estimation in Mixtures: The Rectangular Case

In this paper we consider the mixture of two Rectangular Distributions. Methods used for estimating the parameters are (i) Moment ratio method (ii) Extreme value estimation (  b ), this problem has been addressed to some extent by Pavan Kumar (2003), in the context of obtaining stopping rule to estimate population minimum in sampling with replacement from a finite population. Extreme value estimate (  b ) method relatively better than moment ratio method, based on simulation, the percentage of acceptable estimators obtained in samples.


Introduction
Many studies have been reported dealing with mixture of populations. However, most ofthem seem to discuss mainly theoretical aspects, of estimation of parameters rather than consider thepractical goodness of estimates obtained by the approaches dealt within these studies (Rider (1961), Tallis and Light (1968), Blischke (1962), Craigmile and Titterington (1997), Hussain and Liu(2009)). Further, they seem to assume that samples (from which parameters are to be estimated) are actually mixtures of a given family of distributions and do not consider whether the sample itself canallow one to check whether it has come from a single population or from a mixturepopulation.
The present study tries to address these questions, particularly in the context of two component rectangular mixture population.

Application of Finite Mixture Distribution
Mixtures of distributions arise frequently both in biological and physical sciences, reliability in life testing, engineering, medical and social sciences. Here is an example of mixtures.
In fishery biology: It is often desired to measure certain characteristics in natural populationof fish. For this purpose, samples of fish are taken and the desired trait is measured for each fish in the sample. However, many characteristics vary markedly with age of the fish. Then the trait has a distinct distribution for each age group so that the population has a mixture of distributions (Keewhan Choi and Bulgren (1968)).

Review of Literature
For completeness we mention below some studies in this regard and make a few observationson the same. Rider (1961), considered the problem of mixture of two exponential distributions. He noted that method of moments gives unacceptable estimates; however the parameters 'a' and 'b' of the constituent distributions are not equal (a≠b) the estimates obtained are consistent. Also, he found variance of the asymptotic distributions of â and  b . He also, finds out that a chi-square test for singleexponential can be misleading; a null hypothesis that data are from a single exponential often get saccepted, when a chi-square test for goodness of fit is carried out. Tallis and Light (1968) use fractional moments for estimates of parameters of mixed exponential distribution, and compare method of moments with maximum likelihood method and observes that maximum likelihood method involves much larger amount of calculation without increasing in acceptability of estimates. Blischke (1962) considers mixture of two binomials with same sample size 'n' but with different parameters p 1 and p 2 and with mixture parameter p and q=1−p and points out some theoretical anomalies about relative efficacy of estimators. Day (1969)  Particularly interesting is the recent work of Hussain and Liu (2009) which utilizes the Lmoment approach to estimation of parameters.

The Present Approach
The present approach is expected to be useful particularly where the range of the variate concerned is doubly finite. In particular we consider the case of rectangular two-component mixtures.
Functions of moments are used to link up the theory with the observed sample moments and ratios (Craigmile and Titterington (1997), Hussain and Liu (2009)) of these functions are used to explicitly compute the parameter estimates. Since ratios of sample moments are involved in the process, the estimates are likely to exhibit rather wide fluctuations around the true parameter values, particularly in small samples. Let 'a' and 'b'>a, be the lower and upper limits of the range of a random variate. Thus, ,2…, n} are consistent but biased estimators of 'a' and 'b', relative bias can be (heuristically) assumed to be order 1/(n+1) and here asymptotic unbiased estimation of a and b can be as â = min{x i ,i=1,2…,n}-1/(n+1) and  b = max{x i ,i=1,2…,n}+1/(n+1). The 'range parameters' of rectangular mixture population were revised are finite which can be efficiently estimated by the sample extreme values and hence, estimates so, obtained can be plugged into the formulas for estimation, thus reducing the 'estimate' fluctuation. This has been found to be the case, as revealed by the simulation studies (see Appendix).

The paper is divided into two sections
Section A: Dealing with the simpler case, where the mixture components have the ranges 0≤x 1 ≤a,0≤x 2 ≤b,(b>a) and mixing ratio p:q, p+q=1 It may be noted that this case does not seem to have been considered so far.
Explicit formulas linking functions of moments with simple explicit functions of parameters are obtained and explicit expressions for parameter estimates in terms of these ratios of moment's functions are obtained. This is followed by a simulational study to examine the practical utility of the approach.
For a few selected sets of parameter values a, b and p (q=1-p), number of samples ns = 100, of size n= (20, 50, 100) are generated and estimates from these samples are computed.
As already noted, often one finds the observed estimates as unreasonable, (a, b being negative or p not being in the range (0≤p≤1) For the same samples, the estimates are computed using the formulas directly and also by modifying them by first estimating b (>a) by sample maximum (with a correction for bias, heuristically obtained). It is found that in the latter case, estimates will be more likely to be acceptable. Section B: Deals with the more general 5 -parameter case, where the two-component populationhave range parameters as a 1 ≤x 1 ≤b 1 , a 2 ≤x 2 ≤b 2 and mixing ratio p:q . Three cases arise in this context: The distribution may have: (1) gap (a 1 <b 1 <a 2 <b 2 ) (2) overlap (a 1 <a 2 <b 1 <b 2 ) and (3) Imbedding one range within the other (a 1 <a 2 <b 2 <b 1 ) Here a 1 and maximum of (b 1 ,b 2 ) are estimated by the sample minimum and maximum respectively, quite efficiently. However, estimating the other parameters is more complicated. However, attempts are made to distinguish between the three possibilities (of a gap, overlap and imbedding) on the basis of sample itself this may make further study of estimation easier. These also are explored through simulation. 14 BMSA Volume 2

Moment Ratio Method:
Firstly, we consider the case of range with common origin '0'. (0≤x 1 ≤a,0≤x 2 ≤b, p:q). Let f 1 (x) and f 2 (x) be the densities of two populations, with mixing parameter p:q. Then f(x) = pf 1 (x) + qf 2 (x). Where m k ′ are k th raw moments. Hence, in the present case, the raw moments of the mixture are Then, four raw moments are pa+ qb=2m 1 ′ (1a) pa 2 +qb 2 =3m 2 ′ (1b) pa 3 +qb 3 =4m 3 ′ (1c) pa 4 +qb 4 =5m 4 ′ (1d) Hence, 2 (a 2 +b 2 +ab) (2c) Here we consider s, s 1 , and s 2 are functions of raw moments. From which it follows that Solving (3a) and (3b) using the observed (that is sample) moments, one gets the estimates as â = However, expression for a-b involve square roots and two possible solutions for â −  b , which may lead to negative values of a and b and also, possibly complex numbers as the estimates.

Extreme Value Estimation (  b )
This problem is considerably mitigated by first estimating b(>a) by ) being greater than1 since the max(x) will necessarily under-estimate b.
Using this estimate of b in subsequent steps of estimation, considerably reduces the likelihood of getting unacceptable estimates 'a' and 'b'.
Here, we can use first two raw moments. Substituting (  b ) in the above 1(a), 1(b), we get other two estimates (â) and (ρ ).
We now present simulation results for a few parameter values. For number of samples ns = 100 samples of sizes, n=20, 50, 100 are generated for each parameter set, the mean, standard deviation, minimum, maximum of the 'ns' estimates for each parameters are reported along with the number of percentage of acceptable estimates, by both these methods. We shall now present the more general 5-parameter case.
We compute and first two raw moments and standard deviation for each of these 'ns' samples. Table 2(a) presents first two raw moments and standard deviation of a 1 ,b 1 ,a 2 ,b 2 for different p values. This helps in arriving at some heuristic, to decide on the types (relative range position) of the two component population giving rise to the mixture. estimates for a, b and p (i) moments ratio method, (ii) Extreme value estimation '  b '. first and then estimatingthe other two parameters' are given for the five chosen values of 'p'. Also are given the percentages of samples giving acceptable estimates of the parameters (by the two approaches). Thus, for instance from table 1(b) one has for p=0.3, 38% only of the samples give acceptable estimates. For a, b as well as p while the corresponding figures, for the second approach are 65%, 65% and 37%. In general the estimates (â,  b ,ρ ), appear to be generally much nearer to the true values in the latter approach. As seen from table 1(c) we the sample size is 100, the performance of moment ratio approach remains quite poor, while the other approach is on the whole very satisfactory. However even in the latter approach there are a few samples which give absurd results, though the number of such samples is quite small. One possible cause of these results, may be due to these samples containing outliers or due to bunching in the sampling processes.

Simulation Results: (5 -parameter case)
Simulation is carried out for parameters, a 1 =2, b 1 =3, a 2 =4, b 2 =5. Number of samples ns = 100 samples, sample size n= 50 were generated for the three cases and for p=0.1:0.2:0.9 as well as for the case of single rectangular with range 'a 1 ' to 'b 2 '. Theoretical moments were computed and corresponding sample moments obtained by simulation also are computed. Results are given in Table (2a). It was hoped that these results will suggest methods for distinguishing between three possible range relationships by comparing the moments for the single population case with 'a 1 ', 'b 2 ' given and the three mixture cases. However, investigation so far has not shown any definite suggestion. In this regard except that the second raw moment appears considerably larger in the cases (i) Overlap and (ii) gap, while it can be large when 'p' is small, while it could become smaller where 'p' is larger in the case of 'overlap' 'gap'. However, as only to be expected estimation of 'a 1 ' and 'b 2 ' by sample minimum and maximum gives very satisfactory results.

Conclusions
1. Even when sample size is relatively small the extreme value estimation approach gives more encouraging result than the moment ratio approach. While in extreme value estimation (  b ) the estimators appear to be much nearer to the true values than the moment ratio method.