Quantile

Quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. q-quantiles are values that partition a finite and ordered set of data into q subsets of (nearly) equal sizes. There are q-1 of the q-quantiles, one for each integer k satisfying 0<k<q. The k-th q-quantile of a random variable is the value x such that the probability that a sample or the random variable will be less than x is at most \frac{k}{q} and the probability that a sample or the random variable will be more than x is at most \frac{q-k}{q}.

Quantiles can also be applied to continuous distributions, providing a way to generalize rank statistics to continuous variables. When the cumulative distribution function of a random variable is know, the q-quantiles are the application of the quantile function to the values \{\frac{1}{q},\frac{2}{q},\dots,\frac{q-1}{q}\}.

In NM Dev, the class Quantile computes the quantiles for a data set. There are 9 different quantile definitions and implementations.

  1. INVERSE_OF_EMPIRICAL_CDF: the inverse of empirical distribution function
  2. INVERSE_OF_EMPIRICAL_CDF_WITH_AVERAGING_AT_DISCONTINUITIES: the inverse of empirical distribution function with averaging at discontinuities 
  3. NEAREST_EVEN_ORDER_STATISTICS: the nearest even order statistic as in SAS
  4. LINEAR_INTERPOLATION_OF_EMPIRICAL_CDF: the linear interpolation of the empirical CDF
  5. MIDWAY_THROUGH_STEPS_OF_EMPIRICAL_CDF: a piecewise linear function where the knots are the values midway through the steps of the empirical CDF 
  6. MINITAB_SPSS: the definition in Minitab and SPSS
  7. S: the definition in S
  8. APPROXIMATELY_MEDIAN_UNBIASED: the resulting quantile estimates are approximately median-unbiased regardless of the distribution of the sample
  9. APPROXIMATELY_UNBIASED_IF_DATA_IS_NORMAL: the resulting quantile estimates are approximately unbiased for the expected order statistics if the sample is normally distributed
				
					// create an array of doubles for our dataset and quantiles
val values = doubleArrayOf(0.0, 1.0, 2.0, 3.0, 3.0, 3.0, 6.0, 7.0, 8.0, 9.0)
val qs = doubleArrayOf(1e-10, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
0.9, 0.95, 1.0)

// APPROXIMATELY_MEDIAN_UNBIASED
println("APPROXIMATELY_MEDIAN_UNBIASED")
// create Quantile object
val quantile1 = Quantile(values, Quantile.QuantileType.APPROXIMATELY_MEDIAN_UNBIASED)

println("Sample size: " + quantile1.N())

for (i in qs) {
    println("Q(" + i + ") = " + quantile1.value(i))
}

println()

// "NEAREST_EVEN_ORDER_STATISTICS
println("NEAREST_EVEN_ORDER_STATISTICS")
// create Quantile object
val quantile2 = Quantile(values, Quantile.QuantileType.NEAREST_EVEN_ORDER_STATISTICS)

println("Sample size: " + quantile2.N())

for (i in qs) {
    println("Q(" + i + ") = " + quantile2.value(i))
}

				
			
				
					APPROXIMATELY_MEDIAN_UNBIASED
Sample size: 10
Q(1.0E-10) = 0.0
Q(0.1) = 0.3666666666666667
Q(0.15) = 0.8833333333333333
Q(0.2) = 1.4
Q(0.3) = 2.4333333333333336
Q(0.4) = 3.0
Q(0.5) = 3.0
Q(0.6) = 4.6
Q(0.7) = 6.566666666666666
Q(0.8) = 7.6
Q(0.9) = 8.633333333333333
Q(0.95) = 9.0
Q(1.0) = 9.0

NEAREST_EVEN_ORDER_STATISTICS
Sample size: 10
Q(1.0E-10) = 0.0
Q(0.1) = 0.0
Q(0.15) = 1.0
Q(0.2) = 1.0
Q(0.3) = 2.0
Q(0.4) = 3.0
Q(0.5) = 3.0
Q(0.6) = 3.0
Q(0.7) = 6.0
Q(0.8) = 7.0
Q(0.9) = 8.0
Q(0.95) = 9.0
Q(1.0) = 9.0