Package dev.nm.stat.test.distribution
Class AndersonDarlingPValue
- java.lang.Object
-
- dev.nm.stat.test.distribution.AndersonDarlingPValue
-
public class AndersonDarlingPValue extends Object
This algorithm calculates the p-value when the Anderson-Darling statistic and the number of samples are given. The p-value is calculated by the interpolation formula (section 4, p.920): \[ t_m\left ( \alpha \right ) = b_0 + \frac{b_1}{\sqrt m} + \frac{b_2}{m} \] where the coefficients for each α are calculated by OLS regression using data in Table 1. m is the total number of samples minus 1. We use a two-step procedure to interpolate the data in Table 1. In the first step, the dependent variables are 1/\sqrt(m) and 1/m, where m = 1, ... 10, 1000000. The independent variable is statistics corresponding to upper percentiles 0.25, 0.1, 0.05, 0.025, 0.01. The prediction values corresponding to actual number of samples minus 1 are stored. Therefore there are 5 OLS regressions in this step and 5 prediction values. In the second step, the dependent variables are 5 predictions and their squares, and the independent variables are the p-values {0.25,0.1,0.05,0.025,0.01}. The p-value corresponding to the actual statistics tm is predicted by the linear regression model tm(\alpha) = b0+b1/\sqrt(m)+b2/m. The details of this step is not mentioned in the paper. The process of calculating p-value when the statistics is not in the table is documented by only one sentence in right column paragraph 3, p. 920: "Similarly, one could interpolate and even extrapolate p-value for the observed Anderson-Darling statistic; see Section 7 for an example." The author suggests using linear extrapolation. We use the second order extrapolation for two reasons: 1) By regressing the p-values against the statistics in Table 1. We found that the coefficient of the second order term is significant in most cases and the R square value is higher than the regression which only include the first order term. This indicates by including the second order term, the extrapolation is more accurate. Take m=1 as an example: the p-value of the second order coefficient is 0.03352. The corresponding R square 0.9994. On the other hand the R square of regression which only includes the first order term is 0.9939. 2) The R program includes also the second order term.
-
-
Constructor Summary
Constructors Constructor Description AndersonDarlingPValue(int m)
Construct the Anderson-Darling distribution for a particular number of samples.
-