This algorithm calculates the p-value when the Anderson-Darling statistic and the number of
samples are given. The p-value is calculated by the interpolation formula (section 4, p.920):
\[
t_m\left ( \alpha \right ) = b_0 + \frac{b_1}{\sqrt m} + \frac{b_2}{m}
\]
where the coefficients for each α are calculated by OLS regression using data in Table 1.
m is the total number of samples minus 1.
We use a two-step procedure to interpolate the data in Table 1.
In the first step, the dependent
variables are
1/\sqrt(m) and
1/m, where
m = 1, ... 10, 1000000. The
independent variable is statistics corresponding to upper percentiles
0.25, 0.1, 0.05, 0.025, 0.01. The prediction values corresponding to actual number of samples
minus 1 are stored. Therefore there are 5 OLS regressions in this step and 5 prediction values.
In the second step, the dependent variables are 5 predictions and their squares, and the
independent variables are the p-values
{0.25,0.1,0.05,0.025,0.01}. The p-value corresponding
to the actual statistics
tm is predicted by the linear regression model
tm(\alpha)
= b0+b1/\sqrt(m)+b2/m.
The details of this step is not mentioned in the paper. The process of calculating p-value
when the statistics is not in the table is documented by only one sentence in right column
paragraph 3, p. 920: "Similarly, one could interpolate and even extrapolate p-value for the
observed Anderson-Darling statistic; see Section 7 for an example." The author suggests using
linear extrapolation. We use the second order extrapolation for two reasons:
1) By regressing the p-values against the statistics in Table 1. We found that the
coefficient of the second order term is significant in most cases and the R square value is
higher than the regression which only include the first order term. This indicates by
including the second order term, the extrapolation is more accurate. Take m=1 as an example:
the p-value of the second order coefficient is 0.03352. The corresponding R square 0.9994.
On the other hand the R square of regression which only includes the first order term is 0.9939.
2) The R program includes also the second order term.