Class ACERAnalysis


  • public class ACERAnalysis
    extends Object
    Average Conditional Exceedance Rate (ACER) method is for estimating the cdf of the maxima \(M\) distribution from observations. \[ F_M(\eta) = Pr(\max(X_1,X_2,...,X_n) \le \eta) \] With the assumption of the k-step markov-like dependency, the cdf \(F_M(\eta)\) becomes: \[ \begin{eqnarray} F_M(\eta) \approx P_k(\eta) & = & \exp(- \sum_{j=k}^N \alpha_{kj}(\eta) - \alpha_{k-1,k-1}(\eta) - ... - \alpha_{11}(\eta)) \\ & \approx & \exp(- \sum_{j=k}^N \alpha_{kj}(\eta)) \\ & \approx & \exp(- \epsilon_k(\eta) N) \end{eqnarray} \] where \(\alpha_{kj}(\eta)\) is the probability of the j-th element exceeding \(\eta\) conditional on the previous (k-1) non-exceedances, and \(\epsilon_k(\eta)\) is the mean of these conditional probabilities when \(N \to \infty\). The result could be used for estimation of an extreme distribution.

    This algorithm works as follows:

    1. By counting the occurrences of threshold exceedances conditional on (k-1) previous non-exceedances in the observations, find the empirical values for \(P_k(\eta_i)\) where \(\eta_i\) are various equally-spaced barrier levels (or, thresholds) greater than the given tail marker.
    2. Using the empirical \(P_k(\eta_i)\), fit the sub-asymptotic form of Gumbel distribution, i.e., the ACER function: \[ \hat{\epsilon_k}(\eta) = q_k exp(-a_k (\eta - b_k)^{c_k}), \eta \ge \eta_1. \]
    3. With the fitted parameters of the ACER function, calculate the confidence interval of the fitted ACER function.

    Another well-known estimation method is Peaks Over Threshold (POT). POT method assumes independence among extreme events, and therefore always requires declustering and dropping other non-peak data. This is considered to be wasteful. On the other hand, ACER method accounts for Markov-like dependence (i.e., k-step memory) in time series (with k=1 as a special case for event independence). That is, a threshold exceedance is considered as an occurrence if the previous (k-1) points are below the threshold. Experiments show that k=2 (i.e., conditional on one previous non-exceedance) is accurate enough for estimation for a wide range of data.

    The R equivalent function is acer::acer.analysis.

    • Constructor Detail

      • ACERAnalysis

        public ACERAnalysis()
        Create an instance with the default values. That is,
         
         this(2, 300, 0.95, true, true);
         
         
      • ACERAnalysis

        public ACERAnalysis​(int kStepMemory,
                            int nLevels,
                            double confidenceLevel,
                            boolean usePeaksOnly,
                            boolean weightedByPeakCount)
        Create an instance with various options listed below.
        • the value of k in the assumed k-step memory model (k = 1 means no dependency on previous observations; k = 2 is good enough for most cases)
        • the number of barrier levels \(\eta_i\) for estimating the ACER values at different levels
        • the confidence level for computing the confidence interval of the estimated ACER function
        • whether or not to use only peaks in the observations for estimation (peaks are defined as data points whose values are preceded and followed by values smaller than itself)
        • whether or not to put more emphasis on periods in which more events occur
        Parameters:
        kStepMemory - the value of k in the k-step memory model
        nLevels - the number of barrier levels to be used for estimation
        confidenceLevel - the confidence level for computing confidence interval
        usePeaksOnly - true if use only peaks in the observations for estimation
        weightedByPeakCount - true if weight periods by the peak counts in the periods
    • Method Detail

      • run

        public ACERAnalysis.Result run​(double[] observations,
                                       double tailMarker)
        Run the analysis with single-period observations. Tail marker is used to determine the start of the distribution tail, i.e., extreme values.
        Parameters:
        observations - the observations (one row for each period, can has different length)
        tailMarker - the appropriately chosen tail level \(\eta_1\)
        Returns:
        the analysis result
      • run

        public ACERAnalysis.Result run​(double[][] observations,
                                       double tailMarker)
        Run the analysis with multi-period observations. Tail marker is used to determine the start of the distribution tail, i.e., extreme values.
        Parameters:
        observations - the observations (one row for each period, can has different length)
        tailMarker - the appropriately chosen tail level \(\eta_1\)
        Returns:
        the analysis result