Class PCAbySVD

  • All Implemented Interfaces:
    PCA

    public class PCAbySVD
    extends Object
    This class performs Principal Component Analysis (PCA) on a data matrix, using the preferred Singular Value Decomposition (SVD) method.

    PCA essentially rotates the set of points around their mean in order to align with the principal components. This moves as much of the variance as possible (using an orthogonal transformation) into the first few dimensions. The values in the remaining dimensions, therefore, tend to be small and may be dropped with minimal loss of information.

    The R equivalent function is prcomp.

    See Also:
    • K. V. Mardia, J. T. Kent and J. M. Bibby, "Multivariate Analysis," London, Academic Press, 1979.
    • W. N. Venables and B. D. Ripley, "Modern Applied Statistics with S," New York, Springer-Verlag, 2002.
    • Wikipedia: Principal component analysis
    • Constructor Summary

      Constructors 
      Constructor Description
      PCAbySVD​(Matrix data)
      Performs Principal Component Analysis, using the preferred SVD method, on a centered and scaled data matrix.
      PCAbySVD​(Matrix data, boolean centered, boolean scaled)
      Performs Principal Component Analysis, using the preferred SVD method, on a data matrix (possibly centered and/or scaled).
      PCAbySVD​(Matrix data, Vector mean, Vector scale)
      Performs Principal Component Analysis, using the preferred SVD method, on a data matrix with (optional) mean vector and scaling vector provided.
    • Constructor Detail

      • PCAbySVD

        public PCAbySVD​(Matrix data,
                        Vector mean,
                        Vector scale)
        Performs Principal Component Analysis, using the preferred SVD method, on a data matrix with (optional) mean vector and scaling vector provided.
        Parameters:
        data - a matrix that represents the original data
        mean - an optional mean vector (of length equal to nFactors) to be subtracted regardless of the flag centered
        scale - an optional scaling vector (of length equal to nFactors) to be divided regardless of the flag scaled
      • PCAbySVD

        public PCAbySVD​(Matrix data,
                        boolean centered,
                        boolean scaled)
        Performs Principal Component Analysis, using the preferred SVD method, on a data matrix (possibly centered and/or scaled).
        Parameters:
        data - a matrix that represents the original data
        centered - a logical value indicating whether the variables should be shifted to be zero centered
        scaled - a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (N.B. in general scaling is advisable; however, it should only be used if there is no constant variable)
      • PCAbySVD

        public PCAbySVD​(Matrix data)
        Performs Principal Component Analysis, using the preferred SVD method, on a centered and scaled data matrix.
        Parameters:
        data - a matrix that represents the original data
    • Method Detail

      • mean

        public Vector mean()
        Description copied from interface: PCA
        Gets the sample means that were subtracted.
        Specified by:
        mean in interface PCA
        Returns:
        the sample means of each variable in the original data
      • scale

        public Vector scale()
        Description copied from interface: PCA
        Gets the scalings applied to each variable.
        Specified by:
        scale in interface PCA
        Returns:
        the scalings applied to each variable in the original data
      • svd

        public SVD svd()
        Gets the Singular Value Decomposition (SVD) of matrix X.
        Returns:
        the Singular Value Decomposition (SVD) of matrix X
      • sdPrincipalComponents

        public DenseVector sdPrincipalComponents()
        Gets the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the correlation (or covariance) matrix, though the calculation is actually done with the singular values of the data matrix)
        Returns:
        the standard deviations of the principal components
      • loadings

        public Matrix loadings()
        Description copied from interface: PCA
        Gets the matrix of variable loadings. The signs of the columns of the loading are arbitrary.
        Returns:
        the matrix of variable loadings
      • data

        public ImmutableMatrix data()
        Gets the original data matrix.
        Returns:
        the original data matrix
      • nObs

        public int nObs()
        Description copied from interface: PCA
        Gets the number of observations in the original data; sample size.
        Specified by:
        nObs in interface PCA
        Returns:
        nObs, the number of observations in the original data
      • nFactors

        public int nFactors()
        Description copied from interface: PCA
        Gets the number of variables in the original data.
        Specified by:
        nFactors in interface PCA
        Returns:
        nFactors, the number of variables in the original data
      • X

        public Matrix X()
        Description copied from interface: PCA
        Gets the (possibly centered and/or scaled) data matrix X used for the PCA.
        Specified by:
        X in interface PCA
        Returns:
        the (possibly centered and/or scaled) data matrix X
      • sdPrincipalComponent

        public double sdPrincipalComponent​(int i)
        Description copied from interface: PCA
        Gets the standard deviation of the i-th principal component.
        Specified by:
        sdPrincipalComponent in interface PCA
        Parameters:
        i - an index, counting from 1
        Returns:
        the standard deviation of the i-th principal component.
      • loading

        public Vector loading​(int i)
        Description copied from interface: PCA
        Gets the loading vector of the i-th principal component.
        Specified by:
        loading in interface PCA
        Parameters:
        i - an index, counting from 1
        Returns:
        the loading vector of the i-th principal component
      • proportionVar

        public Vector proportionVar()
        Description copied from interface: PCA
        Gets the proportion of overall variance explained by each of the principal components.
        Specified by:
        proportionVar in interface PCA
        Returns:
        the proportion of overall variance explained by each of the principal components
      • proportionVar

        public double proportionVar​(int i)
        Description copied from interface: PCA
        Gets the proportion of overall variance explained by the i-th principal component.
        Specified by:
        proportionVar in interface PCA
        Parameters:
        i - an index, counting from 1
        Returns:
        the proportion of overall variance explained by the i-th principal component
      • cumulativeProportionVar

        public DenseVector cumulativeProportionVar()
        Description copied from interface: PCA
        Gets the cumulative proportion of overall variance explained by the principal components
        Specified by:
        cumulativeProportionVar in interface PCA
        Returns:
        the cumulative proportion of overall variance explained by the principal components
      • scores

        public Matrix scores()
        Description copied from interface: PCA
        Gets the scores of supplied data on the principal components. The signs of the columns of the scores are arbitrary.
        Specified by:
        scores in interface PCA
        Returns:
        the scores of supplied data on the principal components