Class RandomProjection

  • All Implemented Interfaces:
    java.io.Serializable, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, UnsupervisedFilter

    public class RandomProjection
    extends Filter
    implements UnsupervisedFilter, OptionHandler, TechnicalInformationHandler
    Reduces the dimensionality of the data by projecting it onto a lower dimensional subspace using a random matrix with columns of unit length (i.e. It will reduce the number of attributes in the data while preserving much of its variation like PCA, but at a much less computational cost).
    It first applies the NominalToBinary filter to convert all attributes to numeric before reducing the dimension. It preserves the class attribute.

    For more information, see:

    Dmitriy Fradkin, David Madigan: Experiments with random projections for machine learning. In: KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 517-522, 003.

    BibTeX:

     @inproceedings{Fradkin003,
        address = {New York, NY, USA},
        author = {Dmitriy Fradkin and David Madigan},
        booktitle = {KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining},
        pages = {517-522},
        publisher = {ACM Press},
        title = {Experiments with random projections for machine learning},
        year = {003}
     }
     

    Valid options are:

     -N <number>
      The number of dimensions (attributes) the data should be reduced to
      (default 10; exclusive of the class attribute, if it is set).
     
     -D [SPARSE1|SPARSE2|GAUSSIAN]
      The distribution to use for calculating the random matrix.
      Sparse1 is:
        sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)}
      Sparse2 is:
        {-1 with prob(1/2), +1 with prob(1/2)}
     
     -P <percent>
      The percentage of dimensions (attributes) the data should
      be reduced to (exclusive of the class attribute, if it is set). This -N
      option is ignored if this option is present and is greater
      than zero.
     
     -M
      Replace missing values using the ReplaceMissingValues filter
     
     -R <num>
      The random seed for the random number generator used for
      calculating the random matrix (default 42).
     
    Version:
    $Revision: 10832 $ [1.0 - 22 July 2003 - Initial version (Ashraf M. Kibriya)]
    Author:
    Ashraf M. Kibriya (amk14@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Field Detail

      • TAGS_DSTRS_TYPE

        public static final Tag[] TAGS_DSTRS_TYPE
        The types of distributions that can be used for calculating the random matrix
    • Constructor Detail

      • RandomProjection

        public RandomProjection()
    • Method Detail

      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -N <number>
          The number of dimensions (attributes) the data should be reduced to
          (default 10; exclusive of the class attribute, if it is set).
         
         -D [SPARSE1|SPARSE2|GAUSSIAN]
          The distribution to use for calculating the random matrix.
          Sparse1 is:
            sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)}
          Sparse2 is:
            {-1 with prob(1/2), +1 with prob(1/2)}
         
         -P <percent>
          The percentage of dimensions (attributes) the data should
          be reduced to (exclusive of the class attribute, if it is set). This -N
          option is ignored if this option is present and is greater
          than zero.
         
         -M
          Replace missing values using the ReplaceMissingValues filter
         
         -R <num>
          The random seed for the random number generator used for
          calculating the random matrix (default 42).
         
        Specified by:
        setOptions in interface OptionHandler
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface OptionHandler
        Returns:
        an array of strings suitable for passing to setOptions
      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this filter
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • numberOfAttributesTipText

        public java.lang.String numberOfAttributesTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumberOfAttributes

        public void setNumberOfAttributes​(int newAttNum)
        Sets the number of attributes (dimensions) the data should be reduced to
        Parameters:
        newAttNum - the goal for the dimensions
      • getNumberOfAttributes

        public int getNumberOfAttributes()
        Gets the current number of attributes (dimensionality) to which the data will be reduced to.
        Returns:
        the number of dimensions
      • percentTipText

        public java.lang.String percentTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setPercent

        public void setPercent​(double newPercent)
        Sets the percent the attributes (dimensions) of the data should be reduced to
        Parameters:
        newPercent - the percentage of attributes
      • getPercent

        public double getPercent()
        Gets the percent the attributes (dimensions) of the data will be reduced to
        Returns:
        the percentage of attributes
      • randomSeedTipText

        public java.lang.String randomSeedTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setRandomSeed

        public void setRandomSeed​(long seed)
        Sets the random seed of the random number generator
        Parameters:
        seed - the random seed value
      • getRandomSeed

        public long getRandomSeed()
        Gets the random seed of the random number generator
        Returns:
        the random seed value
      • distributionTipText

        public java.lang.String distributionTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDistribution

        public void setDistribution​(SelectedTag newDstr)
        Sets the distribution to use for calculating the random matrix
        Parameters:
        newDstr - the distribution to use
      • getDistribution

        public SelectedTag getDistribution()
        Returns the current distribution that'll be used for calculating the random matrix
        Returns:
        the current distribution
      • replaceMissingValuesTipText

        public java.lang.String replaceMissingValuesTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setReplaceMissingValues

        public void setReplaceMissingValues​(boolean t)
        Sets either to use replace missing values filter or not
        Parameters:
        t - if true then the replace missing values is used
      • getReplaceMissingValues

        public boolean getReplaceMissingValues()
        Gets the current setting for using ReplaceMissingValues filter
        Returns:
        true if the replace missing values filter is used
      • setInputFormat

        public boolean setInputFormat​(Instances instanceInfo)
                               throws java.lang.Exception
        Sets the format of the input instances.
        Overrides:
        setInputFormat in class Filter
        Parameters:
        instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
        Returns:
        true if the outputFormat may be collected immediately
        Throws:
        java.lang.Exception - if the input format can't be set successfully
      • input

        public boolean input​(Instance instance)
                      throws java.lang.Exception
        Input an instance for filtering.
        Overrides:
        input in class Filter
        Parameters:
        instance - the input instance
        Returns:
        true if the filtered instance may now be collected with output().
        Throws:
        java.lang.IllegalStateException - if no input format has been set
        java.lang.NullPointerException - if the input format has not been defined.
        java.lang.Exception - if the input instance was not of the correct format or if there was a problem with the filtering.
      • batchFinished

        public boolean batchFinished()
                              throws java.lang.Exception
        Signify that this batch of input to the filter is finished.
        Overrides:
        batchFinished in class Filter
        Returns:
        true if there are instances pending output
        Throws:
        java.lang.NullPointerException - if no input structure has been defined,
        java.lang.Exception - if there was a problem finishing the batch.
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - should contain arguments to the filter: use -h for help