Class SubsetSizeForwardSelection

  • All Implemented Interfaces:
    java.io.Serializable, OptionHandler, RevisionHandler

    public class SubsetSizeForwardSelection
    extends ASSearch
    implements OptionHandler
    SubsetSizeForwardSelection:

    Extension of LinearForwardSelection. The search performs an interior cross-validation (seed and number of folds can be specified). A LinearForwardSelection is performed on each foldto determine the optimal subset-size (using the given SubsetSizeEvaluator). Finally, a LinearForwardSelection up to the optimal subset-size is performed on the whole data.

    For more information see:

    Martin Guetlein (2006). Large Scale Attribute Selection Using Wrappers. Freiburg, Germany.

    Valid options are:

     -I
      Perform initial ranking to select the
      top-ranked attributes.
     
     -K <num>
      Number of top-ranked attributes that are 
      taken into account by the search.
     
     -T <0 = fixed-set | 1 = fixed-width>
      Type of Linear Forward Selection (default = 0).
     
     -S <num>
      Size of lookup cache for evaluated subsets.
      Expressed as a multiple of the number of
      attributes in the data set. (default = 1)
     
     -E <subset evaluator>
      Subset-evaluator used for subset-size determination.-- -M
     
     -F <num>
      Number of cross validation folds
      for subset size determination (default = 5).
     
     -R <num>
      Seed for cross validation
      subset size determination. (default = 1)
     
     -Z
      verbose on/off
     
     Options specific to evaluator weka.attributeSelection.ClassifierSubsetEval:
     
     -B <classifier>
      class name of the classifier to use for accuracy estimation.
      Place any classifier options LAST on the command line
      following a "--". eg.:
       -B weka.classifiers.bayes.NaiveBayes ... -- -K
      (default: weka.classifiers.rules.ZeroR)
     
     -T
      Use the training data to estimate accuracy.
     
     -H <filename>
      Name of the hold out/test set to 
      estimate accuracy on.
     
     Options specific to scheme weka.classifiers.rules.ZeroR:
     
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     
    Version:
    $Revision: 11198 $
    Author:
    Martin Guetlein (martin.guetlein@gmail.com)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static Tag[] TAGS_TYPE  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int getLookupCacheSize()
      Return the maximum size of the evaluated subset cache (expressed as a multiplier for the number of attributes in a data set.
      int getNumSubsetSizeCVFolds()
      Get the number of cross validation folds for subset size determination (default = 5).
      int getNumUsedAttributes()
      Get the number of top-ranked attributes that taken into account by the search process.
      java.lang.String[] getOptions()
      Gets the current settings of LinearForwardSelection.
      boolean getPerformRanking()
      Get boolean if initial ranking should be performed to select the top-ranked attributes
      java.lang.String getRevision()
      Returns the revision string.
      int getSeed()
      Seed for cross validation subset size determination.
      ASEvaluation getSubsetSizeEvaluator()
      Get the subset evaluator used for subset size determination.
      TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      SelectedTag getType()
      Get the type
      boolean getVerbose()
      Get whether output is to be verbose
      java.lang.String globalInfo()
      Returns a string describing this search method
      java.util.Enumeration listOptions()
      Returns an enumeration describing the available options.
      java.lang.String lookupCacheSizeTipText()
      Returns the tip text for this property
      java.lang.String numSubsetSizeCVFoldsTipText()
      Returns the tip text for this property
      java.lang.String numUsedAttributesTipText()
      Returns the tip text for this property
      java.lang.String performRankingTipText()
      Returns the tip text for this property
      int[] search​(ASEvaluation ASEval, Instances data)
      Searches the attribute subset space by subset size forward selection
      java.lang.String seedTipText()
      Returns the tip text for this property
      void setLookupCacheSize​(int size)
      Set the maximum size of the evaluated subset cache (hashtable).
      void setNumSubsetSizeCVFolds​(int f)
      Set the number of cross validation folds for subset size determination (default = 5).
      void setNumUsedAttributes​(int k)
      Set the number of top-ranked attributes that taken into account by the search process.
      void setOptions​(java.lang.String[] options)
      Parses a given list of options.
      void setPerformRanking​(boolean b)
      Perform initial ranking to select top-ranked attributes.
      void setSeed​(int s)
      Seed for cross validation subset size determination.
      void setSubsetSizeEvaluator​(ASEvaluation eval)
      Set the subset evaluator to use for subset size determination.
      void setType​(SelectedTag t)
      Set the type
      void setVerbose​(boolean b)
      Set whether verbose output should be generated.
      java.lang.String subsetSizeEvaluatorTipText()
      Returns the tip text for this property
      java.lang.String toString()
      returns a description of the search as a String
      java.lang.String typeTipText()
      Returns the tip text for this property
      java.lang.String verboseTipText()
      Returns the tip text for this property
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • TAGS_TYPE

        public static final Tag[] TAGS_TYPE
    • Constructor Detail

      • SubsetSizeForwardSelection

        public SubsetSizeForwardSelection()
        Constructor
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this search method
        Returns:
        a description of the search method suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Returns:
        the technical information about this class
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options. Valid options are:

        -I
        Perform initial ranking to select top-ranked attributes.

        -K
        Number of top-ranked attributes that are taken into account.

        -T <0 = fixed-set | 1 = fixed-width>
        Typ of Linear Forward Selection (default = 0).

        -S
        Size of lookup cache for evaluated subsets. Expressed as a multiple of the number of attributes in the data set. (default = 1).

        -E
        class name of subset evaluator to use for subset size determination (default = null, same subset evaluator as for ranking and final forward selection is used). Place any evaluator options LAST on the command line following a "--". eg. -A weka.attributeSelection.ClassifierSubsetEval ... -- -M -F
        Number of cross validation folds for subset size determination (default = 5).

        -R
        Seed for cross validation subset size determination. (default = 1)

        -Z
        verbose on/off.

        Specified by:
        setOptions in interface OptionHandler
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • setLookupCacheSize

        public void setLookupCacheSize​(int size)
        Set the maximum size of the evaluated subset cache (hashtable). This is expressed as a multiplier for the number of attributes in the data set. (default = 1).
        Parameters:
        size - the maximum size of the hashtable
      • getLookupCacheSize

        public int getLookupCacheSize()
        Return the maximum size of the evaluated subset cache (expressed as a multiplier for the number of attributes in a data set.
        Returns:
        the maximum size of the hashtable.
      • lookupCacheSizeTipText

        public java.lang.String lookupCacheSizeTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • performRankingTipText

        public java.lang.String performRankingTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setPerformRanking

        public void setPerformRanking​(boolean b)
        Perform initial ranking to select top-ranked attributes.
        Parameters:
        b - true if initial ranking should be performed
      • getPerformRanking

        public boolean getPerformRanking()
        Get boolean if initial ranking should be performed to select the top-ranked attributes
        Returns:
        true if initial ranking should be performed
      • numUsedAttributesTipText

        public java.lang.String numUsedAttributesTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumUsedAttributes

        public void setNumUsedAttributes​(int k)
                                  throws java.lang.Exception
        Set the number of top-ranked attributes that taken into account by the search process.
        Parameters:
        k - the number of attributes
        Throws:
        java.lang.Exception - if k is less than 2
      • getNumUsedAttributes

        public int getNumUsedAttributes()
        Get the number of top-ranked attributes that taken into account by the search process.
        Returns:
        the number of top-ranked attributes that taken into account
      • typeTipText

        public java.lang.String typeTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setType

        public void setType​(SelectedTag t)
        Set the type
        Parameters:
        t - the Linear Forward Selection type
      • getType

        public SelectedTag getType()
        Get the type
        Returns:
        the Linear Forward Selection type
      • subsetSizeEvaluatorTipText

        public java.lang.String subsetSizeEvaluatorTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSubsetSizeEvaluator

        public void setSubsetSizeEvaluator​(ASEvaluation eval)
                                    throws java.lang.Exception
        Set the subset evaluator to use for subset size determination.
        Parameters:
        eval - the subset evaluator to use for subset size determination.
        Throws:
        java.lang.Exception
      • getSubsetSizeEvaluator

        public ASEvaluation getSubsetSizeEvaluator()
        Get the subset evaluator used for subset size determination.
        Returns:
        the evaluator used for subset size determination.
      • numSubsetSizeCVFoldsTipText

        public java.lang.String numSubsetSizeCVFoldsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumSubsetSizeCVFolds

        public void setNumSubsetSizeCVFolds​(int f)
        Set the number of cross validation folds for subset size determination (default = 5).
        Parameters:
        f - number of folds
      • getNumSubsetSizeCVFolds

        public int getNumSubsetSizeCVFolds()
        Get the number of cross validation folds for subset size determination (default = 5).
        Returns:
        number of folds
      • seedTipText

        public java.lang.String seedTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSeed

        public void setSeed​(int s)
        Seed for cross validation subset size determination. (default = 1)
        Parameters:
        s - seed
      • getSeed

        public int getSeed()
        Seed for cross validation subset size determination. (default = 1)
        Returns:
        seed
      • verboseTipText

        public java.lang.String verboseTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setVerbose

        public void setVerbose​(boolean b)
        Set whether verbose output should be generated.
        Parameters:
        d - true if output is to be verbose.
      • getVerbose

        public boolean getVerbose()
        Get whether output is to be verbose
        Returns:
        true if output will be verbose
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of LinearForwardSelection.
        Specified by:
        getOptions in interface OptionHandler
        Returns:
        an array of strings suitable for passing to setOptions()
      • toString

        public java.lang.String toString()
        returns a description of the search as a String
        Overrides:
        toString in class java.lang.Object
        Returns:
        a description of the search
      • search

        public int[] search​(ASEvaluation ASEval,
                            Instances data)
                     throws java.lang.Exception
        Searches the attribute subset space by subset size forward selection
        Specified by:
        search in class ASSearch
        Parameters:
        ASEvaluator - the attribute evaluator to guide the search
        data - the training instances.
        Returns:
        an array (not necessarily ordered) of selected attribute indexes
        Throws:
        java.lang.Exception - if the search can't be completed