Class Dagging

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class Dagging
    extends RandomizableSingleClassifierEnhancer
    implements TechnicalInformationHandler
    This meta classifier creates a number of disjoint, stratified folds out of the data and feeds each chunk of data to a copy of the supplied base classifier. Predictions are made via majority vote, since all the generated base classifiers are put into the Vote meta classifier.
    Useful for base classifiers that are quadratic or worse in time behavior, regarding number of instances in the training data.

    For more information, see:
    Ting, K. M., Witten, I. H.: Stacking Bagged and Dagged Models. In: Fourteenth international Conference on Machine Learning, San Francisco, CA, 367-375, 1997.

    BibTeX:

     @inproceedings{Ting1997,
        address = {San Francisco, CA},
        author = {Ting, K. M. and Witten, I. H.},
        booktitle = {Fourteenth international Conference on Machine Learning},
        editor = {D. H. Fisher},
        pages = {367-375},
        publisher = {Morgan Kaufmann Publishers},
        title = {Stacking Bagged and Dagged Models},
        year = {1997}
     }
     

    Valid options are:

     -F <folds>
      The number of folds for splitting the training set into
      smaller chunks for the base classifier.
      (default 10)
     -verbose
      Whether to print some more information during building the
      classifier.
      (default is off)
     -S <num>
      Random number seed.
      (default 1)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -W
      Full name of base classifier.
      (default: weka.classifiers.functions.SMO)
     
     Options specific to classifier weka.classifiers.functions.SMO:
     
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -no-checks
      Turns off all checks - use with caution!
      Turning them off assumes that data is purely numeric, doesn't
      contain any missing values, and has a nominal class. Turning them
      off also means that no header information will be stored if the
      machine is linear. Finally, it also assumes that no instance has
      a weight equal to 0.
      (default: checks on)
     -C <double>
      The complexity constant C. (default 1)
     -N
      Whether to 0=normalize/1=standardize/2=neither. (default 0=normalize)
     -L <double>
      The tolerance parameter. (default 1.0e-3)
     -P <double>
      The epsilon for round-off error. (default 1.0e-12)
     -M
      Fit logistic models to SVM outputs. 
     -V <double>
      The number of folds for the internal
      cross-validation. (default -1, use training data)
     -W <double>
      The random number seed. (default 1)
     -K <classname and parameters>
      The Kernel to use.
      (default: weka.classifiers.functions.supportVector.PolyKernel)
     
     Options specific to kernel weka.classifiers.functions.supportVector.PolyKernel:
     
     -D
      Enables debugging output (if available) to be printed.
      (default: off)
     -no-checks
      Turns off all checks - use with caution!
      (default: checks on)
     -C <num>
      The size of the cache (a prime number), 0 for full cache and 
      -1 to turn it off.
      (default: 250007)
     -E <num>
      The Exponent to use.
      (default: 1.0)
     -L
      Use lower-order terms.
      (default: no)
    Options after -- are passed to the designated classifier.

    Version:
    $Revision: 5306 $
    Author:
    Bernhard Pfahringer (bernhard at cs dot waikato dot ac dot nz), FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Vote, Serialized Form
    • Constructor Detail

      • Dagging

        public Dagging()
        Constructor.
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing classifier
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -F <folds>
          The number of folds for splitting the training set into
          smaller chunks for the base classifier.
          (default 10)
         -verbose
          Whether to print some more information during building the
          classifier.
          (default is off)
         -S <num>
          Random number seed.
          (default 1)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -W
          Full name of base classifier.
          (default: weka.classifiers.functions.SMO)
         
         Options specific to classifier weka.classifiers.functions.SMO:
         
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -no-checks
          Turns off all checks - use with caution!
          Turning them off assumes that data is purely numeric, doesn't
          contain any missing values, and has a nominal class. Turning them
          off also means that no header information will be stored if the
          machine is linear. Finally, it also assumes that no instance has
          a weight equal to 0.
          (default: checks on)
         -C <double>
          The complexity constant C. (default 1)
         -N
          Whether to 0=normalize/1=standardize/2=neither. (default 0=normalize)
         -L <double>
          The tolerance parameter. (default 1.0e-3)
         -P <double>
          The epsilon for round-off error. (default 1.0e-12)
         -M
          Fit logistic models to SVM outputs. 
         -V <double>
          The number of folds for the internal
          cross-validation. (default -1, use training data)
         -W <double>
          The random number seed. (default 1)
         -K <classname and parameters>
          The Kernel to use.
          (default: weka.classifiers.functions.supportVector.PolyKernel)
         
         Options specific to kernel weka.classifiers.functions.supportVector.PolyKernel:
         
         -D
          Enables debugging output (if available) to be printed.
          (default: off)
         -no-checks
          Turns off all checks - use with caution!
          (default: checks on)
         -C <num>
          The size of the cache (a prime number), 0 for full cache and 
          -1 to turn it off.
          (default: 250007)
         -E <num>
          The Exponent to use.
          (default: 1.0)
         -L
          Use lower-order terms.
          (default: no)
        Options after -- are passed to the designated classifier.

        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableSingleClassifierEnhancer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • getNumFolds

        public int getNumFolds()
        Gets the number of folds to use for splitting the training set.
        Returns:
        the number of folds
      • setNumFolds

        public void setNumFolds​(int value)
        Sets the number of folds to use for splitting the training set.
        Parameters:
        value - the new number of folds
      • numFoldsTipText

        public java.lang.String numFoldsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setVerbose

        public void setVerbose​(boolean value)
        Set the verbose state.
        Parameters:
        value - the verbose state
      • getVerbose

        public boolean getVerbose()
        Gets the verbose state
        Returns:
        the verbose state
      • verboseTipText

        public java.lang.String verboseTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        Bagging method.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        data - the training data to be used for generating the bagged classifier.
        Throws:
        java.lang.Exception - if the classifier could not be built successfully
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Calculates the class membership probabilities for the given test instance.
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance to be classified
        Returns:
        preedicted class probability distribution
        Throws:
        java.lang.Exception - if distribution can't be computed successfully
      • toString

        public java.lang.String toString()
        Returns description of the classifier.
        Overrides:
        toString in class java.lang.Object
        Returns:
        description of the classifier as a string
      • main

        public static void main​(java.lang.String[] args)
        Main method for testing this class.
        Parameters:
        args - the options