Class BIRCHCluster

  • All Implemented Interfaces:
    java.io.Serializable, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class BIRCHCluster
    extends ClusterGenerator
    implements TechnicalInformationHandler
    Cluster data generator designed for the BIRCH System

    Dataset is generated with instances in K clusters.
    Instances are 2-d data points.
    Each cluster is characterized by the number of data points in itits radius and its center. The location of the cluster centers isdetermined by the pattern parameter. Three patterns are currentlysupported grid, sine and random.

    For more information refer to:

    Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: ACM SIGMOD International Conference on Management of Data, 103-114, 1996.

    BibTeX:

     @inproceedings{Zhang1996,
        author = {Tian Zhang and Raghu Ramakrishnan and Miron Livny},
        booktitle = {ACM SIGMOD International Conference on Management of Data},
        pages = {103-114},
        publisher = {ACM Press},
        title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases},
        year = {1996}
     }
     

    Valid options are:

     -h
      Prints this help.
     -o <file>
      The name of the output file, otherwise the generated data is
      printed to stdout.
     -r <name>
      The name of the relation.
     -d
      Whether to print debug informations.
     -S
      The seed for random function (default 1)
     -a <num>
      The number of attributes (default 10).
     -c
      Class Flag, if set, the cluster is listed in extra attribute.
     -b <range>
      The indices for boolean attributes.
     -m <range>
      The indices for nominal attributes.
     -k <num>
      The number of clusters (default 4)
     -G
      Set pattern to grid (default is random).
      This flag cannot be used at the same time as flag I.
      The pattern is random, if neither flag G nor flag I is set.
     -I
      Set pattern to sine (default is random).
      This flag cannot be used at the same time as flag I.
      The pattern is random, if neither flag G nor flag I is set.
     -N <num>..<num>
      The range of number of instances per cluster (default 1..50).
      Lower number must be between 0 and 2500,
      upper number must be between 50 and 2500.
     -R <num>..<num>
      The range of radius per cluster (default 0.1..1.4142135623730951).
      Lower number must be between 0 and SQRT(2), 
      upper number must be between SQRT(2) and SQRT(32).
     -M <num>
      The distance multiplier (default 4.0).
     -C <num>
      The number of cycles (default 4).
     -O
      Flag for input order is ORDERED. If flag is not set then 
      input order is RANDOMIZED. RANDOMIZED is currently not 
      implemented, therefore is the input order always ORDERED.
     -P <num>
      The noise rate in percent (default 0.0).
      Can be between 0% and 30%. (Remark: The original 
      algorithm only allows noise up to 10%.)
    Version:
    $Revision: 1.8 $
    Author:
    Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • GRID

        public static final int GRID
        Constant set for choice of pattern. (option G)
        See Also:
        Constant Field Values
      • SINE

        public static final int SINE
        Constant set for choice of pattern. (option I)
        See Also:
        Constant Field Values
      • RANDOM

        public static final int RANDOM
        Constant set for choice of pattern. (default)
        See Also:
        Constant Field Values
      • TAGS_PATTERN

        public static final Tag[] TAGS_PATTERN
        the pattern tags
      • ORDERED

        public static final int ORDERED
        Constant set for input order (option O)
        See Also:
        Constant Field Values
      • RANDOMIZED

        public static final int RANDOMIZED
        Constant set for input order (default)
        See Also:
        Constant Field Values
      • TAGS_INPUTORDER

        public static final Tag[] TAGS_INPUTORDER
        the input order tags
    • Constructor Detail

      • BIRCHCluster

        public BIRCHCluster()
        initializes the generator with default values
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this data generator.
        Returns:
        a description of the data generator suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class ClusterGenerator
        Returns:
        an enumeration of all the available options
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a list of options for this object.

        Valid options are:

         -h
          Prints this help.
         -o <file>
          The name of the output file, otherwise the generated data is
          printed to stdout.
         -r <name>
          The name of the relation.
         -d
          Whether to print debug informations.
         -S
          The seed for random function (default 1)
         -a <num>
          The number of attributes (default 10).
         -c
          Class Flag, if set, the cluster is listed in extra attribute.
         -b <range>
          The indices for boolean attributes.
         -m <range>
          The indices for nominal attributes.
         -k <num>
          The number of clusters (default 4)
         -G
          Set pattern to grid (default is random).
          This flag cannot be used at the same time as flag I.
          The pattern is random, if neither flag G nor flag I is set.
         -I
          Set pattern to sine (default is random).
          This flag cannot be used at the same time as flag I.
          The pattern is random, if neither flag G nor flag I is set.
         -N <num>..<num>
          The range of number of instances per cluster (default 1..50).
          Lower number must be between 0 and 2500,
          upper number must be between 50 and 2500.
         -R <num>..<num>
          The range of radius per cluster (default 0.1..1.4142135623730951).
          Lower number must be between 0 and SQRT(2), 
          upper number must be between SQRT(2) and SQRT(32).
         -M <num>
          The distance multiplier (default 4.0).
         -C <num>
          The number of cycles (default 4).
         -O
          Flag for input order is ORDERED. If flag is not set then 
          input order is RANDOMIZED. RANDOMIZED is currently not 
          implemented, therefore is the input order always ORDERED.
         -P <num>
          The noise rate in percent (default 0.0).
          Can be between 0% and 30%. (Remark: The original 
          algorithm only allows noise up to 10%.)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class ClusterGenerator
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • setNumClusters

        public void setNumClusters​(int numClusters)
        Sets the number of clusters the dataset should have.
        Parameters:
        numClusters - the new number of clusters
      • getNumClusters

        public int getNumClusters()
        Gets the number of clusters the dataset should have.
        Returns:
        the number of clusters the dataset should have
      • numClustersTipText

        public java.lang.String numClustersTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getMinInstNum

        public int getMinInstNum()
        Gets the lower boundary for instances per cluster.
        Returns:
        the the lower boundary for instances per cluster
      • setMinInstNum

        public void setMinInstNum​(int newMinInstNum)
        Sets the lower boundary for instances per cluster.
        Parameters:
        newMinInstNum - new lower boundary for instances per cluster
      • minInstNumTipText

        public java.lang.String minInstNumTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getMaxInstNum

        public int getMaxInstNum()
        Gets the upper boundary for instances per cluster.
        Returns:
        the upper boundary for instances per cluster
      • setMaxInstNum

        public void setMaxInstNum​(int newMaxInstNum)
        Sets the upper boundary for instances per cluster.
        Parameters:
        newMaxInstNum - new upper boundary for instances per cluster
      • maxInstNumTipText

        public java.lang.String maxInstNumTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getMinRadius

        public double getMinRadius()
        Gets the lower boundary for the radiuses of the clusters.
        Returns:
        the lower boundary for the radiuses of the clusters
      • setMinRadius

        public void setMinRadius​(double newMinRadius)
        Sets the lower boundary for the radiuses of the clusters.
        Parameters:
        newMinRadius - new lower boundary for the radiuses of the clusters
      • minRadiusTipText

        public java.lang.String minRadiusTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getMaxRadius

        public double getMaxRadius()
        Gets the upper boundary for the radiuses of the clusters.
        Returns:
        the upper boundary for the radiuses of the clusters
      • setMaxRadius

        public void setMaxRadius​(double newMaxRadius)
        Sets the upper boundary for the radiuses of the clusters.
        Parameters:
        newMaxRadius - new upper boundary for the radiuses of the clusters
      • maxRadiusTipText

        public java.lang.String maxRadiusTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getPattern

        public SelectedTag getPattern()
        Gets the pattern type.
        Returns:
        the current pattern type
      • setPattern

        public void setPattern​(SelectedTag value)
        Sets the pattern type.
        Parameters:
        value - new pattern type
      • patternTipText

        public java.lang.String patternTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getDistMult

        public double getDistMult()
        Gets the distance multiplier.
        Returns:
        the distance multiplier
      • setDistMult

        public void setDistMult​(double newDistMult)
        Sets the distance multiplier.
        Parameters:
        newDistMult - new distance multiplier
      • distMultTipText

        public java.lang.String distMultTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getNumCycles

        public int getNumCycles()
        Gets the number of cycles.
        Returns:
        the number of cycles
      • setNumCycles

        public void setNumCycles​(int newNumCycles)
        Sets the the number of cycles.
        Parameters:
        newNumCycles - new number of cycles
      • numCyclesTipText

        public java.lang.String numCyclesTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getInputOrder

        public SelectedTag getInputOrder()
        Gets the input order.
        Returns:
        the current input order
      • setInputOrder

        public void setInputOrder​(SelectedTag value)
        Sets the input order.
        Parameters:
        value - new input order
      • inputOrderTipText

        public java.lang.String inputOrderTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getOrderedFlag

        public boolean getOrderedFlag()
        Gets the ordered flag (option O).
        Returns:
        true if ordered flag is set
      • getNoiseRate

        public double getNoiseRate()
        Gets the percentage of noise set.
        Returns:
        the percentage of noise set
      • setNoiseRate

        public void setNoiseRate​(double newNoiseRate)
        Sets the percentage of noise set.
        Parameters:
        newNoiseRate - new percentage of noise
      • noiseRateTipText

        public java.lang.String noiseRateTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getSingleModeFlag

        public boolean getSingleModeFlag()
        Gets the single mode flag.
        Specified by:
        getSingleModeFlag in class DataGenerator
        Returns:
        true if methode generateExample can be used.
      • generateExample

        public Instance generateExample()
                                 throws java.lang.Exception
        Generate an example of the dataset.
        Specified by:
        generateExample in class DataGenerator
        Returns:
        the instance generated
        Throws:
        java.lang.Exception - if format not defined or generating
        examples one by one is not possible, because voting is chosen
      • generateExamples

        public Instances generateExamples()
                                   throws java.lang.Exception
        Generate all examples of the dataset.
        Specified by:
        generateExamples in class DataGenerator
        Returns:
        the instance generated
        Throws:
        java.lang.Exception - if format not defined
      • generateExamples

        public Instances generateExamples​(java.util.Random random,
                                          Instances format)
                                   throws java.lang.Exception
        Generate all examples of the dataset.
        Parameters:
        random - the random number generator to use
        format - the dataset format
        Returns:
        the instance generated
        Throws:
        java.lang.Exception - if format not defined
      • generateFinished

        public java.lang.String generateFinished()
                                          throws java.lang.Exception
        Compiles documentation about the data generation after the generation process
        Specified by:
        generateFinished in class DataGenerator
        Returns:
        string with additional information about generated dataset
        Throws:
        java.lang.Exception - no input structure has been defined
      • generateStart

        public java.lang.String generateStart()
        Compiles documentation about the data generation before the generation process
        Specified by:
        generateStart in class DataGenerator
        Returns:
        string with additional information
      • getRevision

        public java.lang.String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface RevisionHandler
        Returns:
        the revision
      • main

        public static void main​(java.lang.String[] args)
        Main method for testing this class.
        Parameters:
        args - should contain arguments for the data producer: