Skip to main content

DWDM 2b ,2c experiment

 //2b//

  • Downloading and/or installation of WEKA data mining toolkit.

    1. Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software.

    2. Select the appropriate link corresponding to the version of the software based on your operating system and whether or not you already have Java VM running on your machine.

    3. The link will forward you to a site where you can download the software from a mirror site. Save the self-extracting executable to disk and then double click on it to install Weka. Answer yes or next to the questions during the installation.

    4. Click yes to accept the Java agreement if necessary. After you install the program Weka should appear on your start menu under Programs (if you are using Windows).

    5. Running Weka from the start menu select Programs, then Weka. You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.

  • //2c//
  • Understand the features of WEKA toolkit such as Explorer, Knowledge Flow interface, Experimenter, command-line interface.

  • ANS:

    The Weka GUI Chooser (class weka.gui.GUIChooser) provides a starting point for launching Weka‘s main GUI applications and supporting tools. If one prefers a MDI (“Multiple Document Interface”) appearance, then this is provided by an alternative launcher called “Main” (class weka.gui.Main).





The GUI Chooser application allows you to run five different types of applications -

  • The Explorer is the central panel where most data mining tasks are performed.

  • The Experimenter panel is used to run experiments and conduct statistical tests between learning schemes.

  • The KnowledgeFlow panel is used to provide an interface to drag and drop components, connect them to form a knowledge flow and analyze the data and results.

  • The WorkBench panel is used to discover, explore & learn about different statistical distributions.

  • The Simple CLI panel provides the command-line interface powers to run WEKA.



//2.d
Navigate the options available in the WEKA (ex. Select attributes panel, Preprocess panel, Classify panel, Cluster panel, Associate panel and Visualize panel)
ANS:
EXPLORER PANEL
Preprocessor Panel
A variety of dataset formats can be loaded: WEKA‘s ARFF format (.arff extension), CSV format (.csv extension), C4.5 format (.data & .names extension), or serialized Instances format (.bsi entension).
Load a standard dataset in the data/ directory of your Weka installation, specifically data/breast-cancer.arff.

Classify Panel

Test Options

  1. The result of applying the chosen classifier will be tested according to the options that are set by clicking in the Test options box.

  2. There are four test modes:

    • Use training set: The classifier is evaluated on how well it predicts the class of the instances it was trained on.

    • Supplied test set: The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... button brings up a dialog allowing you to choose the file to test on.

    • Cross-validation: The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field.

    • Percentage split: The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field.

  3. Click the “Start” button to run the ZeroR classifier on the dataset and summarize the results.


Cluster Panel

  1. Click the “Start” button to run the EM clustering algorithm on the dataset and summarize the results.

Associate Panel
Click the “Start” button to run the Apriori association algorithm on the dataset and summarize the results.
Select Attributes Panel
Click the “Start” button to run the CfsSubsetEval algorithm with a BestFirst search on the dataset and summarize the results.


Visualize Panel
Increase the point size and the jitter and click the “Update” button to set an improved plot of the categorical attributes of the loaded dataset.

Select Attributes Panel
Click the “Start” button to run the CfsSubsetEval algorithm with a BestFirst search on the dataset and summarize the results.





Comments

Popular posts from this blog

Data structures: Introduction to Trees

Tree: The data in a tree are not stored in a sequential manner i.e., they are not stored linearly. Instead, they are arranged on multiple levels or we can say it is a hierarchical structure. For this reason, the tree is considered to be a non-linear data structure. KeyConcepts : Nodes : Individual units within the tree, each storing data and potentially linking to other nodes. Edges : Connections between nodes, representing relationships (parent-child, sibling, etc.). Root : The topmost node in the tree, from which all other nodes originate. Parent : A node that has one or more child nodes. Child : A node connected to a parent node. Leaf : A node with no children. Subtree : A portion of a tree that is itself a tree. Representation of tree: Binary search tree :

Hashing and hash functions

Types of hash functions

Data structures dequeues