DWDM 2b ,2c experiment

//2b//

Downloading and/or installation of WEKA data mining toolkit.

Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software.
Select the appropriate link corresponding to the version of the software based on your operating system and whether or not you already have Java VM running on your machine.
The link will forward you to a site where you can download the software from a mirror site. Save the self-extracting executable to disk and then double click on it to install Weka. Answer yes or next to the questions during the installation.
Click yes to accept the Java agreement if necessary. After you install the program Weka should appear on your start menu under Programs (if you are using Windows).
Running Weka from the start menu select Programs, then Weka. You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.

//2c//
Understand the features of WEKA toolkit such as Explorer, Knowledge Flow interface, Experimenter, command-line interface.

ANS:

The Weka GUI Chooser (class weka.gui.GUIChooser) provides a starting point for launching Weka‘s main GUI applications and supporting tools. If one prefers a MDI (“Multiple Document Interface”) appearance, then this is provided by an alternative launcher called “Main” (class weka.gui.Main).

The GUI Chooser application allows you to run five different types of applications -

The Explorer is the central panel where most data mining tasks are performed.
The Experimenter panel is used to run experiments and conduct statistical tests between learning schemes.
The KnowledgeFlow panel is used to provide an interface to drag and drop components, connect them to form a knowledge flow and analyze the data and results.
The WorkBench panel is used to discover, explore & learn about different statistical distributions.
The Simple CLI panel provides the command-line interface powers to run WEKA.

//2.d
Navigate the options available in the WEKA (ex. Select attributes panel, Preprocess panel, Classify panel, Cluster panel, Associate panel and Visualize panel)

ANS:

EXPLORER PANEL

Preprocessor Panel

A variety of dataset formats can be loaded: WEKA‘s ARFF format (.arff extension), CSV format (.csv extension), C4.5 format (.data & .names extension), or serialized Instances format (.bsi entension).

Load a standard dataset in the data/ directory of your Weka installation, specifically data/breast-cancer.arff.

Classify Panel

Test Options

The result of applying the chosen classifier will be tested according to the options that are set by clicking in the Test options box.
There are four test modes:

Use training set: The classifier is evaluated on how well it predicts the class of the instances it was trained on.
Supplied test set: The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... button brings up a dialog allowing you to choose the file to test on.
Cross-validation: The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field.
Percentage split: The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field.

Click the “Start” button to run the ZeroR classifier on the dataset and summarize the results.

Cluster Panel

Click the “Start” button to run the EM clustering algorithm on the dataset and summarize the results.

Associate Panel

Click the “Start” button to run the Apriori association algorithm on the dataset and summarize the results.

Select Attributes Panel

Click the “Start” button to run the CfsSubsetEval algorithm with a BestFirst search on the dataset and summarize the results.

Visualize Panel

Increase the point size and the jitter and click the “Update” button to set an improved plot of the categorical attributes of the loaded dataset.

Select Attributes Panel

Click the “Start” button to run the CfsSubsetEval algorithm with a BestFirst search on the dataset and summarize the results.

Jhansi Rani Kunaparaju

Search This Blog