Skip to main content

DWDM 2b ,2c experiment

 //2b//

  • Downloading and/or installation of WEKA data mining toolkit.

    1. Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software.

    2. Select the appropriate link corresponding to the version of the software based on your operating system and whether or not you already have Java VM running on your machine.

    3. The link will forward you to a site where you can download the software from a mirror site. Save the self-extracting executable to disk and then double click on it to install Weka. Answer yes or next to the questions during the installation.

    4. Click yes to accept the Java agreement if necessary. After you install the program Weka should appear on your start menu under Programs (if you are using Windows).

    5. Running Weka from the start menu select Programs, then Weka. You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.

  • //2c//
  • Understand the features of WEKA toolkit such as Explorer, Knowledge Flow interface, Experimenter, command-line interface.

  • ANS:

    The Weka GUI Chooser (class weka.gui.GUIChooser) provides a starting point for launching Weka‘s main GUI applications and supporting tools. If one prefers a MDI (“Multiple Document Interface”) appearance, then this is provided by an alternative launcher called “Main” (class weka.gui.Main).





The GUI Chooser application allows you to run five different types of applications -

  • The Explorer is the central panel where most data mining tasks are performed.

  • The Experimenter panel is used to run experiments and conduct statistical tests between learning schemes.

  • The KnowledgeFlow panel is used to provide an interface to drag and drop components, connect them to form a knowledge flow and analyze the data and results.

  • The WorkBench panel is used to discover, explore & learn about different statistical distributions.

  • The Simple CLI panel provides the command-line interface powers to run WEKA.



//2.d
Navigate the options available in the WEKA (ex. Select attributes panel, Preprocess panel, Classify panel, Cluster panel, Associate panel and Visualize panel)
ANS:
EXPLORER PANEL
Preprocessor Panel
A variety of dataset formats can be loaded: WEKA‘s ARFF format (.arff extension), CSV format (.csv extension), C4.5 format (.data & .names extension), or serialized Instances format (.bsi entension).
Load a standard dataset in the data/ directory of your Weka installation, specifically data/breast-cancer.arff.

Classify Panel

Test Options

  1. The result of applying the chosen classifier will be tested according to the options that are set by clicking in the Test options box.

  2. There are four test modes:

    • Use training set: The classifier is evaluated on how well it predicts the class of the instances it was trained on.

    • Supplied test set: The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... button brings up a dialog allowing you to choose the file to test on.

    • Cross-validation: The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field.

    • Percentage split: The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field.

  3. Click the “Start” button to run the ZeroR classifier on the dataset and summarize the results.


Cluster Panel

  1. Click the “Start” button to run the EM clustering algorithm on the dataset and summarize the results.

Associate Panel
Click the “Start” button to run the Apriori association algorithm on the dataset and summarize the results.
Select Attributes Panel
Click the “Start” button to run the CfsSubsetEval algorithm with a BestFirst search on the dataset and summarize the results.


Visualize Panel
Increase the point size and the jitter and click the “Update” button to set an improved plot of the categorical attributes of the loaded dataset.

Select Attributes Panel
Click the “Start” button to run the CfsSubsetEval algorithm with a BestFirst search on the dataset and summarize the results.





Comments

Popular posts from this blog

DWDM lab exp2

  Explore machine learning tool“WEKA”//2.a// Explore WEKA Data Mining/Machine Learning Toolkit. ANS: WEKA(Waikato Environment for Knowledge Analysis) an open-source software provides tools for data preprocessing, implementation of several Machine Learning algorithms, and visualization tools so that we can develop machine learning techniques and apply them to real-world data mining problems. Features of WEKA - Preprocessor – Most of the Data is Raw. Hence, Preprocessor is used to clean the noisy data. Classify – After preprocessing the data, we assign classes or categories to items. Cluster – In Clustering, a dataset is arranged in different groups/clusters based on some similarities. Associate – Association rules highlight all the associations and correlations between items of a dataset. Select Attributes – Every dataset contains a lot of attributes; only significantly valuable attributes are selected for building a good model. Visualize – In Visualization, different plot matrices ...

Dwdm 1exp 2 bit

  1.experiment 2 . Design multi-dimensional data models namely Star, Snowflake and Fact Constellation schemas forany one enterprise (ex. Banking, Insurance, Finance, Healthcare, manufacturing, Automobiles, sales etc). What is Schema?  Schema is a logical description of the entire database. Star Schema:  A star schema is the elementary form of a dimensional model, in which data are organized into facts and dimensions .  This dimension table contains the set of attributes. The following diagram shows the sales data of a company with respect to the four dimensions, namely time, item, branch, and location .  There is a fact table at the center. It contains the keys to each of four dimensions .  Snowflake Schema:  Some dimension tables in the Snowflake schema are normalized.  The normalization splits up the data into additional tables.  Fact Constellation Schema:  A Fact constellation means two or more fact tables sharing one or more dim...