LATEST UPDATES
latest

Iris Dataset Classification using Radius Neighbors Classifier

iris dataset classification machine learning
The Iris dataset is like the "Hello World" of Machine Learning. This dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems. You can find this dataset on Kaggle, and learn more about it from it's Wikipedia page, and on the UCI Machine Learning Repository.

The Iris dataset includes three iris species (Iris setosa, Iris virginica and Iris versicolor) with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The columns in this dataset are:

  • Id
  • SepalLengthCm
  • SepalWidthCm
  • PetalLengthCm
  • PetalWidthCm
  • Species

    Here, we will classify the iris species in this dataset using the RadiusNeighborsClassifier algorithm. It is a part of the sklearn.neighbors class which has models for both unsupervised and supervised learning methods. The RadiusNeighborsClassifier algorithm is a part of the Supervised neighbors-based learning for classification of data with discrete labels.

    This algorithm makes predictions based on the "radius" argument, which defaults to 1.0. You can check out the other hyperparameters from the sklearn page. Using it is pretty simple, create an instance of the RadiusNeighborsClassifier class:

    Then train the model:

    This is a probability based model, so when you will predict the model will give you a probability for all the classes which you can access using the predict_proba() method.

    You can find a complete implementation of this algorithm in our GitHub repository. Don't forget to fork it, and Like us on Facebook.
    « PREV
    NEXT »