Datasets for practicing Logistic Regression

I was looking for a list of Machine Learning datasets for comparing Logistic Regression model but I couldn’t find it easily. I spent some time curating it based on my need.

This post is collection of such datasets which you can download for your use.

1. Iris Dataset

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.


2. Titanic Dataset

Task is to use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.


3. Bank Marketing Dataset

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.


4. Haberman’s Survival Data Set

The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer.


5. Census Income dataset.

Predict whether income exceeds $50K/yr based on census data.


6. Wine Quality Data Set

Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.


7. Credit Card Dataset

This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.


8. Pima Indian Diabetes dataset

The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.



I found this interesting post on Quora about how to find the required dataset from Kaggle :