Abstract

We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28 * 28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools.

Key Features

Educational: Our multi-modal data, from multiple open medical image datasets with Creative Commons (CC) Licenses, is easy to use for educational purpose.
Standardized: Data is pre-processed into same format, which requires no background knowledge for users.
Diverse: The multi-modal datasets covers diverse data scales (from 100 to 100,000) and tasks (binary/multiclass, ordinal regression and multi-label).
Lightweight: The small size of 28 × 28 is friendly for rapid prototyping and experimenting multi-modal machine learning and AutoML algorithms.

Please note that this dataset is NOT intended for clinical use.

Materials

An Overview of MedMNIST Dataset
Name	Data Modality	Tasks (# Classes/Labels)	# Training	# Validation	# Test
PathMNIST	Pathology	Multi-Class (9)	89,996	10,004	7,180
ChestMNIST	Chest X-ray	Multi-Label (14) Binary-Class (2)	78,468	11,219	22,433
DermaMNIST	Dermatoscope	Multi-Class (7)	7,007	1,003	2,005
OCTMNIST	OCT	Multi-Class (4)	97,477	10,832	1,000
PneumoniaMNIST	Chest X-ray	Binary-Class (2)	4,708	524	624
RetinaMNIST	Fundus Camera	Ordinal Regression (5)	1,080	120	400
BreastMNIST	Breast Ultrasound	Binary-Class (2)	546	78	156
OrganMNIST_Axial	Abdominal CT	Multi-Class (11)	34,581	6,491	17,778
OragnMNIST_Coronal	Abdominal CT	Multi-Class (11)	13,000	2,392	8,268
OrganMNIST_Sagittal	Abdominal CT	Multi-Class (11)	13,940	2,452	8,829

Benchmarking

Methods	PathMNIST		ChestMNIST		DermaMNIST		OCTMNIST		PneumoniaMNIST
Methods	AUC	ACC	AUC	ACC	AUC	ACC	AUC	ACC	AUC	ACC
ResNet-18 (28)	0.972	0.844	0.706	0.947	0.899	0.721	0.951	0.758	0.957	0.843
ResNet-18 (224)	0.978	0.860	0.713	0.948	0.896	0.727	0.960	0.752	0.970	0.861
ResNet-50 (28)	0.979	0.864	0.692	0.947	0.886	0.710	0.939	0.745	0.949	0.857
ResNet-50 (224)	0.978	0.848	0.706	0.947	0.895	0.719	0.951	0.750	0.968	0.896
auto-sklearn	0.500	0.186	0.647	0.642	0.906	0.734	0.883	0.595	0.947	0.865
AutoKeras	0.979	0.864	0.715	0.939	0.921	0.756	0.956	0.736	0.970	0.918
Google AutoML Vision	0.982	0.811	0.718	0.947	0.925	0.766	0.965	0.732	0.993	0.941

Benchmarking Performance on MedMNIST Dataset
Methods	RetinaMNIST		BreastMNIST		OrganMNIST (Axial)		OrganMNIST (Coronal)		OrganMNIST (Sagittal)
Methods	AUC	ACC	AUC	ACC	AUC	ACC	AUC	ACC	AUC	ACC
ResNet-18 (28)	0.727	0.515	0.897	0.859	0.995	0.921	0.990	0.889	0.967	0.762
ResNet-18 (224)	0.721	0.543	0.915	0.878	0.997	0.931	0.991	0.907	0.974	0.777
ResNet-50 (28)	0.719	0.490	0.879	0.853	0.995	0.916	0.990	0.893	0.968	0.746
ResNet-50 (224)	0.717	0.555	0.863	0.833	0.997	0.931	0.992	0.898	0.970	0.770
auto-sklearn	0.694	0.525	0.848	0.808	0.797	0.563	0.898	0.676	0.855	0.601
AutoKeras	0.655	0.420	0.833	0.801	0.996	0.929	0.992	0.915	0.972	0.803
Google AutoML Vision	0.762	0.530	0.932	0.865	0.988	0.818	0.986	0.861	0.964	0.706

Citation and Licenses

If you find this project useful, please cite our paper as:

Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021.

or using bibtex:

@inproceedings{medmnistv1,
    title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis},
    author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing},
    booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)},
    pages={191--195},
    year={2021}
}

Besides, please cite the corresponding paper if you use any subset of MedMNIST. Each subset uses the same license as that of the source dataset.

License: {{reference.license}}

{{citation}}

This website is hosted on GitHub. Updated on {{site.lastUpdated}}.

{{about.title}}

{{author.name}}{{author.name}},

^{{i+1}} {{affiliation}}

{{material.name}} [{{material.where}}]

Abstract

Key Features

Materials

Benchmarking

Citation and Licenses

{{reference.dataset}}