We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28 * 28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools.
Please note that this dataset is NOT intended for clinical use.
Name | Data Modality | Tasks (# Classes/Labels) | # Training | # Validation | # Test |
---|---|---|---|---|---|
PathMNIST | Pathology | Multi-Class (9) | 89,996 | 10,004 | 7,180 |
ChestMNIST | Chest X-ray | Multi-Label (14) Binary-Class (2) | 78,468 | 11,219 | 22,433 |
DermaMNIST | Dermatoscope | Multi-Class (7) | 7,007 | 1,003 | 2,005 |
OCTMNIST | OCT | Multi-Class (4) | 97,477 | 10,832 | 1,000 |
PneumoniaMNIST | Chest X-ray | Binary-Class (2) | 4,708 | 524 | 624 |
RetinaMNIST | Fundus Camera | Ordinal Regression (5) | 1,080 | 120 | 400 |
BreastMNIST | Breast Ultrasound | Binary-Class (2) | 546 | 78 | 156 |
OrganMNIST_Axial | Abdominal CT | Multi-Class (11) | 34,581 | 6,491 | 17,778 |
OragnMNIST_Coronal | Abdominal CT | Multi-Class (11) | 13,000 | 2,392 | 8,268 |
OrganMNIST_Sagittal | Abdominal CT | Multi-Class (11) | 13,940 | 2,452 | 8,829 |
Methods | PathMNIST | ChestMNIST | DermaMNIST | OCTMNIST | PneumoniaMNIST | |||||
---|---|---|---|---|---|---|---|---|---|---|
AUC | ACC | AUC | ACC | AUC | ACC | AUC | ACC | AUC | ACC | |
ResNet-18 (28) | 0.972 | 0.844 | 0.706 | 0.947 | 0.899 | 0.721 | 0.951 | 0.758 | 0.957 | 0.843 |
ResNet-18 (224) | 0.978 | 0.860 | 0.713 | 0.948 | 0.896 | 0.727 | 0.960 | 0.752 | 0.970 | 0.861 |
ResNet-50 (28) | 0.979 | 0.864 | 0.692 | 0.947 | 0.886 | 0.710 | 0.939 | 0.745 | 0.949 | 0.857 |
ResNet-50 (224) | 0.978 | 0.848 | 0.706 | 0.947 | 0.895 | 0.719 | 0.951 | 0.750 | 0.968 | 0.896 |
auto-sklearn | 0.500 | 0.186 | 0.647 | 0.642 | 0.906 | 0.734 | 0.883 | 0.595 | 0.947 | 0.865 |
AutoKeras | 0.979 | 0.864 | 0.715 | 0.939 | 0.921 | 0.756 | 0.956 | 0.736 | 0.970 | 0.918 |
Google AutoML Vision | 0.982 | 0.811 | 0.718 | 0.947 | 0.925 | 0.766 | 0.965 | 0.732 | 0.993 | 0.941 |
Methods | RetinaMNIST | BreastMNIST | OrganMNIST (Axial) | OrganMNIST (Coronal) | OrganMNIST (Sagittal) | |||||
---|---|---|---|---|---|---|---|---|---|---|
AUC | ACC | AUC | ACC | AUC | ACC | AUC | ACC | AUC | ACC | |
ResNet-18 (28) | 0.727 | 0.515 | 0.897 | 0.859 | 0.995 | 0.921 | 0.990 | 0.889 | 0.967 | 0.762 |
ResNet-18 (224) | 0.721 | 0.543 | 0.915 | 0.878 | 0.997 | 0.931 | 0.991 | 0.907 | 0.974 | 0.777 |
ResNet-50 (28) | 0.719 | 0.490 | 0.879 | 0.853 | 0.995 | 0.916 | 0.990 | 0.893 | 0.968 | 0.746 |
ResNet-50 (224) | 0.717 | 0.555 | 0.863 | 0.833 | 0.997 | 0.931 | 0.992 | 0.898 | 0.970 | 0.770 |
auto-sklearn | 0.694 | 0.525 | 0.848 | 0.808 | 0.797 | 0.563 | 0.898 | 0.676 | 0.855 | 0.601 |
AutoKeras | 0.655 | 0.420 | 0.833 | 0.801 | 0.996 | 0.929 | 0.992 | 0.915 | 0.972 | 0.803 |
Google AutoML Vision | 0.762 | 0.530 | 0.932 | 0.865 | 0.988 | 0.818 | 0.986 | 0.861 | 0.964 | 0.706 |
If you find this project useful, please cite our paper as:
Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021.
or using bibtex:
@inproceedings{medmnistv1, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)}, pages={191--195}, year={2021} }
Besides, please cite the corresponding paper if you use any subset of MedMNIST. Each subset uses the same license as that of the source dataset.
Copyright © {{site.year}} {{site.copyright}}
This website is hosted on GitHub. Updated on {{site.lastUpdated}}.