{{about.title}}



{{i+1}} {{affiliation}}




Abstract

We introduce MedMNIST v2, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28 x 28 (2D) or 28 x 28 x 28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of 708,069 2D images and 10,214 3D images in total, could support numerous research / educational purposes in biomedical image analysis, computer vision and machine learning. We benchmark several baseline methods on MedMNIST v2, including 2D / 3D neural networks and open-source / commercial AutoML tools.


Key Features

  • Educational: Our multi-modal data, from multiple open medical image datasets with Creative Commons (CC) Licenses, is easy to use for educational purpose.
  • Standardized: Data is pre-processed into same format, which requires no background knowledge for users.
  • Diverse: The multi-modal datasets covers diverse data scales (from 100 to 100,000) and tasks (binary/multiclass, ordinal regression and multi-label).
  • Lightweight: The small size is friendly for rapid prototyping and experimenting multi-modal machine learning and AutoML algorithms.

Please note that this dataset is NOT intended for clinical use.


Materials

The MedMNIST v2 dataset consists of 12 pre-processed 2D datasets and 6 pre-processed 3D datasets from selected sources covering primary data modalities (e.g., X-Ray, OCT, Ultrasound, CT, Electron Microscope), diverse classification tasks (binary/multi-class, ordinal regression and multi-label) and data scales (from 100 to 100,000). For simplicity, we call the collection of all 2D datasets as MedMNIST2D, and that of 3D as MedMNIST3D.


MedMNIST2D

An Overview of MedMNIST2D in MedMNIST v2. Click➚ each row to view more details.
MedMNIST2D Data Modality Tasks (# Classes/Labels) # Samples # Training / Validation / Test
{{subset.dataset}} {{subset.modality}} {{subset.task}} {{subset.samples}} {{subset.splits}}

Facts of {{selected2d.dataset}}

Data Modality: {{selected2d.modality}}
Task: {{selected2d.task}}
Number of Samples: {{selected2d.samples}} ({{selected2d.splits}})
Source Data:

{{citation}}

License: {{selected2d.license}}

MedMNIST3D

An Overview of MedMNIST3D in MedMNIST v2. Click➚ each row to view more details.
MedMNIST3D Data Modality Tasks (# Classes/Labels) # Samples # Training / Validation / Test
{{subset.dataset}} {{subset.modality}} {{subset.task}} {{subset.samples}} {{subset.splits}}

Facts of {{selected3d.dataset}}

{{format3d}}
Data Modality: {{selected3d.modality}}
Task: {{selected3d.task}}
Number of Samples: {{selected3d.samples}} ({{selected3d.splits}})
Source Data:

{{citation}}

License: {{selected3d.license}}

Benchmarking

Methods PathMNIST ChestMNIST DermaMNIST OCTMNIST PneumoniaMNIST RetinaMNIST
AUC ACC AUC ACC AUC ACC AUC ACC AUC ACC AUC ACC
ResNet-18 (28) 0.983 0.907 0.768 0.947 0.917 0.735 0.943 0.743 0.944 0.854 0.717 0.524
ResNet-18 (224) 0.989 0.909 0.773 0.947 0.920 0.754 0.958 0.763 0.956 0.864 0.710 0.493
ResNet-50 (28) 0.990 0.911 0.769 0.947 0.913 0.735 0.952 0.762 0.948 0.854 0.726 0.528
ResNet-50 (224) 0.989 0.892 0.773 0.948 0.912 0.731 0.958 0.776 0.962 0.884 0.716 0.511
auto-sklearn 0.934 0.716 0.649 0.779 0.902 0.719 0.887 0.601 0.942 0.855 0.690 0.515
AutoKeras 0.959 0.834 0.742 0.937 0.915 0.749 0.955 0.763 0.947 0.878 0.719 0.503
Google AutoML Vision 0.944 0.728 0.778 0.948 0.914 0.768 0.963 0.771 0.991 0.946 0.750 0.531
Benchmarking Performance on MedMNIST2D.
Methods BreastMNIST BloodMNIST TissueMNIST OrganAMNIST OrganCMNIST OrganSMNIST
AUC ACC AUC ACC AUC ACC AUC ACC AUC ACC AUC ACC
ResNet-18 (28) 0.901 0.863 0.998 0.958 0.930 0.676 0.997 0.935 0.992 0.900 0.972 0.782
ResNet-18 (224) 0.891 0.833 0.998 0.963 0.933 0.681 0.998 0.951 0.994 0.920 0.974 0.778
ResNet-50 (28) 0.857 0.812 0.997 0.956 0.931 0.680 0.997 0.935 0.992 0.905 0.972 0.770
ResNet-50 (224) 0.866 0.842 0.997 0.950 0.932 0.680 0.998 0.947 0.993 0.911 0.975 0.785
auto-sklearn 0.836 0.803 0.984 0.878 0.828 0.532 0.963 0.762 0.976 0.829 0.945 0.672
AutoKeras 0.871 0.831 0.998 0.961 0.941 0.703 0.994 0.905 0.990 0.879 0.974 0.813
Google AutoML Vision 0.919 0.861 0.998 0.966 0.924 0.673 0.990 0.886 0.988 0.877 0.964 0.749
Benchmarking Performance on MedMNIST3D.
Methods OrganMNIST3D NoduleMNIST3D FractureMNIST3D AdrenalMNIST3D VesselMNIST3D SynapseMNIST3D
AUC ACC AUC ACC AUC ACC AUC ACC AUC ACC AUC ACC
ResNet-18 + 2.5D 0.977 0.788 0.885 0.903 0.587 0.451 0.718 0.772 0.748 0.846 0.634 0.696
ResNet-18 + 3D 0.996 0.907 0.915 0.908 0.712 0.508 0.827 0.721 0.874 0.877 0.820 0.745
ResNet-18 + ACS 0.994 0.900 0.888 0.910 0.714 0.497 0.839 0.754 0.930 0.928 0.705 0.722
ResNet-50 + 2.5D 0.974 0.769 0.861 0.911 0.552 0.397 0.732 0.763 0.751 0.877 0.669 0.735
ResNet-50 + 3D 0.994 0.883 0.902 0.910 0.725 0.494 0.828 0.745 0.907 0.918 0.851 0.795
ResNet-50 + ACS 0.994 0.889 0.924 0.906 0.750 0.517 0.828 0.758 0.912 0.858 0.719 0.709
auto-sklearn 0.977 0.814 0.872 0.926 0.628 0.453 0.828 0.802 0.910 0.915 0.631 0.730
AutoKeras 0.979 0.804 0.847 0.902 0.642 0.458 0.804 0.705 0.773 0.894 0.538 0.724

Citation and Licenses

If you find this project useful, please cite both v1 and v2 paper as:

  Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, Bingbing Ni. "MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification". arXiv preprint arXiv:2008.#TODO, 2021.

  Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis". IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021.

or using bibtex:

@article{medmnistv2,
    title={MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification},
    author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing},
    journal={arXiv preprint arXiv:2008.#TODO},
    year={2021}
}

@inproceedings{medmnistv1,
    title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis},
    author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing},
    booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)},
    pages={191--195},
    year={2021}
}

Each subset keeps the same license as that of the source dataset. Please also cite the corresponding paper of source data if you use any subset of MedMNIST.


Copyright © {{site.year}} {{site.copyright}}

This website is hosted on GitHub. Updated on {{site.lastUpdated}}.