Skip to content


Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.


  • 134 total task submissions
  • 19 algorithms
  • 1 benchmark test suites

Scroll down to learn more.

Leaderboard: General Purpose Algorithms on matbench_v0.1

Find more information about this benchmark on the benchmark info page

Task name Samples Algorithm Verified MAE (unit) or ROCAUC Notes
matbench_steels 312 MODNet (v0.1.12) 87.7627 (MPa)
matbench_jdft2d 636 MODNet (v0.1.12) 33.1918 (meV/atom)
matbench_phonons 1,265 MegNet (kgcnn v2.1.0) 28.7606 (cm^-1) structure required
matbench_expt_gap 4,604 MODNet (v0.1.12) 0.3327 (eV)
matbench_dielectric 4,764 MODNet (v0.1.12) 0.2711 (unitless)
matbench_expt_is_metal 4,921 AMMExpress v2020 0.9209
matbench_glass 5,680 MODNet (v0.1.12) 0.9603
matbench_log_gvrh 10,987 ALIGNN 0.0715 (log10(GPa)) structure required
matbench_log_kvrh 10,987 MODNet (v0.1.10) 0.0548 (log10(GPa))
matbench_perovskites 18,928 ALIGNN 0.0288 (eV/unit cell) structure required
matbench_mp_gap 106,113 ALIGNN 0.1861 (eV) structure required
matbench_mp_is_metal 106,113 CGCNN v2019 0.9520 structure required
matbench_mp_e_form 132,752 ALIGNN 0.0215 (eV/atom) structure required

Scaled errors for regressions on this leaderboard plot are assessed as the ratio of mean absolute error to mean absolute deviation:

$$ \text{Scaled Error} = \frac{\text{MAE}}{\text{MAD}} = \frac{\sum_i^N | y_i - y_i^{pred} |}{\sum_i^N | y_i - \bar{y} | } $$

Discovery Leaderboard: General Purpose Algorithms on matbench_discovery 0.1.0

Matbench Discovery is an interactive leaderboard and associated PyPI package which together make it easy to benchmark ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals. Matbench-discovery compares ML structure-relaxation methods on the WBM dataset for ranking ~250k generated structures according to predicted hull stability (42k stable). Matbench Discovery is developed by Janosh Riebesell.

model F1 DAF Precision Recall Accuracy TPR FPR TNR FNR MAE RMSE
Voronoi Random Forest 0.34 -0.32 1.51 0.26 0.52 0.66 0.52 0.31 0.69 0.48 0.14 0.21
BOWSR + MEGNet 0.44 0.15 1.90 0.32 0.74 0.68 0.74 0.33 0.67 0.26 0.11 0.16
Wrenformer 0.48 -0.04 2.13 0.36 0.71 0.74 0.71 0.26 0.74 0.29 0.10 0.18
MEGNet 0.49 -0.35 2.94 0.51 0.48 0.83 0.48 0.10 0.90 0.52 0.13 0.21
CGCNN+P 0.51 0.02 2.38 0.41 0.69 0.78 0.69 0.21 0.79 0.31 0.11 0.18
CGCNN 0.52 -0.61 2.62 0.45 0.60 0.81 0.60 0.15 0.85 0.40 0.14 0.23
M3GNet + MEGNet 0.53 0.46 2.65 0.45 0.64 0.80 0.64 0.16 0.84 0.36 0.09 0.13
M3GNet 0.58 0.59 2.66 0.45 0.79 0.80 0.79 0.20 0.80 0.21 0.07 0.12


Matbench is an ImageNet for materials science; a curated set of 13 supervised, pre-cleaned, ready-to-use ML tasks for benchmarking and fair comparison. The tasks span a wide domain of inorganic materials science applications including electronic, thermodynamic, mechanical, and thermal properties among crystals, 2D materials, disordered metals, and more.

The Matbench python package provides everything needed to use Matbench with your ML algorithm in ~10 lines of code or less. The web pages and repository online contain full result files, citations, methodologies, and code for the algorithms shown.


What can Matbench offer?

This website

  • Leaderboard of results for state-of-the-art materials ML algorithms on standardized test problems
  • Interactively explore and download the tasks on MPContribs-ML, a platform hosted by The Materials Project. See Benchmark Info for links to each dataset.
  • Each and every result is backed by a peer-reviewed publication and/or a jupyter notebook (similar to Papers With Code) - i.e., how were these results were obtained?
  • Glossary of all algorithms' results on the Matbench problems

The Matbench Python package

  • Probe ML algorithms strengths and weaknesses across a wide range of materials property prediction tasks
  • Run a full benchmark in ~10 lines of code
  • Submit results as a PR to the Matbench repo to compare with other algorithms and appear on the leaderboard
  • Benchmark both general purpose ML models as well as algorithms specialized for particular domains

Summary of Matbench's Tasks

Matbench's 13 tasks can be broken down into various categories; it includes both the small - less than 10,000 samples - datasets that characterize experimental materials data as well as larger datasets from computer modelling methods like density functional theory (DFT).


Each task in Matbench consists of a three things:

  1. A set of inputs: crystal structures or chemical compositions.
  2. A set of outputs: target properties, such as formation energy.
  3. A test procedure: a way to get a score for your algorithm

The Matbench Python package provides functions for getting the first two (packaged together for each task as a dataset) as well as running the test procedure. See the How to use documentation page to get started.

Citing Matbench

You can find details and results on the benchmark in our paper Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. Please consider citing this paper if you use Matbench v0.1 for benchmarking, comparison, or prototyping.

You can cite Matbench using this reference:

Dunn, A., Wang, Q., Ganose, A., Dopp, D., Jain, A. 
Benchmarking Materials Property Prediction Methods: 
The Matbench Test Set and Automatminer Reference Algorithm. 
npj Computational Materials 6, 138 (2020).