Leaderboard

Matbench is an automated leaderboard for benchmarking state of the art ML algorithms predicting a diverse range of solid materials' properties. It is hosted and maintained by the Materials Project.

crystal

	Materials Properties	Materials Discovery
Task Submissions	`180`	`9`
Algorithms	`28`	`9`
Benchmark Task Suite	`1`	`1`

Leaderboard-Property: General Purpose Algorithms on `matbench_v0.1`

Find more information about this benchmark on the benchmark info page

Task name	Samples	Algorithm	Verified MAE (unit) or ROCAUC	Notes
matbench_steels	312	MODNet (v0.1.12)	87.7627 (MPa)
matbench_jdft2d	636	MODNet (v0.1.12)	33.1918 (meV/atom)
matbench_phonons	1,265	MegNet (kgcnn v2.1.0)	28.7606 (cm^-1)	structure required
matbench_expt_gap	4,604	MODNet (v0.1.12)	0.3327 (eV)
matbench_dielectric	4,764	MODNet (v0.1.12)	0.2711 (unitless)
matbench_expt_is_metal	4,921	AMMExpress v2020	0.9209
matbench_glass	5,680	MODNet (v0.1.12)	0.9603
matbench_log_gvrh	10,987	coNGN	0.0670 (log10(GPa))	structure required
matbench_log_kvrh	10,987	coNGN	0.0491 (log10(GPa))	structure required
matbench_perovskites	18,928	coGN	0.0269 (eV/unit cell)	structure required
matbench_mp_gap	106,113	coGN	0.1559 (eV)	structure required
matbench_mp_is_metal	106,113	CGCNN v2019	0.9520	structure required
matbench_mp_e_form	132,752	coGN	0.0170 (eV/atom)	structure required

Scaled errors for regressions on this leaderboard plot are assessed as the ratio of mean absolute error to mean absolute deviation:

$$ \text{Scaled Error} = \frac{\text{MAE}}{\text{MAD}} = \frac{\sum_i^N | y_i - y_i^{pred} |}{\sum_i^N | y_i - \bar{y} | } $$

While, scaled errors for classifications are assessed as:

$$ \text{Scaled Error} = \frac{1 - \text{ROCAUC}}{0.5} $$

Leaderboard-Discovery: General Purpose Algorithms on `matbench_discovery 0.1.0`

Matbench Discovery is an interactive leaderboard and associated PyPI package which together make it easy to benchmark ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals. Matbench-discovery compares ML structure-relaxation methods on the WBM dataset for ranking ~250k generated structures according to predicted hull stability (42k stable). Matbench Discovery is developed by Janosh Riebesell.

model	F1	DAF	Precision	TPR	TNR	Accuracy	MAE	RMSE	R²
CHGNet	0.59	3.06	0.52	0.67	0.87	0.84	0.07	0.11	0.61
M3GNet	0.58	2.66	0.45	0.79	0.80	0.80	0.07	0.12	0.59
MEGNet	0.52	2.70	0.46	0.59	0.86	0.81	0.13	0.20	-0.27
CGCNN	0.52	2.62	0.45	0.60	0.85	0.81	0.14	0.23	-0.61
CGCNN+P	0.51	2.38	0.41	0.69	0.79	0.78	0.11	0.18	0.02
Wrenformer	0.48	2.13	0.36	0.71	0.74	0.74	0.10	0.18	-0.04
BOWSR + MEGNet	0.44	1.90	0.32	0.74	0.67	0.68	0.11	0.16	0.15
Voronoi RF	0.34	1.51	0.26	0.52	0.69	0.66	0.14	0.21	-0.32
dummy	0.19	1.01	0.17	0.23	0.77	0.68	0.12	0.18	0.00

Overview

Matbench is an ImageNet for materials science; a curated set of 13 supervised, pre-cleaned, ready-to-use ML tasks for benchmarking and fair comparison. The tasks span a wide domain of inorganic materials science applications including electronic, thermodynamic, mechanical, and thermal properties among crystals, 2D materials, disordered metals, and more.

The Matbench python package provides everything needed to use Matbench with your ML algorithm in ~10 lines of code or less. The web pages and repository online contain full result files, citations, methodologies, and code for the algorithms shown.

infographic

What can Matbench offer?

This website

Leaderboard of results for state-of-the-art materials ML algorithms on standardized test problems
Interactively explore and download the tasks on MPContribs-ML, a platform hosted by The Materials Project. See Benchmark Info for links to each dataset.
Each and every result is backed by a peer-reviewed publication and/or a jupyter notebook (similar to Papers With Code) - i.e., how were these results were obtained?
Glossary of all algorithms' results on the Matbench problems

The Matbench Python package

Probe ML algorithms strengths and weaknesses across a wide range of materials property prediction tasks
Run a full benchmark in ~10 lines of code
Submit results as a PR to the Matbench repo to compare with other algorithms and appear on the leaderboard
Benchmark both general purpose ML models as well as algorithms specialized for particular domains

Summary of Matbench's Tasks

Matbench's 13 tasks can be broken down into various categories; it includes both the small - less than 10,000 samples - datasets that characterize experimental materials data as well as larger datasets from computer modelling methods like density functional theory (DFT).

breakdown

Each task in Matbench consists of a three things:

A set of inputs: crystal structures or chemical compositions.
A set of outputs: target properties, such as formation energy.
A test procedure: a way to get a score for your algorithm

The Matbench Python package provides functions for getting the first two (packaged together for each task as a dataset) as well as running the test procedure. See the How to use documentation page to get started.

Citing Matbench

You can find details and results on the benchmark in our paper Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. Please consider citing this paper if you use Matbench v0.1 for benchmarking, comparison, or prototyping.

You can cite Matbench using this reference:

Dunn, A., Wang, Q., Ganose, A., Dopp, D., Jain, A. 
Benchmarking Materials Property Prediction Methods: 
The Matbench Test Set and Automatminer Reference Algorithm. 
npj Computational Materials 6, 138 (2020). 
https://doi.org/10.1038/s41524-020-00406-3