« home

Matbench Perovskite Dataset

Exploratory Data Analysis (EDA). MPContribs link

# matminer needed for loading data
!pip install pymatviz matminer
Requirement already satisfied: pymatviz in /Users/janosh/.venv/py313/lib/python3.13/site-packages (0.17.2)
Requirement already satisfied: matminer in /Users/janosh/.venv/py313/lib/python3.13/site-packages (0.9.3)
Requirement already satisfied: kaleido>=1.0.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (1.0.0)
Requirement already satisfied: numpy>=2 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (2.3.2)
Requirement already satisfied: anywidget>=0.9.18 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (0.9.18)
Requirement already satisfied: moyopy>=0.4.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from moyopy[interface]>=0.4.1->pymatviz) (0.4.4)
Requirement already satisfied: nbformat>=5.10 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (5.10.4)
Requirement already satisfied: pandas>=2.2 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pandas[output-formatting,xml]>=2.2->pymatviz) (2.3.1)
Requirement already satisfied: plotly>=6 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (6.3.0)
Requirement already satisfied: pymatgen>=2025.2.18 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (2025.6.14)
Requirement already satisfied: pyyaml>=6 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (6.0.2)
Requirement already satisfied: scikit-learn>=1.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (1.7.1)
Requirement already satisfied: scipy>=1.14 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatviz) (1.16.1)
Requirement already satisfied: requests~=2.31 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matminer) (2.32.4)
Requirement already satisfied: tqdm~=4.66 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matminer) (4.67.1)
Requirement already satisfied: pymongo~=4.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matminer) (4.10.1)
Requirement already satisfied: sympy~=1.11 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matminer) (1.14.0)
Requirement already satisfied: monty>=2023 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matminer) (2025.3.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pandas>=2.2->pandas[output-formatting,xml]>=2.2->pymatviz) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pandas>=2.2->pandas[output-formatting,xml]>=2.2->pymatviz) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pandas>=2.2->pandas[output-formatting,xml]>=2.2->pymatviz) (2025.2)
Requirement already satisfied: dnspython<3.0.0,>=1.16.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymongo~=4.5->matminer) (2.7.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from requests~=2.31->matminer) (3.4.3)
Requirement already satisfied: idna<4,>=2.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from requests~=2.31->matminer) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from requests~=2.31->matminer) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from requests~=2.31->matminer) (2025.8.3)
Requirement already satisfied: joblib>=1.2.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from scikit-learn>=1.5->pymatviz) (1.5.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from scikit-learn>=1.5->pymatviz) (3.6.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from sympy~=1.11->matminer) (1.3.0)
Requirement already satisfied: ipywidgets>=7.6.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from anywidget>=0.9.18->pymatviz) (8.1.7)
Requirement already satisfied: psygnal>=0.8.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from anywidget>=0.9.18->pymatviz) (0.14.0)
Requirement already satisfied: typing-extensions>=4.2.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from anywidget>=0.9.18->pymatviz) (4.14.1)
Requirement already satisfied: comm>=0.1.3 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.2.3)
Requirement already satisfied: ipython>=6.1.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (9.4.0)
Requirement already satisfied: traitlets>=4.3.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (5.14.3)
Requirement already satisfied: widgetsnbextension~=4.0.14 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (4.0.14)
Requirement already satisfied: jupyterlab_widgets~=3.0.15 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (3.0.15)
Requirement already satisfied: decorator in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (5.2.1)
Requirement already satisfied: ipython-pygments-lexers in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (1.1.1)
Requirement already satisfied: jedi>=0.16 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.19.2)
Requirement already satisfied: matplotlib-inline in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.1.7)
Requirement already satisfied: pexpect>4.3 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (4.9.0)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (3.0.51)
Requirement already satisfied: pygments>=2.4.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (2.19.2)
Requirement already satisfied: stack_data in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.6.3)
Requirement already satisfied: wcwidth in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.2.13)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.8.4)
Requirement already satisfied: choreographer>=1.0.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from kaleido>=1.0.0->pymatviz) (1.0.9)
Requirement already satisfied: logistro>=1.0.8 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from kaleido>=1.0.0->pymatviz) (1.1.0)
Requirement already satisfied: orjson>=3.10.15 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from kaleido>=1.0.0->pymatviz) (3.11.2)
Requirement already satisfied: packaging in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from kaleido>=1.0.0->pymatviz) (25.0)
Requirement already satisfied: simplejson>=3.19.3 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from choreographer>=1.0.5->kaleido>=1.0.0->pymatviz) (3.20.1)
Requirement already satisfied: ruamel.yaml in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from monty>=2023->matminer) (0.18.14)
Requirement already satisfied: ase>=3.23 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from moyopy[interface]>=0.4.1->pymatviz) (3.25.0)
Requirement already satisfied: matplotlib>=3.3.4 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ase>=3.23->moyopy[interface]>=0.4.1->pymatviz) (3.10.5)
Requirement already satisfied: contourpy>=1.0.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matplotlib>=3.3.4->ase>=3.23->moyopy[interface]>=0.4.1->pymatviz) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matplotlib>=3.3.4->ase>=3.23->moyopy[interface]>=0.4.1->pymatviz) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matplotlib>=3.3.4->ase>=3.23->moyopy[interface]>=0.4.1->pymatviz) (4.59.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matplotlib>=3.3.4->ase>=3.23->moyopy[interface]>=0.4.1->pymatviz) (1.4.9)
Requirement already satisfied: pillow>=8 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matplotlib>=3.3.4->ase>=3.23->moyopy[interface]>=0.4.1->pymatviz) (11.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from matplotlib>=3.3.4->ase>=3.23->moyopy[interface]>=0.4.1->pymatviz) (3.2.3)
Requirement already satisfied: fastjsonschema>=2.15 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from nbformat>=5.10->pymatviz) (2.21.1)
Requirement already satisfied: jsonschema>=2.6 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from nbformat>=5.10->pymatviz) (4.25.0)
Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from nbformat>=5.10->pymatviz) (5.8.1)
Requirement already satisfied: attrs>=22.2.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from jsonschema>=2.6->nbformat>=5.10->pymatviz) (25.3.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from jsonschema>=2.6->nbformat>=5.10->pymatviz) (2025.4.1)
Requirement already satisfied: referencing>=0.28.4 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from jsonschema>=2.6->nbformat>=5.10->pymatviz) (0.36.2)
Requirement already satisfied: rpds-py>=0.7.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from jsonschema>=2.6->nbformat>=5.10->pymatviz) (0.26.0)
Requirement already satisfied: platformdirs>=2.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from jupyter-core!=5.0.*,>=4.12->nbformat>=5.10->pymatviz) (4.3.8)
Requirement already satisfied: lxml>=4.9.2 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pandas[output-formatting,xml]>=2.2->pymatviz) (6.0.0)
Requirement already satisfied: jinja2>=3.1.2 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pandas[output-formatting,xml]>=2.2->pymatviz) (3.1.6)
Requirement already satisfied: tabulate>=0.9.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pandas[output-formatting,xml]>=2.2->pymatviz) (0.9.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from jinja2>=3.1.2->pandas[output-formatting,xml]>=2.2->pymatviz) (3.0.2)
Requirement already satisfied: ptyprocess>=0.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.7.0)
Requirement already satisfied: narwhals>=1.15.1 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from plotly>=6->pymatviz) (2.1.1)
Requirement already satisfied: bibtexparser>=1.4.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatgen>=2025.2.18->pymatviz) (1.4.3)
Requirement already satisfied: networkx>=2.7 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatgen>=2025.2.18->pymatviz) (3.5)
Requirement already satisfied: palettable>=3.3.3 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatgen>=2025.2.18->pymatviz) (3.3.3)
Requirement already satisfied: spglib>=2.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatgen>=2025.2.18->pymatviz) (2.6.0)
Requirement already satisfied: uncertainties>=3.1.4 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from pymatgen>=2025.2.18->pymatviz) (3.2.3)
Requirement already satisfied: six>=1.5 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas>=2.2->pandas[output-formatting,xml]>=2.2->pymatviz) (1.17.0)
Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from ruamel.yaml->monty>=2023->matminer) (0.2.12)
Requirement already satisfied: executing>=1.2.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from stack_data->ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (2.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from stack_data->ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (3.0.0)
Requirement already satisfied: pure-eval in /Users/janosh/.venv/py313/lib/python3.13/site-packages (from stack_data->ipython>=6.1.0->ipywidgets>=7.6.0->anywidget>=0.9.18->pymatviz) (0.2.3)
import pandas as pd
import plotly.express as px
import plotly.io as pio
from matbench_discovery.structure.prototype import get_protostructure_label
from matminer.datasets import load_dataset
from tqdm import tqdm

import pymatviz as pmv
from pymatviz.enums import Key


pmv.set_plotly_template("pymatviz_white")


__author__ = "Janosh Riebesell"
__date__ = "2022-03-19"


# make plotly figures render both locally and on GitHub.
# https://github.com/plotly/plotly.py/issues/931#issuecomment-2098209279
pio.renderers.default = "png"
df_perov = load_dataset("matbench_perovskites")

moyo_spg_num_key = "moyopy_spg_num"
df_perov[moyo_spg_num_key] = [
    struct.get_symmetry_dataset(backend="moyopy", return_raw_dataset=True).number
    for struct in tqdm(df_perov[Key.structure])
]
df_perov[Key.volume] = df_perov[Key.structure].map(lambda struct: struct.volume)

df_perov[Key.formula] = df_perov[Key.structure].map(lambda cryst: cryst.formula)

df_perov[Key.crystal_system] = df_perov[moyo_spg_num_key].map(
    pmv.utils.spg_to_crystal_sys
)
100%|██████████| 18928/18928 [00:03<00:00, 6209.99it/s]
fig = pmv.structure_3d(df_perov[Key.structure].iloc[:12])
fig.layout.paper_bgcolor = "rgba(255, 255, 255, 0.5)"
fig.show()
No description has been provided for this image
labels = {"e_form": "Formation Energy (eV/atom)"}

fig = px.histogram(df_perov, x="e_form", nbins=300, labels=labels)

title = "Matbench Perovskites Formation Energy Distribution"
fig.layout.title.update(text=title, x=0.5)
fig.layout.margin.update(b=10, l=10, r=10, t=40)

fig.add_vline(x=0, fillcolor="black", line=dict(width=2, dash="dot"))
No description has been provided for this image
fig = pmv.ptable_heatmap_plotly(
    df_perov[Key.formula], exclude_elements=["O"], heat_mode="percent"
)
title = "<b>Elements in Matbench Perovskites dataset</b>"
fig.layout.title.update(text=title, x=0.36, y=0.9)
fig.show()
No description has been provided for this image
fig = px.bar(df_perov[Key.crystal_system].value_counts())
fig.layout.title.update(text="Crystal systems in Matbench Perovskites", x=0.5)
fig.layout.update(showlegend=False, margin_t=50)
fig.show()
No description has been provided for this image
fig = px.scatter(df_perov, x="volume", y="e_form", color=moyo_spg_num_key)
fig.layout.title = dict(text="Matbench Perovskites Formation Energy vs. Volume", x=0.5)
fig.layout.coloraxis.colorbar.update(
    orientation="h", y=0, x=1, xanchor="right", thickness=10, len=0.6
)
fig.layout.margin.update(b=10, l=10, r=10, t=40)
fig.show()
No description has been provided for this image
fig = pmv.spacegroup_sunburst(df_perov[moyo_spg_num_key], show_counts="percent")
fig.layout.title.update(text="Matbench Perovskites spacegroup sunburst", x=0.5)
fig.layout.margin.update(b=0, l=0, r=0, t=40)
fig.show()
No description has been provided for this image
df_perov[Key.protostructure_moyo] = df_perov[Key.structure].map(
    get_protostructure_label
)
df_perov[Key.protostructure_moyo].value_counts()
# originally generated with aviary calling out to Aflow CLI, takes ~6h when running
# uninterrupted. see https://github.com/CompRhys/aviary/blob/14b2ab204ec/aviary/wren/utils.py#L158
aflow_protostructure_key = "aflow_wyckoff"
df_perov[f"{Key.protostructure}_aflow"] = pd.read_csv(
    # 2022-05-17-matbench_perovskites_aflow_labels.csv
    "https://docs.google.com/spreadsheets/d/"
    "1Mhk5t3Ac_aHOTWMjZ1DL4LtUBIB21nWt7oy2t3M-fQU/export?format=csv"
)[aflow_protostructure_key]
# uncomment line to cache expensive aflow results
# %store df_perov

# uncomment line to reload cached dataframe
%store -r df_perov
no stored variable or alias df_perov

extract spacegroups and crystal systems from Aflow Wyckoff labels

aflow_spg_num_key = "aflow_spg_num"
df_perov[aflow_spg_num_key] = (
    df_perov[aflow_protostructure_key].str.split("_").str[2].astype(int)
)
df_perov["aflow_crys_sys"] = df_perov[aflow_spg_num_key].map(
    pmv.utils.spg_to_crystal_sys
)

Surprisingly large disparity between Spglib and Aflow spacegroups

Spglib is fast while Aflow uses a slower adaptive but presumably more correct algorithm.

fig = pmv.sankey_from_2_df_cols(df_perov, [moyo_spg_num_key, aflow_spg_num_key])
title = "Spglib vs Aflow Spacegroups<br>for the Matbench Perovskites dataset"
fig.layout.title = dict(text=title, x=0.5)
fig.show()

# pmv.io.save_and_compress_svg(fig, "sankey-spglib-vs-aflow-spacegroups")
sankey-spglib-vs-aflow-spacegroups.svg:
Done in 79 ms!
10.74 KiB - 37.1% = 6.752 KiB
No description has been provided for this image
fig = pmv.sankey_from_2_df_cols(df_perov, ["spglib_crys_sys", "aflow_crys_sys"])
title = "Spglib vs Aflow Crystal systems<br>for the Matbench Perovskites dataset"
fig.layout.title = dict(text=title, x=0.5)
fig.show()
No description has been provided for this image