Materials Project Formation Energy Distribution¶

MP has a curious bimodality in its formation energies. Considering the formation energies are the result of a carefully fitted correction scheme (see pymatgen.entries.compatibility.MaterialsProject2020Compatibility) that takes into account only the composition of a compound, let's look at which elements dominate the upper and lower modes.

If running this notebook on Google Colab, install the Python 3.7 compatible release:
!pip install pymatviz==0.5.1

import pandas as pd
import plotly.express as px
import plotly.io as pio
from pymatgen.ext.matproj import MPRester

from pymatviz import count_elements, ptable_heatmap_plotly


__author__ = "Janosh Riebesell"
__date__ = "2022-08-11"

pio.templates.default = "plotly_white"
# Interactive plotly figures don't show up on GitHub.
# https://github.com/plotly/plotly.py/issues/931
# change renderer from "svg" to "notebook" to get hover tooltips back
# (but blank plots on GitHub!)
pio.renderers.default = "png"

Check if all of MP also has a bi-modal formation energy distribution. Short answer: yes it does.

PMG_MAPI_KEY = "your Materials Project API key"

e_form_all_mp = MPRester(PMG_MAPI_KEY).query(
    {}, ["material_id", "formula", "formation_energy_per_atom"]
)

df_e_form_all_mp = pd.DataFrame(e_form_all_mp).set_index("material_id")

# cache MP data
# %store df_e_form_all_mp

# load cached MP data
%store -r df_e_form_all_mp

labels = {"formation_energy_per_atom": "Formation energy [eV/atom]"}
fig = px.histogram(
    df_e_form_all_mp,
    x="formation_energy_per_atom",
    nbins=200,
    range_x=(-5, 3),
    labels=labels,
)

e_form_valley = -1.35
fig.add_vline(e_form_valley, line=dict(color="orange", dash="dash"))
fig.add_annotation(
    text=f"{e_form_valley} eV/atom",
    x=e_form_valley - 1,
    y=0.05,
    yref="paper",
    font=dict(size=14, color="orange"),
    showarrow=False,
)
fig.update_layout(title=dict(text=f"All {len(df_e_form_all_mp):,} MP entries", x=0.5))

No description has been provided for this image

First, let's plot elemental prevalence as a heatmap over the periodic table for things above and below e_form_valley.

for comparator, label in ((">", "higher"), ("<=", "lower")):
    fig = ptable_heatmap_plotly(
        df_e_form_all_mp.query(
            f"formation_energy_per_atom {comparator} {e_form_valley}"
        ).formula
    )
    title = f"{label.title()} mode {comparator} {e_form_valley} eV/atom"
    fig.update_layout(title=dict(text=title, x=0.4, y=0.94))
    fig.show()

Looks like the lower mode is mostly oxides, whereas the higher mode is more diverse also containing many nitrides, sulfides and selenides.

Another way to visualize this are bar charts.

for comparator, label in ((">", "higher"), ("<=", "lower")):
    elem_counts = count_elements(
        df_e_form_all_mp.query(
            f"formation_energy_per_atom {comparator} {e_form_valley}"
        ).formula
    )
    elem_counts = count_elements(
        df_e_form_all_mp.query(
            f"formation_energy_per_atom {comparator} {e_form_valley}"
        ).formula
    )
    fig = px.bar(elem_counts.nlargest(20))
    title = f"{label.title()} mode {comparator} {e_form_valley} eV/atom"
    fig.update_layout(title=dict(text=title, x=0.5))
    fig.show()

Note that this significant lowering of oxide formation energies compared to other anions might at least partially be an artifact of too little experimental data outside oxide systems. In other words, perhaps there should be stronger corrections applied to nitrides, selenides, etc. as well but because there's insufficient experimental data to fit a robust correction scheme there, MP doesn't.