New InterCat paper - Luuk Kempen, Raffaele Cheula and Mie Andersen

27. maj 2026 af Karin Vittrup

How accurate are foundational machine learning interatomic potentials for heterogeneous catalysis?

This is the question we answer in our new paper, now published in JCP. The short answer: it depends. (Read the paper for the long answer!)

Common benchmarks for foundational MLIPs test these models on ordered, crystalline bulk materials, which are far from the messy reality of catalytic surfaces. So we investigated zero-shot performance of 80 different foundational MLIPs on catalysis-relevant tasks—adsorption energies, reaction barriers, vibrational properties—across metals, oxides, and metal–oxide interfaces, totaling over 770,000 energy evaluations.

Some highlights:
Current-generation MLIPs are more than accurate enough for initial screening;
Many MLIPs fail catastrophically on magnetic materials (Co, Ni), which is linked to their training data sets;
eSEN, Orb, and UMA are among the best-performing models, but no single model universally performs best, so model selection really matters!

See paper