Questions? +1 (202) 335-3939 Login
Trusted News Since 1995
A service for global professionals · Thursday, May 15, 2025 · 812,964,242 Articles · 3+ Million Readers

Computational Chemistry Unlocked: A Record-Breaking Dataset to Train AI Models has Launched

A leap forward in AI models

Scientists around the world can now begin training their own MLIPs on OMol25. They can also use the FAIR lab’s open-access universal model, also released today. The universal model was trained on OMol25 and FAIR lab’s other open-source datasets – which they have been releasing since 2020 – and is designed to work “out of the box” for many applications. However, the universal model and any other MLIPs trained with the dataset are expected to improve over time, as researchers learn how to best leverage the vast amount of data at their fingertips.

To measure and track model performance, the collaboration has provided evaluations, which are sets of challenges that analyze how well a model can accurately complete useful tasks. The team strove to develop exceptionally thorough evaluations to give fellow researchers more confidence in the capabilities of MLIPs trained on the dataset. “Once you get to chemistry like atomic bonds breaking and reforming and molecules with variable charges and spins, researchers are going to be rightfully skeptical of any ML tool,” said Blau, who also played a large role in this component of the project.

Evaluations also drive innovation through friendly competition, as the results are ranked publicly. Potential users can see which ones run smoothly and developers can see how their model stacks up against others.

“Better benchmarks and evaluations have been essential for progress and advancing many fields of ML,” added OMol25 team member Aditi Krishnapriyan, a faculty scientist in Berkeley Lab’s Applied Mathematics and Computational Research Division, and assistant professor of Chemical and Biomolecular Engineering and Electrical Engineering and Computer Sciences at UC Berkeley. Krishnapriyan assisted in the evaluations and developing a subset of the chemical simulations.

“Trust is especially critical here because scientists need to rely on these models to produce physically sound results that translate to and can be used for scientific research,” said Krishnapriyan.

By the community, for the community

OMol25 was created by scientists to fill an unmet need for their community, and the ethos of collaboration is woven throughout all aspects of the project. To curate the content in OMol25, the team started with past datasets made by others, as these represent molecular configurations and reactions that are important to researchers in different chemistry specialties. Then they performed more sophisticated simulations on these snapshots using their advanced DFT capabilities. Next, they looked to see what major types of chemistry had not been captured previously, and tried to fill the gap.

Three-quarters of the dataset is composed of this new content, divided into three major focus areas: biomolecules, electrolytes, and metal complexes (molecules arranged around a central metal ion). There is still a need for snapshots involving polymers – large molecules made of repeating units called monomers. This will be addressed by the upcoming Open Polymer data, a complementary project that also includes collaborators from Lawrence Livermore National Laboratory.

The OMol25 team itself was brought together by the branching connections of the STEM community that span academia and industry. Blau and co-leader Brandon Wood, a research scientist in FAIR, met while working in the lab of Kristin Persson, a Berkeley Lab and UC Berkeley researcher who leads the Materials Project. Wood, Blau, and Larry Zitnick, the FAIR chemistry research director, joined forces on the OMol25 project in Fall 2023. Together, they recruited scientists they admired from UC Berkeley, Carnegie Mellon, New York University, Princeton University, Stanford University, the University of Cambridge, Los Alamos National Laboratory, and Genentech.

“This open dataset is the result of a fantastic team effort, and we can’t wait to see how the community leverages it to explore new directions in AI modeling,” said Wood. 

“It was really exciting to come together to push forward the capabilities available to humanity,” added Blau. 

Blau’s work on OMol25 was funded by Berkeley Lab’s Laboratory Directed Research and Development (LDRD) program. His contributions to the electrolyte modeling portion of the dataset were funded by the Energy Storage Research Alliance, a battery research initiative of the DOE Office of Science. Krishnapriyan’s work was funded by the DOE Office of Science, as part of the Center for Ionomer-based Water Electrolysis.  

# # #

Lawrence Berkeley National Laboratory (Berkeley Lab) is committed to groundbreaking research focused on discovery science and solutions for abundant and reliable energy supplies. The lab’s expertise spans materials, chemistry, physics, biology, earth and environmental science, mathematics, and computing. Researchers from around the world rely on the lab’s world-class scientific facilities for their own pioneering research. Founded in 1931 on the belief that the biggest problems are best addressed by teams, Berkeley Lab and its scientists have been recognized with 16 Nobel Prizes. Berkeley Lab is a multiprogram national laboratory managed by the University of California for the U.S. Department of Energy’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.

Powered by EIN Presswire

Distribution channels: Science

Legal Disclaimer:

EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Submit your press release