AI-based search engine can help researchers find new chemical reactions in data archives

AI-based search engine can help researchers find new chemical reactions in data archives

Description of the search engine pipeline. First, the engine takes as input molecular formulas and charges of searched ions. They can be derived from the reaction system using hypothesis generation method (through fragment-based or large language model, LLM, guided approach) or defined manually (A). Then, it searches all spectra files that contain the two most abundant isotopologues’ peaks of each input ion (B). The peak is represented by its mass-to-charge ratio—m/z. These spectra files are called the candidates. Cosine distance threshold is calculated for them (C1). Then, an algorithm that searches for the isotopic distribution by input formula within a single spectrum is performed for all candidate mass spectra (C2). Additional machine learning (ML) models attempt to decrease the number of false positive search answers (C3). Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-56905-8

In a joint project between the Zelinsky Institute of Organic Chemistry and Skoltech, a research group led by RAS Academician Valentin Ananikov has developed a unique machine-learning-based search engine for analyzing vast amounts of high-resolution mass spectrometry data. Machine learning allows exploring terabytes of accumulated data without new experiments. The algorithm accelerates the search for new compounds, reduces costs, and makes research more environmentally friendly.

The study is published in Nature Communications.

In a typical laboratory, terabytes of data accumulate over several years, for example, during experimental measurements of high-resolution mass spectrometry. But due to the limitations of manual analysis, scientists consider only a small part of the information. Up to 95% of the accumulated data remains unexplored, which leads to the loss of potentially important discoveries. It would take hundreds of years to manually process such a large amount of information, but new AI-based algorithms can conduct the analysis in just a few days.

“Our work is based on an innovative algorithm combining machine learning and analysis of signal distribution in mass spectra, which has significantly reduced false positives when identifying chemical compounds. The new search algorithm has successfully verified historical data on the Mizoroki-Heck reaction and revealed not only already known, but also completely new chemical transformations, including a unique process of cross-combination that has not been previously documented in the scientific literature,” commented Valentin Ananikov, the scientific supervisor of the study.

During organic synthesis, chemists select specific experimental conditions to optimize the reaction and achieve maximum results. After the reaction and sample preparation, the chemical composition is determined and characterized by an analytical system. High-resolution mass spectrometry is often used to implement this strategy due to its high speed of analysis, sensitivity, and easy data accumulation. The method is widely used in analytical chemistry, organic and inorganic chemistry, proteomics, metabolomics, materials science, as well as in many other fields.

The new solution opens up new possibilities in chemical research. The search engine is capable of analyzing data from different fields of chemistry, leading to the discovery of new reactions, catalysts, and mechanisms. The use of existing data not only accelerates scientific progress, but also reduces experiment costs, making science more environmentally friendly.

More information:
Konstantin S. Kozlov et al, Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data, Nature Communications (2025). DOI: 10.1038/s41467-025-56905-8

Provided by
Skolkovo Institute of Science and Technology


Citation:
AI-based search engine can help researchers find new chemical reactions in data archives (2025, March 20)
retrieved 20 March 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.




Source link

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every week.

We don’t spam! Read our privacy policy for more info.

More From Author

Trump says the Fed should cut rates to ease the economy’s transition to his tariffs

Trump says the Fed should cut rates to ease the economy’s transition to his tariffs

Taliban Frees an American, George Glezmann, Held in Afghanistan Since 2022

Taliban Frees an American, George Glezmann, Held in Afghanistan Since 2022

Leave a Reply

Your email address will not be published. Required fields are marked *