by Sandra Niemeyer, Leibniz Informationszentrum Technik und Naturwissenschaften / TIB – Leibniz Information Centre for Science and Technology
Despite significant advances in digital technologies, modern scientific results are still communicated using antiquated methods. In nearly 400 years, scientific literature has progressed from physically printed articles to PDFs, but these electronic documents are still text-based and therefore not machine-readable. This means your computer cannot interpret the information they contain without human assistance.
With millions of scientific articles published annually, the need for machine-assisted information retrieval and processing is rapidly growing. Most efforts to address this need have attempted to train machines to interpret text-based information using artificial intelligence (AI) approaches, usually with limited success.
Recently, a research team from the TIB—Leibniz Information Center for Science and Technology proposed tackling the problem with a different mindset. Rather than trying to teach machines our language, why not produce science in a language they already understand?
In an article published in Scientific Data, the team introduces reborn articles, an open-source approach that allows researchers to produce scientific findings in a machine-readable format.
Dr. Markus Stocker, first author and head of the Lab Knowledge Infrastructures at the TIB, explained, “Many scientists already use data analysis tools that produce results machines can read. But the standard way of publishing these results is to organize them in a PDF document that is not readable by machines. This means that if anyone wants to reuse these results, which is the entire point of publishing them, they first have to extract and restructure them.
“Wouldn’t it be more efficient if we could publish results in a way that preserves their original structure? That’s what reborn articles enables.”
How reborn articles work
The reborn articles approach works with common data analysis tools like R and Python, and allows researchers to produce results that can be easily read by both humans and machines. This means other researchers can reproduce the analyses themselves and even download reborn article data as Excel or CSV files, which are also machine readable.
This may seem trivial, but the main alternatives for reusing published data are to either copy and paste individual values from PDF articles by hand, which is time-consuming and error-prone, or use AI-based tools, which are inaccurate.
Overcoming the current fixation on AI-based information extraction has been a challenge when explaining how the approach works. As co-author and TIB postdoctoral researcher Dr. Lauren Snyder noted, “AI-based extraction tools are a hot topic. It seems every field of science is looking for ways to use large language models and other extraction-related approaches. While they are powerful tools in certain situations, I wonder if fixating on them is not doing us an overall disservice.
“Imagine renovating your home and trying to tackle every job with drilling tools. That just doesn’t make sense. I worry this fixation on information extraction will lead us to miss opportunities to develop tools that can tackle certain tasks more efficiently. I hope our work inspires others to start thinking beyond mainstream approaches.”
Dr. Stocker added, “People have been pointing out the inefficiencies of how we produce scientific knowledge for at least a quarter century. In that time, AI-based extraction has not solved the problem and if we continue with the mindset that extraction is all we can do, by mid-century we might still be struggling with the same problems.
“If instead we had begun using long-existing technologies to ensure scientific knowledge is produced and published machine readable, today we would have vast databases of organized knowledge. While we may be a little late to the game, any time is a good time to begin with disruptive approaches.”
More information:
Markus Stocker et al, Rethinking the production and publication of machine-readable expressions of research findings, Scientific Data (2025). DOI: 10.1038/s41597-025-04905-0. www.nature.com/articles/s41597-025-04905-0
Provided by
Leibniz Informationszentrum Technik und Naturwissenschaften / TIB – Leibniz Information Centre for Science and Technology
Citation:
‘Reborn articles’: Simple approach enables direct publication of machine-readable scientific findings (2025, April 30)
retrieved 30 April 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.