Over-training large language models may make them harder to fine-tune

Language models with extensive pre-training can exhibit catastrophic overtraining, where the performance of post-trained models degrades as the pre-training stage is extended. Credit: arXiv (2025). DOI: 10.48550/arxiv.2503.19206

A small team of AI researchers from Carnegie Mellon University, Stanford University, Harvard University and Princeton University, all in the U.S., has found that if large language models are over-trained, it might make them harder to fine-tune. In their paper posted on the arXiv preprint server, the group compared the impact of different amounts of training on a single LLM.

Over the past couple of years, as AI researchers seek to enhance their products to make them more “intelligent,” many have been driven by the mantra that the more training a model is given, the better the model will be in the end. In this new study, the research team has found some evidence suggesting that there may be a point of diminishing returns with language model training.

The researchers came to this conclusion as they were testing the return when training two different versions of the LLM OLMo-1B. Under one scenario, they trained it using 2.3 trillion tokens, while in the other they used 3 trillion tokens. They then compared the scenarios by testing them with several benchmarks, such as ARC and AlpacaEval. In so doing, they found that the model trained with more tokens actually did worse when tested—up to 3% worse.

Surprised by their findings, they ran more tests and found similar results, suggesting that there is some point at which more training starts to make models less “intelligent.” The research team calls it “catastrophic overtraining,” and suggests it is due to what they describe as “progressive sensitivity.”

They further suggest that as the number of tokens rises, the more fragile a model becomes, which means that fine-tuning, which can be viewed as adding noise, starts to reverse the gains in improvement that were seen prior to the stress point.

Over-training language models may make them harder to tune — Schematic to illustrate how the scaling of the optimal learning rate can affect model evaluations as a function of the pre-training tokens T. Credit: *arXiv* (2025). DOI: 10.48550/arxiv.2503.19206

To test their theory, they added Gaussian noise to some of the models, and found that doing so led to the same type of performance degradation they had witnessed earlier. They have named the point of no return, the “inflection point.” After that point, they suggest, any further training will reduce the stability of the model, making it more difficult to tune in ways that are useful for a desired set of applications.

The researchers conclude by suggesting that moving forward, developers of LLM models may have to make estimations regarding how much training is enough—or, find other types of methods that will allow for additional training with a more distant inflection point.

More information:
Jacob Mitchell Springer et al, Overtrained Language Models Are Harder to Fine-Tune, arXiv (2025). DOI: 10.48550/arxiv.2503.19206

Journal information:
arXiv

Citation:
Over-training large language models may make them harder to fine-tune (2025, April 14)
retrieved 14 April 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

Andrew Tate fined for driving 90mph over limit in Romania

IDF, Shin Bet foil imminent terror attack in northern West Bank

Muhammed Hafeez: How a polo-loving businessman was a secret global drug lord

Iran claims it seized thousands of Israeli intel. materials, including nuclear docs.

Riot police and protesters clash after LA immigration raids

Israeli forces recover body of murdered Thai hostage in special Gaza operation

Body of Thai hostage recovered from Gaza, Israel says

Will Musk’s explosive row with Trump help or harm his businesses?

South Sudan – the African country producing fashion’s favourite models

An electric scooter is blamed for a violent fire that killed 4 in a French city

IGET.NEWS

Oh hi there
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every week.

More From Author

Andrew Tate fined for driving 90mph over limit in Romania

Sebastien Ogier survives brutal Sardinia Saturday with slender lead

Capitals coach Carbery earns Jack Adams Award

Iraq sandstorm leaves 1,800 with breathing problems

Trump Administration Memo Proposes Cutting State Department Funding by Nearly Half

Leave a Reply Cancel reply

Recent News

About Us - Terms of use - Privace Policy

Oh hi there It’s nice to meet you.

Sign up to receive awesome content in your inbox, every week.

Leave a Reply Cancel reply

Recent News

Top of the day

About Us - Terms of use - Privace Policy

Oh hi there
It’s nice to meet you.