Reinforcement learning boosts reasoning skills in new diffusion-based language model d1

Log Probability Estimation in diffu-GRPO. Credit: arXiv (2025). DOI: 10.48550/arxiv.2504.12216

A team of AI researchers at the University of California, Los Angeles, working with a colleague from Meta AI, has introduced d1, a diffusion-large-language-model-based framework that has been improved through the use of reinforcement learning. The group posted a paper describing their work and features of the new framework on the arXiv preprint server.

Over the past couple of years, the use of LLMs has skyrocketed, with millions of people the world over using AI apps for a wide variety of applications. This has led to an associated need for large amounts of electricity to power data centers running the computer-intensive applications. Researchers have been looking for other ways to provide AI services to the user community. One such approach involves the use of dLLMs as either a replacement or complementary approach.

Diffusion-based LLMs (dLLMs) are AI models that arrive at answers differently than LLMs. Instead of taking the autoregressive approach, they use diffusion to find answers. Such models were originally used to generate images—they were taught how to do so by adding overwhelming noise to an image and then training the model to reverse the process until nothing was left but the original image.

Using this approach for text involved converting letters or words to tokens as an analog for pixels. The result was a model that used masks as an analog for noise to slowly erase tokens until there was nothing left but mask characteristics, then training the model to reverse the process until there was nothing but tokens. The advantage of this approach is that it can require far less computing power than LLMs.

d1 uses using reinforcement learning to enhance the reasoning capabilities of dLLMs — Across four math and logical reasoning tasks, d1-LLaDA, which undergoes SFT followed by our proposed diffu-GRPO, consistently outperforms the base LLaDA-8BInstruct model. Credit: *arXiv* (2025). DOI: 10.48550/arxiv.2504.12216

Holding up the use of dLLMs has been their inferior reasoning abilities. That is where the team in California comes in. They have been working to add reinforcement learning (where models learn through the use of rewards) to a dLLM as a way to improve its reasoning ability.

To build d1, the team added a two-step process. The first step involved supervised fine-tuning of the training dataset using high-quality data. The second makes use of reinforcement learning by adding an algorithm called diffu-GRPO, which uses math principles to make high-level estimates, along with what the team calls “random prompt masking.”

Testing of d1 has thus far shown the approach works—models using the framework outscored some math and logical reasoning benchmarks. The research team suggests their framework is ready for testing by other entities who may choose to adapt their AI models to incorporate the changes they are suggesting.

More information:
Siyan Zhao et al, d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning, arXiv (2025). DOI: 10.48550/arxiv.2504.12216

Journal information:
arXiv

Citation:
Reinforcement learning boosts reasoning skills in new diffusion-based language model d1 (2025, April 30)
retrieved 1 May 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

Briton dies from rabies after ‘scratch’ from stray puppy in Morocco

Israeli forces kill 47 Palestinians in Gaza, including 11 awaiting aid

Trump considers joining Israel’s strikes on Iranian nuclear sites

Air India: Lone survivor lays brother to rest at emotional funeral

Family of Zambia’s ex-president Lungu halts return of his body

Edgar Lungu: Zambian ex-president’s family halts return of his body from South Africa

El Chapo: Lawyer Silvia Delgado who defended drug lord elected as judge

Kneecap rapper in court on terror charge over Hezbollah flag

Gilbert Deya: Kenyan ‘miracle babies’ pastor dies in road crash

Australia lifts plasma donation ban for gay, bisexual men in world first

IGET.NEWS

Oh hi there
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every week.

More From Author

Monterrey 1-1 Inter Milan (Jun 17, 2025) Game Analysis – ESPN

Fantasy baseball: Devers and Soto on the decline? Cruz and Greene on the rise?

Briton dies from rabies after ‘scratch’ from stray puppy in Morocco

European Defence Fund funnels money to drones, hypersonic defense, AI

Ferrari reveal one-off livery for Miami GP

Leave a Reply Cancel reply

Recent News

About Us - Terms of use - Privace Policy

Oh hi there It’s nice to meet you.

Sign up to receive awesome content in your inbox, every week.

Leave a Reply Cancel reply

Recent News

Top of the day

About Us - Terms of use - Privace Policy

Oh hi there
It’s nice to meet you.