A trio of AI researchers at KAIST AI, in Korea, has developed what they call a Chain-of-Zoom framework that allows the generation of extreme super-resolution imagery using existing super-resolution models without the need for retraining.
In their study published on the arXiv preprint server, Bryan Sangwoo Kim, Jeongsol Kim, and Jong Chul Ye broke down the process of zooming in on an image and then used an existing super-resolution model at each step to refine the image, resulting in incremental improvements in resolution.
The team in Korea began by noting that existing frameworks for improving the resolution of pictures tend to use interpolation or regression when zooming, resulting in blurry imagery. To overcome these problems, they took a new approach—using a stepwise zooming process, in which subsequent steps improve on those that came before.
The researchers call their new framework Chain-of-Zoom (CoZ), due to the chain of processes that are used to improve resolution.
For each step, the new framework uses a super-resolution (SR) model that already exists to begin the refinement process. As such processing is taking place, a vision-language-model (VLM) generates descriptive prompts that help the SR model conduct the generation process. The result is the generation of a zoomed-in part of the first image.

The framework then repeats the process, using helpful cues from VLM, repeatedly, improving the resolution of the zoomed image each time, until settling on a final version. To ensure that the prompts given by the VLM were useful, the research team applied reinforcement-learning techniques. Testing of the framework showed it is capable of besting imagery generated by standard benchmarks.
The researchers note that their framework does not require retraining to improve image quality, which, they suggest, makes it more portable. They also state that users need to be careful about how their framework is used. The zoomed-in image is not real—it has been generated using artificial intelligence.
Thus, if it were to be used for making out the letters and/or numbers on a getaway car license plate used during a bank robbery, for example, it might show some very clear letters and numbers—but they might not match those on the real car.
More information:
Bryan Sangwoo Kim et al, Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment, arXiv (2025). DOI: 10.48550/arxiv.2505.18600
Project page: bryanswkim.github.io/chain-of-zoom/
© 2025 Science X Network
Citation:
Chain-of-Zoom framework enables extreme super-resolution zoom without retraining (2025, June 4)
retrieved 4 June 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.