LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Flow-Based Real-World Super-Resolution

Rollout-group multi-reward fine-tuning for flow-based Real-ISR with LR-referenced faithfulness.

Song Fei1, †, Tian Ye1, †, Sixiang Chen1, Zhaohu Xing1, Jianyu Lai1, Lei Zhu1, 2, *

1 The Hong Kong University of Science and Technology (Guangzhou), 2 The Hong Kong University of Science and Technology

† Equal contribution   * Corresponding author

Abstract

Generative real-world image super-resolution can synthesize visually convincing details from severely degraded low-resolution inputs, yet stochastic sampling makes a critical failure mode hard to avoid: outputs may look sharp but be unfaithful to the LR evidence, exhibiting semantic or structural hallucinations.

LucidNFT is a multi-reward RL framework for flow-matching Real-ISR. It introduces LucidConsistency, a degradation-invariant and hallucination-sensitive LR-referenced evaluator trained with content-consistent degradation pools and original-inpainted hard negatives; a decoupled reward normalization strategy that preserves objective-wise contrasts within each LR-conditioned rollout group before fusion; and LucidLR, a large-scale collection of real-world degraded images for robust RL fine-tuning.

LR faithfulness is missing

Without HR references, no-reference perceptual metrics can reward sharp but unsupported details. Real-ISR needs an LR-referenced signal that is robust to degradations and sensitive to hallucination.

Rollout groups need contrast

Preference learning compares multiple stochastic restorations conditioned on the same LR image. Scalarizing heterogeneous rewards before normalization can collapse perceptual-faithfulness distinctions.

Real degradations need scale

RL alignment benefits from diverse LR inputs that induce informative rollout variation. Small benchmark datasets and synthetic pipelines limit degradation coverage.

Method

Overview of LucidConsistency
Overview of LucidConsistency. Same-content views under different degradation levels form positive pools, while original-inpainted pools provide spatially aligned hard negatives for localized AI-generated hallucinations.

LucidConsistency

A Qwen3-VL-Embedding-8B backbone with trainable LoRA adapters learns global and native-token representations through pool-based contrastive losses. At inference, it combines global and local LR-SR consistency into an LR-referenced score.

Decoupled reward normalization

Each reward dimension is normalized within the same LR-conditioned rollout group before fusion. The fused advantage is stabilized at batch level and mapped to the bounded DiffusionNFT reward weight.

LucidNFT fine-tuning

LucidNFT uses UniPercept IQA as the perceptual reward and LucidConsistency as the LR-faithfulness reward. Fine-tuning uses LoRA rank 32, 12 rollouts per LR input, and LucidLR as the real-world LR source.

Advantage separability analysis
Advantage separability analysis on LucidFlux using RealLQ250. Decoupled normalization produces larger advantage gaps and more distinct advantage levels than scalar-first reward aggregation under the same DiffusionNFT objective.

LucidLR Dataset

LucidLR is a 20K-image real-world low-quality dataset collected from Wikimedia Commons through its official API. Images are gathered from public low-quality and blurred-image categories, filtered from an approximately 22K-image raw pool with NSFW classification, corrupted-file removal, and manual review.

Representative LucidLR samples
Representative LucidLR samples with diverse real-world degradations, used as LR inputs for RL fine-tuning.
Comparison of representative real-world datasets used in Real-ISR.
Dataset Pairing Primary Usage Type # Images
RealSRPairedTesting / BenchmarkReal-captured100
DRealSRPairedTesting / BenchmarkReal-captured93
RealLQ250UnpairedTesting / BenchmarkReal-world250
LucidLRUnpairedRL / Unsupervised TrainingReal-world20K

Results

Experiments evaluate LucidNFT on two flow-based Real-ISR models, LucidFlux and DiT4SR. All methods are evaluated at 1024 x 1024 output resolution with 4x upscaling. The paper reports eight no-reference quality metrics and LucidConsistency as an LR-referenced consistency score without HR ground truth.

Training curves of LucidNFT and DPO fine-tuning
Training curves of LucidNFT and DPO fine-tuning. LucidNFT steadily improves perceptual rewards while keeping LucidConsistency stable.
Quantitative comparison on RealLQ250, DRealSR, and RealSR. Higher is better except NIQE.
Benchmark Metric DiffBIRv2 SeeSR DreamClear SUPIR DiT4SR DiT4SR(+LucidNFT) LucidFlux LucidFlux(+DPO) LucidFlux(+LucidNFT)
RealLQ250CLIP-IQA+ ↑0.69190.70340.68130.65320.70980.71240.72080.72280.7465
Q-Align ↑3.97554.14234.06474.13474.22704.23584.40524.44304.4855
MUSIQ ↑67.531370.375767.089965.813371.668272.173272.335172.450473.4475
MANIQA ↑0.49000.48950.44050.38260.46070.47190.52270.52580.5443
CLIP-IQA ↑0.71370.70630.69570.57670.71410.73550.68550.69170.7233
NIQE ↓5.11934.43833.87093.65913.55563.50073.74103.77853.2532
UniPercept IQA ↑65.476069.201568.846568.643073.074073.343070.930071.133073.4790
VisualQuality-R1 ↑4.34284.51184.44304.42654.61464.63044.54744.56444.6510
LucidConsistency ↑0.96090.94660.95780.95220.90520.91720.92370.92960.9345
DRealSRCLIP-IQA+ ↑0.64760.62580.44620.54940.65370.67570.65160.65300.6867
Q-Align ↑3.04873.27462.42143.47223.60083.66413.71413.74083.8423
MUSIQ ↑60.075961.322235.191254.928063.805165.191564.602564.560768.1545
MANIQA ↑0.49000.45050.26760.34830.44190.45720.46780.46690.5004
CLIP-IQA ↑0.67820.67600.43610.53100.67320.71110.66730.67130.7073
NIQE ↓6.48536.45037.01645.90925.70015.63295.07425.01434.1788
UniPercept IQA ↑46.229850.341434.247355.137158.129059.932859.903259.778263.7782
VisualQuality-R1 ↑3.47963.61162.56553.73493.96034.02393.99553.98284.1455
LucidConsistency ↑0.93320.92750.96070.89110.84380.85440.88130.88900.8879
RealSRCLIP-IQA+ ↑0.65430.67310.53310.56400.67530.68810.66690.66950.7151
Q-Align ↑3.31563.60733.00403.46823.71063.79593.87283.91473.9918
MUSIQ ↑61.775167.566049.476655.680767.982869.109267.896267.936270.5625
MANIQA ↑0.47450.50870.30920.34260.45330.46540.48890.49070.5284
CLIP-IQA ↑0.68060.69930.53900.48570.66310.69630.63590.64270.6936
NIQE ↓6.07005.45945.28735.28195.09124.83324.81344.68043.9526
UniPercept IQA ↑53.655058.053846.785056.606363.202564.842560.092560.477564.7588
VisualQuality-R1 ↑3.89284.06353.50283.78214.19534.24294.13764.15034.3389
LucidConsistency ↑0.95440.91380.94750.91410.83180.84980.88530.89320.8923
Human-aligned LR-faithfulness evaluation on RealSR.
EvaluatorCriterionAgreementRecall@1Filter@1
CLIP-IQAPerceptual Quality0.3910.1860.093
MUSIQPerceptual Quality0.3490.1160.093
Q-AlignPerceptual Quality0.3220.0930.047
UniPercept-IQAPerceptual Quality0.3220.0700.093
Qwen3-VL-Embedding-8BGeneric Semantics0.6430.4650.302
LucidConsistencyLR Faithfulness0.6900.5580.558
Ablation study on RealLQ250 using LucidFlux.
MethodUniPercept ↑VQ-R1 ↑NIQE ↓LucidConsistency ↑
LucidFlux baseline70.9304.5473.7410.9237
(A) IQA-only RL71.5384.6013.5140.9259
(B) + Frozen semantic reward71.3664.5963.5290.9298
(C) + LucidConsistency reward71.2144.5933.5470.9341
(D) + Decoupled norm.72.7034.6333.3720.9356
(E) + LucidLR73.4794.6513.2530.9345

Visual Results

Visual comparison on RealLQ250
Visual comparison on RealLQ250. LucidNFT variants recover more accurate text structures and finer local textures while preserving LR-supported semantics.
Visual comparison on RealLQ250
Visual comparison on RealLQ250. LucidFlux(+LucidNFT) improves local details while avoiding LR-inconsistent artifacts.

Citation

@article{fei2026lucidnft,
  title={LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Flow-Based Real-World Super-Resolution},
  author={Fei, Song and Ye, Tian and Chen, Sixiang and Xing, Zhaohu and Lai, Jianyu and Zhu, Lei},
  journal={arXiv},
  year={2026}
}

Contact Us

For questions or collaboration, contact sfei285@connect.hkust-gz.edu.cn, tye610@connect.hkust-gz.edu.cn, or leizhu@hkust-gz.edu.cn.