Diffusion-of-Thought & Deep Rethink: Iterative Self-Calibration in Diffusion Language Models

Author: Qafind Labs

Date: March 15, 2025

Abstract
1. Introduction
2. Background
3. Methodology
4. Experimental Results
5. Discussion
6. Conclusion
References

Abstract

This report examines the evolution of DeepThink and Deep Rethink paradigms in diffusion language models. We introduce an explicit iterative self-calibration framework, evaluate its impact on solution accuracy, convergence properties, and user trust.

1. Introduction

Diffusion-of-Thought (DoT) has demonstrated the feasibility of integrating Chain-of-Thought reasoning into diffusion models. Building on DoT’s self-correction and multi-pass sampling, we propose Deep Rethink, a structured paradigm for iterative posterior calibration.

2. Background

DeepThink: RLHF-enhanced CoT generation emphasizing rapid "think-first-then-speak" inference.
Deep Rethink: Incorporates posterior calibration steps after initial inference, approximating Bayesian update cycles.
DoT: Diffusion-of-Thought framework featuring Scheduled and Coupled Sampling for in-training correction.

3. Methodology

Iterative Self-Calibration: Define separate policies for Think and Rethink stages with distinct reward signals.
Dynamic Early Exit: Allow adaptive termination of Rethink loops based on confidence thresholds.
Multi-Pass Sampling: Leverage DoTMP to generate and correct one rationale step at a time.
Self-Consistency Aggregation: Conduct multiple runs and vote on final answers to enhance robustness.

4. Experimental Results

Key findings on GSM8K and multiplication tasks:

100% accuracy: DoT achieves perfect results on multi-digit multiplication with 4–8 diffusion steps.
27× speedup: Compared to autoregressive baselines, maintaining similar accuracy.
Self-Consistency Boost: +5–6 pp on GSM8K via majority-vote decoding.

Recent experiments on Countdown and Sudoku tasks:

DLM: Countdown: 91.5%; Sudoku: 100%.
Autoregressive Baseline: Countdown: 45.8%; Sudoku: 20.7%.
Projected with Deep Rethink: Countdown accuracy expected to rise above 98% due to iterative posterior corrections; Sudoku remains at 100% with enhanced verification and confidence.

5. Discussion

Deep Rethink formalizes DoT’s implicit posterior correction into explicit policy optimization. This yields:

Reduced cumulative error and local optima, driving complex tasks closer to theoretical maximum accuracy.
Enhanced logical consistency in multi-turn dialogue and combinatorial puzzles by iterative self-review.
Improved user trust through transparent correction loops and higher confidence in answers.
In Countdown tasks, expect accuracy boost from 91.5% to >98%; in Sudoku, maintain 100% with faster convergence.

6. Conclusion

DoT’s core capabilities align with Deep Rethink’s goal of iterative posterior calibration. By introducing explicit Rethink policies and adaptive loops, we can elevate trustworthiness, precision, and efficiency across diverse reasoning tasks.

References

Li, X. et al. "Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models." arXiv:2402.07754v3, 2024.
Qafind Labs. "DeepThink & Deep Rethink Paradigms: A Comparative Study." Internal Tech Report, 2025.