Diffusion-of-Thought & Deep Rethink: Iterative Self-Calibration in Diffusion Language Models

Author: Qafind Labs

Date: March 15, 2025

Table of Contents

  1. Abstract
  2. 1. Introduction
  3. 2. Background
  4. 3. Methodology
  5. 4. Experimental Results
  6. 5. Discussion
  7. 6. Conclusion
  8. References

Abstract

This report examines the evolution of DeepThink and Deep Rethink paradigms in diffusion language models. We introduce an explicit iterative self-calibration framework, evaluate its impact on solution accuracy, convergence properties, and user trust.

1. Introduction

Diffusion-of-Thought (DoT) has demonstrated the feasibility of integrating Chain-of-Thought reasoning into diffusion models. Building on DoT’s self-correction and multi-pass sampling, we propose Deep Rethink, a structured paradigm for iterative posterior calibration.

2. Background

3. Methodology

  1. Iterative Self-Calibration: Define separate policies for Think and Rethink stages with distinct reward signals.
  2. Dynamic Early Exit: Allow adaptive termination of Rethink loops based on confidence thresholds.
  3. Multi-Pass Sampling: Leverage DoTMP to generate and correct one rationale step at a time.
  4. Self-Consistency Aggregation: Conduct multiple runs and vote on final answers to enhance robustness.

4. Experimental Results

Key findings on GSM8K and multiplication tasks:

Recent experiments on Countdown and Sudoku tasks:

5. Discussion

Deep Rethink formalizes DoT’s implicit posterior correction into explicit policy optimization. This yields:

6. Conclusion

DoT’s core capabilities align with Deep Rethink’s goal of iterative posterior calibration. By introducing explicit Rethink policies and adaptive loops, we can elevate trustworthiness, precision, and efficiency across diverse reasoning tasks.

References

  1. Li, X. et al. "Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models." arXiv:2402.07754v3, 2024.
  2. Qafind Labs. "DeepThink & Deep Rethink Paradigms: A Comparative Study." Internal Tech Report, 2025.