33: Stop Thinking So Hard
Show Notes
Stop Thinking So Hard
Large reasoning models have an overthinking problem. They reach the correct answer early in their chain of thought — then keep generating thousands of additional tokens reconsidering, double-checking, and exploring alternatives they'll ultimately discard. A new paper from researchers at UT Austin, EPFL, ENS Paris-Saclay, and Telecom Paris introduces TERMINATOR, an inference-time early-exit strategy that detects when a model has already generated its final answer and stops reasoning immediately.
The key insight is that the first arrival of a model's final answer in its chain of thought is detectable from hidden states. Token confidence spikes distinctly at the answer position. Thinking-word usage shifts — words like "hmm" and "okay" cluster before the answer; words like "another" and "alternatively" cluster after. These signals are real, consistent across math, coding, and science domains, and learnable by a small classifier.
TERMINATOR is a single transformer layer — initialized from the base model's final layer — with a binary prediction head trained to predict answer arrival at every token position. At inference time, a sliding window of the ten most recent predictions triggers a stop when majority vote says the answer is already there, injecting a close-thinking token into the token stream. No data-calibrated thresholds. No test-time distribution samples. Train once, deploy anywhere.
Results
Tested on Qwen3-8B, Qwen3-14B, Ministral-3-8B-Reasoning, and Ministral-3-14B-Reasoning across MATH-500, AIME 2025, HumanEval, and GPQA:
- Best or second-best on 28 out of 32 metrics (accuracy + compression rate)
- MATH-500: ~45% token reduction, accuracy drop under 0.5 percentage points
- AIME 2025: ~30% reduction; TERMINATOR exits too early on hard problems — documented failure mode
- Consistently occupies the best accuracy-efficiency Pareto frontier position versus DEER, Dynasor, Thought Calibration, and NoThinking
Links
- Paper: arXiv:2603.12529 — TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
- Authors: Alliot Nagle (UT Austin), Jakhongir Saydaliev (EPFL), Dhia Garbaya (EPFL / ENS Paris-Saclay), Michael Gastpar (EPFL), Ashok Vardhan Makkuva (Telecom Paris / IP Paris), Hyeji Kim (UT Austin)
Related Work Mentioned
- DEER — chunk-based early exit via token probability thresholds
- Dynasor — periodic intermediate answer consistency checks
- Thought Calibration — linear probes on reasoning step hidden states
- Self-Certainty / Kang et al. — KL divergence confidence metric for reasoning
- DeepSeek-R1 — large reasoning model showing overthinking phenomenon
- Qwen3 — base models used in experiments
- vLLM — inference framework used for dataset curation
Datasets
- MATH — Lightman et al., mathematical problem solving
- AIME 2025 — American Invitational Mathematics Examination
- HumanEval — Chen et al., Python code generation
- GPQA — Rein et al., graduate-level science questions
- OpenScience — NVIDIA, scientific research dataset
- OpenCoder-SFT — Huang et al., code instruction fine-tuning
DTF:FTL is produced by PDX Hackerspace Foundation. Find us on Apple Podcasts, Spotify, or wherever fine podcasts are distributed.