Skip to content

Add QRS-Tune PUCT hyperparameter tuning and match statistics#1178

Draft
ChinChangYang wants to merge 15 commits intolightvector:masterfrom
ChinChangYang:claude/katago-puct-tuning-guide-OXzst
Draft

Add QRS-Tune PUCT hyperparameter tuning and match statistics#1178
ChinChangYang wants to merge 15 commits intolightvector:masterfrom
ChinChangYang:claude/katago-puct-tuning-guide-OXzst

Conversation

@ChinChangYang
Copy link
Copy Markdown
Contributor

@ChinChangYang ChinChangYang commented Apr 1, 2026

Summary

Adds a tune-params subcommand for automated PUCT hyperparameter tuning using QRS (Quadratic Response Surface) optimization. The optimizer runs sequential head-to-head matches between a base bot and an experiment bot, fitting a quadratic logistic regression model to propose better parameter combinations over time.

Three PUCT parameters are tuned: cpuctExploration, cpuctExplorationLog, and cpuctUtilityStdevPrior.

Changes

New: tune-params subcommand (cpp/command/tuneparams.cpp)

  • Runs head-to-head matches between a fixed base bot (bot0) and an experiment bot (bot1)
  • After each game, feeds the outcome into QRSTuner::addResult()
  • Proposes the next parameter point via QRSTuner::nextSample() (MAP optimum + decaying Gaussian noise)
  • Logs a progress report every 100 trials
  • Prints best-found values and ASCII regression curves per dimension at the end
  • Graceful SIGINT/SIGTERM shutdown (matches pattern in match.cpp)

New: cpp/qrstune/QRSOptimizer.h + QRSOptimizer.cpp

  • QRSModel — Quadratic logistic regression with L2 regularization; Newton-Raphson MAP estimation; feature map is [1, x_i, x_i², x_i·x_j]
  • QRSBuffer — Sample storage with confidence-based pruning: drops samples whose predicted win rate is more than prune_margin below the current MAP best, keeping at least min_keep highest-quality samples
  • QRSTuner — Top-level interface: proposes next sample, records results, refits model periodically, prunes, decays exploration noise linearly from sigma_init to sigma_fin

Enhanced: match command statistics (cpp/command/match.cpp)

  • Tracks pairwise W/L/D counts during the match loop
  • Prints Bradley-Terry MLE Elo (Newton-Raphson solver with convergence warning)
  • Wilson 95% CI on win rate per pairing
  • One-tailed p-value for whether the experiment bot's win rate exceeds 50%

New: cpp/configs/tune_params_example.cfg

Example config with 500 trials, numGameThreads = 8, maxVisits = 500, with comments explaining all tuning-specific keys and search range defaults.

Documentation

  • README.md — Added command example for tune-params
  • cpp/README.md — Added qrstune/ to source folder summary and tuneparams.cpp to command list

Files changed

  • cpp/command/tuneparams.cpp — New subcommand
  • cpp/qrstune/QRSOptimizer.h + QRSOptimizer.cpp — QRS optimizer (model, buffer, tuner)
  • cpp/command/match.cpp — Bradley-Terry Elo and match statistics
  • cpp/configs/tune_params_example.cfg — Example config
  • cpp/CMakeLists.txt — Build integration
  • README.md, cpp/README.md — Documentation updates

claude and others added 15 commits March 31, 2026 13:31
Introduce tune-params subcommand for sequential optimization of KataGo
PUCT parameters (cpuctExploration, cpuctExplorationLog,
cpuctUtilityStdevPrior) using QRS-Tune, a quadratic response surface
optimizer with logistic regression and confidence-based pruning.

Add match statistics output with Bradley-Terry Elo ratings, Wilson
confidence intervals, and pairwise win/loss/draw summaries.

New files:
- cpp/qrstune/QRSOptimizer.h: header-only QRS-Tune optimizer library
- cpp/command/tuneparams.cpp: tune-params subcommand implementation

Modified files:
- cpp/CMakeLists.txt: add tuneparams.cpp to build
- cpp/main.h, cpp/main.cpp: register tune-params subcommand
- cpp/command/match.cpp: add Elo/CI/p-value statistics after matches

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
The root-level build/ directory is used for out-of-source CMake builds.

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
- Add missing NeuralNet::globalCleanup() call before ScoreValue::freeTables()
  to properly clean up neural net backend state on exit
- Hoist bestWinRate computation out of per-dimension loop in
  printRegressionCurves() (value is invariant across dimensions)
- Remove unnecessary step-number comments that restated the code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename ALL_CAPS constants to camelCase (nDims, paramNames, plotW,
  plotH, qrsDefaultMins/Maxs, rangeMinKeys/MaxKeys, eloPerStrength)
- Change nullptr to NULL to match KataGo's dominant convention
- Change "// Comment" to "//Comment" (no space after //)
- Change "// --- Section ---" separators to "//Section" style
- Leave QRSOptimizer.h unchanged (standalone library, own namespace)

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
…stency

Add graceful SIGINT/SIGTERM shutdown to tuneparams matching the pattern
used by match.cpp and other long-running commands. Fix QRSBuffer::prune
to retain highest-quality samples rather than oldest insertion-order ones
when applying min_keep. Add missing inline on gaussianSolve in header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add convergence detection to computeBradleyTerryElo in match.cpp so
that a warning is logged when the Newton-Raphson solver hits the 200
iteration limit without converging. Change QRSOptimizer.h free functions
from static inline to inline for correct weak external linkage in the
namespaced header-only library.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The header-only design violated KataGo's convention of separating
declarations (.h) from implementations (.cpp). Move all non-trivial
function bodies to QRSOptimizer.cpp, replace #pragma once with
#ifndef guard, trim header includes, and have predict() delegate
to score() to eliminate duplicated logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build directory moved under cpp/build which is already gitignored.

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Move wilsonCI95 and oneTailedPValue from static functions in match.cpp
to FancyMath namespace in core/fancymath.h/.cpp, following KataGo's
pattern of placing reusable math utilities in core namespaces.

Move computeBradleyTerryElo from static function in match.cpp to
ComputeElos namespace in core/elo.h/.cpp, alongside the existing
Elo computation utilities.

match.cpp now calls FancyMath::wilsonCI95(), FancyMath::oneTailedPValue(),
and ComputeElos::computeBradleyTerryElo() instead of file-local statics.

tuneparams.cpp static functions (qrsDimToReal, qrsToPUCT,
printRegressionCurves) are kept as file-local statics since they are
command-specific helpers, matching KataGo's pattern for command files.

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
- Use existing ELO_PER_LOG_GAMMA constant instead of recomputing
- Hoist Newton-loop allocations in computeBradleyTerryElo (grad, H, aug, delta)
- Hoist Newton-loop allocations in QRSModel::fit (grad, negH)
- Remove dead sigReceived state in tuneparams.cpp
- Add n <= 0 guard to FancyMath::wilsonCI95 to prevent division by zero
- Use std::move in QRSBuffer::prune for kept sample vectors
- Fix cpp/README.md: qrstune is no longer header-only, fix algorithm name

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Readability:
- Add file-level comment explaining the QRS-Tune algorithm
- Document feature layout with example (D=2: [1, a, b, a^2, b^2, a*b])
- Name magic numbers: SINGULAR_THRESHOLD, CONVERGENCE_THRESHOLD, SIGMOID_CLAMP
- Rename shadow variable 'f' to 'mult' in gaussianSolve
- Rename terse variables: z->logit, w->hessianWeight, resid->residual,
  maxd->maxStep, b_lin->linearCoeffs, b_quad->quadCoeffs, b_cross->crossCoeffs,
  p_best->bestPrediction, kv->entry, nx/ny->newXs/newYs
- Add phase comments in fit() documenting Newton-Raphson steps
- Add algorithm-level comment above fit() explaining the objective function

Tests (8 test cases):
- numFeatures: verify D=0,1,2,3
- computeFeatures: verify feature vector for D=2
- sigmoid: boundary, midpoint, and clamp behavior
- gaussianSolve: 2x2 system, 3x3 identity, singular detection
- QRSModel fit+predict: 1D and 2D separable data
- QRSModel mapOptimum: optimum better than anti-optimum
- QRSTuner end-to-end: 100 trials with deterministic seed
- QRSBuffer prune: verify buffer size reduction

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants