Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
8e7c955
Add ONNX Runtime backend with configurable execution providers
ChinChangYang Feb 28, 2026
44f1681
Support humanv0 model with SGF metadata encoder in ONNX backend
ChinChangYang Mar 1, 2026
9884cc9
Simplify ONNX backend: use MatMul, add null guard, hoist variable
ChinChangYang Mar 1, 2026
7abd30f
Support loading raw .onnx model files in ONNX backend
ChinChangYang Mar 1, 2026
4234237
Fix tellg() failure check and avoid large model copy in ONNX backend
ChinChangYang Mar 1, 2026
c20d1a8
Fix thread_local misuse and dead code in ONNX backend
ChinChangYang Mar 1, 2026
4b2d37c
Improve ONNX backend robustness and defaults
ChinChangYang Mar 1, 2026
fedb326
Fix fragile output node name lookup in ONNX model builder
ChinChangYang Mar 2, 2026
9b14d01
Fix inconsistent buffer sizing and duplicated config read in ONNX bac…
ChinChangYang Mar 2, 2026
512c6e3
Add -export-onnx flag to export PyTorch checkpoints as ONNX models
ChinChangYang Mar 2, 2026
59c1cfa
Fix empty tensor fallback and hardcoded export dimensions in ONNX bac…
ChinChangYang Mar 2, 2026
9ab8cd3
Fix ONNX export hardcoded dimensions and silent missing output nodes
ChinChangYang Mar 2, 2026
a8a54b8
Fix ownership assertion and add ONNX integration test script
ChinChangYang Mar 3, 2026
6800366
Fix missing commas in model name list and improve ONNX backend docs
ChinChangYang Mar 3, 2026
d0e1e0b
Remove CLAUDE.md
ChinChangYang Mar 4, 2026
7e44e9f
Add ONNX backend documentation across all user-facing docs
ChinChangYang Mar 4, 2026
ef643ed
Add ONNX Runtime license attribution and defensive checks
ChinChangYang Mar 4, 2026
4b00169
Add CUDA and TensorRT execution providers for ONNX backend
ChinChangYang Mar 4, 2026
93810b9
Fix ONNX include to use onnx_pb.h for proper ONNX_API macro definition
claude Mar 24, 2026
21fbc42
Merge pull request #18 from ChinChangYang/claude/fix-onnx-api-macro-V…
ChinChangYang Mar 24, 2026
c4ee034
Use system-installed onnxruntime for ONNX backend
ChinChangYang Mar 23, 2026
f4208c3
Add ONNX_ML compile definition for cross-platform onnx_pb.h compatibi…
ChinChangYang Mar 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions Compiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,3 +151,51 @@ As also mentioned in the instructions below but repeated here for visibility, if
* Pre-trained neural nets are available at [the main training website](https://katagotraining.org/).
* You will probably want to edit `configs/gtp_example.cfg` (see "Tuning for Performance" above).
* If using OpenCL, you will want to verify that KataGo is picking up the correct device when you run it (e.g. some systems may have both an Intel CPU OpenCL and GPU OpenCL, if KataGo appears to pick the wrong one, you can correct this by specifying `openclGpuToUse` in `configs/gtp_example.cfg`).

## ONNX Runtime Backend
The ONNX backend uses [ONNX Runtime](https://onnxruntime.ai/) for neural net inference. It supports both standard `.bin.gz` model files (building the ONNX graph internally from the model weights) and raw `.onnx` model files. On macOS, it can use the CoreML execution provider for hardware-accelerated inference on Apple Silicon. On Windows/Linux, it can use the CUDA or TensorRT execution providers for NVIDIA GPU acceleration.

* Requirements
* ONNX Runtime built from source. See the [ONNX Runtime build instructions](https://onnxruntime.ai/docs/build/).
* On macOS with CoreML support, build ONNX Runtime with `--use_coreml`:
```
python3 tools/ci_build/build.py --build_dir build/MacOS \
--config RelWithDebInfo --build_shared_lib --parallel \
--compile_no_warning_as_error --skip_submodule_sync \
--cmake_generator Ninja --use_coreml
```
* On Windows/Linux with CUDA support, build ONNX Runtime with `--use_cuda`:
```
python3 tools/ci_build/build.py --build_dir build/Linux \
--config RelWithDebInfo --build_shared_lib --parallel \
--compile_no_warning_as_error --skip_submodule_sync \
--use_cuda --cudnn_home /usr/local/cuda --cuda_home /usr/local/cuda
```
* For TensorRT support, build with `--use_tensorrt` (also enables CUDA):
```
python3 tools/ci_build/build.py --build_dir build/Linux \
--config RelWithDebInfo --build_shared_lib --parallel \
--compile_no_warning_as_error --skip_submodule_sync \
--use_tensorrt --tensorrt_home /usr/local/TensorRT \
--use_cuda --cudnn_home /usr/local/cuda --cuda_home /usr/local/cuda
```
* zlib, libzip (same as other backends).
* Compile using CMake in the cpp directory:
* `cd KataGo/cpp`
* ```
cmake . -DUSE_BACKEND=ONNX \
-DONNXRUNTIME_ROOT=/path/to/onnxruntime \
-DONNXRUNTIME_BUILD_DIR=/path/to/onnxruntime/build/MacOS/RelWithDebInfo
```
* `ONNXRUNTIME_ROOT` - path to the ONNX Runtime source/install root directory.
* `ONNXRUNTIME_BUILD_DIR` - path to the ONNX Runtime build output directory (e.g. `build/MacOS/RelWithDebInfo`).
* CoreML support is automatically enabled when building on Apple platforms.
* `make`
* Done! You should now have a compiled `katago` executable in your working directory.
* Pre-trained neural nets are available at [the main training website](https://katagotraining.org/).
* You will probably want to edit `configs/gtp_example.cfg` (see "Tuning for Performance" above).
* Using raw `.onnx` model files:
* You can pass an `.onnx` file directly as the `-model` argument instead of a `.bin.gz` file.
* Input/output node names and model version are auto-detected. If auto-detection fails, override them via config keys documented in the ONNX section of `configs/gtp_example.cfg`.
* Selecting the execution provider:
* Set `onnxProvider = cpu` (default), `onnxProvider = coreml` (macOS only), `onnxProvider = cuda`, or `onnxProvider = tensorrt` in your config file.
6 changes: 6 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ and/or files, see the individual readmes and/or license files for each one withi
subdirectories within cpp/external. Additionally, cpp/core/sha2.cpp derives from another piece of
external code and embeds its own license within that file.

When built with the ONNX backend, this software links dynamically against ONNX Runtime (MIT License,
https://github.com/microsoft/onnxruntime) and its transitive dependencies including ONNX (MIT License,
https://github.com/onnx/onnx) and Protocol Buffers (BSD 3-Clause License,
https://github.com/protocolbuffers/protobuf). These libraries are not distributed with this
repository; see their respective repositories for license details.

Aside from the above, the license for all OTHER content in this repo is as follows:

----------------------------------------
Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
* [GUIs](#guis)
* [Windows and Linux](#windows-and-linux)
* [MacOS](#macos)
* [OpenCL vs CUDA vs TensorRT vs Eigen](#opencl-vs-cuda-vs-tensorrt-vs-eigen)
* [OpenCL vs CUDA vs TensorRT vs Eigen vs ONNX](#opencl-vs-cuda-vs-tensorrt-vs-eigen-vs-onnx)
* [How To Use](#how-to-use)
* [Tuning for Performance](#tuning-for-performance)
* [Common Questions and Issues](#common-questions-and-issues)
Expand Down Expand Up @@ -84,21 +84,23 @@ The community also provides KataGo packages for [Homebrew](https://brew.sh) on M

Use `brew install katago`. The latest config files and networks are installed in KataGo's `share` directory. Find them via `brew list --verbose katago`. A basic way to run katago will be `katago gtp -config $(brew list --verbose katago | grep 'gtp.*\.cfg') -model $(brew list --verbose katago | grep .gz | head -1)`. You should choose the Network according to the release notes here and customize the provided example config as with every other way of installing KataGo.

### OpenCL vs CUDA vs TensorRT vs Eigen
KataGo has four backends, OpenCL (GPU), CUDA (GPU), TensorRT (GPU), and Eigen (CPU).
### OpenCL vs CUDA vs TensorRT vs Eigen vs ONNX
KataGo has five backends, OpenCL (GPU), CUDA (GPU), TensorRT (GPU), Eigen (CPU), and ONNX (CPU/GPU).

The quick summary is:
* **To easily get something working, try OpenCL if you have any good or decent GPU.**
* **For often much better performance on NVIDIA GPUs, try TensorRT**, but you may need to install TensorRT from Nvidia.
* Use Eigen with AVX2 if you don't have a GPU or if your GPU is too old/weak to work with OpenCL, and you just want a plain CPU KataGo.
* Use Eigen without AVX2 if your CPU is old or on a low-end device that doesn't support AVX2.
* The CUDA backend can work for NVIDIA GPUs with CUDA+CUDNN installed but is likely worse than TensorRT.
* Use ONNX if you want to load raw `.onnx` model files, use CoreML on Apple Silicon, or use CUDA/TensorRT via ONNX Runtime on Windows/Linux.

More in detail:
* OpenCL is a general GPU backend should be able to run with any GPUs or accelerators that support [OpenCL](https://en.wikipedia.org/wiki/OpenCL), including NVIDIA GPUs, AMD GPUs, as well CPU-based OpenCL implementations or things like Intel Integrated Graphics. This is the most general GPU version of KataGo and doesn't require a complicated install like CUDA does, so is most likely to work out of the box as long as you have a fairly modern GPU. **However, it also need to take some time when run for the very first time to tune itself.** For many systems, this will take 5-30 seconds, but on a few older/slower systems, may take many minutes or longer. Also, the quality of OpenCL implementations is sometimes inconsistent, particularly for Intel Integrated Graphics and for AMD GPUs that are older than several years, so it might not work for very old machines, as well as specific buggy newer AMD GPUs, see also [Issues with specific GPUs or GPU drivers](#issues-with-specific-gpus-or-gpu-drivers).
* CUDA is a GPU backend specific to NVIDIA GPUs (it will not work with AMD or Intel or any other GPUs) and requires installing [CUDA](https://developer.nvidia.com/cuda-zone) and [CUDNN](https://developer.nvidia.com/cudnn) and a modern NVIDIA GPU. On most GPUs, the OpenCL implementation will actually beat NVIDIA's own CUDA/CUDNN at performance. The exception is for top-end NVIDIA GPUs that support FP16 and tensor cores, in which case sometimes one is better and sometimes the other is better.
* TensorRT is similar to CUDA, but only uses NVIDIA's TensorRT framework to run the neural network with more optimized kernels. For modern NVIDIA GPUs, it should work whenever CUDA does and will usually be faster than CUDA or any other backend.
* Eigen is a *CPU* backend that should work widely *without* needing a GPU or fancy drivers. Use this if you don't have a good GPU or really any GPU at all. It will be quite significantly slower than OpenCL or CUDA, but on a good CPU can still often get 10 to 20 playouts per second if using the smaller (15 or 20) block neural nets. Eigen can also be compiled with AVX2 and FMA support, which can provide a big performance boost for Intel and AMD CPUs from the last few years. However, it will not run at all on older CPUs (and possibly even some recent but low-power modern CPUs) that don't support these fancy vector instructions.
* ONNX is a backend that uses [ONNX Runtime](https://onnxruntime.ai/) for inference. It can load both standard `.bin.gz` model files and raw `.onnx` model files directly. It supports CPU inference out of the box, CoreML on macOS for Apple Silicon hardware acceleration, and CUDA/TensorRT execution providers for NVIDIA GPUs on Windows/Linux. Requires building ONNX Runtime from source as a prerequisite. See [Compiling KataGo](Compiling.md) for details.

For **any** implementation, it's recommended that you also tune the number of threads used if you care about optimal performance, as it can make a factor of 2-3 difference in the speed. See "Tuning for Performance" below. However, if you mostly just want to get it working, then the default untuned settings should also be still reasonable.

Expand Down
45 changes: 43 additions & 2 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ endif()
set(BUILD_DISTRIBUTED 0 CACHE BOOL "Build with http support for contributing to distributed training")
set(USE_BACKEND CACHE STRING "Neural net backend")
string(TOUPPER "${USE_BACKEND}" USE_BACKEND)
set_property(CACHE USE_BACKEND PROPERTY STRINGS "" CUDA TENSORRT OPENCL EIGEN)
set_property(CACHE USE_BACKEND PROPERTY STRINGS "" CUDA TENSORRT OPENCL EIGEN ONNX)

set(USE_TCMALLOC 0 CACHE BOOL "Use TCMalloc")
set(NO_GIT_REVISION 0 CACHE BOOL "Disable embedding the git revision into the compiled exe")
Expand Down Expand Up @@ -145,8 +145,14 @@ elseif(USE_BACKEND STREQUAL "EIGEN")
set(NEURALNET_BACKEND_SOURCES
neuralnet/eigenbackend.cpp
)
elseif(USE_BACKEND STREQUAL "ONNX")
message(STATUS "-DUSE_BACKEND=ONNX, using ONNX Runtime backend (loads .bin.gz natively).")
set(NEURALNET_BACKEND_SOURCES
neuralnet/onnxbackend.cpp
neuralnet/onnxmodelbuilder.cpp
)
elseif(USE_BACKEND STREQUAL "")
message(WARNING "${ColorBoldRed}WARNING: Using dummy neural net backend, intended for non-neural-net testing only, will fail on any code path requiring a neural net. To use neural net, specify -DUSE_BACKEND=CUDA or -DUSE_BACKEND=TENSORRT or -DUSE_BACKEND=OPENCL or -DUSE_BACKEND=EIGEN to compile with the respective backend.${ColorReset}")
message(WARNING "${ColorBoldRed}WARNING: Using dummy neural net backend, intended for non-neural-net testing only, will fail on any code path requiring a neural net. To use neural net, specify -DUSE_BACKEND=CUDA or -DUSE_BACKEND=TENSORRT or -DUSE_BACKEND=OPENCL or -DUSE_BACKEND=EIGEN or -DUSE_BACKEND=ONNX to compile with the respective backend.${ColorReset}")
set(NEURALNET_BACKEND_SOURCES neuralnet/dummybackend.cpp)
else()
message(FATAL_ERROR "Unrecognized backend: " ${USE_BACKEND})
Expand Down Expand Up @@ -449,6 +455,41 @@ elseif(USE_BACKEND STREQUAL "EIGEN")
endif()
endif()
endif()
elseif(USE_BACKEND STREQUAL "ONNX")
target_compile_definitions(katago PRIVATE USE_ONNX_BACKEND)
find_path(ONNXRUNTIME_INCLUDE_DIR onnxruntime_cxx_api.h
HINTS /opt/homebrew/opt/onnxruntime/include/onnxruntime
/opt/homebrew/include/onnxruntime /usr/local/include/onnxruntime
)
if(NOT ONNXRUNTIME_INCLUDE_DIR)
message(FATAL_ERROR "Could not find onnxruntime headers. Install via: brew install onnxruntime")
endif()
target_include_directories(katago SYSTEM PRIVATE "${ONNXRUNTIME_INCLUDE_DIR}")
find_library(ONNXRUNTIME_LIB onnxruntime
HINTS /opt/homebrew/opt/onnxruntime/lib /opt/homebrew/lib /usr/local/lib
)
if(NOT ONNXRUNTIME_LIB)
message(FATAL_ERROR "Could not find libonnxruntime. Install via: brew install onnxruntime")
endif()
find_path(ONNX_INCLUDE_DIR onnx/onnx-ml.pb.h
HINTS /opt/homebrew/opt/onnx/include /opt/homebrew/include /usr/local/include
)
if(NOT ONNX_INCLUDE_DIR)
message(FATAL_ERROR "Could not find onnx headers. Install via: brew install onnx")
endif()
target_include_directories(katago PRIVATE "${ONNX_INCLUDE_DIR}")
target_compile_definitions(katago PRIVATE ONNX_ML)
find_library(ONNX_PROTO_LIB onnx_proto
HINTS /opt/homebrew/opt/onnx/lib /opt/homebrew/lib /usr/local/lib
)
if(NOT ONNX_PROTO_LIB)
message(FATAL_ERROR "Could not find libonnx_proto. Install via: brew install onnx")
endif()
find_package(PkgConfig REQUIRED)
pkg_check_modules(PROTOBUF REQUIRED protobuf)
target_include_directories(katago PRIVATE ${PROTOBUF_INCLUDE_DIRS})
target_link_directories(katago PRIVATE ${PROTOBUF_LIBRARY_DIRS})
target_link_libraries(katago ${ONNXRUNTIME_LIB} ${ONNX_PROTO_LIB} ${PROTOBUF_LIBRARIES})
endif()

if(USE_BIGGER_BOARDS_EXPENSIVE)
Expand Down
5 changes: 3 additions & 2 deletions cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,14 @@ Summary of source folders, in approximate dependency order, from lowest level to
* `board.{cpp,h}` - Raw board implementation, without move history. Helper functions for Benson's algorithm and ladder search.
* `boardhistory.{cpp,h}` - Datastructure that does include move history - handles superko, passing, game end, final scoring, komi, handicap detection, etc.
* `graphhash.{cpp,h}` - History-sensitive hash used for [monte-carlo graph search](https://github.com/lightvector/KataGo/blob/master/docs/GraphSearch.md).
* `neuralnet` - Neural net GPU implementation and interface. Contains OpenCL, CUDA, Eigen, TensorRT backends along with common interfaces and model data structures.
* `neuralnet` - Neural net GPU implementation and interface. Contains OpenCL, CUDA, Eigen, TensorRT, and ONNX backends along with common interfaces and model data structures.
* `desc.{cpp,h}` - Data structure holding neural net structure and weights.
* `modelversion.{cpp,h}` - Enumerates the various versions of neural net features and models.
* `nninputs.{cpp,h}` - Implements the input features for the neural net.
* `sgfmetadata.{cpp,h}` - Implements the input features for the [HumanSL neural net](https://github.com/lightvector/KataGo/blob/master/docs/Analysis_Engine.md#human-sl-analysis-guide), for conditioning on various SGF metadata about human players from training data.
* `nninterface.h` - Common interface that is implemented by every low-level neural net backend.
* `{cuda,opencl,eigen,trt,dummy}backend.cpp` - Various backends.
* `{cuda,opencl,eigen,trt,onnx,dummy}backend.cpp` - Various backends.
* `onnxmodelbuilder.{cpp,h}` - Builds ONNX graphs from KataGo model weights for the ONNX backend.
* `nneval.{cpp,h}` - Top-level handle to the neural net used by the rest of the engine, implements thread-safe batching of queries.
* `search` - The main search engine.
* `timecontrols.cpp` - Basic handling of a few possible time controls.
Expand Down
29 changes: 29 additions & 0 deletions cpp/configs/gtp_example.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,35 @@ searchFactorWhenWinningThreshold = 0.95
# Default: numSearchThreads
# numEigenThreadsPerModel = X

# ------------------------------
# ONNX backend settings
# ------------------------------
# These only apply when using the ONNX version of KataGo.

# Execution provider to use: "cpu" (default), "coreml" (macOS only),
# "cuda" (NVIDIA GPU), or "tensorrt" (NVIDIA GPU, optimized).
# CoreML uses Apple's Neural Engine and GPU for hardware-accelerated inference.
# CUDA and TensorRT require ONNX Runtime built with --use_cuda or --use_tensorrt.
# onnxProvider = cpu

# Override input/output node names for raw .onnx model files.
# When loading a raw .onnx file, KataGo auto-detects node names by searching
# for "spatial", "global", "meta", "policy", "value", "miscvalue", "ownership"
# in the model's node names. Use these settings to override if auto-detection
# picks the wrong nodes.
# onnxInputSpatial = input_spatial
# onnxInputGlobal = input_global
# onnxInputMeta = input_meta
# onnxOutputPolicy = out_policy
# onnxOutputValue = out_value
# onnxOutputMiscvalue = out_miscvalue
# onnxOutputOwnership = out_ownership

# Override the auto-detected model version for raw .onnx model files.
# Model version is normally auto-detected from channel counts. Set this
# to a specific version number (>= 0) if auto-detection picks the wrong one.
# onnxModelVersion = 15

# ===========================================================================
# Root move selection and biases
# ===========================================================================
Expand Down
13 changes: 7 additions & 6 deletions cpp/dataio/loadmodel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,31 @@ std::time_t to_time_t(TP tp)
static const vector<string> ACCEPTABLE_MODEL_SUFFIXES {
".bin.gz",
".bin",
".onnx",
"model.txt.gz",
"model.txt"
};
static const vector<string> GENERIC_MODEL_NAMES {
"model.bin.gz",
"model.bin",
"model.txt.gz",
"model.txt"
"model.txt",
"Model.bin.gz",
"Model.bin",
"Model.txt.gz",
"Model.txt"
"Model.txt",
"MODEL.bin.gz",
"MODEL.bin",
"MODEL.txt.gz",
"MODEL.txt"
"MODEL.txt",
"model.ckpt",
"Model.ckpt"
"Model.ckpt",
"MODEL.ckpt",
"model.checkpoint",
"Model.checkpoint"
"Model.checkpoint",
"MODEL.checkpoint",
"model",
"Model"
"Model",
"MODEL",
};

Expand Down
2 changes: 2 additions & 0 deletions cpp/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,8 @@ string Version::getKataGoVersionFullInfo() {
out << "Using OpenCL backend" << endl;
#elif defined(USE_EIGEN_BACKEND)
out << "Using Eigen(CPU) backend" << endl;
#elif defined(USE_ONNX_BACKEND)
out << "Using ONNX backend" << endl;
#else
out << "Using dummy backend" << endl;
#endif
Expand Down
Loading