-
Notifications
You must be signed in to change notification settings - Fork 306
C++ readme #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nenad1002
wants to merge
4
commits into
main
Choose a base branch
from
nebanfic/cpp-sdk-readme
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+364
−0
Open
C++ readme #677
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,364 @@ | ||
| # Foundry Local C++ SDK | ||
|
|
||
| The Foundry Local C++ SDK provides a C++17 static library for running AI models locally via [Foundry Local](https://www.foundrylocal.ai/). Discover, download, load, and run inference entirely on your own machine — no cloud required. | ||
|
|
||
| > **Windows-only** — requires MSVC or clang-cl (MSVC-compatible toolchain). | ||
|
|
||
| ## Features | ||
|
|
||
| - **Model catalog** — browse and search all available models; filter by cached or loaded state | ||
| - **Lifecycle management** — download, load, unload, and remove models programmatically | ||
| - **Chat completions** — synchronous and streaming via OpenAI-compatible types | ||
| - **Audio transcription** — transcribe audio files with streaming support | ||
| - **Tool calling** — define tools and handle tool-call responses in chat completions | ||
| - **Download progress** — wire up a callback for real-time download percentage | ||
| - **Model variants** — select specific hardware/quantization variants per model alias | ||
| - **Optional web service** — start an OpenAI-compatible REST endpoint | ||
| - **Execution providers** — ensure EPs are downloaded and registered for hardware acceleration | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| | Requirement | Notes | | ||
| |---|---| | ||
| | **Foundry Local / AI Toolkit** | Install via `winget install Microsoft.AIToolkit` or the VS Code AI Toolkit extension | | ||
| | **CMake >= 3.20** | Ships with Visual Studio 2022 | | ||
| | **Ninja** | Ships with Visual Studio 2022 | | ||
| | **vcpkg** | Set the `VCPKG_ROOT` environment variable to your vcpkg installation | | ||
| | **MSVC** (or clang-cl) | Visual Studio 2022 Build Tools or full IDE | | ||
|
|
||
|
nenad1002 marked this conversation as resolved.
|
||
| ## Building from Source | ||
|
|
||
| ### 0. Open an x64 developer environment | ||
|
|
||
| All commands below must run in a shell that has the x64 MSVC toolchain on the PATH. | ||
| Choose **one** of the following: | ||
|
|
||
| | Method | How to open | | ||
| |---|---| | ||
| | **Developer Command Prompt** | Start Menu → *"x64 Native Tools Command Prompt for VS 2022"* | | ||
| | **Developer PowerShell** | Start Menu → *"Developer PowerShell for VS 2022"* | | ||
| | **Inside an existing cmd** | Run `"<VS_INSTALL>\VC\Auxiliary\Build\vcvars64.bat"` where `<VS_INSTALL>` is your Visual Studio installation path (e.g. `C:\Program Files\Microsoft Visual Studio\2022\Enterprise`) | | ||
| | **VS Code terminal** | Open the project folder in VS Code with the CMake Tools extension; it configures the environment automatically | | ||
|
|
||
| Verify by running `cl.exe` — the banner should say **x64**. | ||
|
|
||
| ### 1. Clone & navigate | ||
|
|
||
| ```bash | ||
| git clone <repo-url> | ||
| cd foundry-local-sdk/sdk/cpp | ||
| ``` | ||
|
|
||
| ### 2. Configure (CMake + vcpkg) | ||
|
|
||
| ```bash | ||
| cmake --preset x64-debug | ||
| ``` | ||
|
|
||
| This uses the `x64-debug` preset which: | ||
| - Uses the **Ninja** generator | ||
| - Resolves dependencies via **vcpkg** (`nlohmann-json`, `ms-gsl`, `wil`, `gtest`) | ||
| - Builds with the `x64-windows-static-md` triplet | ||
|
|
||
| ### 3. Build | ||
|
|
||
| ```bash | ||
| cmake --build --preset x64-debug | ||
| ``` | ||
|
|
||
| ### Release build | ||
|
|
||
| ```bash | ||
| cmake --preset x64-release | ||
| cmake --build --preset x64-release | ||
| ctest --preset x64-release | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```cpp | ||
| #include "foundry_local.h" | ||
| #include <iostream> | ||
|
|
||
| using namespace foundry_local; | ||
|
|
||
| int main() { | ||
| // 1. Create the manager | ||
| Manager::Create({"MyApp"}); | ||
| auto& manager = Manager::Instance(); | ||
|
|
||
| // 2. Get a model from the catalog | ||
| auto& catalog = manager.GetCatalog(); | ||
| auto* model = catalog.GetModel("phi-3.5-mini"); | ||
| if (!model) return 1; | ||
|
|
||
| // 3. Download (if needed) and load | ||
| model->Download(); | ||
| model->Load(); | ||
|
|
||
| // 4. Chat | ||
| OpenAIChatClient chat(*model); | ||
| std::vector<ChatMessage> messages = {{"user", "Hello!"}}; | ||
| ChatSettings settings; | ||
| settings.max_tokens = 128; | ||
|
|
||
| auto response = chat.CompleteChat(messages, settings); | ||
| if (!response.choices.empty() && response.choices[0].message) { | ||
| std::cout << response.choices[0].message->content << "\n"; | ||
| } | ||
|
|
||
| // 5. Cleanup | ||
| model->Unload(); | ||
| Manager::Destroy(); | ||
| } | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Initialization | ||
|
|
||
| `Manager` is a singleton. Call `Create` once at startup: | ||
|
|
||
| ```cpp | ||
| Manager::Create(Configuration{"MyApp"}, &myLogger); | ||
| ``` | ||
|
|
||
| Access it anywhere afterward via `Manager::Instance()`. Check `Manager::IsInitialized()` to verify creation. | ||
|
|
||
| Call `Manager::Destroy()` to perform deterministic cleanup when done. | ||
|
|
||
| ### Catalog | ||
|
|
||
| The catalog lists all models known to the Foundry Local Core: | ||
|
|
||
| ```cpp | ||
| auto& catalog = Manager::Instance().GetCatalog(); | ||
|
|
||
| // List all available models | ||
| auto models = catalog.ListModels(); | ||
| for (auto* m : models) | ||
| std::cout << m->GetAlias() << " — " << m->GetId() << "\n"; | ||
|
|
||
| // Get a specific model by alias | ||
| auto* model = catalog.GetModel("phi-3.5-mini"); | ||
|
|
||
| // Get a specific variant by its unique model ID | ||
| auto* variant = catalog.GetModelVariant("phi-3.5-mini-generic-gpu-4"); | ||
|
|
||
| // List models already downloaded to the local cache | ||
| auto cached = catalog.GetCachedModels(); | ||
|
|
||
| // List models currently loaded in memory | ||
| auto loaded = catalog.GetLoadedModels(); | ||
| ``` | ||
|
|
||
| ### Model Lifecycle | ||
|
|
||
| Each model may have multiple variants (different quantizations, hardware targets). The SDK auto-selects the best variant, or you can pick one. | ||
|
|
||
| ```cpp | ||
| // Check and select variants | ||
| if (auto* concrete = dynamic_cast<Model*>(model)) { | ||
| for (const auto& v : concrete->GetAllModelVariants()) { | ||
| std::cout << v.GetId() << " (cached: " << v.IsCached() << ")\n"; | ||
| } | ||
| // Switch to a specific variant (e.g., CPU) | ||
| for (const auto& variant : concrete->GetAllModelVariants()) { | ||
| if (variant.GetInfo().runtime && | ||
| variant.GetInfo().runtime->device_type == DeviceType::CPU) { | ||
| concrete->SelectVariant(variant); | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Download, load, and unload: | ||
|
|
||
| ```cpp | ||
| // Download with progress reporting | ||
| model->Download([](float progress) { | ||
| std::cout << "Download: " << progress << "%\n"; | ||
| }); | ||
|
|
||
| // Load into memory | ||
| model->Load(); | ||
|
|
||
| // Unload when done | ||
| model->Unload(); | ||
|
|
||
| // Remove from local cache entirely | ||
| model->RemoveFromCache(); | ||
| ``` | ||
|
|
||
| ### Chat Completions | ||
|
|
||
| ```cpp | ||
| OpenAIChatClient chat(*model); | ||
|
|
||
| std::vector<ChatMessage> messages = { | ||
| {"system", "You are a helpful assistant."}, | ||
| {"user", "Explain async/await in C#."} | ||
| }; | ||
| ChatSettings settings; | ||
|
|
||
| auto response = chat.CompleteChat(messages, settings); | ||
| if (!response.choices.empty() && response.choices[0].message) { | ||
| std::cout << response.choices[0].message->content << "\n"; | ||
| } | ||
| ``` | ||
|
|
||
| ### Streaming | ||
|
|
||
| Use a callback for token-by-token output: | ||
|
|
||
| ```cpp | ||
| chat.CompleteChatStreaming(messages, settings, [](const ChatCompletionCreateResponse& chunk) { | ||
| if (!chunk.choices.empty() && chunk.choices[0].delta) { | ||
| std::cout << chunk.choices[0].delta->content << std::flush; | ||
| } | ||
| }); | ||
| ``` | ||
|
|
||
| ### Chat Settings | ||
|
|
||
| Tune generation parameters per request: | ||
|
|
||
| ```cpp | ||
| ChatSettings settings; | ||
| settings.temperature = 0.7f; | ||
| settings.max_tokens = 256; | ||
| settings.top_p = 0.9f; | ||
| settings.frequency_penalty = 0.5f; | ||
| ``` | ||
|
|
||
| ### Audio Transcription | ||
|
|
||
| ```cpp | ||
| OpenAIAudioClient audio(*model); | ||
|
|
||
| // One-shot transcription | ||
| auto result = audio.TranscribeAudio(R"(C:\path\to\audio.wav)"); | ||
| std::cout << result.text << "\n"; | ||
|
|
||
| // Streaming transcription | ||
| audio.TranscribeAudioStreaming(R"(C:\path\to\audio.wav)", [](const AudioCreateTranscriptionResponse& chunk) { | ||
| std::cout << chunk.text; | ||
| }); | ||
| ``` | ||
|
|
||
| ### Tool Calling | ||
|
|
||
| See `sample/main.cpp` (Example 5) for a full tool-calling walkthrough. | ||
|
|
||
| ### Web Service | ||
|
|
||
| Start an OpenAI-compatible REST endpoint for use by external tools or processes: | ||
|
|
||
| ```cpp | ||
| Configuration config{"MyApp"}; | ||
| config.web = WebServiceConfig{ "http://127.0.0.1:5000" }; | ||
| Manager::Create(std::move(config)); | ||
|
|
||
| Manager::Instance().StartWebService(); | ||
| auto urls = Manager::Instance().GetUrls(); | ||
| for (const auto& url : urls) | ||
| std::cout << "Listening on: " << url << "\n"; | ||
|
|
||
| // ... use the service ... | ||
|
|
||
| Manager::Instance().StopWebService(); | ||
| ``` | ||
|
|
||
| ### Execution Providers | ||
|
|
||
| Ensure EPs are downloaded and registered for hardware acceleration: | ||
|
|
||
| ```cpp | ||
| Manager::Instance().EnsureEpsDownloaded(); | ||
| ``` | ||
|
|
||
| ## Configuration | ||
|
|
||
| | Property | Type | Default | Description | | ||
| |---|---|---|---| | ||
| | `app_name` | `std::string` | (required) | Your application name | | ||
| | `app_data_dir` | `optional<path>` | `~/.{app_name}` | Application data directory | | ||
| | `model_cache_dir` | `optional<path>` | `{app_data_dir}/cache/models` | Where models are stored locally | | ||
| | `logs_dir` | `optional<path>` | `{app_data_dir}/logs` | Log output directory | | ||
| | `log_level` | `LogLevel` | `Warning` | Verbose, Debug, Information, Warning, Error, Fatal | | ||
| | `web` | `optional<WebServiceConfig>` | `nullopt` | Web service configuration (see below) | | ||
| | `additional_settings` | `optional<unordered_map>` | `nullopt` | Extra key-value settings passed to Core | | ||
|
|
||
| **WebServiceConfig** | ||
|
|
||
| | Property | Type | Default | Description | | ||
| |---|---|---|---| | ||
| | `urls` | `optional<string>` | `127.0.0.1:0` | Bind address; semicolon-separated for multiple | | ||
| | `external_url` | `optional<string>` | `nullopt` | URI for accessing the web service in a separate process | | ||
|
|
||
| ## API Reference | ||
|
|
||
| Key types: | ||
|
|
||
| | Type | Description | | ||
| |---|---| | ||
| | `Manager` | Singleton entry point — create, catalog, web service | | ||
| | `Configuration` | Initialization settings | | ||
| | `Catalog` | Model catalog — list, search, filter | | ||
| | `IModel` | Model interface — identity, metadata, lifecycle | | ||
| | `Model` | Model with variant selection (implements `IModel`) | | ||
| | `ModelVariant` | A specific variant of a model (implements `IModel`) | | ||
| | `OpenAIChatClient` | Chat completions (sync + streaming) | | ||
| | `OpenAIAudioClient` | Audio transcription (sync + streaming) | | ||
| | `ChatSettings` | Chat generation parameters | | ||
| | `ModelInfo` | Full model metadata record | | ||
|
|
||
| ## Tests | ||
|
|
||
| ```bash | ||
| ctest --preset x64-debug | ||
| ``` | ||
|
|
||
| Or run the test executable directly: | ||
|
|
||
| ```bash | ||
| .\out\build\x64-debug\CppSdkTests.exe | ||
| ``` | ||
|
|
||
| ## Project Structure | ||
|
|
||
| ``` | ||
| sdk/cpp/ | ||
| ├── include/ # Public headers | ||
| │ ├── foundry_local.h # Umbrella header (include this) | ||
| │ ├── configuration.h # Configuration struct | ||
| │ ├── foundry_local_manager.h # Manager singleton | ||
| │ ├── catalog.h # Model catalog | ||
| │ ├── model.h # Model & ModelVariant | ||
| │ ├── logger.h # ILogger interface | ||
| │ └── openai/ | ||
| │ ├── openai_chat_client.h # Chat completion client | ||
| │ ├── openai_audio_client.h # Audio transcription client | ||
| │ └── openai_tool_types.h # Tool calling types | ||
| ├── src/ # Private implementation | ||
| ├── sample/ | ||
| │ └── main.cpp # Sample application | ||
| ├── test/ # Unit & E2E tests (GTest) | ||
| ├── CMakeLists.txt | ||
| ├── CMakePresets.json | ||
| ├── vcpkg.json # vcpkg dependencies | ||
| └── vcpkg-configuration.json | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| | Error | Cause | Fix | | ||
| |---|---|---| | ||
| | `DML provider requested, but GenAI has not been built with DML support` | GPU variant selected but ONNX Runtime GenAI lacks DML | Select a CPU variant or update Foundry Local | | ||
| | `OgaGenerator_TokenCount not found in onnxruntime-genai` | Version mismatch between Foundry Local components | Update Foundry Local: `winget upgrade Microsoft.AIToolkit` | | ||
| | `API version [N] is not available` | ONNX Runtime version too old for the Foundry Local service | Update Foundry Local: `winget upgrade Microsoft.AIToolkit` | | ||
|
|
||
| ## License | ||
|
|
||
| Licensed under the MIT License. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.