-
Notifications
You must be signed in to change notification settings - Fork 751
Oelachqar/refresh docs #2372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
oelachqar
wants to merge
2
commits into
main
Choose a base branch
from
oelachqar/refresh_docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Oelachqar/refresh docs #2372
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| # Deploying Models | ||
|
|
||
| Oumi provides a top-level `oumi deploy` command for taking a trained or downloaded model and standing it up as a managed inference endpoint on a third-party provider. Today it supports **Fireworks AI** and **Parasail.io**. | ||
|
|
||
| ```{admonition} Related | ||
| :class: note | ||
| - To *launch training* on remote clusters, see {doc}`/user_guides/launch/launch`. | ||
| - To *call* a deployed endpoint, see {doc}`/user_guides/infer/inference_engines`. | ||
| ``` | ||
|
|
||
| ## Overview | ||
|
|
||
| The deploy workflow has three stages, each exposed as a sub-command: | ||
|
|
||
| 1. **Upload** — push the model (full weights or a LoRA adapter) to the provider. | ||
| 2. **Create endpoint** — provision hardware and start serving the uploaded model. | ||
| 3. **Test / use** — smoke-test the endpoint and then call it with any inference engine. | ||
|
|
||
| For the common case, `oumi deploy up` runs all three stages end-to-end from a single YAML config. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A provider account and API key exported in your shell: | ||
| - Fireworks: `FIREWORKS_API_KEY` | ||
| - Parasail: `PARASAIL_API_KEY` | ||
| - For Fireworks, the model must exist on your local disk (HuggingFace download or an Oumi training output). | ||
|
|
||
| ## Quick Start: End-to-End Deploy | ||
|
|
||
| ```bash | ||
| oumi deploy up --config configs/examples/deploy/fireworks_deploy.yaml | ||
| ``` | ||
|
|
||
| The `--config` YAML matches the {py:class}`~oumi.deploy.deploy_config.DeploymentConfig` schema: | ||
|
|
||
| ```yaml | ||
| # configs/examples/deploy/fireworks_deploy.yaml | ||
| model_source: /path/to/my-finetuned-model/ # local directory | ||
| provider: fireworks # fireworks | parasail | ||
| model_name: my-finetuned-model-v1 # display name on the provider | ||
| model_type: full # full | adapter | ||
| # base_model: accounts/fireworks/models/llama-v3p1-8b-instruct # required if adapter | ||
|
|
||
| hardware: | ||
| accelerator: nvidia_h100_80gb # see `oumi deploy list-hardware` | ||
| count: 2 | ||
|
|
||
| autoscaling: | ||
| min_replicas: 1 | ||
| max_replicas: 4 | ||
|
|
||
| test_prompts: | ||
| - "Hello, how are you?" | ||
| ``` | ||
|
|
||
| Any of `model_source`, `provider`, and `hardware` can be overridden on the CLI, e.g.: | ||
|
|
||
| ```bash | ||
| oumi deploy up \ | ||
| --config fireworks_deploy.yaml \ | ||
| --model-path /tmp/llama3-8b \ | ||
| --hardware nvidia_a100_80gb | ||
| ``` | ||
|
|
||
| `oumi deploy up` will upload the model, wait for it to be ready, create an endpoint, optionally run any `test_prompts`, and print the endpoint URL. | ||
|
|
||
| ## Sub-Commands | ||
|
|
||
| | Command | What it does | | ||
| |---------------------------------|----------------------------------------------------------------------| | ||
| | `oumi deploy up` | Full pipeline: upload → create endpoint → test | | ||
| | `oumi deploy upload` | Upload a model only | | ||
| | `oumi deploy create-endpoint` | Create an endpoint for a previously uploaded model | | ||
| | `oumi deploy list` | List all deployments on the provider | | ||
| | `oumi deploy list-models` | List uploaded models | | ||
| | `oumi deploy list-hardware` | List hardware options available for a provider | | ||
| | `oumi deploy status` | Show endpoint state, replica counts, URL | | ||
| | `oumi deploy start` / `stop` | Start or stop an existing endpoint (pause to save cost) | | ||
| | `oumi deploy delete` | Delete an endpoint | | ||
| | `oumi deploy delete-model` | Delete an uploaded model | | ||
| | `oumi deploy test` | Send a sample request to an endpoint | | ||
|
|
||
| Add `--help` to any sub-command for the exact flags it accepts, or see {doc}`/cli/commands`. | ||
|
|
||
| ## Using a Deployed Endpoint | ||
|
|
||
| Once `oumi deploy up` reports `RUNNING`, point any Oumi inference engine at the returned URL. For Fireworks: | ||
|
|
||
| ```python | ||
| from oumi.inference import FireworksInferenceEngine | ||
| from oumi.core.configs import ModelParams | ||
|
|
||
| engine = FireworksInferenceEngine( | ||
| model_params=ModelParams(model_name="my-finetuned-model-v1") | ||
| ) | ||
| ``` | ||
|
|
||
| For Parasail: | ||
|
|
||
| ```python | ||
| from oumi.inference import ParasailInferenceEngine | ||
| from oumi.core.configs import ModelParams | ||
|
|
||
| engine = ParasailInferenceEngine( | ||
| model_params=ModelParams(model_name="my-finetuned-model-v1") | ||
| ) | ||
| ``` | ||
|
|
||
| Both engines are documented in {doc}`/user_guides/infer/inference_engines`. | ||
|
|
||
| ## Tips | ||
|
|
||
| - **Cost control.** Use `oumi deploy stop <endpoint>` to pause an endpoint without deleting it; `start` brings it back online. Set `autoscaling.min_replicas: 0` if the provider supports scale-to-zero. | ||
| - **LoRA adapters.** Set `model_type: adapter` and a matching `base_model` to deploy a LoRA adapter on top of a hosted base model. This is usually cheaper than a full model. | ||
| - **Smoke tests.** `test_prompts` at the bottom of the YAML run automatically after `oumi deploy up` finishes — quick sanity check before sending real traffic. | ||
|
|
||
| ## See Also | ||
|
|
||
| - {doc}`/user_guides/infer/inference_engines` — calling the deployed endpoint | ||
| - {doc}`/user_guides/launch/launch` — launching training jobs on remote clusters | ||
| - {doc}`/cli/commands` — CLI reference | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this statement is 100% given PR https://github.com/oumi-ai/oumi/pull/2360/changes.
on the other hand, we have not exposed that functionality throug the CLI so I think we are ok