feat: Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues | Add Document Embedding Status Events by angelplusultra · Pull Request #5192 · Mintplex-Labs/anything-llm

angelplusultra · 2026-03-11T17:26:55Z

Pull Request Type

✨ feat (New feature)
🐛 fix (Bug fix)
♻️ refactor (Code refactoring without changing behavior)
💄 style (UI style changes)
🔨 chore (Build, CI, maintenance)
📝 docs (Documentation updates)

Relevant Issues

resolves #

Description

Moves the native embedder and reranker from the main process into isolated child processes using child_process.fork() with a serial job queue. This prevents OOM crashes from large document batches from taking down the entire server, while adding real-time document-level progress reporting to the UI via SSE.

Architecture:

WorkerQueue class manages a forked child process with serial FIFO job processing, auto-fork on first job, and configurable idle timeout that kills the worker when inactive
EmbeddingProgressBus (singleton EventEmitter) acts as the central hub for progress events between Document.addDocuments and SSE endpoint listeners, with event buffering for late-joining clients after page reloads
Separate queue instances for embedding and reranking workers — each gets its own forked process
Query embedding (lightweight, single text) still runs in-process; only bulk document embedding is routed through the worker queue

Progress Reporting:

SSE endpoint at /workspace/:slug/embed-progress streams document-level events (batch_starting, doc_starting, doc_complete, doc_failed, all_complete)
EmbeddingProgressContext (React Context) manages progress state globally and connects SSE on component mount — server-side event replay catches up on any in-progress jobs without needing client-side persistence
Progress is scoped per-user — each user only sees their own embedding jobs, even when multiple users embed into the same workspace concurrently
Progress UI shows in the document management modal with per-file status (Queued → Embedding → Complete/Failed) and auto-clears 5 seconds after completion
When no embedding is in progress, the SSE connection stays open silently and waits — no premature signals are sent, avoiding race conditions between the SSE connection and the embed API call

Worker Timeouts:

Configurable via environment variables (NATIVE_EMBEDDING_WORKER_TIMEOUT, NATIVE_RERANKING_WORKER_TIMEOUT) with defaults of 300s and 900s respectively
Embedding timeout is configurable in the native embedder settings UI
Reranking timeout is configurable in the vector database settings when LanceDB is selected
Timeouts are re-read from env before each job so UI changes take effect without server restart

Other Changes:

NativeEmbedder.embedChunks() detects whether it's running in the main process or worker via process.send and routes through the worker queue automatically — no changes needed in vector DB providers or non-native embedding engines
NativeEmbeddingReranker.rerankViaWorker() added as static method to route through the queue from LanceDB, using the same process.send detection pattern
batch_starting event emitted at the start of a batch with the full file list, so SSE history replay can seed all files as "pending" for late-joining clients
doc_failed event emitted when fileData() returns null or when the database write fails (previously files were silently skipped, stuck as "pending" in UI)

Visuals (if applicable)

Embedding Status Events w/ Persistence Across Renders and Reloads

output.mp4

Worker Timeouts Are Configurable

The document management modal now shows real-time embedding progress when documents are being embedded into a workspace, with per-file status indicators (Queued, Embedding, Complete, Failed).

Additional Information

Worker idle timeouts can be set to 0 for immediate shutdown after work completes
The reranking worker has a longer default timeout (900s) since it runs on every chat query when accuracy-optimized search is enabled with LanceDB, and frequent cold starts would add overhead
Event replay on SSE reconnect ensures no stale "Queued" states after page refresh — the server buffers all document-level events and replays them to new subscribers
Multi-user: progress is scoped per-user via SSE userId filtering. No cross-user visibility or document locking.

Developer Validations

I ran yarn lint from the root of the repo & committed changes
Relevant documentation has been updated (if applicable)
I have tested my code functionality
Docker build succeeds locally

…r embedding progress

The SSE connection opens before the embedding API call fires, so the server sees no buffered history and immediately sends all_complete. Firefox dispatches this eagerly enough that it closes the EventSource before real progress events arrive, causing the progress UI to clear and fall back to the loading spinner. Chrome's EventSource timing masks the race. Track slugs where startEmbedding was called but no real progress event has arrived yet via awaitingProgressRef. Ignore the first all_complete for those slugs and keep the connection open for the real events.

Removed unnecessary tracking of slugs for premature all_complete events in the EmbeddingProgressProvider. Updated the server-side logic to avoid sending all_complete when no embedding is in progress, allowing the connection to remain open for real events. Adjusted the embedding initiation flow to ensure the server processes the job before the SSE connection opens, improving the reliability of progress updates.

…component Extracted the Reranking Worker Idle Timeout input from GeneralEmbeddingPreference and integrated it into the LanceDBOptions component. This change enhances modularity and maintains a cleaner structure for the settings interface.

timothycarambat

Timeout ≠ TTL. This looks like we are talking about how long we should keep workers alive after doing some work just to keep the worker hot. This is a TTL, so lets rename that.
Lets remove the UI components to specify TTL for now. Most people will not ever want to touch these nor want to even change it. Node Worker time to start is small enough to shoulder here.
Lets also then remove the associated systemSettings key entires since we wont be sending them to the UI. We should keep the protectedKeys in dumpENV just in case people DO want to set them manually.

Clarifying questions:

How are embedding jobs user segmented? If user A is embedding 10 docs and user B is embedding 1 - when A is queued and B's job finishes and gets all_complete doesnt A get that event back at the same time or is this simply based on the job ref in their renderer processes

frontend/src/components/Modals/ManageWorkspace/Documents/WorkspaceDirectory/index.jsx

frontend/src/components/Modals/ManageWorkspace/Documents/index.jsx

frontend/src/App.jsx

frontend/src/EmbeddingProgressContext.jsx

server/endpoints/workspaces.js

server/utils/EmbeddingEngines/native/index.js

server/utils/EmbeddingRerankers/native/index.js

server/utils/vectorDbProviders/lance/index.js

timothycarambat · 2026-03-13T20:45:23Z

server/utils/WorkerQueue/index.js

+module.exports = {
+  queueEmbedding,
+  queueReranking,
+  embeddingProgressBus,
+};


For this whole file, lets find a better way to break this up - this file is pretty messy and lots of different things going on in the same file.

…o functions

angelplusultra · 2026-03-13T21:41:07Z

Timeout ≠ TTL. This looks like we are talking about how long we should keep workers alive after doing some work just to keep the worker hot. This is a TTL, so lets rename that.

Lets remove the UI components to specify TTL for now. Most people will not ever want to touch these nor want to even change it. Node Worker time to start is small enough to shoulder here.

Lets also then remove the associated systemSettings key entires since we wont be sending them to the UI. We should keep the protectedKeys in dumpENV just in case people DO want to set them manually.

Clarifying questions:

How are embedding jobs user segmented? If user A is embedding 10 docs and user B is embedding 1 - when A is queued and B's job finishes and gets all_complete doesnt A get that event back at the same time or is this simply based on the job ref in their renderer processes

Each user's "Save" triggers its own addDocuments call, which runs its own loop and emits its own batch_starting → doc_starting → doc_complete → all_complete lifecycle. User B's all_complete only signals the end of User B's batch — it doesn't touch User A's. On the SSE side, every event is tagged with userId, and the subscriber filters on it. So User A's SSE connection only receives events where event.userId matches their own. User B's all_complete is simply never delivered to User A. The two batches are independent event streams that happen to flow through the same bus. They don't interfere with each other.

You can see this in action in the EmbeddingProgressBus:

  /**
  * Register an SSE listener filtered by workspace and user.
  * Replays any buffered events for the workspace before subscribing to live events.
  * @param {{ workspaceSlug: string, userId?: number }} filter
  * @param {function} callback - receives the progress event payload
  * @returns {{ unsubscribe: function }}
  */
 subscribe(filter, callback) {
   // Replay buffered events so reconnecting clients catch up.
   if (filter.workspaceSlug && this.#history.has(filter.workspaceSlug)) {
     for (const event of this.#history.get(filter.workspaceSlug)) {
       if (filter.userId && event.userId && event.userId !== filter.userId)
         continue;
       callback(event);
     }
   }

   const handler = (event) => {
     if (filter.workspaceSlug && event.workspaceSlug !== filter.workspaceSlug)
       return;
     if (filter.userId && event.userId && event.userId !== filter.userId)
       return;
     callback(event);
   };
   this.on("progress", handler);
   return {
     unsubscribe: () => this.off("progress", handler),
   };
 }

…d of native EventSource API.

…embedding progress SSE

…ed by workers instead of process.send checks

angelplusultra added 17 commits March 10, 2026 15:29

implement native embedder job queue

d2540a2

persist embedding progress across renders

e941b17

add development worker timeouts

a0cd78b

change to static method

573856a

native reranker

c80107e

remove useless return

0149020

lint

acd1e3d

simplify

f4f6b78

make embedding worker timeout value configurable by admin

af3c6e0

add event emission for missing data

592905c

lint

73ce274

remove onProgress callback argument

f0e5117

make rerank to rerankDirect

8c21420

persists progress state across app reloads

37ddfaa

remove chunk level progress reporting

0bee3ec

remove unuse dvariable

59d3727

make NATIVE_RERANKING_WORKER_TIMEOUT user configurable

6ff2015

angelplusultra marked this pull request as draft March 11, 2026 17:27

angelplusultra changed the title ~~feat: Move native embedder & reranker into isolated worker processes~~ feat: Native Embedder and Reranker Job Queue & Document Embedding Status Events Mar 11, 2026

angelplusultra changed the title ~~feat: Native Embedder and Reranker Job Queue & Document Embedding Status Events~~ feat:Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues & Add Document Embedding Status Events Mar 11, 2026

angelplusultra changed the title ~~feat:Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues & Add Document Embedding Status Events~~ feat: Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues & Add Document Embedding Status Events Mar 11, 2026

angelplusultra changed the title ~~feat: Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues & Add Document Embedding Status Events~~ feat: Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues | Add Document Embedding Status Events Mar 11, 2026

angelplusultra added 3 commits March 11, 2026 11:29

remove dead code

21c5784

scope embedding progress per-user and clear stale state on SSE reconnect

addab25

lint

7a80d1e

angelplusultra requested a review from shatfield4 March 11, 2026 21:50

angelplusultra marked this pull request as ready for review March 11, 2026 21:50

angelplusultra assigned shatfield4 and unassigned shatfield4 Mar 11, 2026

angelplusultra marked this pull request as draft March 11, 2026 22:32

angelplusultra requested a review from shatfield4 March 12, 2026 17:13

angelplusultra assigned shatfield4 Mar 12, 2026

angelplusultra marked this pull request as draft March 12, 2026 19:02

angelplusultra added 12 commits March 12, 2026 12:29

replace sessionStorage persistence with server-side history replay fo…

3fbc217

…r embedding progress

fix old comment

9eff983

reduce duplication with progress emissions

3c6d12a

remove dead code

2877dc9

fix stale comment

dce2bf8

remove unused function

0722754

fix event emissions for document creation failure

7df5535

lint

c88fac3

remove unused hadHistory vars

8b99aea

angelplusultra marked this pull request as ready for review March 13, 2026 19:17

timothycarambat reviewed Mar 13, 2026

View reviewed changes

timothycarambat requested changes Mar 13, 2026

View reviewed changes

timothycarambat assigned angelplusultra and unassigned shatfield4 Mar 13, 2026

refactor workspace directory by hoisting component and converting int…

eafa12c

…o functions

angelplusultra added 8 commits March 13, 2026 15:29

moved EmbeddingProgressProvider to wrap Document Manager Modal

f335e2b

refactor embed progress SSE connection to use fetchEventSource instea…

2b7bce8

…d of native EventSource API.

refactor message handlng into a function and reduce duplication

4ed2304

refactor: utilize writeResponseChunk for event emissions in document …

904ed16

…embedding progress SSE

refactor: explicit in-proc embedding and rerank methods that are call…

b1d2dd6

…ed by workers instead of process.send checks

Abstract EmbeddingProgressBus and Worker Queue into modules

9a63f98

remove error and toast messages on embed process result

cf8c681

use safeJsonParse

fd6dc2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues | Add Document Embedding Status Events#5192

feat: Move Native Embedder and Reranker Into Isolated Workers w/ Job Queues | Add Document Embedding Status Events#5192
angelplusultra wants to merge 45 commits intomasterfrom
feat-native-embedder-job-queue

angelplusultra commented Mar 11, 2026 •

edited

Loading

Uh oh!

timothycarambat left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timothycarambat Mar 13, 2026

Uh oh!

angelplusultra Mar 13, 2026

Uh oh!

angelplusultra commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

angelplusultra commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Type

Relevant Issues

Description

Visuals (if applicable)

Additional Information

Developer Validations

Uh oh!

timothycarambat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timothycarambat Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

angelplusultra Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

angelplusultra commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

angelplusultra commented Mar 11, 2026 •

edited

Loading

angelplusultra commented Mar 13, 2026 •

edited

Loading