Skip to content

Lazy-load storage plugins to avoid side effects on import#773

Open
whaaaley wants to merge 1 commit intoNaturalNode:masterfrom
whaaaley:lazy-load-storage-plugins
Open

Lazy-load storage plugins to avoid side effects on import#773
whaaaley wants to merge 1 commit intoNaturalNode:masterfrom
whaaaley:lazy-load-storage-plugins

Conversation

@whaaaley
Copy link
Copy Markdown

Problem

Importing natural for NLP features (tokenizers, stemmers, TF-IDF, etc.) eagerly loads all five storage plugins (Postgres, MongoDB, Redis, Memcached, File) at module scope in StorageBackend.js. Each plugin calls require('dotenv').config(), which reads .env files from disk, mutates process.env, and as of dotenv v17, logs promotional messages to stdout on every call.

This means anyone who imports natural for basic NLP gets unwanted console output, filesystem reads, and environment mutation -- even if they never use storage. Since dotenv is only used by the storage plugins, lazy-loading them eliminates these side effects entirely for the majority of users.

Here is what our production logs look like on every server restart (Fly.io, ord region). We only use natural for tokenizing and stemming:

[dotenv@17.3.1] injecting env (0) from .env -- tip: ⚡️ secrets for agents: https://dotenvx.com/as2
[dotenv@17.3.1] injecting env (0) from .env -- tip: ⚙️  load multiple .env files with { path: ['.env.local', '.env'] }
[dotenv@17.3.1] injecting env (0) from .env -- tip: 🛡️ auth for agents: https://vestauth.com
[dotenv@17.3.1] injecting env (0) from .env -- tip: ⚙️  write to custom object with { processEnv: myObject }
[dotenv@17.3.1] injecting env (0) from .env -- tip: ⚙️  specify custom .env file path with { path: '/custom/path/.env' }

Five log lines, one per storage plugin, injecting 0 env vars, with ads for a product we don't use. There's no way to silence these because the quiet option only works when you call dotenv.config() yourself -- not when a transitive dependency calls it.

For context, dotenv v17's default logging behavior has been a widespread issue:

Solution

Move the require() calls from the top of StorageBackend.js into the switch inside setStorageType(). Storage plugins are now loaded on demand only when a user explicitly creates a storage backend.

This is a non-breaking change. Users of the storage feature see no difference in behavior.

Testing

All 858 existing specs pass.

@whaaaley
Copy link
Copy Markdown
Author

This also addresses #754. The punycode deprecation warning comes from mongoose, which is pulled in by the MongoDB storage plugin. With lazy loading, users who don't call setStorageType('mongodb') never load mongoose and never see the warning.

If you're open to it, another option would be moving mongoose, pg, redis, and memjs to peerDependencies or optionalDependencies so they're not installed at all unless the user explicitly adds them. Or the storage backends could be a separate package entirely. Happy to help with either approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant