feat(pdf): add MinerU Cloud API as PDF parsing provider#438
Merged
Conversation
Add MinerU Cloud (v4 API) as a new PDF provider alongside unpdf (built-in) and MinerU (self-hosted). Users can now parse PDFs via the cloud API without deploying a self-hosted MinerU instance. - New provider `mineru-cloud` with API key auth and optional base URL - Cloud flow: batch create → presigned upload → poll → ZIP download → parse - Extract shared `mineru-parser.ts` from inline code (used by both paths) - Settings UI adapted for cloud (API key required) vs self-hosted (base URL required) - Server-side env var support: PDF_MINERU_CLOUD_API_KEY / PDF_MINERU_CLOUD_BASE_URL - SSRF protection on cloud verification endpoint - i18n translations for en-US, zh-CN, ja-JP, ru-RU Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts the PDF section from keylessProviders back to requiresBaseUrl: true to fix failing test: mineru with only apiKey (no baseUrl) should not activate. mineru-cloud API key can be configured via browser settings UI instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test connection button was disabled when the API key was only configured server-side (not entered by user in browser). Now checks isServerConfigured in addition to user-entered values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
unpdf(built-in) andmineru(self-hosted)mineru-parser.tsfor code reuse between self-hosted and cloud pathsChanged Files
lib/pdf/mineru-parser.tslib/pdf/mineru-cloud.tslib/pdf/types.ts'mineru-cloud'toPDFProviderIdunionlib/pdf/constants.tsMINERU_CLOUD_DEFAULT_BASEconstantlib/pdf/pdf-providers.tslib/server/provider-config.tsPDF_MINERU_CLOUDenv var mapping, fix PDF activation logiclib/store/settings.tscomponents/settings/pdf-settings.tsxapp/api/verify-pdf-provider/route.tslib/i18n/locales/*.jsonDesign Decisions
provider-config.tsfromrequiresBaseUrl: truetokeylessProvidersset —mineruactivates on base URL alone,mineru-cloudactivates on API key.extractMinerUResultextracted tomineru-parser.ts— pure refactor, no behavior change for existing self-hosted path.vlm(recommended by MinerU docs) instead ofpipeline(default).Code Review Summary
Two rounds of automated code review were performed:
Round 1 found and we fixed:
validateUrlForSSRFbefore fetchfile_names: [string]tofiles: [{name: string}]per official docsRound 2 (final review) result: Ready to merge
sourceFileNamenot threaded to cloud client (falls back todocument.pdf, works correctly)MINERU_CLOUD_DEFAULT_BASEconstant — extracted toconstants.tsJSON.parsein shared parser — added try-catchlanguage: 'ch'hardcoded — deferred to follow-up (make configurable)Blobconstruction — cosmetic, not blockingCI checks:
pnpm check✅ |pnpm lint✅ (0 errors) |npx tsc --noEmit✅Test Plan
https://mineru.net/api/v4)🤖 Generated with Claude Code