fix: correctly parse --urls tokens containing colons that are not label separators#53
Open
aneesh-spec wants to merge 5 commits intolangchain-ai:mainfrom
Open
fix: correctly parse --urls tokens containing colons that are not label separators#53aneesh-spec wants to merge 5 commits intolangchain-ai:mainfrom
aneesh-spec wants to merge 5 commits intolangchain-ai:mainfrom
Conversation
…el separators file: URLs and Windows drive paths (e.g. C:/...) were incorrectly split on the first colon, treating the scheme or drive letter as a label name.
Tests cover file: URLs and Windows drive paths being incorrectly parsed as label:url pairs. All 5 bug-specific tests fail on main and pass on this branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Arguments to
--urlsare sometimes split on the first:, which breaks valid inputs where the colon is part of a Windows drive path or afile:URL—not aLabel:prefix.Expected
A
--urlstoken that is only a path orfile:URL should register that full value, same as when the same value is given in YAML/JSON config.Actual
The registered path/URL is wrong; startup may fail with "file not found" for a file that exists, or behavior diverges from config-based setup.
Examples of broken inputs (before fix)
file:///path/to/llms.txtfile///path/to/llms.txtC:/docs/llms.txtC/docs/llms.txtRoot cause
In
mcpdoc/cli.py, the condition to detectname:urlformat only excludedhttp:andhttps:schemes, missingfile:URLs and single-letter Windows drive letters.Fix
file:to the scheme exclusion list:)Tests
Added 9 F2P (fail-to-pass) tests in
tests/unit_tests/test_cli.py:main, pass on this branchLabel:urlbehaviour is unaffected