Skip to content

Implement WebSearch and WebOpen with Playwright and DDGS integration#15

Open
anakori wants to merge 4 commits intomainfrom
feature/webSearchAndWebOpen
Open

Implement WebSearch and WebOpen with Playwright and DDGS integration#15
anakori wants to merge 4 commits intomainfrom
feature/webSearchAndWebOpen

Conversation

@anakori
Copy link
Copy Markdown
Member

@anakori anakori commented Jan 15, 2026

This PR adds new web searching and improved web browsing capabilities for Golem:

  1. Web Search via DDGS
  2. Web browsing with JavaScript rendering via Playwright, with jina.ai as lightweight fallback
  3. Session for authenticated workflows (login, cookies, localStorage persistence)

Web searching

Minimal Python/FastAPI wrapper for DDGS (Dux Distributed Global Search) metasearch library

  • search backends: bing, brave, duckduckgo, google, wikipedia and more
  • grokipedia is filtered out. We are NOT poisoning Golem with Elon Musk garbage
  • configurable parameters: region, safesearch, time filters, etc.
  • gradle handles venv creation and dependency management

Endpoints:

  • GET /health - health check
  • GET /search - web search with full parameter control

Gradle tasks:

  • ./gradlew createVenv - create Python venv
  • ./gradlew installDdgsDeps - install Python dependencies
  • ./gradlew runDdgsSearch - start the service on localhost:8001, invokes createVenv and installDdgsDeps, if the user didn't run createVenv and installDdgsDeps first

Web browsing

  1. We're initializing Playwright with graceful fallback if things won't work as they're supposed to:
    • try bundled Playwright chromium first
    • fallback to system chromium at known paths
    • supports specifying non-standard chromium binary path via --chromium-path CLI option
  2. DefaultWebBrowser is created if Playwright succeeds
  3. Inject DefaultWeb with optional browser into GolemScriptDependencyProvider
  4. Clean up resources on shutdown

HTML to markdown conversion:

  • custom HTML to markdown converter using jsoup (replaces Flexmark dependency). I am planning to extend it in near future.
  • handles tables, nested lists, code blocks, images, and links
  • two operation modes:
    • keepPagesOpen=false: default, fresh page per request, closed after use
    • keepPagesOpen=true: reuses page for debugging with --show-browser

Web Interface

interface Web {
    suspend fun open(url: String): String
    suspend fun openInSession(sessionId: String, url: String): String
    suspend fun closeSession(sessionId: String)
    fun listSessions(): Set<String>
    suspend fun search(
        query: String,
        provider: String? = null,  // "ddgs" (default, work well) or "anthropic" (not yet implemented, expensive)
        page: Int = 1,
        pageSize: Int = 10,
        region: String = "us-en",
        safesearch: String = "moderate",
        timelimit: String? = null  // "d", "w", "m", "y"
    ): String
}

WebBrowser Interface

interface WebBrowser {
    suspend fun open(url: String): String
    suspend fun openInSession(sessionId: String, url: String): String
    suspend fun closeSession(sessionId: String)
    fun listSessions(): Set<String>
}

CLI options:

  • --show-browser: run chromium in non-headless mode (window visible)
  • --chromium-path=/path/to/chromium: use chromium installed in non-standard path

Usage in GolemScript

// Simple web search
val results = web.search("xemantic AI")

// Filtered search
val recentResults = web.search(
    query = "Kotlin coroutines",
    timelimit = "w",  // Last week
    pageSize = 20
)

// Get webpage content
val content = web.open("https://example.com")

// TODO/WIP: Authenticated browsing session
// I plan on adding support for interacting with websites with Golem
web.openInSession("github", "https://github.com/login")
val privateData = web.openInSession("github", "https://github.com/settings/profile")
web.closeSession("github")

Tests

Unit tests (mocked)

DefaultWebTest.kt:

  • open() with Playwright success/failure scenarios
  • open() fallback to jina.ai
  • openInSession() behavior
  • search() with DDGS service
  • error handling for various failure modes

DefaultWebBrowserTest.kt:

  • HTML to Markdown conversion
  • headings, paragraphs, links, images, lists, tables
  • code blocks and blockquotes

Integration tests

DefaultWebIntegrationTest.kt:

  • real DDGS service integration
  • verifies search result format and content

DefaultWebBrowserIntegrationTest.kt:

  • real Playwright browser integration
  • tests actual web page fetching
  • tests session creation and management

Integration tests are tagged with @Tag("integration") and skip gracefully when required services are unavailable

Dependencies

Pythond dependencies

  • fastapi
  • uvicorn
  • ddgs

JVM dependencies

  • jsoup (for HTML parsing in Playwright module)

How to run it

Open 4 terminals

Terminal #1 - start Neo4j

./gradlew runNeo4j

Terminal #2 - start DDGS service

./gradlew runDdgsSearch

Terminal #3 - start Golem

export ANTHROPIC_API_KEY=your_key
./gradlew run

Optionally, if you want to see the web browser used by Playwright:

export ANTHROPIC_API_KEY=your_key
./gradlew run --args="--show-browser"

If you want to specify chromium binary located in non-standard path:

export ANTHROPIC_API_KEY=your_key
./gradlew run --args="--show-browser --chromium-path=/snap/bin/chromium"

Terminal #4 - web UI

./gradlew jsBrowserDevelopmentRun --continuous

Running tests

# Unit tests only
./gradlew :golem-xiv-core:test --tests "*DefaultWebTest*"
./gradlew :golem-xiv-playwright:test --tests "*DefaultWebBrowserTest*"

# Integration tests (start DDGS service first!)
./gradlew :golem-xiv-core:test --tests "*Integration*"
./gradlew :golem-xiv-playwright:test --tests "*Integration*"

More

I would like to take it further and let Golem see the web by making screenshots and allowing it to click on elements and to login in to websites, fill the forms etc. I am wondering about extending/improving markanywhere, to use it as HTML to markown converter, but I am not sure about that yet.

Copy link
Copy Markdown
Member

@morisil morisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left my initial comments, there is much more, but I think we should start by drafting the architecture together first. Let's have a meeting focused on that, and then we can proceed with the implementation.

Comment thread golem-xiv-api-backend/src/main/kotlin/script/GolemScriptApi.kt Outdated
Comment thread golem-xiv-api-backend/src/main/kotlin/script/GolemScriptApi.kt Outdated
Comment thread golem-xiv-api-backend/src/main/kotlin/script/GolemScriptApi.kt Outdated
Comment thread golem-xiv-api-backend/src/main/kotlin/script/GolemScriptApi.kt Outdated
Comment thread golem-xiv-api-backend/src/main/kotlin/script/GolemScriptApi.kt Outdated
Comment thread golem-xiv-api-backend/src/main/kotlin/script/GolemScriptApi.kt
Comment thread golem-xiv-core/src/main/kotlin/script/service/DefaultWeb.kt
Comment thread golem-xiv-core/src/main/kotlin/script/GolemScriptDependencyProvider.kt Outdated
Comment thread golem-xiv-core/src/main/resources/constitution/GolemXivConstitution.md Outdated
@anakori anakori force-pushed the feature/webSearchAndWebOpen branch from e62486d to 7111ead Compare February 9, 2026 14:00
@anakori
Copy link
Copy Markdown
Member Author

anakori commented Feb 9, 2026

Change 1: New SearchProvider Interface

New file: golem-xiv-api-backend/src/main/kotlin/Web.kt:

interface SearchProvider {
    suspend fun search(
        query: String,
        page: Int = 1,
        pageSize: Int = 10,
        region: String = "us-en",
        safeSearch: String = "moderate",
        timeLimit: String? = null
    ): String
}

Original Web interface combined two distinct responsibilities:

  1. content fetching (open, openInSession)
  2. web searching

By enforcing interface segregation:

  • each search provider (DDGS, Anthropic) can implement just the search contract
  • Web interface stays focused on content fetching
  • new providers can be added without modifying Web interface
  • different search implementations can be swapped at runtime

Change 2: renamed open() to fetch() with content negotiation

What changed:

Before:

suspend fun open(url: String): String

After:

val MarkdownContentType = ContentType("text", "markdown")
suspend fun fetch(url: String, accept: ContentType = MarkdownContentType): String

Removed from Web interface:

suspend fun openInSession(sessionId: String, url: String): String
suspend fun closeSession(sessionId: String)
  1. Since for now we want to simplify web.open() by getting rid of session management and leaving just basic content fetching, web.fetch() is more accurate (suggests we're only fetching the content), because web.open() might suggest persistent browser state
  2. The ContentType parameter enables future support for different output formats (HTML, plain text, JSON)

Change 3: search provider map injection

Before:

override suspend fun search(..., provider: String?, ...): String {
    return when (provider) {
        "anthropic" -> throw UnsupportedOperationException(...)
        "ddgs", null -> searchWithDdgs(...)
        else -> throw IllegalArgumentException(...)
    }
}

After:

class DefaultWeb(
    private val searchProviders: Map<String?, SearchProvider>,
    private val httpClient: HttpClient,
    ...
) : Web {
    override suspend fun search(..., provider: String?, ...): String =
        searchProviders[provider]?.search(...)
            ?: throw IllegalArgumentException("Unknown search provider: $provider")
}

By removing hardcoded web search provider logic, we remove the need to modify DefaultWeb when adding new web search providers. Providers are now injected from outside and DefaultWeb class doesn't need to know how any provider works.
This also means we can inject mocked web search providers for testing without needing the actual DDGS service running or Anthropic API key. DefaultWeb is now open for extension (new providers), but closed for modification.


Change 4: new golem-xiv-ddgs module

New golem-xiv-ddgs module contains DdgsSearchProvider implementing SearchProvider:

class DdgsSearchProvider(
    httpClient: HttpClient,
    ddgsServiceUrl: String = "http://localhost:8001"
) : SearchProvider {
    override suspend fun search(...): String { ... }
}

Before:

golem-xiv-core
  └── DefaultWeb (contains DDGS logic embedded)

After:

golem-xiv-ddgs
  └── DdgsSearchProvider (isolated)
golem-xiv-cognizer-anthropic
  └── AnthropicSearchProvider (isolated)
golem-xiv-core
  └── DefaultWeb (delegates to injected providers)

Golem is supposed to have a modular architecture, allowing for easy composition of custom Golem distributions tailored to specific needs of specific clients. DDGS-specific code now lives in its own module with its own dependencies, projects that don't need DDGS can exclude this module entirely, DDGS integration tests can run in isolation, module clearly encapsulates everything DDGS-related.


Change 5: AnthropicSearchProvider in golem-xiv-cognizer-anthropic module

New file: golem-xiv-cognizer-anthropic/src/main/kotlin/AnthropicSearchProvider.kt

class AnthropicSearchProvider : SearchProvider {
    override suspend fun search(...): String {
        TODO("Not yet implemented")
    }
}

Why this is needed:

Anthropic-specific code belongs in the Anthropic cognizer module, since this module already depends on anthropic-sdk-kotlin and can properly integrate with the Anthropic API.


Other minor changes:

  • correct camelCase naming convention (safeSearch, timeLimit)
  • removed implementation details from Golem constitution (only API usage shown)
  • removed misleading "backward compatibility" comment on http service

@anakori anakori requested a review from morisil February 9, 2026 14:02
@claude
Copy link
Copy Markdown

claude bot commented Feb 9, 2026

Pull Request Review: Web Search and Web Browsing Integration

Summary

This is a well-structured PR that adds comprehensive web search and browsing capabilities to Golem XIV. The implementation demonstrates good architectural decisions, thorough testing, and attention to operational concerns.


🟢 Strengths

Architecture & Design

  • Clean separation of concerns: Modular design with separate packages follows SOLID principles well
  • Graceful degradation: Excellent fallback chain (bundled Chromium → system Chromium → jina.ai)
  • Provider pattern for search: Pluggable SearchProvider interface enables easy extensibility
  • Session management: Clean separation between stateless fetching and stateful browsing
  • Resource management: Proper cleanup on shutdown (GolemServer.kt:269-276)

Code Quality

  • Excellent error handling: Informative messages with clear remediation steps
  • Comprehensive logging: Debug logs with content previews aid troubleshooting
  • Thread-safe sessions: Proper use of ConcurrentHashMap, Mutex, and fast-path optimization
  • Well-tested: Both unit tests (mocked) and integration tests with graceful skipping

Documentation

  • Clear CLAUDE.md updates: Accurate feature descriptions and setup requirements
  • Helpful KDoc comments: Clear interface documentation with examples
  • Good inline comments: E.g., grokipedia filtering rationale

🟡 Areas for Improvement

Security Concerns ⚠️

HIGH PRIORITY

  1. Missing URL validation in DefaultWeb.fetch() (DefaultWeb.kt:44)

    • Risk: SSRF vulnerability - could fetch internal URLs like http://localhost:8001/admin, file:///etc/passwd, or cloud metadata (http://169.254.169.254/)
    • Fix: Add URL validation to allow only http/https and block localhost/private IPs
  2. Exception stack traces in Python service (ddgs_service.py:88)

    • exc_info=True could leak sensitive info in production
    • Consider sanitizing error messages

MEDIUM PRIORITY

  1. No resource limits beyond navigationTimeoutMs
    • keepPagesOpen mode could accumulate pages
    • Consider page-level resource limits

Performance Considerations

  1. HTML to Markdown is synchronous (WebBrowsing.kt:276-517)

    • Large HTML could block dispatcher
    • Consider withContext(Dispatchers.Default) for CPU-intensive parsing
  2. No explicit connection pooling

    • Add HttpTimeout and pool config to webHttpClient (GolemServer.kt:222)
  3. Session cleanup is manual-only

    • Abandoned sessions could leak memory
    • Add TTL or LRU eviction in DefaultWebBrowser

Code Quality Issues

MEDIUM PRIORITY

  1. Magic numbers in HTML converter (WebBrowsing.kt:304)

    • Extract listOf('\n', ' ', '[', '(') to named constants
  2. Duplicate code in DefaultWebBrowser

    • openWithFreshPage() and openKeepingPageOpen() have ~70% duplication
    • Extract common navigation/conversion logic
  3. Inconsistent null handling

    • searchProviders map allows null keys but filters them in error messages
    • Use null consistently or explicit "default" key
  4. Session API partially exposed

    • DefaultWeb has openInSession(), closeSession(), listSessions() (lines 75-102) but not in Web interface
    • Either remove if unfinished or add to interface - current state is confusing

LOW PRIORITY

  1. Python service binds to 0.0.0.0 (ddgs_service.py:101)

    • Use 127.0.0.1 for better local security
  2. Verbose debug logging

    • Full content previews (DefaultWeb.kt:49-60, 114-127) could impact performance
    • Make configurable or use TRACE level

Test Coverage Gaps

  1. No concurrent session tests - Add test with multiple coroutines accessing same session
  2. No Playwright fallback integration test - Verify Playwright → jina.ai fallback
  3. No malformed HTML edge cases - Test deeply nested, malformed, and very large HTML
  4. Missing negative test cases - Test invalid providers, negative page numbers, etc.

🔵 Additional Observations

Build System

  • Python venv management: Well-done with OS-specific handling and onlyIf optimization
  • Dependencies: Current and appropriate (jsoup is a solid choice)

API Design

  1. Web.fetch() ignores ContentType parameter (GolemScriptApi.kt:100)

    • Either implement content negotiation or remove parameter
  2. String-based enums for safeSearch, timeLimit

    • Acceptable for flexibility but consider validation/documentation

🟢 Minor Suggestions

  1. Add health check endpoint to Golem server (like DDGS /health)
  2. Make jina.ai URL configurable (currently hardcoded at DefaultWeb.kt:105)
  3. Add metrics: search counts, Playwright vs jina.ai ratio, response times
  4. Consider retry logic with exponential backoff for network failures
  5. Add privacy-focused browser args when launching Chromium

🎯 Recommendations

Must Fix Before Merge

  1. SSRF protection: Add URL validation
  2. Session API: Remove unfinished methods or add to interface

Should Fix Before Merge

  1. ContentType parameter: Implement or remove
  2. Code duplication: Extract common logic in DefaultWebBrowser
  3. Session cleanup: Add TTL mechanism

Can Address Later

  1. Performance optimizations (dispatcher, pooling)
  2. Additional test coverage
  3. Metrics and observability

✅ Conclusion

This is a high-quality PR that adds significant value. The implementation is well-architected, thoroughly tested, and demonstrates excellent engineering practices. The fallback mechanisms and error handling are particularly well done.

Recommendation: Approve with minor changes. Address the security concerns (SSRF) and API consistency issues, and this will be ready to merge.

Great work! This feature will significantly enhance Golem's capabilities.


Reviewed: 28 files (+3435/-53 lines) | Date: 2026-02-09

@anakori
Copy link
Copy Markdown
Member Author

anakori commented Feb 9, 2026

Ad security concerns

Ad 1. Missing URL validation in DefaultWeb.fetch():

I believe Claude Bot's SSRF concern here is a false positive, at least for now

Traditional SSRF applies when:

  • An external, untrusted user supplies a URL to a web-facing service
  • The server fetches that URL with its own privileges, exposing internal infrastructure

Golem's architecture is fundamentally different:

  • The entity calling web.fetch() is Golem itself, the AI agent running GolemScript
  • The human operator runs Golem locally and grants it access to their machine
  • There is no external attacker injecting URLs, the AI autonomously decides what to fetch based on its reasoning
  • Restricting localhost/LAN/file access would break legitimate use cases like fetching web search results from the DDGS service at localhost:8001 or accessing internal development servers

However, for future deployment, publicly accessible on the web we should implement SandboxedWeb that wraps DefaultWeb with URL filtering, rate limits, etc., without changing DefaultWeb:

class SandboxedWeb(
  private val delegate: Web,
  private val urlPolicy: UrlPolicy  // configurable per deployment
) : Web {
  override suspend fun fetch(url: String, accept: ContentType): String {
      urlPolicy.validate(url)  // throws on disallowed URLs
      return delegate.fetch(url, accept)
  }
  // ...
}

@morisil do you want me to implement SandboxedWeb now, in this PR?

Ad 2. Exception stack traces in Python service (ddgs_service.py:88):

  1. exc_info=True on line 88 writes the stack trace to the server-side log (stdout/stderr). It does NOT send it to the client.
  2. DDGS service now binds to 127.0.0.1, I changed it, so it's only reachable from localhost. The only consumer is Golem's process on the same machine. There's no external party to "leak" to.

Ad 3. No resource limits beyond navigationTimeoutMs

keepPagesOpen reuses a single page:

val page = statelessPage ?: browser.newPage().also {
  statelessPage = it
}

It creates one page and navigates it to different URLs. Pages don't accumulate. The statelessPageMutex ensures serialized access. In keepPagesOpen=false mode (the default), each page is created and closed in a try/finally block.


Ad performance considerations

Ad 1. HTML to Markdown is synchronous

We will migrate to markanywhere converting HTML to Markdown later, this is not an issue for now.

Ad 2. No explicit connection pooling

Java's built-in HttpClient has connection pooling by default

Ad 3. Session cleanup is manual-only

Sessions aren't exposed in the Web interface yet


Ad code quality issues

Ad 1. Magic numbers in HTML converter

We will migrate to markanywhere converting HTML to Markdown later, this is not an issue for now.

Ad 2. Duplicate code in DefaultWebBrowser

We could extract navigateAndConvert(page, url): String helper, but since we're planning to migrate to markanywhere anyway, that refactoring would be thrown away

Ad 3. Inconsistent null handling

The null key is intentional as it represents the default provider. When Web.search() is called without specifying a provider, the map lookup searchProviders[null] resolves to DDGS

Ad 4. Session API partially exposed

That's how it's supposed to be, we're planning on developing it further in the future

Ad 5. Python service binds to 0.0.0.0

Changed to 127.0.0.1

Ad 6. Verbose debug logging

These are all inside logger.debug { ... } lambda blocks. When the log level is INFO or higher (which it will be in normal operation), the lambdas are never executed so no performance overhead in production

Ad Test Coverage Gaps

Will be developed further once markanywhere as HTML to markdown converter is ready

Ad Additional Observations

API design

Ad 1. Web.fetch() ignores ContentType parameter

Will be developed further

@anakori
Copy link
Copy Markdown
Member Author

anakori commented Feb 9, 2026

image Build failing because GitHub is throwing HTTP 500 errors every few requests 🫠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants