Skip to content

fix(polardb): escape user-supplied values in edge_exists() to prevent Cypher injection (CWE-89)#1637

Open
sebastiondev wants to merge 2 commits intoMemTensor:mainfrom
sebastiondev:security/fix-cwe89-polardb-edge-exists
Open

fix(polardb): escape user-supplied values in edge_exists() to prevent Cypher injection (CWE-89)#1637
sebastiondev wants to merge 2 commits intoMemTensor:mainfrom
sebastiondev:security/fix-cwe89-polardb-edge-exists

Conversation

@sebastiondev
Copy link
Copy Markdown

Summary

Fixes a Cypher injection vulnerability in PolarDBGraphDB.edge_exists() (CWE-89). User-supplied values (source_id, target_id, user_name, type) were interpolated directly into a Cypher query string with no escaping, allowing an attacker to break out of string literals and modify the executed query.

  • File: src/memos/graph_dbs/polardb.py
  • Function: PolarDBGraphDB.edge_exists()
  • CWE: CWE-89 (Improper Neutralization of Special Elements used in an SQL Command — applies here to Cypher embedded inside Postgres/AGE)
  • Severity: High

The vulnerability

The pre-fix query was constructed like this:

query = f"SELECT * FROM cypher('{self.db_name}_graph', $$"
query += f"\nMATCH {pattern}"
query += f"\nWHERE a.user_name = '{user_name}' AND b.user_name = '{user_name}'"
query += f"\nAND a.id = '{source_id}' AND b.id = '{target_id}'"
if type != "ANY":
    query += f"\n AND type(r) = '{type}'"

The Cypher body is passed to Postgres/AGE inside a $$ ... $$ dollar-quoted string. Because of the dollar quoting, psycopg2 placeholders cannot reach the Cypher string literals — parameter binding happens at the Postgres layer, not inside the Cypher payload. So the only correct mitigation is to escape single quotes in the values before they are embedded.

A value like x' OR 1=1 // supplied as source_id closes the Cypher string, injects an additional clause, and comments out the rest. The same applies to target_id, user_name, and type.

Reaching the sink with attacker input

user_name is taken from the X-User-Name HTTP header via the request-context middleware (src/memos/api/middleware/request_context.py) and consumed downstream by graph operations. source_id / target_id flow in from API memory/edge endpoints. In a default deployment the header is trusted with no strong authentication binding it to the caller, so the value reaches edge_exists() directly under attacker control.

The fix

Escape all user-supplied string values with the existing escape_sql_string() helper (which doubles single quotes — the standard Cypher/SQL string-literal escape) before embedding them in the dollar-quoted Cypher block. direction was already validated against a whitelist; type, when not "ANY", is now also escaped.

src_esc = escape_sql_string(source_id)
tgt_esc = escape_sql_string(target_id)
uname_esc = escape_sql_string(user_name or "")

query = f"SELECT * FROM cypher('{self.db_name}_graph', $$"
query += f"\nMATCH {pattern}"
query += f"\nWHERE a.user_name = '{uname_esc}' AND b.user_name = '{uname_esc}'"
query += f"\nAND a.id = '{src_esc}' AND b.id = '{tgt_esc}'"
if type != "ANY":
    type_esc = escape_sql_string(type)
    query += f"\n AND type(r) = '{type_esc}'"

The change is minimal and matches the escaping pattern already used elsewhere in polardb.py.

Tests

A new test module tests/graph_dbs/test_polardb_edge_exists_cwe89.py was added with 11 cases covering:

  • Single-quote injection in source_id is doubled, raw payload not present in query
  • Single-quote injection in target_id is doubled
  • Single-quote injection in user_name is doubled (both occurrences)
  • type value is escaped when supplied
  • direction whitelist still rejects unknown values with ValueError
  • Legitimate identifiers pass through unchanged
  • ANY direction and ANY type still produce a valid query
  • user_name falls back to self.config.user_name correctly
  • The full Cypher body remains structurally valid

Run:

pytest tests/graph_dbs/test_polardb_edge_exists_cwe89.py -q
# 11 passed

A small tests/graph_dbs/conftest.py was added so the tests can import memos without the full env.

Adversarial review

Before submitting, we tried to disprove this. We checked whether psycopg2 parameterization could safely cover the values — it can't, because the Cypher payload sits inside a $$ ... $$ dollar-quoted Postgres string, so the inner literals are opaque to the driver. We checked whether upstream callers sanitize the IDs or user_name — they don't; user_name in particular comes from the X-User-Name request header and is forwarded as-is. We checked whether an alternate code path (edge_exists_old) makes the fix redundant — it's parameterized but isn't the one in use. The exploit precondition is simply "caller can set request headers and the PolarDB graph backend is configured", which is the normal operating mode.

Scope

Only edge_exists() is changed. Other functions in the same file have separate handling and are out of scope for this PR.

cc @lewiswigmore

…njection (CWE-89)

The edge_exists() method in the PolarDB graph database module
interpolated source_id, target_id, user_name, and type directly
into Cypher query strings via f-strings without any sanitization.

An attacker providing a crafted value such as source_id="x' OR 1=1 --"
could manipulate the Cypher WHERE clause to bypass authorization
checks or extract unauthorized data.

Fix: escape all four user-supplied values via the existing
escape_sql_string() function (which doubles single quotes) before
interpolating them into the Cypher query, consistent with how other
methods in the same file (get_edges, _build_user_name_and_kb_ids_conditions_cypher)
already handle escaping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant