docs: Cognitive Security manifesto and threat taxonomy v1#23657
docs: Cognitive Security manifesto and threat taxonomy v1#23657BrianCLong merged 1 commit intomainfrom
Conversation
Core category documents for Summit Cognitive's GTM positioning. Extracted from #23637 (docs-only, skipping lockfile/workflow changes that had conflicts). - docs/cognitive-security/manifesto.md: Category thesis, core sentence, paradigm shift argument (admissibility over alignment), CSP/quarantine/ subsumption pillars. v1.0.0 draft. - docs/cognitive-security/threat-taxonomy-v1.md: Structured threat taxonomy for cognitive security failures (stochastic drift, prompt injection, epistemic poisoning, context hijacking, etc.) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to data retention organization setting WalkthroughTwo new documentation files introduce cognitive security concepts, including a manifesto reframing AI safety around structural integrity of machine reasoning and a threat taxonomy classifying four canonical failure modes. No functional code is modified. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the Cognitive Security Manifesto and the initial version of the Cognitive Security Threat Taxonomy. The review feedback suggests refining terminology for consistency: replacing 'intentionally hallucinated' with 'maliciously crafted' to better reflect malicious intent, and standardizing the terminology around 'quarantine' instead of 'rejection' to align with the framework's core pillars. Additionally, the review recommends updating internal references to 'Admissibility Gate' for better documentation cohesion.
|
|
||
| * An attacker embeds a hidden prompt injection payload inside a seemingly benign PDF resume, causing the HR parsing AI to output a recommendation for hire regardless of qualifications. | ||
| * A user adds invisible text to a webpage that instructs a summarization agent to exfiltrate private session tokens via markdown image links. | ||
| * A third-party API dependency returns intentionally hallucinated JSON that exploits the parser's loose schema, causing downstream logic errors in an autonomous agent. |
There was a problem hiding this comment.
The phrase "intentionally hallucinated" is conceptually contradictory. Hallucinations in the context of AI are typically stochastic, unintentional errors in reasoning or grounding. If a third-party API is providing deceptive data specifically to exploit a system, it is more accurately described as "maliciously crafted" or "poisoned" data, which aligns better with the definition of "Corrupted Cognition" provided in this section.
| * A third-party API dependency returns intentionally hallucinated JSON that exploits the parser's loose schema, causing downstream logic errors in an autonomous agent. | |
| * A third-party API dependency returns maliciously crafted JSON that exploits the parser's loose schema, causing downstream logic errors in an autonomous agent. |
| ## 4. Non-Admissible Cognition | ||
|
|
||
| **Definition:** | ||
| Occurs when the output fails structural, schema, or relational validation checks defined by the Cognitive Security Protocol (CSP), resulting in the rejection of the data packet by the Admissibility Gates. |
There was a problem hiding this comment.
To maintain consistency with the "Quarantine and Subsumption" pillar defined in the Cognitive Security Manifesto (line 41), the result of an admissibility failure should be described as "quarantine" rather than "rejection". This reinforces the framework's emphasis on isolating and analyzing failures rather than simply dropping them.
| Occurs when the output fails structural, schema, or relational validation checks defined by the Cognitive Security Protocol (CSP), resulting in the rejection of the data packet by the Admissibility Gates. | |
| Occurs when the output fails structural, schema, or relational validation checks defined by the Cognitive Security Protocol (CSP), resulting in the quarantine of the data packet by the Admissibility Gates. |
|
|
||
| **Examples:** | ||
|
|
||
| * An agent outputs a JSON response missing a required evidence ID field, causing the data to be rejected by the WriteSet firewall. |
There was a problem hiding this comment.
The term "WriteSet firewall" is introduced here without being defined or mentioned in the Manifesto. For terminological consistency across the documentation, this should refer to the "Admissibility Gate". Additionally, using "quarantined" instead of "rejected" aligns with the core pillars of the Cognitive Security posture.
| * An agent outputs a JSON response missing a required evidence ID field, causing the data to be rejected by the WriteSet firewall. | |
| * An agent outputs a JSON response missing a required evidence ID field, causing the data to be quarantined by the Admissibility Gate. |
Docs-only extraction from #23637. Skipped lockfile/workflow changes that conflicted.
Adds the foundational category documents for Summit Cognitive GTM positioning.
🤖 Generated with Claude Code
Summary by CodeRabbit