Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions docs/cognitive-security/manifesto.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
version: 1.0.0
date: 2026-03-31
status: draft
---

# Cognitive Security Manifesto

## The Category Thesis

Artificial Intelligence has fundamentally altered the security landscape. Traditional cybersecurity focused on protecting systems from unauthorized access, but it is ill-equipped to handle systems that can reason, synthesize, and act autonomously. AI safety is broken because it relies on probabilistic alignment rather than structural guarantees. We do not need better vibes; we need an epistemic immune system. Cognitive security is not just about stopping prompt injection—it is about ensuring the structural integrity of machine reasoning.

## Core Category Sentence

We secure the structural integrity of AI reasoning to prevent cognitive failure, enforcing admissibility and determinism over probabilistic alignment.

## The Paradigm Shift: Admissibility Over Alignment

The prevailing paradigm of AI safety attempts to teach models to behave well through reinforcement learning from human feedback (RLHF). This is fundamentally flawed. It treats symptoms rather than causes, attempting to patch a leaky boat with polite suggestions.

We advocate a shift from probabilistic alignment to deterministic admissibility.

1. **Verification Before Execution:** AI outputs must not be trusted implicitly. They must be validated against deterministic cognitive structures.

2. **Epistemic Integrity:** The provenance, lineage, and structural soundness of information entering and leaving an AI system must be unbroken.

3. **Graph-Based Grounding:** Reality is relational. Our security models must enforce relational invariants through Reality Graphs, Belief Graphs, and Narrative Graphs.

## The Messaging Hierarchy

To build a true Cognitive Security posture, organizations must adopt three core pillars:

### 1. Structural Admissibility Gates

AI systems must operate behind Admissibility Gates that enforce cryptographic and structural validation of all context and outputs. If a cognitive packet cannot be verified against the Reality Graph, it is quarantined.

### 2. The Cognitive Security Protocol (CSP)

A standardized, machine-readable schema for defining what constitutes valid cognition within an enterprise. The CSP acts as the foundational constitution that all AI agents must strictly adhere to.

### 3. Quarantine and Subsumption

When cognitive failure occurs—whether through external attack or internal stochastic drift—the system must isolate the anomaly in a Quarantine Graph. From there, human and automated operators analyze, subsume, and integrate the failure to inoculate the broader Epistemic Immune System.

## The Path Forward

We stand at the precipice of cognitive automation. The systems we build today will define the epistemics of tomorrow. We must stop relying on the black-box promises of model providers and start engineering deterministic cognitive constraints. Security is no longer just about the network; it is about the mind of the machine.
77 changes: 77 additions & 0 deletions docs/cognitive-security/threat-taxonomy-v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
version: 1.0.0
date: 2026-03-31
status: draft
---

# Cognitive Security Threat Taxonomy v1

This taxonomy defines the four Canonical Failure classes in Cognitive Security.

## 1. Corrupted Cognition

**Definition:**
Occurs when the input context, prompt structures, or reasoning chains are maliciously manipulated or poisoned, leading the AI system to process invalid or hijacked epistemics.

**Sub-types:**

* Context Poisoning
* Prompt Injection
* Instruction Hijacking

**Examples:**

* An attacker embeds a hidden prompt injection payload inside a seemingly benign PDF resume, causing the HR parsing AI to output a recommendation for hire regardless of qualifications.
* A user adds invisible text to a webpage that instructs a summarization agent to exfiltrate private session tokens via markdown image links.
* A third-party API dependency returns intentionally hallucinated JSON that exploits the parser's loose schema, causing downstream logic errors in an autonomous agent.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrase "intentionally hallucinated" is conceptually contradictory. Hallucinations in the context of AI are typically stochastic, unintentional errors in reasoning or grounding. If a third-party API is providing deceptive data specifically to exploit a system, it is more accurately described as "maliciously crafted" or "poisoned" data, which aligns better with the definition of "Corrupted Cognition" provided in this section.

Suggested change
* A third-party API dependency returns intentionally hallucinated JSON that exploits the parser's loose schema, causing downstream logic errors in an autonomous agent.
* A third-party API dependency returns maliciously crafted JSON that exploits the parser's loose schema, causing downstream logic errors in an autonomous agent.


## 2. Non-Compliant Cognition

**Definition:**
Occurs when the AI system generates outputs or takes actions that violate established enterprise policies, guardrails, or regulatory frameworks, despite operating on non-corrupted inputs.

**Sub-types:**

* Guardrail Evasion
* Policy Bypass
* Regulatory Infraction

**Examples:**

* A financial advisory agent provides specific, actionable stock trading advice despite a strict system prompt forbidding financial recommendations, due to a highly persuasive conversational turn.
* A customer service bot reveals a hidden discount code meant only for internal employees because the user asked it to roleplay as a developer testing the system.
* An AI tool generates code that includes a hardcoded secret or vulnerability, violating the organization's secure coding standards.

## 3. Non-Reproducible Cognition

**Definition:**
Occurs when an AI system produces non-deterministic, heavily drifted, or hallucinated outputs that cannot be reliably reproduced or traced back to grounded facts.

**Sub-types:**

* Stochastic Drift
* Hallucination Cascades
* Contextual Amnesia

**Examples:**

* A legal analysis bot invents a non-existent legal precedent (hallucination) and uses it as the foundational argument for all subsequent case analysis in the session.
* An agent running the same evaluation task on the same data returns three wildly different summarization metrics across three separate runs.
* A multi-agent system progressively loses track of its original objective over a long context window, leading to an emergent, off-topic loop.

## 4. Non-Admissible Cognition

**Definition:**
Occurs when the output fails structural, schema, or relational validation checks defined by the Cognitive Security Protocol (CSP), resulting in the rejection of the data packet by the Admissibility Gates.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To maintain consistency with the "Quarantine and Subsumption" pillar defined in the Cognitive Security Manifesto (line 41), the result of an admissibility failure should be described as "quarantine" rather than "rejection". This reinforces the framework's emphasis on isolating and analyzing failures rather than simply dropping them.

Suggested change
Occurs when the output fails structural, schema, or relational validation checks defined by the Cognitive Security Protocol (CSP), resulting in the rejection of the data packet by the Admissibility Gates.
Occurs when the output fails structural, schema, or relational validation checks defined by the Cognitive Security Protocol (CSP), resulting in the quarantine of the data packet by the Admissibility Gates.


**Sub-types:**

* Schema Violation
* Relational Inconsistency
* Unverified Epistemics

**Examples:**

* An agent outputs a JSON response missing a required evidence ID field, causing the data to be rejected by the WriteSet firewall.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term "WriteSet firewall" is introduced here without being defined or mentioned in the Manifesto. For terminological consistency across the documentation, this should refer to the "Admissibility Gate". Additionally, using "quarantined" instead of "rejected" aligns with the core pillars of the Cognitive Security posture.

Suggested change
* An agent outputs a JSON response missing a required evidence ID field, causing the data to be rejected by the WriteSet firewall.
* An agent outputs a JSON response missing a required evidence ID field, causing the data to be quarantined by the Admissibility Gate.

* A knowledge graph generator asserts a relationship between two entities that explicitly contradicts a verified invariant in the central Reality Graph.
* A data extraction pipeline submits an event with a timestamp that predates the creation of the system, violating temporal constraints in the Admissibility Gate.
Loading