Summary
Implement selective context compression to reduce token usage while preserving important information for generation.
Background
The Python REFRAG implementation uses a compress-select-expand pipeline: chunk passages into fixed-size segments, compute importance via query similarity, expand only top-p% chunks, and compress the rest with LLM summarization.
Reference: refrag_ollama.py:610-625 (REFRAGOllama.compress_and_select)
Features
Chunking
- Split passages into k-token chunks (default: k=64)
- Use GPT-2 tokenizer for chunking (lightweight, consistent)
Importance Scoring
- Encode chunks with query context
- Compute cosine similarity between chunk and query
- Rank chunks by importance
Selective Expansion
- Expand top p% of chunks (default: p=0.25, i.e., 25%)
- Compress low-importance chunks with LLM (Claude Haiku or similar)
- Build compressed context string
Performance
- Reduces context from ~10k tokens to ~2k tokens
- Preserves most important information
- Enables longer generation with limited context windows
Implementation Tasks
API Design
type CompressionOptions struct {
ChunkSize int // Chunk size in tokens (default: 64)
SelectionRatio float64 // Fraction of chunks to expand (default: 0.25)
CompressModel string // LLM model for compression (e.g., "claude-haiku")
}
type HybridIndex struct {
// ... existing fields ...
}
func (h *HybridIndex) CompressContext(results []*SearchResult, query string, opts *CompressionOptions) (string, error)
Benefits
- 5x reduction in context tokens (10k → 2k)
- Enables longer generation with limited context
- Preserves query-relevant information
- Proven effective in Python REFRAG
Summary
Implement selective context compression to reduce token usage while preserving important information for generation.
Background
The Python REFRAG implementation uses a compress-select-expand pipeline: chunk passages into fixed-size segments, compute importance via query similarity, expand only top-p% chunks, and compress the rest with LLM summarization.
Reference:
refrag_ollama.py:610-625(REFRAGOllama.compress_and_select)Features
Chunking
Importance Scoring
Selective Expansion
Performance
Implementation Tasks
API Design
Benefits