Add selective context compression for RAG generation

## Summary
Implement selective context compression to reduce token usage while preserving important information for generation.

## Background
The Python REFRAG implementation uses a compress-select-expand pipeline: chunk passages into fixed-size segments, compute importance via query similarity, expand only top-p% chunks, and compress the rest with LLM summarization.

Reference: `refrag_ollama.py:610-625` (REFRAGOllama.compress_and_select)

## Features

### Chunking
- Split passages into k-token chunks (default: k=64)
- Use GPT-2 tokenizer for chunking (lightweight, consistent)

### Importance Scoring
- Encode chunks with query context
- Compute cosine similarity between chunk and query
- Rank chunks by importance

### Selective Expansion
- Expand top p% of chunks (default: p=0.25, i.e., 25%)
- Compress low-importance chunks with LLM (Claude Haiku or similar)
- Build compressed context string

### Performance
- Reduces context from ~10k tokens to ~2k tokens
- Preserves most important information
- Enables longer generation with limited context windows

## Implementation Tasks
- [ ] Add tokenizer for chunking (use tiktoken or similar)
- [ ] Implement chunk importance scoring
- [ ] Add LLM compression for low-importance chunks
- [ ] Integrate with HybridIndex API
- [ ] Add compression metrics/logging

## API Design
```go
type CompressionOptions struct {
    ChunkSize      int     // Chunk size in tokens (default: 64)
    SelectionRatio float64 // Fraction of chunks to expand (default: 0.25)
    CompressModel  string  // LLM model for compression (e.g., "claude-haiku")
}

type HybridIndex struct {
    // ... existing fields ...
}

func (h *HybridIndex) CompressContext(results []*SearchResult, query string, opts *CompressionOptions) (string, error)
```

## Benefits
- 5x reduction in context tokens (10k → 2k)
- Enables longer generation with limited context
- Preserves query-relevant information
- Proven effective in Python REFRAG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add selective context compression for RAG generation #3

Summary

Background

Features

Chunking

Importance Scoring

Selective Expansion

Performance

Implementation Tasks

API Design

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add selective context compression for RAG generation #3

Description

Summary

Background

Features

Chunking

Importance Scoring

Selective Expansion

Performance

Implementation Tasks

API Design

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions