Merge Wikipedia articles across languages into one comprehensive, source-attributed page.
Wikipedia articles vary dramatically across languages. A politician's English page might have 3 references while the French version has 25. A scientist's Hindi page might cover their early life in detail while English focuses on achievements. wikifuse merges these perspectives into a single, richer article with full source attribution.
pip install wikifuse
# Compare English-only vs merged English+French for Rachida Dati
wikifuse diff --qid Q27182 --base en --compare en,fr --out ./rachida_dati/ --no-llmExample output:
$ wikifuse diff --qid Q27182 --base en --compare en,fr --out ./rachida_dati/ --no-llm
Base (en only): 3,245 words, 12 references
Merged (en+fr): 5,891 words, 47 references
Gain: +81% words, +292% references
See example diff output comparing Rachida Dati's English vs English+French articles.
Shows what you gain by merging across languages:
wikifuse diff --qid Q27182 --base en --compare en,fr --out ./output/wikifuse fetch --qid Q1058 --languages en,hi --out ./out/Q1058wikifuse merge --qid Q1058 --languages en,hi --out ./out/Q1058wikifuse render --ir ./out/Q1058/wikifuse.ir.json --out ./out/Q1058/wikifuse.wikitextwikifuse preview --ir ./out/Q1058/wikifuse.ir.json --out ./out/Q1058/preview.html- Fetch: Download articles from multiple language Wikipedias using Wikidata QID
- Translate: Non-English text translated to English for alignment
- Align: Sentence embeddings cluster semantically similar claims
- Merge: Deduplicate while preserving unique content and references
- Render: Output wikitext or HTML with full provenance
wikifuse.ir.json- Intermediate Representation with sections, claims, and attributionwikifuse.wikitext- MediaWiki wikitext ready for reviewpreview.html- HTML previewdiff.html- Side-by-side comparison (fromdiffcommand)
# wikifuse.yaml
qid: Q1058
languages: [en, hi]
base_language: en
max_refs_per_claim: 3
emit: [ir, wikitext, html]pip install wikifuseFor LLM-powered merging (uses OpenAI):
pip install wikifuse
export OPENAI_API_KEY=your-key
wikifuse merge --qid Q1058 --languages en,hi --out ./output/Without LLM (basic text merge):
wikifuse merge --qid Q1058 --languages en,hi --out ./output/ --no-llm- Wikipedia text is CC BY-SA 4.0; remixes must include attribution
- Generated
ATTRIBUTION.mdincludes source language and revision IDs - Wikidata statements are under compatible open licenses
Issues and PRs welcome. Focus areas:
- Enhanced translation service integration
- Better cross-lingual alignment models
- Performance optimization for large articles
MIT