These notebooks are from CodeCut. CodeCut features open-source Python data science tools explained in clear, digestible tutorials. Subscribe to get:
- Weekly articles with step-by-step guides
- Newsletters 3x per week (2-minute digests)
This repository contains 45+ comprehensive technical articles covering data science, MLOps, and AI tools.
Here are some examples of what you'll find in this repository:
Data Engineering
- PySpark SQL - DataFrames, window functions, aggregations
- DuckDB - Fast analytical queries for data scientists
- DVC - Data versioning and experiment tracking
- Delta Lake - Production lakehouses with delta-rs
Machine Learning
- Bayesian Optimization - Efficient hyperparameter tuning
- MLflow - RAG evaluation and experiment tracking
- pytest for Data Scientists - Testing ML pipelines
LLM Applications
- LangChain + Ollama - Private AI workflows
- Pydantic AI - Type-safe LLM applications
- RAG Pipelines - Intelligent QA systems
- pgvector - Vector search for embeddings
Data Visualization
- Python Visualization Libraries - Matplotlib, Plotly, Seaborn comparison
- Manim - Mathematical animations like 3Blue1Brown
Data Utilities
- Faker - Generate realistic test data
- PRegEx - Human-readable regex patterns
- Loguru - Simplified Python logging
- Hydra - Configuration management
Prerequisites: Python 3.9+
Quick Start:
# Clone repository
git clone https://github.com/khuyentran1401/codecut-blog.git
cd codecut-blog
# Install dependencies (listed at top of each notebook)
pip install package1 package2Use UV for faster installs: uv pip install package1 package2
All articles are copyright � Khuyen Tran. Code examples within articles are MIT licensed for reuse.