Skip to content
@catalyst-cooperative

Catalyst Cooperative

Catalyst is a small data engineering cooperative working on electricity regulation and climate change.

Catalyst Cooperative is a data engineering and analysis consultancy, specializing in energy system and utility financial data. Our current focus is on the US electricity and natural gas sectors. We primarily serve non-profit organizations, academic researchers, journalists, climate policy advocates, public policymakers, and occasionally smaller business users.

We believe public data should be freely available and easy to use by those working in the public interest. Whenever possible, we release our software under the MIT License, and our data products under the Creative Commons Attribution 4.0 License

If you're interested in hiring us email hello@catalyst.coop. We can often make acommodations for smaller/grassroots organizations and frequently collaborate with open source contributors.

Contact Us 💌

Services We Provide

  • Programmatic acquisition, cleaning, and integration of public data sources.
  • Data-oriented software development.
  • Compilation of new machine-readable data sources from regulatory filings, legislation, and other public information.
  • Data warehousing and dashboard development.
  • Both ad-hoc and replicable production data analysis.
  • Translation of existing ad-hoc data wrangling workflows into replicable data pipelines written in Python.
  • Reproducible data pipeline design, implementation, and ongoing maintenance.

Tools We Use 🔨 🔧

  • Python is our primary language for everything.
  • Pandas the swiss army knife of tabular data manipulation in Python.
  • Dagster for orchestrating and parallelizing our data pipelines.
  • DuckDB as a performant, columnar, analysis oriented embedded database. The SQLite of analytical databases.
  • Flask for building web-apps like the PUDL Data Viewer
  • Pixi, a fast, ergonomic conda package management command line tool.
  • Marimo Notebooks for interactive dashboads and data exploration.
  • Polars Dataframes for working with larger data tables that don't fit into memory, or are computationally intensive.
  • Apache Parquet to persist larger data tables to disk.
  • Pydantic for managing and validating settings and our collection of metadata.
  • Pandera to specifiy dataframe schemas and data validations in conjunction with Dagster.
  • Pyodide to let users access and play with our data in-browser.
  • SQLite for local storage and distribution of tabular, relational data.
  • JupyterLab for interactive data wrangling, exploration, and visualizations.
  • Scikit Learn to construct machine learning pipelines.
  • Splink for fast, generalized entity matching / record linkage.
  • MLFlow for ML experiment and artifact tracking, mostly in the context of our entity matching / record linkage work.
  • Google Batch to minimize the infrastructure we need to manage for our nightly builds.
  • Hypothesis for more robust data-oriented unit testing.
  • Zenodo provides long-term, programmatically accessible, versioned archives of all our raw inputs.
  • Sphinx for building our documentation, incorporating much of our structured metadata directly using Jinja templates.
  • The Frictionless Framework as a standard interchange model for tabular data.
  • VS Code is our primary main code editor, ever more deeply integrated with GitHub.
  • pre-commit to enforce code formatting and style standards.
  • GitHub Actions to run our continuous integration and coordinate our nightly builds and data scraping jobs.

Tools We're Studying 🚧

  • Agent Skills to give LLM-based coding agents dynamic, specialized context.
  • Zensical a beautiful, blazing fast static site generator written in Rust.
  • OpenSearch for processing, indexing, and programmatically managing large troves of unstructrured documents.
  • HuggingFace Hub as another platform for distributing larger datasets and pre-trained machine-learning models specific to energy system data.

Adjacent Projects 🧠

Organizational Friends & Allies 💞

Funders & Clients 💰 💵

Business & Employment 🌲 🌲

Catalyst is a democratic workplace and a member of the US Federation of Worker Cooperatives. We exist to help our members earn a decent living while working for a more just, livable, and sustainable world. Our income comes from a mix of grant funding and client work. We only work with mission-aligned clients.

We are an entirely remote organization, and have been since well before the coronavirus pandemic. Our members are scattered all across North America from Alaska to Mexico. We enjoy a great deal of autonomy and flexibility in determining our own work-life balance and schedules. Membership entails working a minimum of 1000 hours each year for the co-op.

As a small 100% employee-owned cooperative, we are able to compensate members through an unusual mix of wages and profit sharing, including:

  • An hourly wage (currently $36.75/hr)
  • Tax-deferred employer retirement plan contributions (proportional to wages, up to 25% of wages)
  • Tax-advantaged patronage dividends (proportional to hours worked, unlimited but subject to profitability)

We also reimburse ourselves for expenses related to maintaining a home office, and provide a monthly health insurance stipend.

Candidates must do at least 500 hours of contract work for the cooperative within over six months, at which point they will be considered for membership.

Check our website to see if we're recruiting new members.

Pinned Loading

  1. pudl pudl Public

    The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

    Python 579 133

  2. ferc-xbrl-extractor ferc-xbrl-extractor Public

    A tool for converting FERC filings published in XBRL into SQLite databases

    Python 16 3

  3. pudl-archiver pudl-archiver Public

    A tool for capturing snapshots of public data sources and archiving them on Zenodo for programmatic use.

    Python 14 6

  4. pudl-examples pudl-examples Public

    Example Jupyter notebooks hosted on Kaggle that demonstrate how to work with US energy data from PUDL.

    Jupyter Notebook 20 5

  5. catalystcoop-handbook catalystcoop-handbook Public

    A readthedocs site containing Catalyst Cooperative policies.

    Python 2 2

Repositories

Showing 10 of 82 repositories

Top languages

Loading…

Most used topics

Loading…