diff --git a/.changeset/config.json b/.changeset/config.json index 18d149766d..9a36f6a6c4 100644 --- a/.changeset/config.json +++ b/.changeset/config.json @@ -6,6 +6,7 @@ ["myst-common", "myst-config", "myst-frontmatter", "myst-spec-ext"], ["myst-to-jats", "jats-to-myst"], ["myst-to-tex", "tex-to-myst"], + ["myst-to-md", "myst-to-ipynb"], ["myst-parser", "myst-roles", "myst-directives", "myst-to-html"], ["mystmd", "myst-cli", "myst-migrate"] ], diff --git a/.changeset/witty-tigers-hunt.md b/.changeset/witty-tigers-hunt.md new file mode 100644 index 0000000000..c0b0a78a0d --- /dev/null +++ b/.changeset/witty-tigers-hunt.md @@ -0,0 +1,7 @@ +--- +"myst-frontmatter": patch +"myst-to-ipynb": patch +"myst-cli": patch +--- + +Add ipynb as export format diff --git a/docs/creating-notebooks.md b/docs/creating-notebooks.md new file mode 100644 index 0000000000..19573ec181 --- /dev/null +++ b/docs/creating-notebooks.md @@ -0,0 +1,117 @@ +--- +title: Creating Jupyter Notebooks +description: Export MyST documents to Jupyter Notebook (.ipynb) format with optional CommonMark markdown and embedded images. +--- + +You can export MyST documents to Jupyter Notebook (`.ipynb`) format using `myst build`. The exported notebooks can use either MyST markdown (for use with [jupyterlab-myst](https://github.com/jupyter-book/jupyterlab-myst)) or plain CommonMark markdown compatible with vanilla Jupyter Notebook, JupyterLab, and Google Colab. + +## Basic usage + +Add an `exports` entry with `format: ipynb` to your page frontmatter: + +```{code-block} yaml +:filename: my-document.md +--- +exports: + - format: ipynb + output: exports/my-document.ipynb +--- +``` + +Build the notebook with: + +```bash +myst build my-document.md --ipynb +``` + +Or build all ipynb exports in the project: + +```bash +myst build --ipynb +``` + +## CommonMark markdown + +By default, exported notebooks use MyST markdown in their cells. If you need compatibility with environments that don't support MyST (vanilla Jupyter, Colab, etc.), set `markdown: commonmark`: + +```{code-block} yaml +:filename: my-document.md +--- +exports: + - format: ipynb + markdown: commonmark + output: exports/my-document.ipynb +--- +``` + +With `markdown: commonmark`, MyST-specific syntax is converted to plain CommonMark equivalents: + +```{list-table} CommonMark conversions +:header-rows: 1 +- * MyST syntax + * CommonMark output +- * `:::{note}` admonitions + * `> **Note**` blockquotes +- * `` {math}`E=mc^2` `` roles + * `$E=mc^2$` dollar math +- * `$$` math blocks + * `$$...$$` (preserved) +- * `:::{exercise}` directives + * **Exercise N** bold headers +- * `:::{proof:theorem}` directives + * **Theorem N** bold headers +- * Figures with captions + * `![alt](url)` with italic caption +- * Tab sets + * Bold tab titles with content +- * `{image}` directives + * `![alt](url)` images +- * `(label)=` targets + * Dropped (no CommonMark equivalent) +- * `% comments` + * Dropped +``` + +## Embedding images as cell attachments + +By default, images in exported notebooks reference external files. To create fully self-contained notebooks with images embedded as base64 cell attachments, set `images: attachment`: + +```{code-block} yaml +:filename: my-document.md +--- +exports: + - format: ipynb + markdown: commonmark + images: attachment + output: exports/my-document.ipynb +--- +``` + +With `images: attachment`: +- Local images are read from disk and base64-encoded +- Image references become `![alt](attachment:filename.png)` +- Each cell includes an `attachments` field with the image data +- Remote images (http/https URLs) are left as references + +This is useful for distributing notebooks, uploading to Google Colab, or sharing via email where external image files may not be available. + +## Export options + +```{list-table} ipynb export options +:header-rows: 1 +- * Option + * Values + * Description +- * `format` + * `ipynb` + * Required — specifies notebook export +- * `output` + * string + * Output filename or folder +- * `markdown` + * `myst` (default), `commonmark` + * Markdown format for notebook cells +- * `images` + * `reference` (default), `attachment` + * How to handle images — references or embedded attachments +``` diff --git a/docs/documents-exports.md b/docs/documents-exports.md index dcffdd55a6..2dacf30152 100644 --- a/docs/documents-exports.md +++ b/docs/documents-exports.md @@ -1,6 +1,6 @@ --- title: Exporting overview -description: Create an export for PDF, LaTeX, Typst, Docx, JATS, or CITATION.cff in your page or project frontmatter, and use `myst build` to build the export. +description: Create an export for PDF, LaTeX, Typst, Docx, JATS, Jupyter Notebook (ipynb), or CITATION.cff in your page or project frontmatter, and use `myst build` to build the export. --- You can export MyST content into one or more static documents, and optionally bundle them with a MyST website. This section gives an overview of the Exporting process and major configuration options. @@ -29,6 +29,8 @@ Below are supported export types and links to documentation for further reading: * [](./creating-citation-cff.md) - * `MyST Markdown` * [](#export:myst) +- * `Jupyter Notebook` + * [](./creating-notebooks.md) ``` ## Where to configure options for exports @@ -127,6 +129,9 @@ You can configure the CLI command in a number of ways: `myst build --pdf --docx` : Build `pdf` (LaTeX or Typst) exports and `docx` in the project +`myst build --ipynb` +: Build `ipynb` (Jupyter Notebook) exports in the project + `myst build my-paper.md` : Build all exports in a specific page diff --git a/docs/frontmatter.md b/docs/frontmatter.md index b5d359dd3f..1066fcb8e5 100644 --- a/docs/frontmatter.md +++ b/docs/frontmatter.md @@ -438,7 +438,7 @@ For usage information, see [](./documents-exports.md). * - `id` - a string - a local identifier that can be used to reference the export * - `format` - - one of `pdf` (built with $\LaTeX$ or Typst, depending on the template), `tex` (raw $\LaTeX$ files), `pdf+tex` (both PDF and raw $\LaTeX$ files) `typst` (raw Typst files and built PDF file), `docx`, `md`, `jats`, or `meca` + - one of `pdf` (built with $\LaTeX$ or Typst, depending on the template), `tex` (raw $\LaTeX$ files), `pdf+tex` (both PDF and raw $\LaTeX$ files) `typst` (raw Typst files and built PDF file), `docx`, `md`, `jats`, `meca`, or `ipynb` * - `template` - a string - name of an existing [MyST template](https://github.com/myst-templates) or a local path to a template folder. Templates are only available for `pdf`, `tex`, `typst`, and `docx` formats. * - `output` diff --git a/docs/myst.yml b/docs/myst.yml index d89e1de0fe..c02e17de15 100644 --- a/docs/myst.yml +++ b/docs/myst.yml @@ -122,6 +122,7 @@ project: - file: creating-word-documents.md - file: creating-jats-xml.md - file: creating-citation-cff.md + - file: creating-notebooks.md - file: plugins.md children: - file: javascript-plugins.md diff --git a/packages/myst-cli/package.json b/packages/myst-cli/package.json index 7ea330b5e6..40b781e9b2 100644 --- a/packages/myst-cli/package.json +++ b/packages/myst-cli/package.json @@ -88,6 +88,7 @@ "myst-spec-ext": "^1.9.5", "myst-templates": "^1.0.27", "myst-to-docx": "^1.0.16", + "myst-to-ipynb": "^1.0.15", "myst-to-jats": "^1.0.35", "myst-to-md": "^1.0.16", "myst-to-tex": "^1.0.45", diff --git a/packages/myst-cli/src/build/build.spec.ts b/packages/myst-cli/src/build/build.spec.ts index d3151a0812..7ce7bdd3c7 100644 --- a/packages/myst-cli/src/build/build.spec.ts +++ b/packages/myst-cli/src/build/build.spec.ts @@ -36,6 +36,7 @@ describe('get export formats', () => { ExportFormats.tex, ExportFormats.xml, ExportFormats.md, + ExportFormats.ipynb, ExportFormats.meca, ExportFormats.cff, ]); diff --git a/packages/myst-cli/src/build/build.ts b/packages/myst-cli/src/build/build.ts index d8a5154cbc..f699126a24 100644 --- a/packages/myst-cli/src/build/build.ts +++ b/packages/myst-cli/src/build/build.ts @@ -26,6 +26,7 @@ type FormatBuildOpts = { typst?: boolean; xml?: boolean; md?: boolean; + ipynb?: boolean; meca?: boolean; cff?: boolean; html?: boolean; @@ -37,8 +38,8 @@ type FormatBuildOpts = { export type BuildOpts = FormatBuildOpts & CollectionOptions & RunExportOptions & StartOptions; export function hasAnyExplicitExportFormat(opts: BuildOpts): boolean { - const { docx, pdf, tex, typst, xml, md, meca, cff } = opts; - return docx || pdf || tex || typst || xml || md || meca || cff || false; + const { docx, pdf, tex, typst, xml, md, ipynb, meca, cff } = opts; + return docx || pdf || tex || typst || xml || md || ipynb || meca || cff || false; } /** @@ -50,12 +51,13 @@ export function hasAnyExplicitExportFormat(opts: BuildOpts): boolean { * @param opts.typst * @param opts.xml * @param opts.md + * @param opts.ipynb * @param opts.meca * @param opts.all all exports requested with --all option * @param opts.explicit explicit input file was provided */ export function getAllowedExportFormats(opts: FormatBuildOpts & { explicit?: boolean }) { - const { docx, pdf, tex, typst, xml, md, meca, cff, all, explicit } = opts; + const { docx, pdf, tex, typst, xml, md, ipynb, meca, cff, all, explicit } = opts; const formats = []; const any = hasAnyExplicitExportFormat(opts); const override = all || (!any && explicit); @@ -69,6 +71,7 @@ export function getAllowedExportFormats(opts: FormatBuildOpts & { explicit?: boo if (typst || override) formats.push(ExportFormats.typst); if (xml || override) formats.push(ExportFormats.xml); if (md || override) formats.push(ExportFormats.md); + if (ipynb || override) formats.push(ExportFormats.ipynb); if (meca || override) formats.push(ExportFormats.meca); if (cff || override) formats.push(ExportFormats.cff); return [...new Set(formats)]; @@ -78,7 +81,7 @@ export function getAllowedExportFormats(opts: FormatBuildOpts & { explicit?: boo * Return requested formats from CLI options */ export function getRequestedExportFormats(opts: FormatBuildOpts) { - const { docx, pdf, tex, typst, xml, md, meca, cff } = opts; + const { docx, pdf, tex, typst, xml, md, ipynb, meca, cff } = opts; const formats = []; if (docx) formats.push(ExportFormats.docx); if (pdf) formats.push(ExportFormats.pdf); @@ -86,6 +89,7 @@ export function getRequestedExportFormats(opts: FormatBuildOpts) { if (typst) formats.push(ExportFormats.typst); if (xml) formats.push(ExportFormats.xml); if (md) formats.push(ExportFormats.md); + if (ipynb) formats.push(ExportFormats.ipynb); if (meca) formats.push(ExportFormats.meca); if (cff) formats.push(ExportFormats.cff); return formats; @@ -239,7 +243,8 @@ export async function build(session: ISession, files: string[], opts: BuildOpts) // Print out the kinds that are filtered const kinds = Object.entries(opts) .filter( - ([k, v]) => ['docx', 'pdf', 'tex', 'typst', 'xml', 'md', 'meca', 'cff'].includes(k) && v, + ([k, v]) => + ['docx', 'pdf', 'tex', 'typst', 'xml', 'md', 'ipynb', 'meca', 'cff'].includes(k) && v, ) .map(([k]) => k); session.log.info( diff --git a/packages/myst-cli/src/build/ipynb/index.ts b/packages/myst-cli/src/build/ipynb/index.ts new file mode 100644 index 0000000000..dc675ef934 --- /dev/null +++ b/packages/myst-cli/src/build/ipynb/index.ts @@ -0,0 +1,123 @@ +import fs from 'node:fs'; +import path from 'node:path'; +import mime from 'mime-types'; +import { tic, writeFileToFolder } from 'myst-cli-utils'; +import { FRONTMATTER_ALIASES, PAGE_FRONTMATTER_KEYS } from 'myst-frontmatter'; +import { writeIpynb } from 'myst-to-ipynb'; +import type { IpynbOptions, ImageData } from 'myst-to-ipynb'; +import { filterKeys } from 'simple-validators'; +import { selectAll } from 'unist-util-select'; +import { VFile } from 'vfile'; +import { finalizeMdast } from '../../process/mdast.js'; +import type { ISession } from '../../session/types.js'; +import { logMessagesFromVFile } from '../../utils/logging.js'; +import { KNOWN_IMAGE_EXTENSIONS } from '../../utils/resolveExtension.js'; +import type { ExportWithOutput, ExportFnOptions } from '../types.js'; +import { cleanOutput } from '../utils/cleanOutput.js'; +import { getFileContent } from '../utils/getFileContent.js'; +import { getSourceFolder } from '../../transforms/links.js'; + +export async function runIpynbExport( + session: ISession, + sourceFile: string, + exportOptions: ExportWithOutput, + opts?: ExportFnOptions, +) { + const toc = tic(); + const { output, articles } = exportOptions; + const { clean, projectPath, extraLinkTransformers, execute } = opts ?? {}; + // At this point, export options are resolved to contain one-and-only-one article + const article = articles[0]; + if (!article?.file) return { tempFolders: [] }; + if (clean) cleanOutput(session, output); + const [{ mdast, frontmatter }] = await getFileContent(session, [article.file], { + projectPath, + imageExtensions: KNOWN_IMAGE_EXTENSIONS, + extraLinkTransformers, + preFrontmatters: [ + filterKeys(article, [...PAGE_FRONTMATTER_KEYS, ...Object.keys(FRONTMATTER_ALIASES)]), + ], + execute, + }); + await finalizeMdast(session, mdast, frontmatter, article.file, { + imageWriteFolder: path.join(path.dirname(output), 'files'), + imageAltOutputFolder: 'files/', + imageExtensions: KNOWN_IMAGE_EXTENSIONS, + simplifyFigures: false, + useExistingImages: true, + }); + const vfile = new VFile(); + vfile.path = output; + // Build ipynb options from export config + const ipynbOpts: IpynbOptions = {}; + if ((exportOptions as any).markdown === 'commonmark') { + ipynbOpts.markdown = 'commonmark'; + } + if ((exportOptions as any).images === 'attachment') { + ipynbOpts.images = 'attachment'; + // Collect image data from the AST — read files and base64-encode + ipynbOpts.imageData = collectImageData(session, mdast, article.file); + } + const mdOut = writeIpynb(vfile, mdast as any, frontmatter, ipynbOpts); + logMessagesFromVFile(session, mdOut); + session.log.info(toc(`📓 Exported IPYNB in %s, copying to ${output}`)); + writeFileToFolder(output, mdOut.result as string); + return { tempFolders: [] }; +} + +/** + * Collect base64-encoded image data from the mdast tree (Phase 1 of attachment embedding). + * + * Walks all image nodes via `selectAll('image', mdast)`, resolves their + * filesystem paths using `getSourceFolder` (handles both absolute `/_static/...` + * and relative paths), reads the files, and base64-encodes them into a map. + * + * The returned `Record` is passed to `writeIpynb` as + * `options.imageData`. Phase 2 (in `embedImagesAsAttachments`) then rewrites + * the serialized markdown to use `attachment:` references. + * + * Remote URLs (http/https) and data URIs are skipped — only local files are embedded. + */ +function collectImageData( + session: ISession, + mdast: any, + sourceFile: string, +): Record { + const imageData: Record = {}; + const imageNodes = selectAll('image', mdast) as any[]; + const sourcePath = session.sourcePath(); + + for (const img of imageNodes) { + const url = img.url ?? img.urlSource; + if ( + !url || + url.startsWith('http://') || + url.startsWith('https://') || + url.startsWith('data:') + ) { + continue; + } + if (imageData[url]) continue; // already processed + + const sourceFolder = getSourceFolder(url, sourceFile, sourcePath); + const relativeUrl = url.replace(/^[/\\]+/, ''); + const filePath = path.join(sourceFolder, relativeUrl); + + try { + if (!fs.existsSync(filePath)) { + session.log.debug(`Image not found for attachment embedding: ${filePath}`); + continue; + } + const buffer = fs.readFileSync(filePath); + const mimeType = (mime.lookup(filePath) || 'application/octet-stream') as string; + imageData[url] = { + mime: mimeType, + data: buffer.toString('base64'), + }; + } catch (err) { + session.log.debug(`Failed to read image for attachment: ${filePath}`); + } + } + + return imageData; +} diff --git a/packages/myst-cli/src/build/utils/collectExportOptions.ts b/packages/myst-cli/src/build/utils/collectExportOptions.ts index da0393fcd6..596d2baefe 100644 --- a/packages/myst-cli/src/build/utils/collectExportOptions.ts +++ b/packages/myst-cli/src/build/utils/collectExportOptions.ts @@ -271,6 +271,7 @@ export function resolveArticles( export const ALLOWED_EXTENSIONS: Record = { [ExportFormats.docx]: ['.doc', '.docx'], [ExportFormats.md]: ['.md'], + [ExportFormats.ipynb]: ['.ipynb'], [ExportFormats.meca]: ['.zip', '.meca'], [ExportFormats.pdf]: ['.pdf'], [ExportFormats.pdftex]: ['.pdf', '.tex', '.zip'], diff --git a/packages/myst-cli/src/build/utils/localArticleExport.ts b/packages/myst-cli/src/build/utils/localArticleExport.ts index ebdd25996c..d062c6465b 100644 --- a/packages/myst-cli/src/build/utils/localArticleExport.ts +++ b/packages/myst-cli/src/build/utils/localArticleExport.ts @@ -20,6 +20,7 @@ import { texExportOptionsFromPdf } from '../pdf/single.js'; import { createPdfGivenTexExport } from '../pdf/create.js'; import { runMecaExport } from '../meca/index.js'; import { runMdExport } from '../md/index.js'; +import { runIpynbExport } from '../ipynb/index.js'; import { selectors, watch as watchReducer } from '../../store/index.js'; import { runCffExport } from '../cff.js'; @@ -113,6 +114,8 @@ async function _localArticleExport( exportFn = runJatsExport; } else if (format === ExportFormats.md) { exportFn = runMdExport; + } else if (format === ExportFormats.ipynb) { + exportFn = runIpynbExport; } else if (format === ExportFormats.meca) { exportFn = runMecaExport; } else if (format === ExportFormats.cff) { diff --git a/packages/myst-cli/src/cli/build.ts b/packages/myst-cli/src/cli/build.ts index d2e2295b7d..720c6fd41a 100644 --- a/packages/myst-cli/src/cli/build.ts +++ b/packages/myst-cli/src/cli/build.ts @@ -23,6 +23,7 @@ import { makeCffOption, makeKeepHostOption, makePortOption, + makeIpynbOption, } from './options.js'; import { readableName } from '../utils/whiteLabelling.js'; @@ -37,6 +38,7 @@ export function makeBuildCommand() { .addOption(makeTypstOption('Build Typst outputs')) .addOption(makeDocxOption('Build Docx output')) .addOption(makeMdOption('Build MD output')) + .addOption(makeIpynbOption('Build IPYNB output')) .addOption(makeJatsOption('Build JATS xml output')) .addOption(makeMecaOptions('Build MECA zip output')) .addOption(makeCffOption('Build CFF output')) diff --git a/packages/myst-cli/src/cli/options.ts b/packages/myst-cli/src/cli/options.ts index abc45716a0..2c1e8d4a8c 100644 --- a/packages/myst-cli/src/cli/options.ts +++ b/packages/myst-cli/src/cli/options.ts @@ -29,6 +29,10 @@ export function makeMdOption(description: string) { return new Option('--md', description).default(false); } +export function makeIpynbOption(description: string) { + return new Option('--ipynb', description).default(false); +} + export function makeJatsOption(description: string) { return new Option('--jats, --xml', description).default(false); } diff --git a/packages/myst-frontmatter/src/exports/types.ts b/packages/myst-frontmatter/src/exports/types.ts index 0cd118a79d..b562abb4dd 100644 --- a/packages/myst-frontmatter/src/exports/types.ts +++ b/packages/myst-frontmatter/src/exports/types.ts @@ -8,6 +8,7 @@ export enum ExportFormats { docx = 'docx', xml = 'xml', md = 'md', + ipynb = 'ipynb', meca = 'meca', cff = 'cff', } diff --git a/packages/myst-frontmatter/src/exports/validators.ts b/packages/myst-frontmatter/src/exports/validators.ts index 43a58cd29b..deb8fc8eaa 100644 --- a/packages/myst-frontmatter/src/exports/validators.ts +++ b/packages/myst-frontmatter/src/exports/validators.ts @@ -61,6 +61,7 @@ export const EXT_TO_FORMAT: Record = { '.typ': ExportFormats.typst, '.typst': ExportFormats.typst, '.cff': ExportFormats.cff, + '.ipynb': ExportFormats.ipynb, }; export const RESERVED_EXPORT_KEYS = [ diff --git a/packages/myst-to-ipynb/.eslintrc.cjs b/packages/myst-to-ipynb/.eslintrc.cjs new file mode 100644 index 0000000000..76787609ad --- /dev/null +++ b/packages/myst-to-ipynb/.eslintrc.cjs @@ -0,0 +1,4 @@ +module.exports = { + root: true, + extends: ['curvenote'], +}; diff --git a/packages/myst-to-ipynb/CHANGELOG.md b/packages/myst-to-ipynb/CHANGELOG.md new file mode 100644 index 0000000000..5ec5958361 --- /dev/null +++ b/packages/myst-to-ipynb/CHANGELOG.md @@ -0,0 +1 @@ +# myst-to-ipynb diff --git a/packages/myst-to-ipynb/README.md b/packages/myst-to-ipynb/README.md new file mode 100644 index 0000000000..8d90241dd1 --- /dev/null +++ b/packages/myst-to-ipynb/README.md @@ -0,0 +1,31 @@ +# myst-to-ipynb + +Convert a MyST AST to Jupyter Notebook (`.ipynb`) format. + +Part of the [mystmd](https://github.com/jupyter-book/mystmd) monorepo. + +## Features + +- **MyST markdown** (default) — preserves MyST syntax for use with [jupyterlab-myst](https://github.com/jupyter-book/jupyterlab-myst) +- **CommonMark markdown** (`markdown: commonmark`) — converts MyST directives/roles to plain CommonMark for vanilla Jupyter, JupyterLab, and Google Colab +- **Image attachments** (`images: attachment`) — embeds local images as base64 cell attachments for self-contained notebooks + +## Usage + +Configure exports in your page frontmatter: + +```yaml +exports: + - format: ipynb + markdown: commonmark + images: attachment + output: exports/my-document.ipynb +``` + +Build with: + +```bash +myst build --ipynb +``` + +See the [Creating Jupyter Notebooks](https://mystmd.org/guide/creating-notebooks) documentation for full details. diff --git a/packages/myst-to-ipynb/package.json b/packages/myst-to-ipynb/package.json new file mode 100644 index 0000000000..329175ee7a --- /dev/null +++ b/packages/myst-to-ipynb/package.json @@ -0,0 +1,50 @@ +{ + "name": "myst-to-ipynb", + "version": "1.0.15", + "description": "Export from MyST mdast to ipynb", + "author": "Rowan Cockett ", + "homepage": "https://github.com/jupyter-book/mystmd/tree/main/packages/myst-to-ipynb", + "license": "MIT", + "type": "module", + "exports": "./dist/index.js", + "types": "./dist/index.d.ts", + "files": [ + "src", + "dist" + ], + "keywords": [ + "myst-plugin", + "markdown" + ], + "publishConfig": { + "access": "public" + }, + "repository": { + "type": "git", + "url": "git+https://github.com/jupyter-book/mystmd.git" + }, + "scripts": { + "clean": "rimraf dist", + "lint": "eslint \"src/**/*.ts\" -c .eslintrc.cjs --max-warnings 1", + "lint:format": "prettier --check src/*.ts src/**/*.ts", + "test": "vitest run", + "test:watch": "vitest watch", + "build:esm": "tsc", + "build": "npm-run-all -l clean -p build:esm" + }, + "bugs": { + "url": "https://github.com/jupyter-book/mystmd/issues" + }, + "dependencies": { + "js-yaml": "^4.1.0", + "mdast-util-gfm-footnote": "^1.0.2", + "mdast-util-gfm-table": "^1.0.7", + "mdast-util-to-markdown": "^1.5.0", + "myst-common": "^1.7.6", + "myst-frontmatter": "^1.7.6", + "myst-to-md": "^1.0.15", + "unist-util-select": "^4.0.3", + "vfile": "^5.3.7", + "vfile-reporter": "^7.0.4" + } +} diff --git a/packages/myst-to-ipynb/src/attachments.ts b/packages/myst-to-ipynb/src/attachments.ts new file mode 100644 index 0000000000..7e5149bb72 --- /dev/null +++ b/packages/myst-to-ipynb/src/attachments.ts @@ -0,0 +1,97 @@ +/** + * Image attachment embedding for ipynb export. + * + * Converts markdown image references `![alt](url)` into Jupyter cell + * attachments `![alt](attachment:name)` with base64-encoded image data + * stored in the cell's `attachments` field. + * + * This enables self-contained notebooks that don't depend on external + * image files — useful for distribution, Colab uploads, etc. + * + * Architecture (two-phase hybrid): + * + * Phase 1 — AST-driven collection (myst-cli, build/ipynb/index.ts): + * `collectImageData()` walks AST image nodes via `selectAll('image', mdast)`, + * resolves filesystem paths, reads files, and base64-encodes them into a + * `Record` map passed to `writeIpynb` as `options.imageData`. + * + * Phase 2 — Post-serialization rewriting (this module): + * `embedImagesAsAttachments()` runs AFTER `writeMd` has serialized the AST + * to a markdown string. It regex-matches `![alt](url)` patterns, looks up + * URLs in the `imageData` map, and rewrites them to `![alt](attachment:name)`. + * + * Why regex instead of AST rewriting? + * By the time we build cell attachments, `writeMd` has already consumed the AST + * and produced a markdown string. Rewriting at the AST level would require the + * transform phase to return per-cell attachment metadata alongside the tree, + * coupling the pure AST transform to notebook cell structure. The current split + * keeps `myst-to-ipynb` (pure, no filesystem) separate from `myst-cli` + * (filesystem-aware). + */ + +import type { ImageData } from './types.js'; + +/** + * Extract the basename (filename) from a URL or path. + */ +function basename(url: string): string { + // Strip query string and fragment + const clean = url.split('?')[0].split('#')[0]; + const parts = clean.split('/'); + return parts[parts.length - 1] || 'image'; +} + +/** + * Scan markdown text for image references, replace matching URLs with + * `attachment:` references, and build the cell attachments object. + * + * @param md - The markdown string to process + * @param imageData - Map of image URL → { mime, data } with base64-encoded content + * @returns Object with rewritten markdown and optional attachments dict + */ +export function embedImagesAsAttachments( + md: string, + imageData: Record, +): { md: string; attachments?: Record> } { + if (!imageData || Object.keys(imageData).length === 0) return { md }; + + const attachments: Record> = {}; + const usedNames = new Set(); + + // Match markdown image syntax: ![alt](url) and ![alt](url "title") + // Handles escaped brackets in alt text and escaped parentheses in URLs. + // The escaped sequences (\] and \)) must appear BEFORE the single-char + // alternatives so the regex engine matches them as pairs first. + const imgRegex = /!\[((?:\\\]|[^\]])*)\]\(((?:\\\)|[^)\s])+)(?:\s+"[^"]*")?\)/g; + + const updatedMd = md.replace(imgRegex, (fullMatch, alt, url) => { + // Unescape markdown characters that mdast-util-to-markdown might have added + const unescapedUrl = url.replace(/\\([()[\]])/g, '$1'); + + const data = imageData[unescapedUrl]; + if (!data) return fullMatch; + + // Generate a unique attachment name from the basename + const base = basename(unescapedUrl); + let name = base; + let counter = 1; + while (usedNames.has(name)) { + const dot = base.lastIndexOf('.'); + if (dot >= 0) { + name = `${base.slice(0, dot)}_${counter}${base.slice(dot)}`; + } else { + name = `${base}_${counter}`; + } + counter++; + } + usedNames.add(name); + + attachments[name] = { [data.mime]: data.data }; + return `![${alt}](attachment:${name})`; + }); + + if (Object.keys(attachments).length > 0) { + return { md: updatedMd, attachments }; + } + return { md }; +} diff --git a/packages/myst-to-ipynb/src/commonmark.ts b/packages/myst-to-ipynb/src/commonmark.ts new file mode 100644 index 0000000000..84f3a81d03 --- /dev/null +++ b/packages/myst-to-ipynb/src/commonmark.ts @@ -0,0 +1,500 @@ +/** + * CommonMark AST pre-transform for myst-to-ipynb + * + * Converts MyST-specific AST nodes into their CommonMark-equivalent AST nodes + * so that `writeMd` from `myst-to-md` produces plain CommonMark output + * compatible with vanilla Jupyter Notebook, JupyterLab, and Google Colab. + * + * This transform is applied before `writeMd` is called for each markdown cell. + * It walks the AST tree and replaces MyST directive/role nodes with standard + * mdast nodes that `writeMd` already handles natively. + */ + +import type { GenericNode } from 'myst-common'; +import { toText } from 'myst-common'; +import { selectAll, select } from 'unist-util-select'; + +/** + * Capitalize the first letter of a string. + */ +function capitalize(s: string): string { + return s.charAt(0).toUpperCase() + s.slice(1); +} + +/** + * Convert an admonition node to a blockquote with bold title. + * + * Input: { type: 'admonition', kind: 'note', children: [admonitionTitle, ...content] } + * Output: { type: 'blockquote', children: [paragraph(bold(title)), ...content] } + */ +function transformAdmonition(node: GenericNode): GenericNode { + const kind = node.kind ?? 'note'; + const titleNode = node.children?.find((c: GenericNode) => c.type === 'admonitionTitle'); + const titleText = titleNode ? toText(titleNode) : capitalize(kind); + const contentChildren = + node.children?.filter((c: GenericNode) => c.type !== 'admonitionTitle') ?? []; + return { + type: 'blockquote', + children: [ + { + type: 'paragraph', + children: [{ type: 'strong', children: [{ type: 'text', value: titleText }] }], + }, + ...contentChildren, + ], + }; +} + +/** + * Convert a math block directive to a raw html node containing `$$...$$`. + * + * We use an `html` type node because mdast serializers output its `value` + * as-is, without escaping underscores or other special characters that + * commonly appear in LaTeX expressions. + * + * Input: { type: 'math', value: 'E=mc^2', label: '...' } + * Output: { type: 'html', value: '$$\nE=mc^2\n$$' } + */ +function transformMathBlock(node: GenericNode): GenericNode { + const value = node.value ?? ''; + const labelComment = node.label ? ` (${node.label})` : ''; + return { + type: 'html', + value: `$$\n${value}\n$$${labelComment}`, + }; +} + +/** + * Convert an inline math role to a raw html node with `$...$` delimiters. + * + * Input: { type: 'inlineMath', value: 'E=mc^2' } + * Output: { type: 'html', value: '$E=mc^2$' } + * + * We use an `html` type node so the markdown serializer outputs the value + * as-is, preventing underscore/backslash escaping in LaTeX expressions. + * Jupyter's markdown renderer supports `$...$` for inline math natively. + */ +function transformInlineMath(node: GenericNode): GenericNode { + return { type: 'html', value: `$${node.value ?? ''}$` }; +} + +/** + * Convert a figure container to an image with caption text. + * + * Input: { type: 'container', kind: 'figure', children: [image, caption, legend] } + * Output: { type: 'image', url: '...', alt: 'caption text' } + * followed by caption paragraph if present + */ +function transformFigure(node: GenericNode): GenericNode { + const imageNode: GenericNode | null = select('image', node); + const captionNode: GenericNode | null = select('caption', node); + const legendNode: GenericNode | null = select('legend', node); + + const url = imageNode?.urlSource ?? imageNode?.url ?? ''; + const alt = imageNode?.alt ?? (captionNode ? toText(captionNode) : ''); + + const children: GenericNode[] = [{ type: 'image', url, alt, title: imageNode?.title }]; + + // Add caption as a paragraph below the image if present + if (captionNode?.children?.length) { + children.push({ + type: 'paragraph', + children: [{ type: 'emphasis', children: captionNode.children }], + }); + } + + // Add legend content as-is + if (legendNode?.children?.length) { + children.push(...legendNode.children); + } + + return { type: 'root', children }; +} + +/** + * Convert a table container to its inner table node. + * The table node is already handled by myst-to-md's GFM table extension. + */ +function transformTableContainer(node: GenericNode): GenericNode { + const captionNode: GenericNode | null = select('caption', node); + const tableNode: GenericNode | null = select('table', node); + + const children: GenericNode[] = []; + + // Add caption as bold paragraph above the table + if (captionNode?.children?.length) { + children.push({ + type: 'paragraph', + children: [{ type: 'strong', children: captionNode.children }], + }); + } + + if (tableNode) { + children.push(tableNode); + } + + return { type: 'root', children }; +} + +/** + * Convert an exercise node to a bold header with content. + * + * Input: { type: 'exercise', children: [...] } + * Output: { type: 'root', children: [paragraph(**Exercise N**), ...content] } + */ +function transformExercise(node: GenericNode): GenericNode { + const titleNode = node.children?.find((c: GenericNode) => c.type === 'admonitionTitle'); + const titleText = titleNode ? toText(titleNode) : 'Exercise'; + const enumerator = node.enumerator ? ` ${node.enumerator}` : ''; + const contentChildren = + node.children?.filter((c: GenericNode) => c.type !== 'admonitionTitle') ?? []; + + return { + type: 'root', + children: [ + { + type: 'paragraph', + children: [ + { + type: 'strong', + children: [{ type: 'text', value: `${titleText}${enumerator}` }], + }, + ], + }, + ...contentChildren, + ], + }; +} + +/** + * Convert a solution node to a bold header with content. + * Solutions are kept by default but can be configured to be dropped. + */ +function transformSolution(node: GenericNode, dropSolutions: boolean): GenericNode | null { + if (dropSolutions) return null; + + const titleNode = node.children?.find((c: GenericNode) => c.type === 'admonitionTitle'); + const titleText = titleNode ? toText(titleNode) : 'Solution'; + const contentChildren = + node.children?.filter((c: GenericNode) => c.type !== 'admonitionTitle') ?? []; + + return { + type: 'root', + children: [ + { + type: 'paragraph', + children: [ + { + type: 'strong', + children: [{ type: 'text', value: titleText }], + }, + ], + }, + ...contentChildren, + ], + }; +} + +/** + * Convert a proof-type node (theorem, lemma, definition, etc.) to a bold header. + * + * Input: { type: 'proof', kind: 'theorem', children: [...] } + * Output: { type: 'root', children: [paragraph(**Theorem N** (Title)), ...content] } + */ +function transformProof(node: GenericNode): GenericNode { + const kind = node.kind ?? 'proof'; + const titleNode = node.children?.find((c: GenericNode) => c.type === 'admonitionTitle'); + const titleText = titleNode ? ` (${toText(titleNode)})` : ''; + const enumerator = node.enumerator ? ` ${node.enumerator}` : ''; + const contentChildren = + node.children?.filter((c: GenericNode) => c.type !== 'admonitionTitle') ?? []; + + return { + type: 'root', + children: [ + { + type: 'paragraph', + children: [ + { + type: 'strong', + children: [{ type: 'text', value: `${capitalize(kind)}${enumerator}${titleText}` }], + }, + ], + }, + ...contentChildren, + ], + }; +} + +/** + * Convert a tab-set to just the content of each tab, with tab titles as headings. + */ +function transformTabSet(node: GenericNode): GenericNode { + const children: GenericNode[] = []; + + for (const tabItem of node.children ?? []) { + if (tabItem.type === 'tabItem' || tabItem.kind === 'tabItem') { + // Add tab title as bold paragraph + if (tabItem.title) { + children.push({ + type: 'paragraph', + children: [{ type: 'strong', children: [{ type: 'text', value: tabItem.title }] }], + }); + } + // Add tab content + if (tabItem.children) { + children.push(...tabItem.children); + } + } + } + + return { type: 'root', children }; +} + +/** + * Convert a card to its content with optional title. + */ +function transformCard(node: GenericNode): GenericNode { + const titleNode = node.children?.find((c: GenericNode) => c.type === 'cardTitle'); + const contentChildren = + node.children?.filter( + (c: GenericNode) => !['cardTitle', 'header', 'footer'].includes(c.type), + ) ?? []; + + const children: GenericNode[] = []; + + if (titleNode) { + children.push({ + type: 'paragraph', + children: [ + { + type: 'strong', + children: titleNode.children ?? [{ type: 'text', value: toText(titleNode) }], + }, + ], + }); + } + + children.push(...contentChildren); + + return { type: 'root', children }; +} + +/** + * Convert a grid to its card children (which will be individually transformed). + */ +function transformGrid(node: GenericNode): GenericNode { + return { type: 'root', children: node.children ?? [] }; +} + +/** + * Convert a details/dropdown to a blockquote with summary as title. + */ +function transformDetails(node: GenericNode): GenericNode { + const summaryNode = node.children?.find((c: GenericNode) => c.type === 'summary'); + const contentChildren = node.children?.filter((c: GenericNode) => c.type !== 'summary') ?? []; + + const titleText = summaryNode ? toText(summaryNode) : 'Details'; + + return { + type: 'blockquote', + children: [ + { + type: 'paragraph', + children: [{ type: 'strong', children: [{ type: 'text', value: titleText }] }], + }, + ...contentChildren, + ], + }; +} + +/** + * Convert an aside/sidebar/margin to a blockquote. + */ +function transformAside(node: GenericNode): GenericNode { + const titleNode = node.children?.find((c: GenericNode) => c.type === 'admonitionTitle'); + const contentChildren = + node.children?.filter((c: GenericNode) => c.type !== 'admonitionTitle') ?? []; + + const children: GenericNode[] = []; + + if (titleNode) { + children.push({ + type: 'paragraph', + children: [{ type: 'strong', children: titleNode.children ?? [] }], + }); + } + + children.push(...contentChildren); + + return { type: 'blockquote', children }; +} + +/** + * Convert a code-block directive to a standard fenced code block. + * (Remove MyST-specific options like label, emphasize-lines, etc.) + */ +function transformCodeBlock(node: GenericNode): GenericNode { + return { + type: 'code', + lang: node.lang, + value: node.value ?? '', + }; +} + +/** + * Convert an image node to a plain markdown image by stripping + * directive-specific properties (class, width, align) that cause + * myst-to-md to render it as a ```{image} directive. + */ +function transformImage(node: GenericNode): GenericNode { + return { + type: 'image', + url: node.url ?? node.urlSource ?? '', + alt: node.alt ?? '', + title: node.title, + }; +} + +/** + * Convert a mystDirective node to plain content or remove it. + */ +function transformMystDirective(node: GenericNode): GenericNode | null { + // If it has children, keep the content + if (node.children?.length) { + return { type: 'root', children: node.children }; + } + // If it has a value, render as a code block + if (node.value) { + return { type: 'code', lang: node.lang ?? '', value: node.value }; + } + return null; +} + +/** + * Convert a mystRole node to plain text. + */ +function transformMystRole(node: GenericNode): GenericNode { + if (node.children?.length) { + return { type: 'root', children: node.children }; + } + return { type: 'text', value: node.value ?? '' }; +} + +export interface CommonMarkOptions { + /** Drop solution blocks from output (default: false) */ + dropSolutions?: boolean; +} + +/** + * Walk an AST tree and replace MyST-specific nodes with CommonMark equivalents. + * + * This modifies the tree in-place by replacing children arrays. + * Returns the (possibly replaced) root node. + */ +export function transformToCommonMark(tree: GenericNode, opts?: CommonMarkOptions): GenericNode { + const dropSolutions = opts?.dropSolutions ?? false; + + // Process children recursively (bottom-up so nested directives are handled first) + if (tree.children) { + // First, recurse into children + tree.children = tree.children.map((child: GenericNode) => transformToCommonMark(child, opts)); + + // Then, transform this node's children — replacing nodes that need conversion + const newChildren: GenericNode[] = []; + for (const child of tree.children) { + const transformed = transformNode(child, dropSolutions); + if (transformed === null) { + // Node should be dropped (e.g., solution with dropSolutions=true) + continue; + } + if (transformed.type === 'root' && transformed.children) { + // Flatten: a root wrapper means multiple replacement nodes + newChildren.push(...transformed.children); + } else { + newChildren.push(transformed); + } + } + tree.children = newChildren; + + // Strip identifier/label from all transformed children to prevent + // myst-to-md's labelWrapper from adding `(identifier)=\n` prefixes + // to headings, paragraphs, blockquotes, lists, etc. + // This runs AFTER transformNode so transforms can still use label/identifier. + for (const child of tree.children) { + delete child.identifier; + delete child.label; + } + } + + return tree; +} + +/** + * Transform a single node if it's a MyST-specific type. + * Returns the node unchanged if no transformation is needed. + * Returns null if the node should be removed. + */ +function transformNode(node: GenericNode, dropSolutions: boolean): GenericNode | null { + switch (node.type) { + case 'admonition': + return transformAdmonition(node); + case 'math': + return transformMathBlock(node); + case 'inlineMath': + return transformInlineMath(node); + case 'container': + if (node.kind === 'figure') return transformFigure(node); + if (node.kind === 'table') return transformTableContainer(node); + // code containers — extract the code node + if (node.kind === 'code') { + const codeNode = select('code', node); + return codeNode ? transformCodeBlock(codeNode as GenericNode) : node; + } + return node; + case 'exercise': + return transformExercise(node); + case 'solution': + return transformSolution(node, dropSolutions); + case 'proof': + return transformProof(node); + case 'tabSet': + return transformTabSet(node); + case 'card': + return transformCard(node); + case 'grid': + return transformGrid(node); + case 'details': + return transformDetails(node); + case 'aside': + return transformAside(node); + case 'include': + // Include directives are resolved during transformMdast — their children + // contain the fully-parsed content from the included file. Unwrap them + // so the resolved content is emitted instead of the directive syntax. + if (node.children?.length) { + return { type: 'root', children: node.children }; + } + return null; + case 'mystDirective': + return transformMystDirective(node); + case 'mystRole': + return transformMystRole(node); + case 'mystTarget': + // Drop MyST target labels — they have no CommonMark equivalent + return null; + case 'comment': + // Drop MyST comments (% comment syntax) — not valid in CommonMark + return null; + case 'code': + // Strip extra MyST attributes (class, emphasize-lines, etc.) so myst-to-md + // renders this as a plain fenced code block instead of a ```{code-block} directive + return transformCodeBlock(node); + case 'image': + // Strip directive-specific properties (class, width, align) so myst-to-md + // renders this as ![alt](url) instead of a ```{image} directive + return transformImage(node); + default: + return node; + } +} diff --git a/packages/myst-to-ipynb/src/index.ts b/packages/myst-to-ipynb/src/index.ts new file mode 100644 index 0000000000..585da7eb72 --- /dev/null +++ b/packages/myst-to-ipynb/src/index.ts @@ -0,0 +1,322 @@ +import type { Root, Node } from 'myst-spec'; +import type { Block, Code } from 'myst-spec-ext'; +import type { Plugin } from 'unified'; +import type { VFile } from 'vfile'; +import type { GenericNode } from 'myst-common'; +import type { PageFrontmatter } from 'myst-frontmatter'; +import { writeMd } from 'myst-to-md'; +import { select } from 'unist-util-select'; +import { transformToCommonMark } from './commonmark.js'; +import type { CommonMarkOptions } from './commonmark.js'; +import { embedImagesAsAttachments } from './attachments.js'; +export type { ImageData } from './types.js'; +import type { ImageData } from './types.js'; + +function sourceToStringList(src: string): string[] { + const lines = src.split('\n').map((s) => `${s}\n`); + lines[lines.length - 1] = lines[lines.length - 1].trimEnd(); + return lines; +} + +/** + * Strip leading `+++` cell break markers from markdown content. + * These are MyST-specific block separators that have no meaning in notebooks. + */ +function stripBlockMarkers(md: string): string { + return md.replace(/^\+\+\+[^\n]*(\n|$)/gm, ''); +} + +export interface IpynbOptions { + /** Markdown format: 'myst' preserves MyST syntax, 'commonmark' converts to plain CommonMark */ + markdown?: 'myst' | 'commonmark'; + /** Options for CommonMark conversion */ + commonmark?: CommonMarkOptions; + /** + * How to handle images: 'reference' keeps URL references (default), + * 'attachment' embeds as base64 cell attachments for self-contained notebooks. + * + * When 'attachment', image data is read from disk by `collectImageData()` + * in myst-cli (Phase 1), then post-serialization regex rewriting in + * `embedImagesAsAttachments()` converts `![alt](url)` → `![alt](attachment:name)` + * and adds the `attachments` field to each cell (Phase 2). + */ + images?: 'reference' | 'attachment'; + /** + * Map of image URL → { mime, data } for attachment embedding. + * Only used when `images` is 'attachment'. Populated by `collectImageData()` + * in myst-cli which walks AST image nodes and reads files from disk. + * Keys must match the image URLs as they appear in the serialized markdown + * (e.g. '/_static/img/foo.png'). + */ + imageData?: Record; +} + +/** + * Check whether a node is a code-cell block (i.e. a `{code-cell}` directive + * that should become a notebook code cell). + */ +function isCodeCellBlock(node: GenericNode): boolean { + return node.type === 'block' && node.kind === 'notebook-code'; +} + +/** + * Check whether a node is an exercise or solution that contains code-cell blocks. + */ +function isGatedNodeWithCodeCells(node: GenericNode, opts?: CommonMarkOptions): boolean { + if (node.type !== 'exercise' && node.type !== 'solution') return false; + // Skip solutions that should be dropped — leave intact for transformToCommonMark + if (node.type === 'solution' && opts?.dropSolutions) return false; + return node.children?.some(isCodeCellBlock) ?? false; +} + +/** + * Lift code-cell blocks out of exercise/solution nodes that used gated syntax. + * + * When gated syntax (`{exercise-start}`/`{exercise-end}`) is used, the + * `joinGatesTransform` nests all content between the gates — including + * `{code-cell}` blocks — as children of the exercise/solution node. Then + * `blockNestingTransform` groups the exercise/solution with neighboring + * non-block siblings into a single wrapper block. The real AST structure is: + * + * root > block { para, exercise { para, block{code} }, solution { ... }, para } + * + * This means code-cell blocks inside exercise/solution never appear as + * top-level notebook cells; they are absorbed into a single markdown cell. + * + * This function walks each block's children, finds exercise/solution nodes + * that contain code-cell blocks, and splits the block so code cells are + * emitted as top-level notebook code cells: + * + * BEFORE: block { para, solution { title, para, block{code}, para } } + * AFTER: block { para, solution { title, para } } + * block{code} + * block { para } + * + * When `dropSolutions` is true, solution nodes are left intact so that + * `transformToCommonMark` can drop them entirely (including their code cells). + */ +function liftCodeCellsFromGatedNodes(root: Root, opts?: CommonMarkOptions): Root { + const newChildren: Node[] = []; + let modified = false; + + for (const child of root.children) { + const c = child as GenericNode; + + // Case 1: exercise/solution directly as root child (e.g. in tests) + if (isGatedNodeWithCodeCells(c, opts)) { + modified = true; + liftFromExerciseSolution(c, newChildren, false); + continue; + } + + // Case 2: block containing exercise/solution among its children + if ( + c.type === 'block' && + c.children?.some((ch: GenericNode) => isGatedNodeWithCodeCells(ch, opts)) + ) { + modified = true; + splitBlockWithGatedNodes(c, newChildren, opts); + continue; + } + + // No gated nodes — keep as-is + newChildren.push(child); + } + + if (!modified) return root; + return { ...root, children: newChildren } as Root; +} + +/** + * Split a single exercise/solution node's children into alternating + * markdown content and top-level code cells. + * + * The first group of markdown content retains the exercise/solution wrapper + * (for title/enumerator rendering). Subsequent groups become plain content. + * + * @param wrapInBlock If true, wraps output groups in block nodes. + */ +function liftFromExerciseSolution(node: GenericNode, output: Node[], wrapInBlock: boolean): void { + const mdContent: GenericNode[] = []; + let isFirstGroup = true; + + const flushMarkdown = () => { + if (mdContent.length === 0) return; + const content = [...mdContent]; + mdContent.length = 0; + + if (isFirstGroup) { + // Preserve the exercise/solution wrapper for title rendering + const wrapper: GenericNode = { ...node, children: content }; + if (wrapInBlock) { + output.push({ type: 'block', children: [wrapper] } as unknown as Node); + } else { + output.push(wrapper as unknown as Node); + } + isFirstGroup = false; + } else { + if (wrapInBlock) { + output.push({ type: 'block', children: content } as unknown as Node); + } else { + for (const n of content) { + output.push(n as unknown as Node); + } + } + } + }; + + for (const gatedChild of node.children ?? []) { + if (isCodeCellBlock(gatedChild)) { + flushMarkdown(); + output.push(gatedChild as unknown as Node); + } else { + mdContent.push(gatedChild); + } + } + flushMarkdown(); +} + +/** + * Process a block that contains one or more exercise/solution nodes with + * embedded code cells, along with other child nodes. Splits the block into + * multiple top-level blocks and code cells as needed. + * + * For non-exercise/solution children, they accumulate in a markdown block. + * When an exercise/solution with code cells is encountered, the accumulated + * block is flushed, then the exercise/solution is expanded via + * liftFromExerciseSolution. + */ +function splitBlockWithGatedNodes( + block: GenericNode, + output: Node[], + opts?: CommonMarkOptions, +): void { + const pending: GenericNode[] = []; + + const flushPending = () => { + if (pending.length === 0) return; + output.push({ type: 'block', children: [...pending] } as unknown as Node); + pending.length = 0; + }; + + for (const child of block.children ?? []) { + if (isGatedNodeWithCodeCells(child, opts)) { + flushPending(); + liftFromExerciseSolution(child, output, true); + } else { + pending.push(child); + } + } + flushPending(); +} + +export function writeIpynb( + file: VFile, + node: Root, + frontmatter?: PageFrontmatter, + options?: IpynbOptions, +) { + const markdownFormat = options?.markdown ?? 'myst'; + + // Lift code-cell blocks out of gated exercise/solution nodes + // so they become proper notebook code cells instead of being + // absorbed into markdown cells. + node = liftCodeCellsFromGatedNodes(node, options?.commonmark); + + const cells = (node.children as Block[]) + .map((block: Block) => { + if (block.type === 'block' && block.kind === 'notebook-code') { + const code = select('code', block) as Code; + return { + cell_type: 'code' as const, + execution_count: null, + metadata: {}, + outputs: [], + source: sourceToStringList(code.value), + }; + } + // Build the sub-tree for this markdown cell + let blockTree: any = { type: 'root', children: [block] }; + if (markdownFormat === 'commonmark') { + blockTree = transformToCommonMark( + JSON.parse(JSON.stringify(blockTree)), + options?.commonmark, + ); + } + const md = writeMd(file, blockTree).result as string; + const cleanMd = stripBlockMarkers(md); + // Embed images as cell attachments if requested + if (options?.images === 'attachment' && options?.imageData) { + const { md: attachedMd, attachments } = embedImagesAsAttachments( + cleanMd, + options.imageData, + ); + const cell: Record = { + cell_type: 'markdown' as const, + metadata: {}, + source: sourceToStringList(attachedMd), + }; + if (attachments) { + cell.attachments = attachments; + } + return cell; + } + return { + cell_type: 'markdown' as const, + metadata: {}, + source: sourceToStringList(cleanMd), + }; + }) + .filter((cell) => { + // Remove empty markdown cells (e.g., from dropped mystTarget/comment nodes) + if (cell.cell_type === 'markdown') { + const content = cell.source.join('').trim(); + return content.length > 0; + } + return true; + }); + + // Build notebook metadata from frontmatter kernelspec when available + const languageName = + frontmatter?.kernelspec?.language ?? frontmatter?.kernelspec?.name ?? 'python'; + const metadata: Record = { + language_info: { + name: languageName, + }, + }; + if (frontmatter?.kernelspec) { + metadata.kernelspec = { + name: frontmatter.kernelspec.name, + display_name: frontmatter.kernelspec.display_name, + language: languageName, + }; + } + + const ipynb = { + cells, + metadata, + nbformat: 4, + nbformat_minor: 2, + }; + + file.result = JSON.stringify(ipynb, null, 2); + return file; +} + +const plugin: Plugin<[PageFrontmatter?, IpynbOptions?], Root, VFile> = function ( + frontmatter?, + options?, +) { + this.Compiler = (node, file) => { + return writeIpynb(file, node, frontmatter, options); + }; + + return (node: Root) => { + // Preprocess + return node; + }; +}; + +export default plugin; +export type { CommonMarkOptions } from './commonmark.js'; +export { embedImagesAsAttachments } from './attachments.js'; diff --git a/packages/myst-to-ipynb/src/types.ts b/packages/myst-to-ipynb/src/types.ts new file mode 100644 index 0000000000..12b7e90f8a --- /dev/null +++ b/packages/myst-to-ipynb/src/types.ts @@ -0,0 +1,7 @@ +/** Image data for embedding as cell attachments */ +export interface ImageData { + /** MIME type (e.g. 'image/png') */ + mime: string; + /** Base64-encoded image data */ + data: string; +} diff --git a/packages/myst-to-ipynb/tests/attachments.spec.ts b/packages/myst-to-ipynb/tests/attachments.spec.ts new file mode 100644 index 0000000000..5e889f1b48 --- /dev/null +++ b/packages/myst-to-ipynb/tests/attachments.spec.ts @@ -0,0 +1,109 @@ +import { describe, expect, test } from 'vitest'; +import { embedImagesAsAttachments } from '../src/attachments'; + +describe('embedImagesAsAttachments', () => { + test('replaces image URL with attachment reference', () => { + const md = '![Chart](/_static/img/chart.png)'; + const imageData = { + '/_static/img/chart.png': { mime: 'image/png', data: 'base64data' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![Chart](attachment:chart.png)'); + expect(result.attachments).toEqual({ + 'chart.png': { 'image/png': 'base64data' }, + }); + }); + + test('handles multiple images', () => { + const md = '![A](/_static/a.png)\n\n![B](/_static/b.jpg)'; + const imageData = { + '/_static/a.png': { mime: 'image/png', data: 'AAAA' }, + '/_static/b.jpg': { mime: 'image/jpeg', data: 'BBBB' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![A](attachment:a.png)\n\n![B](attachment:b.jpg)'); + expect(result.attachments).toEqual({ + 'a.png': { 'image/png': 'AAAA' }, + 'b.jpg': { 'image/jpeg': 'BBBB' }, + }); + }); + + test('deduplicates same-basename images with counter suffix', () => { + const md = '![A](/dir1/img.png)\n\n![B](/dir2/img.png)'; + const imageData = { + '/dir1/img.png': { mime: 'image/png', data: 'AAAA' }, + '/dir2/img.png': { mime: 'image/png', data: 'BBBB' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![A](attachment:img.png)\n\n![B](attachment:img_1.png)'); + expect(result.attachments).toEqual({ + 'img.png': { 'image/png': 'AAAA' }, + 'img_1.png': { 'image/png': 'BBBB' }, + }); + }); + + test('skips images not in imageData', () => { + const md = '![A](/a.png)\n\n![B](/b.png)'; + const imageData = { + '/a.png': { mime: 'image/png', data: 'AAAA' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![A](attachment:a.png)\n\n![B](/b.png)'); + expect(result.attachments).toEqual({ + 'a.png': { 'image/png': 'AAAA' }, + }); + }); + + test('returns no attachments when imageData is empty', () => { + const md = '![A](/a.png)'; + const result = embedImagesAsAttachments(md, {}); + expect(result.md).toBe('![A](/a.png)'); + expect(result.attachments).toBeUndefined(); + }); + + test('returns no attachments when no images match', () => { + const md = '![A](/a.png)'; + const imageData = { + '/other.png': { mime: 'image/png', data: 'XXXX' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![A](/a.png)'); + expect(result.attachments).toBeUndefined(); + }); + + test('handles image with no alt text', () => { + const md = '![](/_static/chart.png)'; + const imageData = { + '/_static/chart.png': { mime: 'image/png', data: 'DATA' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![](attachment:chart.png)'); + expect(result.attachments).toEqual({ + 'chart.png': { 'image/png': 'DATA' }, + }); + }); + + test('handles escaped parentheses in URL', () => { + const md = '![Chart](/_static/img\\(1\\).png)'; + const imageData = { + '/_static/img(1).png': { mime: 'image/png', data: 'base64data' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![Chart](attachment:img(1).png)'); + expect(result.attachments).toEqual({ + 'img(1).png': { 'image/png': 'base64data' }, + }); + }); + + test('handles escaped brackets in alt text', () => { + const md = '![alt \\[text\\]](/_static/chart.png)'; + const imageData = { + '/_static/chart.png': { mime: 'image/png', data: 'DATA' }, + }; + const result = embedImagesAsAttachments(md, imageData); + expect(result.md).toBe('![alt \\[text\\]](attachment:chart.png)'); + expect(result.attachments).toEqual({ + 'chart.png': { 'image/png': 'DATA' }, + }); + }); +}); diff --git a/packages/myst-to-ipynb/tests/attachments.yml b/packages/myst-to-ipynb/tests/attachments.yml new file mode 100644 index 0000000000..a8e31e1eb5 --- /dev/null +++ b/packages/myst-to-ipynb/tests/attachments.yml @@ -0,0 +1,176 @@ +title: Image Attachments +cases: + - title: single image with attachment embedding + options: + images: attachment + imageData: + /_static/img/chart.png: + mime: image/png + data: iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg== + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: text + value: "Here is a chart:" + - type: image + url: /_static/img/chart.png + alt: A chart + ipynb: + cells: + - cell_type: markdown + metadata: {} + attachments: + chart.png: + image/png: iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg== + source: + - "Here is a chart:\n" + - "\n" + - "![A chart](attachment:chart.png)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: multiple images in one cell with attachment embedding + options: + images: attachment + imageData: + /_static/img/alpha.png: + mime: image/png + data: AAAA + /_static/img/beta.jpg: + mime: image/jpeg + data: BBBB + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: image + url: /_static/img/alpha.png + alt: Alpha + - type: paragraph + children: + - type: image + url: /_static/img/beta.jpg + alt: Beta + ipynb: + cells: + - cell_type: markdown + metadata: {} + attachments: + alpha.png: + image/png: AAAA + beta.jpg: + image/jpeg: BBBB + source: + - "![Alpha](attachment:alpha.png)\n" + - "\n" + - "![Beta](attachment:beta.jpg)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: image without matching data stays as reference + options: + images: attachment + imageData: + /_static/img/other.png: + mime: image/png + data: CCCC + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: image + url: /_static/img/missing.png + alt: Missing + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "![Missing](/_static/img/missing.png)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: images as reference (default) keeps URLs + options: + imageData: + /_static/img/chart.png: + mime: image/png + data: DDDD + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: image + url: /_static/img/chart.png + alt: Chart + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "![Chart](/_static/img/chart.png)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: attachment embedding with commonmark conversion + options: + markdown: commonmark + images: attachment + imageData: + /_static/img/plot.png: + mime: image/png + data: EEEE + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: text + value: A plot + - type: image + url: /_static/img/plot.png + alt: My plot + class: width-80 + ipynb: + cells: + - cell_type: markdown + metadata: {} + attachments: + plot.png: + image/png: EEEE + source: + - "A plot\n" + - "\n" + - "![My plot](attachment:plot.png)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 diff --git a/packages/myst-to-ipynb/tests/basic.yml b/packages/myst-to-ipynb/tests/basic.yml new file mode 100644 index 0000000000..626758e500 --- /dev/null +++ b/packages/myst-to-ipynb/tests/basic.yml @@ -0,0 +1,350 @@ +title: myst-to-ipynb basic features +cases: + - title: styles in paragraph + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: 'Some % ' + - type: emphasis + children: + - type: text + value: markdown + - type: text + value: ' with ' + - type: strong + children: + - type: text + value: different + - type: text + value: ' ' + - type: inlineCode + value: style`s + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Some % *markdown* with **different** ``style`s``" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: headings + mdast: + type: root + children: + - type: heading + depth: 1 + children: + - type: text + value: first + - type: paragraph + children: + - type: text + value: 'Some % ' + - type: emphasis + children: + - type: text + value: markdown + - type: heading + depth: 4 + children: + - type: text + value: fourth + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "# first" + - cell_type: markdown + metadata: {} + source: + - "Some % *markdown*" + - cell_type: markdown + metadata: {} + source: + - "#### fourth" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: thematic break + mdast: + type: root + children: + - type: thematicBreak + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "---" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: block quote + mdast: + type: root + children: + - type: blockquote + children: + - type: paragraph + children: + - type: text + value: 'Some % ' + - type: emphasis + children: + - type: text + value: markdown + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "> Some % *markdown*" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: unordered list + mdast: + type: root + children: + - type: list + ordered: false + children: + - type: listItem + children: + - type: paragraph + children: + - type: text + value: Item one + - type: listItem + children: + - type: paragraph + children: + - type: text + value: Item two + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "* Item one\n" + - "\n" + - "* Item two" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: html + mdast: + type: root + children: + - type: html + value: '
*Not markdown*
' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "
*Not markdown*
" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: code - plain fenced + mdast: + type: root + children: + - type: code + value: "5+5\nprint(\"hello world\\n\")" + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "```\n" + - "5+5\n" + - "print(\"hello world\\n\")\n" + - "```" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: code cell in block + mdast: + type: root + children: + - type: block + kind: notebook-code + children: + - type: code + lang: python + executable: true + value: 'print("hello")' + - type: output + id: test-output + ipynb: + cells: + - cell_type: code + execution_count: null + metadata: {} + outputs: [] + source: + - 'print("hello")' + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: mixed markdown and code cells + mdast: + type: root + children: + - type: heading + depth: 1 + children: + - type: text + value: Title + - type: block + kind: notebook-code + children: + - type: code + lang: python + executable: true + value: x = 1 + - type: paragraph + children: + - type: text + value: After code + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "# Title" + - cell_type: code + execution_count: null + metadata: {} + outputs: [] + source: + - "x = 1" + - cell_type: markdown + metadata: {} + source: + - "After code" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: block markers stripped + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: text + value: Content after marker + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Content after marker" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: link with title + mdast: + type: root + children: + - type: paragraph + children: + - type: link + url: https://example.com + title: my link + children: + - type: text + value: Click here + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - '[Click here](https://example.com "my link")' + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: image + mdast: + type: root + children: + - type: image + url: fig.png + alt: A figure + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "![A figure](fig.png)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: break in paragraph + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: Some markdown + - type: break + - type: text + value: Some more markdown + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Some markdown\\\n" + - "Some more markdown" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 diff --git a/packages/myst-to-ipynb/tests/commonmark.yml b/packages/myst-to-ipynb/tests/commonmark.yml new file mode 100644 index 0000000000..e39d95df90 --- /dev/null +++ b/packages/myst-to-ipynb/tests/commonmark.yml @@ -0,0 +1,908 @@ +title: myst-to-ipynb CommonMark mode +cases: + - title: inline math converted to dollar signs + options: + markdown: commonmark + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: 'The value ' + - type: inlineMath + value: 'E = mc^2' + - type: text + value: ' is famous.' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "The value $E = mc^2$ is famous." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: math block converted to dollar-dollar + options: + markdown: commonmark + mdast: + type: root + children: + - type: math + value: "\\int_0^\\infty e^{-x^2} dx" + label: eq-gauss + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "$$\n" + - "\\int_0^\\infty e^{-x^2} dx\n" + - "$$ (eq-gauss)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: math block without label + options: + markdown: commonmark + mdast: + type: root + children: + - type: math + value: 'a^2 + b^2 = c^2' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "$$\n" + - "a^2 + b^2 = c^2\n" + - "$$" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: admonition converted to blockquote + options: + markdown: commonmark + mdast: + type: root + children: + - type: admonition + kind: note + children: + - type: admonitionTitle + children: + - type: text + value: Important + - type: paragraph + children: + - type: text + value: This is a note. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "> **Important**\n" + - ">\n" + - "> This is a note." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: admonition preserved in myst mode + mdast: + type: root + children: + - type: admonition + kind: note + children: + - type: admonitionTitle + children: + - type: text + value: Important + - type: paragraph + children: + - type: text + value: This is a note. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - ":::{note} Important\n" + - "This is a note.\n" + - ":::" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: exercise with enumerator + options: + markdown: commonmark + mdast: + type: root + children: + - type: exercise + enumerator: '1' + children: + - type: admonitionTitle + children: + - type: text + value: Exercise + - type: paragraph + children: + - type: text + value: 'Solve ' + - type: inlineMath + value: 'x^2 = 1' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Exercise 1**\n" + - "\n" + - "Solve $x^2 = 1$" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: theorem with title + options: + markdown: commonmark + mdast: + type: root + children: + - type: proof + kind: theorem + enumerator: '1' + children: + - type: admonitionTitle + children: + - type: text + value: Pythagorean + - type: paragraph + children: + - type: inlineMath + value: 'a^2 + b^2 = c^2' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Theorem 1 (Pythagorean)**\n" + - "\n" + - "$a^2 + b^2 = c^2$" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: tabSet flattened to bold titles + options: + markdown: commonmark + mdast: + type: root + children: + - type: tabSet + children: + - type: tabItem + title: Python + children: + - type: code + lang: python + value: 'print("hi")' + - type: tabItem + title: Julia + children: + - type: code + lang: julia + value: 'println("hi")' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Python**\n" + - "\n" + - "```python\n" + - "print(\"hi\")\n" + - "```\n" + - "\n" + - "**Julia**\n" + - "\n" + - "```julia\n" + - "println(\"hi\")\n" + - "```" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: solution dropped when configured + options: + markdown: commonmark + commonmark: + dropSolutions: true + mdast: + type: root + children: + - type: solution + children: + - type: admonitionTitle + children: + - type: text + value: Solution + - type: paragraph + children: + - type: text + value: The answer is 42. + ipynb: + cells: [] + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: solution kept by default in commonmark mode + options: + markdown: commonmark + mdast: + type: root + children: + - type: solution + children: + - type: admonitionTitle + children: + - type: text + value: Solution to Exercise 1 + - type: paragraph + children: + - type: text + value: The answer is 42. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Solution to Exercise 1**\n" + - "\n" + - "The answer is 42." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: gated solution with code cell lifted to top level + options: + markdown: commonmark + mdast: + type: root + children: + - type: block + children: + - type: solution + children: + - type: admonitionTitle + children: + - type: text + value: Solution to Exercise 1 + - type: paragraph + children: + - type: text + value: "Here's one solution:" + - type: block + kind: notebook-code + children: + - type: code + lang: python3 + executable: true + value: "def factorial(n):\n k = 1\n for i in range(n):\n k = k * (i + 1)\n return k\n\nfactorial(4)" + - type: outputs + children: [] + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Solution to Exercise 1**\n" + - "\n" + - "Here's one solution:" + - cell_type: code + execution_count: null + metadata: {} + outputs: [] + source: + - "def factorial(n):\n" + - " k = 1\n" + - " for i in range(n):\n" + - " k = k * (i + 1)\n" + - " return k\n" + - "\n" + - "factorial(4)" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: gated solution with multiple code cells interleaved with markdown + options: + markdown: commonmark + mdast: + type: root + children: + - type: block + children: + - type: solution + children: + - type: admonitionTitle + children: + - type: text + value: Solution + - type: paragraph + children: + - type: text + value: First approach + - type: block + kind: notebook-code + children: + - type: code + lang: python3 + executable: true + value: "x = 1" + - type: outputs + children: [] + - type: paragraph + children: + - type: text + value: Second approach + - type: block + kind: notebook-code + children: + - type: code + lang: python3 + executable: true + value: "x = 2" + - type: outputs + children: [] + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Solution**\n" + - "\n" + - "First approach" + - cell_type: code + execution_count: null + metadata: {} + outputs: [] + source: + - "x = 1" + - cell_type: markdown + metadata: {} + source: + - "Second approach" + - cell_type: code + execution_count: null + metadata: {} + outputs: [] + source: + - "x = 2" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: gated exercise with code cell + options: + markdown: commonmark + mdast: + type: root + children: + - type: block + children: + - type: exercise + enumerator: '1' + children: + - type: admonitionTitle + children: + - type: text + value: Exercise + - type: paragraph + children: + - type: text + value: Write a factorial function. + - type: block + kind: notebook-code + children: + - type: code + lang: python3 + executable: true + value: "# your code here" + - type: outputs + children: [] + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Exercise 1**\n" + - "\n" + - "Write a factorial function." + - cell_type: code + execution_count: null + metadata: {} + outputs: [] + source: + - "# your code here" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: gated solution dropped when configured + options: + markdown: commonmark + commonmark: + dropSolutions: true + mdast: + type: root + children: + - type: block + children: + - type: solution + children: + - type: admonitionTitle + children: + - type: text + value: Solution + - type: paragraph + children: + - type: text + value: "Here's the answer:" + - type: block + kind: notebook-code + children: + - type: code + lang: python3 + executable: true + value: "x = 42" + - type: outputs + children: [] + ipynb: + cells: [] + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: gated exercise and solution sharing a block with other content + options: + markdown: commonmark + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: text + value: Text before exercise. + - type: exercise + enumerator: '1' + children: + - type: admonitionTitle + children: + - type: text + value: Exercise + - type: paragraph + children: + - type: text + value: Write a factorial function. + - type: solution + children: + - type: admonitionTitle + children: + - type: text + value: Solution to Exercise 1 + - type: paragraph + children: + - type: text + value: "Here's one solution:" + - type: block + kind: notebook-code + children: + - type: code + lang: python3 + executable: true + value: "factorial(4)" + - type: outputs + children: [] + - type: paragraph + children: + - type: text + value: Text after solution. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Text before exercise.\n" + - "\n" + - "**Exercise 1**\n" + - "\n" + - "Write a factorial function." + - cell_type: markdown + metadata: {} + source: + - "**Solution to Exercise 1**\n" + - "\n" + - "Here's one solution:" + - cell_type: code + execution_count: null + metadata: {} + outputs: [] + source: + - "factorial(4)" + - cell_type: markdown + metadata: {} + source: + - "Text after solution." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: gated solution dropped from shared block when configured + options: + markdown: commonmark + commonmark: + dropSolutions: true + mdast: + type: root + children: + - type: block + children: + - type: exercise + enumerator: '1' + children: + - type: admonitionTitle + children: + - type: text + value: Exercise + - type: paragraph + children: + - type: text + value: Solve this problem. + - type: solution + children: + - type: admonitionTitle + children: + - type: text + value: Solution + - type: paragraph + children: + - type: text + value: "The answer:" + - type: block + kind: notebook-code + children: + - type: code + lang: python3 + executable: true + value: "x = 42" + - type: outputs + children: [] + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "**Exercise 1**\n" + - "\n" + - "Solve this problem." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: inline math with underscores not escaped + options: + markdown: commonmark + mdast: + type: root + children: + - type: paragraph + children: + - type: inlineMath + value: 'x_1 + x_2' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "$x_1 + x_2$" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: math block with underscores not escaped + options: + markdown: commonmark + mdast: + type: root + children: + - type: math + value: "\\int_0^\\infty e^{-x^2} dx = \\frac{\\sqrt{\\pi}}{2}" + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "$$\n" + - "\\int_0^\\infty e^{-x^2} dx = \\frac{\\sqrt{\\pi}}{2}\n" + - "$$" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: CommonMark with frontmatter kernelspec + frontmatter: + kernelspec: + name: python3 + display_name: Python 3 + language: python + options: + markdown: commonmark + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: 'Value: ' + - type: inlineMath + value: 'x = 1' + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Value: $x = 1$" + metadata: + language_info: + name: python + kernelspec: + name: python3 + display_name: Python 3 + language: python + nbformat: 4 + nbformat_minor: 2 + + - title: heading identifier stripped in commonmark mode + options: + markdown: commonmark + mdast: + type: root + children: + - type: heading + depth: 2 + identifier: my-section + label: my-section + children: + - type: text + value: My Section + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "## My Section" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: paragraph identifier stripped in commonmark mode + options: + markdown: commonmark + mdast: + type: root + children: + - type: paragraph + identifier: labeled-para + children: + - type: text + value: A labeled paragraph. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "A labeled paragraph." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: mystTarget nodes dropped in commonmark mode + options: + markdown: commonmark + mdast: + type: root + children: + - type: mystTarget + label: my-target + - type: paragraph + children: + - type: text + value: After target. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "After target." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: comment nodes dropped in commonmark mode + options: + markdown: commonmark + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: Before comment. + - type: comment + value: This is a MyST comment + - type: paragraph + children: + - type: text + value: After comment. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Before comment." + - cell_type: markdown + metadata: {} + source: + - "After comment." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: code block with extra attributes stripped in commonmark mode + options: + markdown: commonmark + mdast: + type: root + children: + - type: code + lang: python + class: no-execute + value: "print('hello')" + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "```python\n" + - "print('hello')\n" + - "```" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: resolved include directive unwrapped to content + options: + markdown: commonmark + mdast: + type: root + children: + - type: block + children: + - type: include + file: _admonition/gpu.md + children: + - type: admonition + kind: note + children: + - type: admonitionTitle + children: + - type: text + value: GPU + - type: paragraph + children: + - type: text + value: This lecture requires a GPU-enabled machine. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "> **GPU**\n" + - ">\n" + - "> This lecture requires a GPU-enabled machine." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: unresolved include directive with no children is dropped + options: + markdown: commonmark + mdast: + type: root + children: + - type: block + children: + - type: paragraph + children: + - type: text + value: Before include. + - type: include + file: _admonition/missing.md + - type: paragraph + children: + - type: text + value: After include. + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Before include.\n" + - "\n" + - "After include." + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 diff --git a/packages/myst-to-ipynb/tests/example.ipynb b/packages/myst-to-ipynb/tests/example.ipynb new file mode 100644 index 0000000000..8e16dcd584 --- /dev/null +++ b/packages/myst-to-ipynb/tests/example.ipynb @@ -0,0 +1,31 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "hello\n", + "\n", + "world" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = 1\n", + "\n", + "hello = 2" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/packages/myst-to-ipynb/tests/frontmatter.yml b/packages/myst-to-ipynb/tests/frontmatter.yml new file mode 100644 index 0000000000..9d48cacdc8 --- /dev/null +++ b/packages/myst-to-ipynb/tests/frontmatter.yml @@ -0,0 +1,108 @@ +title: myst-to-ipynb frontmatter and metadata +cases: + - title: empty frontmatter defaults to python + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: Hello world! + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Hello world!" + metadata: + language_info: + name: python + nbformat: 4 + nbformat_minor: 2 + + - title: kernelspec from frontmatter + frontmatter: + kernelspec: + name: julia-1.10 + display_name: Julia 1.10 + language: julia + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: Hello + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Hello" + metadata: + language_info: + name: julia + kernelspec: + name: julia-1.10 + display_name: Julia 1.10 + language: julia + nbformat: 4 + nbformat_minor: 2 + + - title: kernelspec python3 + frontmatter: + kernelspec: + name: python3 + display_name: Python 3 + language: python + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: Test + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "Test" + metadata: + language_info: + name: python + kernelspec: + name: python3 + display_name: Python 3 + language: python + nbformat: 4 + nbformat_minor: 2 + + - title: kernelspec R language + frontmatter: + kernelspec: + name: ir + display_name: R + language: R + mdast: + type: root + children: + - type: paragraph + children: + - type: text + value: R notebook + ipynb: + cells: + - cell_type: markdown + metadata: {} + source: + - "R notebook" + metadata: + language_info: + name: R + kernelspec: + name: ir + display_name: R + language: R + nbformat: 4 + nbformat_minor: 2 diff --git a/packages/myst-to-ipynb/tests/run.spec.ts b/packages/myst-to-ipynb/tests/run.spec.ts new file mode 100644 index 0000000000..9df4d579d5 --- /dev/null +++ b/packages/myst-to-ipynb/tests/run.spec.ts @@ -0,0 +1,43 @@ +import { describe, expect, test } from 'vitest'; +import fs from 'node:fs'; +import path from 'node:path'; +import yaml from 'js-yaml'; +import { unified } from 'unified'; +import writeIpynb from '../src'; +import type { PageFrontmatter } from 'myst-frontmatter'; +import type { IpynbOptions } from '../src'; + +type TestCase = { + title: string; + ipynb: Record; + mdast: Record; + frontmatter?: PageFrontmatter; + options?: IpynbOptions; +}; + +type TestCases = { + title: string; + cases: TestCase[]; +}; + +const casesList: TestCases[] = fs + .readdirSync(__dirname) + .filter((file) => file.endsWith('.yml')) + .map((file) => { + const content = fs.readFileSync(path.join(__dirname, file), { encoding: 'utf-8' }); + return yaml.load(content) as TestCases; + }); + +casesList.forEach(({ title, cases }) => { + describe(title, () => { + test.each(cases.map((c): [string, TestCase] => [c.title, c]))( + '%s', + (_, { ipynb, mdast, frontmatter, options }) => { + const pipe = unified().use(writeIpynb, frontmatter, options); + pipe.runSync(mdast as any); + const file = pipe.stringify(mdast as any); + expect(JSON.parse(file.result)).toEqual(ipynb); + }, + ); + }); +}); diff --git a/packages/myst-to-ipynb/tsconfig.json b/packages/myst-to-ipynb/tsconfig.json new file mode 100644 index 0000000000..1c5c0f1c4b --- /dev/null +++ b/packages/myst-to-ipynb/tsconfig.json @@ -0,0 +1,8 @@ +{ + "extends": "../tsconfig/base.json", + "compilerOptions": { + "outDir": "dist" + }, + "include": ["."], + "exclude": ["dist", "build", "node_modules", "src/**/*.spec.ts", "tests"] +} diff --git a/packages/myst-to-md/src/directives.ts b/packages/myst-to-md/src/directives.ts index 93558289b1..dda85f7c27 100644 --- a/packages/myst-to-md/src/directives.ts +++ b/packages/myst-to-md/src/directives.ts @@ -181,7 +181,7 @@ function containerValidator(node: any, file: VFile) { ruleId: RuleId.mdRenders, }); } - if (kind !== 'figure' && kind !== 'table' && kind !== 'code') { + if (kind !== 'figure' && kind !== 'table' && kind !== 'code' && kind !== 'quote') { fileError(file, `Unknown kind on container node: ${kind}`, { node, source: 'myst-to-md', @@ -203,7 +203,20 @@ function container(node: any, _: Parent, state: NestedState, info: Info): string const captionNode: GenericNode | null = select('caption', node); const legendNode: GenericNode | null = select('legend', node); const children = [...(captionNode?.children || []), ...(legendNode?.children || [])]; - if (node.kind === 'figure') { + if (node.kind === 'quote') { + const blockquoteNode: GenericNode | null = select('blockquote', node); + if (!blockquoteNode) return ''; + // Serialize the blockquote content using the default blockquote handler + let result = defaultHandlers.blockquote(blockquoteNode as any, _ as any, state, info); + // Append attribution (caption) as a blockquote line with em-dash prefix + if (captionNode) { + const attribution = state.containerPhrasing(captionNode as any, info); + if (attribution) { + result += '\n>\n> \u2014 ' + attribution; + } + } + return result; + } else if (node.kind === 'figure') { const imageNodes: GenericNode[] = selectAll('image', node); const imageNode = imageNodes.find((img) => !img.placeholder); if (imageNode?.data?.altTextIsAutoGenerated) { diff --git a/packages/myst-to-md/src/references.ts b/packages/myst-to-md/src/references.ts index a044ef2a4d..ebeb689d59 100644 --- a/packages/myst-to-md/src/references.ts +++ b/packages/myst-to-md/src/references.ts @@ -11,10 +11,13 @@ function labelWrapper(handler: Handle) { } function crossReference(node: any, _: Parent, state: NestedState, info: Info): string { - const { urlSource, label, identifier } = node; + const { urlSource, label, identifier, url, html_id } = node; + const resolvedUrl = + urlSource ?? + (label ? `#${label}` : identifier ? `#${identifier}` : html_id ? `#${html_id}` : (url ?? '')); const nodeCopy = { ...node, - url: urlSource ?? (label ? `#${label}` : identifier ? `#${identifier}` : ''), + url: resolvedUrl, }; return defaultHandlers.link(nodeCopy, _, state, info); } diff --git a/packages/myst-to-md/tests/directives.yml b/packages/myst-to-md/tests/directives.yml index f56124b645..623e1edf84 100644 --- a/packages/myst-to-md/tests/directives.yml +++ b/packages/myst-to-md/tests/directives.yml @@ -960,3 +960,59 @@ cases: :::{topic} Topic content ::: + - title: epigraph blockquote + mdast: + type: root + children: + - type: container + kind: quote + class: epigraph + children: + - type: blockquote + children: + - type: paragraph + children: + - type: text + value: 'Python has gotten sufficiently weapons grade that we don''t descend into R anymore.' + markdown: |- + > Python has gotten sufficiently weapons grade that we don't descend into R anymore. + - title: epigraph blockquote with attribution + mdast: + type: root + children: + - type: container + kind: quote + class: epigraph + children: + - type: blockquote + children: + - type: paragraph + children: + - type: text + value: 'Debugging is twice as hard as writing the code in the first place.' + - type: caption + children: + - type: paragraph + children: + - type: text + value: Brian Kernighan + markdown: |- + > Debugging is twice as hard as writing the code in the first place. + > + > — Brian Kernighan + - title: pull-quote blockquote + mdast: + type: root + children: + - type: container + kind: quote + class: pull-quote + children: + - type: blockquote + children: + - type: paragraph + children: + - type: text + value: An important quote. + markdown: |- + > An important quote. diff --git a/packages/myst-to-md/tests/references.yml b/packages/myst-to-md/tests/references.yml index c8499a3d6a..4bde3188d7 100644 --- a/packages/myst-to-md/tests/references.yml +++ b/packages/myst-to-md/tests/references.yml @@ -181,3 +181,41 @@ cases: value: markdown markdown: |- [Some % *markdown*](#example) + - title: crossReference - url fallback for remote refs + mdast: + type: root + children: + - type: crossReference + url: /other-page#section + children: + - type: text + value: Section 7 + markdown: |- + [Section 7](/other-page#section) + - title: crossReference - html_id fallback for resolved refs + mdast: + type: root + children: + - type: crossReference + kind: heading + resolved: true + html_id: oop-solow-growth + children: + - type: text + value: the next section + markdown: |- + [the next section](#oop-solow-growth) + - title: crossReference - html_id fallback for equation + mdast: + type: root + children: + - type: crossReference + kind: equation + resolved: true + html_id: solow-lom + enumerator: '1' + children: + - type: text + value: (1) + markdown: |- + [(1)](#solow-lom)