forked from jupyter-book/mystmd
-
Notifications
You must be signed in to change notification settings - Fork 0
CommonMark ipynb export + image attachment embedding #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mmcky
wants to merge
29
commits into
main
Choose a base branch
from
myst-to-ipynb
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
0b8268a
📓 Add `ipynb` as export format
agoose77 3d20828
fix some tests and comment the failing ones to fix in upcoming commits
kp992 09f1a54
fix tests
kp992 ab8f6ef
fix failure in myst-cli
kp992 137f9a8
update the tests and keep the split lines logic
kp992 d889e7a
Revert merging md blocks and update 2 sample test cases
kp992 e72ff5c
Add ipynb in validators (#2159)
kp992 4a989de
fix: myst-to-ipynb bug fixes for kernelspec, markers, and metadata
mmcky 44571dc
feat(myst-to-ipynb): add CommonMark serialization mode
mmcky 925a194
test: expand myst-to-ipynb test suite (30 cases)
mmcky 51d6524
fix: strip identifier/label from nodes, drop mystTarget/comment, filt…
mmcky f6a8586
feat: add image node handler to CommonMark transform
mmcky 85b6fcc
feat: add image attachment embedding option for ipynb export
mmcky 34a59d6
fix: lint formatting + add ipynb export documentation
mmcky fe2c295
fix: break circular dependency between attachments.ts and index.ts
mmcky 27c2c15
fix: strip leading slash from image URLs + fix misleading comment
mmcky c5eea1c
fix: unwrap resolved include directives in CommonMark transform
mmcky 6837df2
fix: lift code-cell blocks from gated exercise/solution nodes in ipyn…
mmcky c0c4e33
fix: handle real AST structure where exercise/solution share a block
mmcky 033cd77
fix: serialize epigraph/pull-quote/blockquote containers in CommonMar…
mmcky 6d6ba98
debug: add crossReference empty-URL instrumentation and node.url fall…
mmcky bff99f8
fix: use html_id as fallback for crossReference URLs in CommonMark ex…
mmcky 89a57e5
refactor(myst-to-md): remove MYST_DEBUG_XREF instrumentation
mmcky 59267a8
style: fix prettier formatting in myst-to-ipynb and myst-to-md
mmcky 7ec2d37
Fix image attachment regex for escaped markdown characters
mmcky 65e71ce
Fix prettier formatting
mmcky 43edd9e
Fix lint errors: remove shadowed variable and useless escape
mmcky 0a5563d
Merge branch 'main' into myst-to-ipynb
mmcky b76866e
Merge branch 'main' into myst-to-ipynb
mmcky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| --- | ||
| "myst-frontmatter": patch | ||
| "myst-to-ipynb": patch | ||
| "myst-cli": patch | ||
| --- | ||
|
|
||
| Add ipynb as export format |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| --- | ||
| title: Creating Jupyter Notebooks | ||
| description: Export MyST documents to Jupyter Notebook (.ipynb) format with optional CommonMark markdown and embedded images. | ||
| --- | ||
|
|
||
| You can export MyST documents to Jupyter Notebook (`.ipynb`) format using `myst build`. The exported notebooks can use either MyST markdown (for use with [jupyterlab-myst](https://github.com/jupyter-book/jupyterlab-myst)) or plain CommonMark markdown compatible with vanilla Jupyter Notebook, JupyterLab, and Google Colab. | ||
|
|
||
| ## Basic usage | ||
|
|
||
| Add an `exports` entry with `format: ipynb` to your page frontmatter: | ||
|
|
||
| ```{code-block} yaml | ||
| :filename: my-document.md | ||
| --- | ||
| exports: | ||
| - format: ipynb | ||
| output: exports/my-document.ipynb | ||
| --- | ||
| ``` | ||
|
|
||
| Build the notebook with: | ||
|
|
||
| ```bash | ||
| myst build my-document.md --ipynb | ||
| ``` | ||
|
|
||
| Or build all ipynb exports in the project: | ||
|
|
||
| ```bash | ||
| myst build --ipynb | ||
| ``` | ||
|
|
||
| ## CommonMark markdown | ||
|
|
||
| By default, exported notebooks use MyST markdown in their cells. If you need compatibility with environments that don't support MyST (vanilla Jupyter, Colab, etc.), set `markdown: commonmark`: | ||
|
|
||
| ```{code-block} yaml | ||
| :filename: my-document.md | ||
| --- | ||
| exports: | ||
| - format: ipynb | ||
| markdown: commonmark | ||
| output: exports/my-document.ipynb | ||
| --- | ||
| ``` | ||
|
|
||
| With `markdown: commonmark`, MyST-specific syntax is converted to plain CommonMark equivalents: | ||
|
|
||
| ```{list-table} CommonMark conversions | ||
| :header-rows: 1 | ||
| - * MyST syntax | ||
| * CommonMark output | ||
| - * `:::{note}` admonitions | ||
| * `> **Note**` blockquotes | ||
| - * `` {math}`E=mc^2` `` roles | ||
| * `$E=mc^2$` dollar math | ||
| - * `$$` math blocks | ||
| * `$$...$$` (preserved) | ||
| - * `:::{exercise}` directives | ||
| * **Exercise N** bold headers | ||
| - * `:::{proof:theorem}` directives | ||
| * **Theorem N** bold headers | ||
| - * Figures with captions | ||
| * `` with italic caption | ||
| - * Tab sets | ||
| * Bold tab titles with content | ||
| - * `{image}` directives | ||
| * `` images | ||
| - * `(label)=` targets | ||
| * Dropped (no CommonMark equivalent) | ||
| - * `% comments` | ||
| * Dropped | ||
| ``` | ||
|
|
||
| ## Embedding images as cell attachments | ||
|
|
||
| By default, images in exported notebooks reference external files. To create fully self-contained notebooks with images embedded as base64 cell attachments, set `images: attachment`: | ||
|
|
||
| ```{code-block} yaml | ||
| :filename: my-document.md | ||
| --- | ||
| exports: | ||
| - format: ipynb | ||
| markdown: commonmark | ||
| images: attachment | ||
| output: exports/my-document.ipynb | ||
| --- | ||
| ``` | ||
|
|
||
| With `images: attachment`: | ||
| - Local images are read from disk and base64-encoded | ||
| - Image references become `` | ||
| - Each cell includes an `attachments` field with the image data | ||
| - Remote images (http/https URLs) are left as references | ||
|
|
||
| This is useful for distributing notebooks, uploading to Google Colab, or sharing via email where external image files may not be available. | ||
|
|
||
| ## Export options | ||
|
|
||
| ```{list-table} ipynb export options | ||
| :header-rows: 1 | ||
| - * Option | ||
| * Values | ||
| * Description | ||
| - * `format` | ||
| * `ipynb` | ||
| * Required — specifies notebook export | ||
| - * `output` | ||
| * string | ||
| * Output filename or folder | ||
| - * `markdown` | ||
| * `myst` (default), `commonmark` | ||
| * Markdown format for notebook cells | ||
| - * `images` | ||
| * `reference` (default), `attachment` | ||
| * How to handle images — references or embedded attachments | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| import fs from 'node:fs'; | ||
| import path from 'node:path'; | ||
| import mime from 'mime-types'; | ||
| import { tic, writeFileToFolder } from 'myst-cli-utils'; | ||
| import { FRONTMATTER_ALIASES, PAGE_FRONTMATTER_KEYS } from 'myst-frontmatter'; | ||
| import { writeIpynb } from 'myst-to-ipynb'; | ||
| import type { IpynbOptions, ImageData } from 'myst-to-ipynb'; | ||
| import { filterKeys } from 'simple-validators'; | ||
| import { selectAll } from 'unist-util-select'; | ||
| import { VFile } from 'vfile'; | ||
| import { finalizeMdast } from '../../process/mdast.js'; | ||
| import type { ISession } from '../../session/types.js'; | ||
| import { logMessagesFromVFile } from '../../utils/logging.js'; | ||
| import { KNOWN_IMAGE_EXTENSIONS } from '../../utils/resolveExtension.js'; | ||
| import type { ExportWithOutput, ExportFnOptions } from '../types.js'; | ||
| import { cleanOutput } from '../utils/cleanOutput.js'; | ||
| import { getFileContent } from '../utils/getFileContent.js'; | ||
| import { getSourceFolder } from '../../transforms/links.js'; | ||
|
|
||
| export async function runIpynbExport( | ||
| session: ISession, | ||
| sourceFile: string, | ||
| exportOptions: ExportWithOutput, | ||
| opts?: ExportFnOptions, | ||
| ) { | ||
| const toc = tic(); | ||
| const { output, articles } = exportOptions; | ||
| const { clean, projectPath, extraLinkTransformers, execute } = opts ?? {}; | ||
| // At this point, export options are resolved to contain one-and-only-one article | ||
| const article = articles[0]; | ||
| if (!article?.file) return { tempFolders: [] }; | ||
| if (clean) cleanOutput(session, output); | ||
| const [{ mdast, frontmatter }] = await getFileContent(session, [article.file], { | ||
| projectPath, | ||
| imageExtensions: KNOWN_IMAGE_EXTENSIONS, | ||
| extraLinkTransformers, | ||
| preFrontmatters: [ | ||
| filterKeys(article, [...PAGE_FRONTMATTER_KEYS, ...Object.keys(FRONTMATTER_ALIASES)]), | ||
| ], | ||
| execute, | ||
| }); | ||
| await finalizeMdast(session, mdast, frontmatter, article.file, { | ||
| imageWriteFolder: path.join(path.dirname(output), 'files'), | ||
| imageAltOutputFolder: 'files/', | ||
| imageExtensions: KNOWN_IMAGE_EXTENSIONS, | ||
| simplifyFigures: false, | ||
| useExistingImages: true, | ||
| }); | ||
| const vfile = new VFile(); | ||
| vfile.path = output; | ||
| // Build ipynb options from export config | ||
| const ipynbOpts: IpynbOptions = {}; | ||
| if ((exportOptions as any).markdown === 'commonmark') { | ||
| ipynbOpts.markdown = 'commonmark'; | ||
| } | ||
| if ((exportOptions as any).images === 'attachment') { | ||
| ipynbOpts.images = 'attachment'; | ||
| // Collect image data from the AST — read files and base64-encode | ||
| ipynbOpts.imageData = collectImageData(session, mdast, article.file); | ||
| } | ||
| const mdOut = writeIpynb(vfile, mdast as any, frontmatter, ipynbOpts); | ||
| logMessagesFromVFile(session, mdOut); | ||
| session.log.info(toc(`📓 Exported IPYNB in %s, copying to ${output}`)); | ||
| writeFileToFolder(output, mdOut.result as string); | ||
| return { tempFolders: [] }; | ||
| } | ||
|
|
||
| /** | ||
| * Collect base64-encoded image data from the mdast tree (Phase 1 of attachment embedding). | ||
| * | ||
| * Walks all image nodes via `selectAll('image', mdast)`, resolves their | ||
| * filesystem paths using `getSourceFolder` (handles both absolute `/_static/...` | ||
| * and relative paths), reads the files, and base64-encodes them into a map. | ||
| * | ||
| * The returned `Record<url, ImageData>` is passed to `writeIpynb` as | ||
| * `options.imageData`. Phase 2 (in `embedImagesAsAttachments`) then rewrites | ||
| * the serialized markdown to use `attachment:` references. | ||
| * | ||
| * Remote URLs (http/https) and data URIs are skipped — only local files are embedded. | ||
| */ | ||
| function collectImageData( | ||
| session: ISession, | ||
| mdast: any, | ||
| sourceFile: string, | ||
| ): Record<string, ImageData> { | ||
| const imageData: Record<string, ImageData> = {}; | ||
| const imageNodes = selectAll('image', mdast) as any[]; | ||
| const sourcePath = session.sourcePath(); | ||
|
|
||
| for (const img of imageNodes) { | ||
| const url = img.url ?? img.urlSource; | ||
| if ( | ||
| !url || | ||
| url.startsWith('http://') || | ||
| url.startsWith('https://') || | ||
| url.startsWith('data:') | ||
| ) { | ||
| continue; | ||
| } | ||
| if (imageData[url]) continue; // already processed | ||
|
|
||
| const sourceFolder = getSourceFolder(url, sourceFile, sourcePath); | ||
| const relativeUrl = url.replace(/^[/\\]+/, ''); | ||
| const filePath = path.join(sourceFolder, relativeUrl); | ||
|
|
||
| try { | ||
| if (!fs.existsSync(filePath)) { | ||
| session.log.debug(`Image not found for attachment embedding: ${filePath}`); | ||
| continue; | ||
| } | ||
| const buffer = fs.readFileSync(filePath); | ||
| const mimeType = (mime.lookup(filePath) || 'application/octet-stream') as string; | ||
| imageData[url] = { | ||
| mime: mimeType, | ||
| data: buffer.toString('base64'), | ||
| }; | ||
| } catch (err) { | ||
| session.log.debug(`Failed to read image for attachment: ${filePath}`); | ||
| } | ||
| } | ||
|
|
||
| return imageData; | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.