Table of Contents

When documentation is the problem #

When a software tool piques my curiosity, I want to try it. My path to a working example could be short and sweet: install, run, see it work.

Not infrequently though I’m met with inconsistent terms, missing prerequisites, outdated command flags, or examples that no longer match the current release. Nothing catastrophic. Just a little drift over here, a little rot over there. Individually, these are minor issues, but collectively, they interrupt my momentum. Instead of validating the tool, I’m debugging the documentation.

I’m patient to a degree, but if the initial experience becomes an exercise in reconciliation of documentation with reality, my attention shifts elsewhere and rarely returns.

This is an avoidable loss. A team may spend months to build the software, launch it, and attract a motivated user, only to discourage that user at the point of adoption.

How good documentation goes bad #

No serious project sets out to produce poor documentation. The issue is not disinterest or incompetence; it is entropy. Poor documentation is not a moral failing, but a systems failure.

Software features are gated by tests. Engineers update unit and integration tests alongside their changes, and those tests must pass in continuous integration (CI) before anything ships.

Documentation rarely benefits from equivalent enforcement. A new feature might be documented, but what ensures it hasn’t broken an existing “hello, world” flow or another end-to-end example? Will the documenters rerun those examples? Will they rerun them in a clean environment?

In practice, gaps emerge and users compensate by burning credits, time, and attention to bridge them.

A hopeful interlude #

A few years ago I needed to document a computation so others could verify the results. Remembering my school days with Maple, I gave Google Colab a try.

Colab allowed me to interleave prose and executable code behind my computation in a single document. More importantly, Colab hosted the document and its environment. As a result, executing the document created the same materialized view of the computation for me and my readers.

This raised a broader question. Could we create materialized views not just for a computation, but for any procedure described by documentation?

I loved this idea – our users and engineers did not. Users were confused by all the Python needed to make a Colab notebook run. They just wanted a clean rendering of correct instructions. Our engineers were reluctant to integrate a hosted notebook into development, testing, and release pipelines.

Still, the core insight remained. If explanation, code, and environment are fixed and validated during the release process, the resulting artifact can enforce documentation consistency and integrity. It becomes a materialized view of the system, aligning the realities of engineers, documenters, and users alike.

Literate documentation #

In leading documentation efforts at Blocky and evaluating our competitors’ software through their documentation, I noticed that most projects attempt to help users to:

Download, install, and configure the software
Run a “hello, world” example
Work through specific workflows showcasing advanced features
Understand the system’s operational envelope of inputs, options, and outputs

These are not merely descriptions, but procedures with claims about system behavior. We could look at documentation as a shared contract between users and engineers over system behavior. That contract maintains consistency (of terminology) and integrity (of functionality) as the software evolves.

What is needed is a method for developing and verifying these procedures for consistency and integrity, not occasionally, but continuously and with the same rigor as the software, whose behavior these procedures describe.

In thinking about how to achieve this in my company, we started to group the following concepts under the umbrella term of literate documentation.

Co-located prose and executable artifacts
Based on literate programming, where explanation and action live in the same source file, reducing drift between what is said and what is done.
Build-time execution and output capture
Inspired by notebook programming, where examples are executed as part of the build, and their outputs are materialized directly into the rendered documentation.
Shared variables across text and execution contexts
Drawing from macro systems and more recently tex, where flags, versions, paths, and identifiers are defined once and reused.
Assertion of expected outputs
Borrowed from Python doctests, where examples do not merely run, but validate that observed behavior matches declared expectations.
Reproducible, pinned execution environments
Based on nix, where the documentation executes in a controlled environment that mirrors release conditions.

Under a continuous integration (CI) authority, literate documentation is executed and verified as a part of the build process. Failures block release, which makes documentation subject to the same release safeguards as production code.

Literate documentation sits alongside several established practices, such as docs-as-code, Behavior-driven development (BDD), and acceptance testing, extending them in a specific direction.

Docs-as-code brings documentation into version control and modern review workflows. Literate documentation adds build-time execution and verification, ensuring that examples and procedures remain mechanically aligned with the released system.

BDD and end-to-end testing validate system behavior from a product perspective. Literate documentation validates the accuracy of the published instructions that guide users in invoking that behavior.

Tools that generate documentation from comments, type annotations, or OpenAPI specifications describe interfaces and structural contracts. Literate documentation verifies procedures, ensuring that installation steps, commands, configurations, and multi-step workflows execute and produce the declared results.

Acceptance tests may confirm that a system works. Literate documentation ensures that the documented path to making it work remains correct, reproducible, and release-gated.

In sum, literate documentation complements existing practices by extending enforcement from code and interfaces to the user-facing procedures that describe how the system is actually used.

What is possible today? #

When we explored these ideas at Blocky, much of what we built was custom, which left me wondering whether literate documentation is achievable today using open source tooling. So, I went out to piece together an open source literate documentation tech stack. I found that while much of the literate documentation functionality is supported across various projects, and in some cases very well, important functionality gaps remain within the core literate documentation concepts.

Co-located prose and executable artifacts #

While Markdown lacks native support for literate programming, it provides two primitives and one property that I think make it well-suited for literate documentation. First, fenced code blocks

```lang
code
```

allow interspersing of text and code. Although code blocks alone do not implement literate programming, their info strings allow language-aware code rendering. Second, HTML blocks in Markdown allow for multiline HTML comments

<!-- 
multiline
comment 
-->

that can “hide” information from a Markdown renderer.

Finally, Markdown is simple. Unlike AsciiDoc, reStructuredText, or LaTeX, which may be familiar to technical writers, Markdown remains accessible to engineers and readable in its raw form without rendering. As a result, it remains the lingua franca for engineering and non-engineering documenters across a wide range of workflows.

Build-time execution and output capture #

With Markdown as the cornerstone markup layer, the next question is execution. While a read–eval–print loop (REPL) is sufficient for literate programming, literate documentation requires shell access to describe installation and configuration procedures, as well as software invocation examples.

Executable shell #

Several projects (PanPipe, PanHandle, ExecMD, RunDOC, runbook, rundoc, Codedown, gfm-code-blocks, run-code-inline) support executable shell blocks in Markdown. The one that I think solves many problems well is mdsh. For example, to flag a bash “hello, world” echo command for mdsh evaluation, we can put it in a fenced code block

```bash > text $
echo "hello, world"
```

The info string bash > text $ tells a Markdown renderer to render the code block with bash syntax highlighting. A renderer, however, will ignore the > text $ portion of the info string that tells mdsh to execute the commands after $ with bash and wrap the output in a text fenced code block

<!-- BEGIN mdsh -->
```text
hello, world
```
<!-- END mdsh -->

The BEGIN/END mdsh markers in HTML comments are invisible in the rendered page, but act as structural fences to facilitate mdsh idempotency. The reader sees the rendered bash code and text blocks

echo "hello, world"

hello, world

The output capture model of mdsh is also nice for developers, who can see both the command and its output in the same file as they work on it, or view it as a part of a pull request diff.

ExecMD can also run in place to add the output to the same file, but does so within the same code block

```bash
$ echo "hello, world"
hello, world
```

which doesn’t let the writer control the formatting language of the output, for example, when the bash command returns a JSON string. Other tools generate an output file by either replacing the bash code block in the output (PanPipe), or by adding a separate output code block.

Transitional text #

To facilitate a continuous narrative, Blocky’s documentation allowed developers to define transitional text between the command and its output. I extended mdsh with optional transitional text specified after ::, as in

```bash > text $ :: outputs
echo "hello, world"
```

<!-- BEGIN mdsh -->
outputs

```text
hello, world
```
<!-- END mdsh -->

RunDOC allows writers to add text before, or inside the output code block, though I find its syntax for doing so convoluted.

Suppressing command rendering #

In some cases, only the output should be rendered. mdsh supports this as well by executing commands embedded in an HTML comment

<!-- > text $ 
echo "only output" 
-->

PanHandle can also suppress command rendering though with much more effort.

Suppressing output rendering #

An important functionality for me that’s missing from mdsh is the ability to run the command, but suppress output. For example, I may want to direct the user to download a file with curl and have mdsh execute the command to make sure it works, but not to render the output of the command as a separate fenced code block.

Of course, I can suppress curl output with a flag

curl -s ...

or redirect it

curl ... > /dev/null 2>&1

though in both cases this extra syntax is distracting to the reader.

In developing Blocky’s documentation we created a custom Markdoc tag to run bash commands

{% bash code=true output=false %}
curl ...
{% /bash %}

with explicit options to render bash code and its captured output. RunDOC syntax also allows users to control the rendering of code and output independently, while PanPipe and run-code-inline have the means to suppress output.

Failures and Debugging #

Occasionally, we want to document that a particular command fails. ExecMD allows writers to start bash commands with $? and RunDOC with fail.$ to indicate they will return a non-zero exit code.

If a command fails unexpectedly, it is useful to be able to re-run it selectively for a faster edit-compile-run (ECR) cycle. rundoc supports it by allowing writers to tag their bash commands

```bash#custom-branch#v2#test
echo "custom-tagged code block"
```

and then invoking the tool only for particular tags

rundoc run -t bash#test input.md

Stateful procedures #

When I document a stateful procedure, such as software installation steps, I want to be able to replicate locally the side effects created by the documented commands on the user’s machine. For example, I may direct the user to clone a repo

git clone https://github.com/mwittie/go-hello-world > /dev/null 2>&1

and then execute commands like

cd go-hello-world && go build

and

cd go-hello-world && ./go-hello-world

to check that it outputs

hello, world

Again, the cd go-hello-world prefix for each command is distracting to the reader. It would be nice to tell the user to cd go-hello-world and then run the different commands without the prefix. mdsh allows us to specify a runtime directory at invocation, but not for each command within a Markdown file. At Blocky, we solved this by allowing writers to specify a dir attribute on the bash tag

{% bash id="file.md:394" dir="go-hello-world" code=true output=false %}
go build
{% /bash %}

to execute the command from within the specified directory.

ExecMD has a more general solution

```bash INIT 
$ CMD
```

where the INIT command is executed before the CMD, but only the CMD and the captured output of CMD are visible to the Markdown rendering engine.

File inclusions and extraction #

Some documentation platforms support file inclusion and content extraction. For example, to pull out the main function from go-hello-world/main.go in Redocly, we can use the code-snippet tag

{% code-snippet file="go-hello-world/main.go" from=5 to=9 language="go" /%}

The problem with this approach is that it requires documenters to manually update extraction line numbers, or at least verify them, when code in main.go changes.

AsciiDoc solves this problem by allowing content extraction by tags

[source,go]
----
include::go-hello-world/main.go[tag=example]
----

if they are added to the source code

// tag::example[]
func main() {
	fmt.Println("hello, world")
}
// end::example[]

This approach is more stable from the documenter’s perspective, but it pollutes the source code.

With executable bash, however, we can simply use ast-grep to pull out the relevant code

ast-grep --pattern 'func main() { $$$ }' --json=stream go-hello-world/main.go \
| jq -r '.text'

which extracts

func main() {
	fmt.Println("hello, world")
}

At Blocky, we developed pluck to make the process a bit easier, especially when we want to also extract the preceding doc comment for a type.

For example

go run github.com/blocky/pluck/cmd/pluck@v0.1.1 \
--input go-hello-world/main.go --pick function:main

extracts

// main prints "hello, world" to standard output.
func main() {
	fmt.Println("hello, world")
}

Other runtimes #

While mdsh supports bash, literate documentation would ideally support other scripting languages, such as Python. mdsh gives a nod to this idea and allows the execution of Python scripts, but requires them to include a shebang line. For example

printf "#!/usr/bin/env python\nprint('hello, world')\n" > test.py
chmod u+x test.py
./test.py

outputs

hello, world

Clearly, this is more cumbersome than simply invoking a Python code snippet like in Jupyter with

```python > text
print("hello, world")
```

though multiple commands would require recycling the runtime and managing per document runtime state correctly. Outside of mdsh, runbook and rundoc already support several other runtimes including Bash, PowerShell, JavaScript, TypeScript, Python, and Go.

While we are on other runtimes, I would find it helpful if mdsh supported declarative diagramming runtimes, like Mermaid or D2, out of the box. For example

<!-- mermaid > svg
graph LR;
   A[Lemons]
   B[Lemonade]
   C[Profit]
   A --> B
   B --> C
-->

could produce

<!-- BEGIN mdsh -->
![diagram](diagram.svg)
<!-- END mdsh -->

Yes, it is possible to render a Mermaid diagram on many Markdown platforms

graph LR; A[Lemons] B[Lemonade] C[Profit] A --> B B --> C

but for using D2, or even Mermaid with custom rendering options, command line rendering would be preferable.

Shared variables across text and execution contexts #

One of the things I really like about writing papers in LaTeX is how easy it is to keep terminology throughout a document consistent by using macros. Macros can be (ab)used to do a lot of things, but one of their most useful features is the support for variables. I can declare

\newcommand{\fn}{foo()}

so that every time I want to refer to the function in my paper I can just write \fn, which will be replaced with foo(). The powerful thing about the LaTeX macro system is that replacement doesn’t just happen in text, but also in other environments.

Markdown lacks native support for variables; however, static site generators like Hugo support shortcodes or content frameworks like Markdoc support tags, which can replicate that functionality for replacement in text. runbook allows templating with the handlebars syntax {{ }}, but the replacements are loaded only once at command invocation.

Where these approaches fall short is being able to replace variable values in executable environments, since tag substitution happens later in the rendering pipeline.

mdsh supports setting of variables, but only for its bash runtime.

```! foo=bar
```

```bash > text $
echo $foo
```

<!-- BEGIN mdsh -->
```text
bar
```
<!-- END mdsh -->

mdsh can also load variables from a .env file

```!< .env
```

as can RunDOC. This creates the possibility of writing to the .env file in one command, and then loading it in another for subsequent access. PanHandle uses a similar approach for saving variables in Haskell files

```{.haskell pipe="tee -a tangled.hs"}
foo = "Hello"

```

```{.haskell pipe="tee -a tangled.hs"}
bar = "World"

```

```{.haskell pipe="ghci -v0"}
:load tangled.hs
print (foo ++ " " ++ bar)
```

Still, these approaches don’t come close to the wide scope of LaTeX macros across text and embedded environments. For documentation to maintain internal consistency, literate documentation needs more mature macro definition and rendering support.

Assertion of expected outputs #

Now that we have a way to document procedures via integrated narrative, executable commands, and captured output, we can treat these as claims about system behavior. In other words, the documentation can become a contract between the user and the system and a test suite of that contract.

The idea of linking implementation and expected behavior comes from Python doctests. The documentation block for a hello_world function

def hello_world():
    """Print a greeting.

    >>> hello_world()
    hello, world
    """

    print("hello, world")

contains an invocation test case after >>> and the expected output on the next line. Invoking

python3 -m doctest hello.py -v

runs file test cases and verifies their output:

Trying:
    hello_world()
Expecting:
    hello, world
ok
1 item had no tests:
    hello
1 item passed all tests:
   1 test in hello.hello_world
1 test in 2 items.
1 passed.
Test passed.

The Go testing package offers a similar approach and allows examples that combine invocations and their expected output. Here example_test.go contains a package Example() test case

func Example() {
	main()
	// Output: hello, world
}

which specifies that our earlier main() function should output hello, world. Running

cd go-hello-world && go test --count=1 -run Example .

outputs

ok  	github.com/mwittie/go-hello-world	0.002s

The challenge is that neither of these approaches works well for shell-based procedures. At Blocky, our documentation described the use of our CLI tool, bky-as, and so we wanted to define our examples using shell commands, not code invocations.

We used the testscript package to process .txtar files, which define input files, shell commands, and assertions over stdout. For example

# [execute] call the bky-as CLI to attest a function call
stdin fn-call.json
exec bky-as attest-fn-call
cp stdout out.json

# [check] assert stderr and stdout expected values
exec jq -r '.transitive_attested_function_call.claims.output | @base64d' out.json
stdout 'hello, world'

specifies that invocations of the bky-as attest-fn-call command to run a function specified in fn-call.json should produce stdout output matching hello, world.

We used these tests as a contract between engineers and documenters. The engineers ensured the test suite was passing for every new release and the documenters used only the tested invocations in documentation examples. While this approach got us closer to making sure our documentation was functional in that it was based on tested procedures, two gaps remained. First, if the documenters needed to change the example, they needed to dive into the test suite, which wasn’t as straightforward as simply updating Markdown bash code blocks and their output. Second, the documentation itself wasn’t verified to match the test suite. A changed test suite example would not necessarily propagate to the documentation. As a result, the documentation itself wasn’t yet an authoritative contract between us and our users.

Fortunately, mdsh offers a glimpse of a better approach. Invoking it with the --frozen flag will generate a File modified error if the newly captured output does not match the previous output already in a Markdown file. This check is useful in a CI pipeline to make sure that documented commands behave consistently across pull requests. A major shortcoming of mdsh is that an error does not tell me which bash command has the output mismatch, or what the mismatch is.

Another improvement I would like to see is the ability to pass captured command output as input to another command. In txtar I can copy stdout to a file and then validate it through separate commands. Redirecting command output would give developers more side-by-side visibility into discrepancies than under the mdsh model of reporting a mismatch with respect to a file that no longer exists. Redirecting also allows developers to explore not just equality, but equivalence, when the output of the system under test is not deterministic (e.g. timestamps, LLM output).

Complete assertion support in a literate documentation system would capture command output and compare it against declared validations with precise diagnostics when they fail. At that point, the documentation itself becomes a test suite with every test failure a contract violation that blocks release.

Reproducible, pinned execution environments #

For documentation to function as a contract between developer and user, its behavior must be consistent on both of their machines.

Without control over the execution environment, consistent behavior is a Sisyphean task. Something as minor as different curl versions between Linux and macOS can silently break a documented procedure.

Attempting to account for these differences within the documentation itself, through setup or configuration steps, does not scale. It increases complexity for the reader and delays the path to a working example, while still leaving gaps in coverage.

As discussed in the hopeful interlude, systems such as hosted notebooks achieve execution consistency by coupling the document with its runtime. The same principle can be applied to general documentation workflows through reproducible environment tooling.

At Blocky, we used Nix to define the execution environment declaratively. Nix Flakes allow all required dependencies, including system packages, language runtimes, and tooling, to be specified in a single artifact with all versions pinned.

Documentation commands are executed inside this environment

nix develop --ignore-environment --command make build

where the --ignore-environment flag prevents inheritance of host environment variables and installed packages. Locally installed packages and environment variables are excluded, which enforces isolation.

Using Nix allowed us to enforce the same execution environment for developers, documenters, the CI pipeline, and users. In continuous integration, we integrated Nix using GitHub Actions with cachix/install-nix-action and cachix/cachix-action to ensure consistency and enable caching of dependencies to speed up subsequent builds.

For users, the workflow reduces to entering the defined environment

nix develop --ignore-environment

where all documented commands execute as written, without additional setup, against a configuration that is tested and versioned alongside the documentation.

Alternative approaches such as Docker containers and virtual machines can provide isolation, but they differ in tradeoffs. Containers are widely used but can introduce variability through base images and layering semantics. Virtual machines provide stronger isolation but incur higher operational cost. Nix provides strict reproducibility with a comparatively lightweight model.

By distributing the environment definition, for example, through a flake.nix file used for the commands in this post, the documentation includes not only instructions but also the context required to execute them. This removes a major source of drift.

The result is that a user entering the defined environment can follow the documented procedures exactly as written, against the same configuration that was tested when the documentation was released.

Conclusion #

Literate documentation treats documentation as an executable specification of system behavior rather than a descriptive artifact.

By co-locating prose and executable commands, running them during the build, asserting outputs, and constraining execution to a reproducible environment, documentation becomes subject to the same verification discipline as code.

This turns documentation into a mechanically enforced contract. Documented procedures are continuously executed and validated, and drift is surfaced as a build failure rather than discovered by users.

The result is alignment between documentation and actual system behavior. The path described is the path that has been executed and verified.

Literate documentation extends practices such as docs-as-code and testing by enforcing correctness at the level of user-facing procedures.