The Lab Notebook: Best Practices for Reproducible Research
Six months from now you will need to know exactly which seed produced Figure 3, which commit ran the ablation, and why you stopped using the v2 dataset. A working lab notebook is not paperwork — it is the cheapest insurance against re-running months of work. Here is how to keep one that actually pays off.
1. What a Lab Notebook Is Actually For
A lab notebook is not a diary, not a status report for your advisor, and
not a place to write polished prose. It is the answer to one question
asked six months from now: "What exactly did past-me do, and can I redo
it?" The realistic audiences, in order: (1) future-you trying to
reproduce a result for a rebuttal, (2) the labmate inheriting your code
after you graduate, (3) the reviewer who asks "what happens if you
change X," (4) the journal demanding a data-availability statement,
(5) the IRB or funding agency during an audit, (6) you again, eight
years later, when a citing paper claims your result does not replicate.
Every entry should serve at least one of these readers. If it serves
none, do not write it; if it serves all of them, you wrote it right.
Three Questions a Good Entry Answers
- What did you do today, in enough detail that a careful labmate could rerun it?
- What did you observe, including the result you did not expect?
- What will you do next, and why this and not something else?
2. Paper vs Electronic vs Hybrid: Pick the One You Will Actually Use
The notebook you keep daily beats the perfect one you set up and abandon
in three weeks. Three stable patterns: (1) Pure paper — bound, numbered
pages, signed and dated; still common in wet labs and required by some
institutions for IP reasons; offline, distraction-free, but impossible
to grep. (2) Pure electronic — Obsidian, Notion, LabArchives, Benchling,
or plain Markdown in a Git repo; searchable, syncable, embeds figures
and code, but easy to fragment across tools. (3) Hybrid — a paper
notebook for raw observations at the bench or whiteboard, scanned and
indexed into an electronic one within 24 hours; gives you the speed of
handwriting and the searchability of text. Whichever you pick, commit
for at least a full semester before switching; the cost of migration is
higher than the cost of a slightly worse tool.
Choosing the Right Format for Your Field
- Wet lab, regulated environment: paper or institutional ELN (LabArchives, Benchling)
- Computational/ML: Markdown in Git + experiment tracker (MLflow, W&B, Aim)
- Theory or math: paper for derivations, LaTeX or Obsidian for typed notes
- Mixed methods: hybrid, with a weekly 30-minute scan-and-index session
- Required by IP/regulatory: whatever your office of research says — non-negotiable
3. The One-Entry-Per-Experiment Rule
Resist the urge to keep one rolling "today" log. Create one entry per
experiment, indexed by date and a short stable ID (`2026-05-15-ablation-dropout`,
`EXP-0042`). Each entry has the same skeleton: hypothesis in one sentence,
method as a short bulleted protocol or a link to the exact commit hash,
what you actually ran (commands, parameters, seeds), what you observed
(numbers, plots, anomalies), and what you concluded. Keep failed
experiments — they are the most valuable entries six months later when
a reviewer asks "did you try X?". A negative result you logged saves
two weeks of re-running; a negative result you remember vaguely is
worth nothing.
Entry Template That Survives Contact With Reality
- Title + date + stable ID (use the same ID in filenames, commits, run names)
- Hypothesis: one sentence — what you expect and why
- Method: link to commit, list parameter changes from the previous run
- Run: exact command, seed, hardware, wall-clock time
- Result: numbers in a table, screenshot of the key plot, raw log path
- Interpretation: one paragraph, including what surprised you
- Next step: the single concrete action this entry implies
4. Versioning, Seeds, and the Provenance Trail
Three pieces of information rescue most "I cannot reproduce my own
result" emergencies: the exact code commit, the exact data version, and
the exact random seed. Make all three impossible to lose. Tag the commit
hash in the entry (`git log -1 --format=%H` pasted in). Pin the dataset
with a content hash or a versioned reference (DVC, Git-LFS, S3 versioning,
or a frozen tarball with a SHA-256 in the entry). Set the seed
explicitly in code and write it down — "default" is not a seed. For
ML experiments, also log the CUDA version, library versions
(`pip freeze > env.txt`), and hardware (`nvidia-smi`); a number of
"irreproducible" papers are reproducible if you match the CUDA minor
version. The whole provenance triple takes 30 seconds to capture and
saves weeks.
5. Plots, Tables, and Raw Data Stay Together
A finding in your notebook should let you click through to the raw data
and the script that made the figure, without hunting. Standard layout:
one directory per experiment ID, containing `run.sh` (the exact command),
`config.yaml` (parameters), `logs/` (stdout/stderr), `outputs/` (model
checkpoints, raw measurements), `figures/` (plot images and the
notebook that generated them), and a `README.md` (the notebook entry
itself). The notebook entry links into this directory. Never copy a
number from a plot into prose without also linking the source file —
next year, when the plot looks wrong, you need to find the data in
under one minute, not one afternoon. For wet-lab notebooks, the
equivalent is: tape the printed gel/blot/sequence trace into the
notebook page, with the file path of the raw scan written next to it.
Directory Layout That Stays Manageable Through Year Five
- One top-level folder per project, one subfolder per experiment ID
- Use ISO dates (2026-05-15) and zero-padded numbers (EXP-0042) for sortable filenames
- Never put data inside the code repo — symlink or reference by path
- Keep raw data read-only at the OS level; you cannot accidentally overwrite chmod -w
- Back up the project folder weekly to a second location, not just iCloud
6. Weekly Index, Monthly Review, Quarterly Audit
A notebook with 400 entries and no index is write-only. Three review
cadences fix this. Weekly (15 minutes, end of Friday): write a 5-line
summary of the week's experiments at the top of a `weekly/` page, link
to the entries, and flag any results that need follow-up. Monthly
(30 minutes): re-read the weekly summaries, write a one-paragraph
"state of the project" — what is the current best result, what is the
next blocker, what experiments are dead branches. Quarterly (1 hour):
pick three random old entries and try to reproduce them from the
notebook alone, without your memory; gaps you find tell you what the
template is missing. The reproduction audit is the single highest-value
hour in your research quarter; it catches drift before it costs you a
paper.
7. Mistakes to Stop Making This Week
Six anti-patterns that ruin otherwise-good notebooks. (1) "I will write
it up later" — by the time later arrives, the numbers and the rationale
are gone; write the entry the day you run the experiment. (2) Editing
old entries to make them look smarter — instead, add a dated
"addendum" so the history is preserved. (3) Storing only the final
plot — keep the intermediate ones; the rejected plot is often the
paper's most valuable figure. (4) Sharing the notebook only via Slack
screenshots — your advisor cannot search Slack a year later; link the
entry. (5) Writing for an audience that does not exist (a hypothetical
tenure committee) — write for the reviewer who will ask one specific
question. (6) Skipping the entry on a "small" experiment — most papers
are won or lost by which small experiments you remember a year later.
PhD graduate who spent years tracking conference deadlines across computer science and engineering. Built ScholarDue after missing a submission window in the final year of candidacy and realizing no single tool tracked CFPs, extensions, and notification dates in one place.
En savoir plus→