How to make a step-by-step tutorial without recording video

Most tutorials don't need to be videos. They became videos because Loom made it easy and YouTube tutorials normalised the shape. The default isn't the best fit for the job.

For a how-to that someone will skim, search, copy from, and return to six months later, video is one of the worst formats. It's the equivalent of explaining a phone number by singing it. Beautiful. Memorable. Useless when the recipient just wants to dial.

This post argues for the text-and-screenshot tutorial — a numbered sequence with one screenshot per step, the URL of every page, and a short description of what to do. It's faster to make, faster to consume, easier to update, and easier to translate. It loses on warmth and on the “here's a friendly face” effect, and that loss is sometimes worth taking and sometimes not. We'll get to when.

The four costs of a video tutorial

Before the alternative, the trade-off video makes that people don't talk about.

Production cost. A 5-minute video usually takes 30–60 minutes of recording, retakes, and trimming. A text tutorial of equivalent content takes 10–15 minutes if the screenshots are auto-captured.

Consumption cost. A 5-minute video takes 5 minutes to consume. A text tutorial of equivalent content takes 1–2 minutes to skim, and the reader can stop where they have what they need.

Search cost. Video isn't indexable the way text is. A reader who knows the answer is in the tutorial somewhere has to scrub. A reader of a text tutorial Ctrl-Fs.

Update cost. The biggest one. When the UI changes, the video is dead. You re-record from scratch or annotate “the button is now in a different place.” A text tutorial gets one screenshot replaced and a couple of words edited.

The aggregate effect: video tutorials get made once, consumed by a few people, and then go stale. Text tutorials get made, consumed, edited, consumed, edited, consumed.

The shape of a tutorial that gets used

A tutorial is not a manual. It teaches one specific outcome — “set up a custom domain,” “export your contacts to CSV,” “invite a teammate to the workspace.” Three structural rules.

One outcome per tutorial. If you're writing “how to use the platform,” you're writing a manual, not a tutorial. Pick the outcome and cut everything else.

One action per step. “Click Settings, then Custom Domains” is two steps, not one. Splitting them lets the reader confirm progress at each click. Combining them obscures the failure point when something doesn't match the screenshot.

One screenshot per step. The screenshot shows the state before the action, with the relevant element clearly visible (and ideally highlighted). After-state screenshots are sometimes useful for the “done” signal but usually redundant.

What goes in each step

Three things, in this order:

The screenshot. Cropped to the relevant area. Annotated with an arrow or callout if the target element isn't obvious. Not the whole desktop unless the whole desktop is the point.
A short instruction. “Click the ‘Add domain’ button at the top right.” Six to twelve words. The reader needs to be able to read it in two seconds.
The URL of the page. Either in the screenshot (if the address bar is visible) or rendered as a clickable link below the screenshot. Lets the reader paste the URL into their browser and land where the tutorial expects them to be.

That's the body of the tutorial. The header has the title and a one-sentence description of the outcome. The footer has “you're done when...” and a link to a related tutorial if there's a logical next step.

The fastest way to produce one

The thirty-minute method:

Minute 0–5. Write the title, the one-sentence outcome, and the “you're done when” signal. Don't click anything yet. The frame helps you stay on scope.

Minute 5–20. Hit record. Do the procedure. Each click and navigation captures a screenshot, a URL, and the clicked element with its visible text. Don't pause to annotate. Just do the work.

Minute 20–30. Walk through the recording. Edit each step:

Fix the auto-generated title if it's wrong.“Click button” becomes “Click ‘Add domain’ in the Custom Domains section.”
Drop a redaction box on anything sensitive — your personal email, internal IDs in URLs, anything you'd rather not share publicly. The original screenshot is preserved, so if you change your mind tomorrow you can remove the box without re-recording.
Delete the steps that turned out to be detours. If you opened a second tab to copy a value and came back, the detour shouldn't be in the tutorial.
Add an annotation arrow on the steps where the target button is small or hard to spot.

Publish. Get a public URL like go.uihike.com/published/{UUID} with comments enabled. Or export to Markdown and paste into your help center, your blog, or your wiki.

Why a tool that captures structure beats one that captures pixels

Snagit, CleanShot, and the macOS screenshot key all capture pixels. The picture is what they save. UIHike captures the screenshot too, but each step also stores the URL, the CSS selector of the clicked element, the visible text on that element, and the associated label. The screenshot is the visible artifact; the structure is what makes the tutorial searchable, translatable, and editable.

Concretely:

Searchable. Your help center can index the visible text of clicked elements as well as the step descriptions. A reader searching for “Add domain” finds the right step, not a fuzzy match on the closest paragraph.
Translatable. The step descriptions are text. A translator translates strings, not pixels. The screenshots stay; the language adapts.
Editable. If the “Add domain” button gets renamed to “New domain,” you re-record the one step or fix the description in place. You don't re-cut a video.

When video is the better choice

Three honest cases:

Tone matters more than precision. A welcome video, a launch announcement, a community tutorial where the personality of the presenter is the point. Text doesn't carry that.
The procedure depends on timing. Drag-and-drop interactions, animations, audio cues, anything where “wait for the spinner” is hard to describe. A 10-second video of the gesture beats a paragraph.
The tutorial is the marketing. A YouTube how-to whose primary job is SEO and discovery, not internal use. Different goals, different format.

For most other cases — the help center article, the internal SOP, the customer success enablement doc, the support reply that links to a how-to — text-and-screenshots wins.

Hosting and distribution

Once you have the tutorial, the format question is where it lives. The same recording exports to several:

Public hosted page. A URL like go.uihike.com/published/{UUID} with comments and reactions on every step. No login wall.
Markdown. Drop into Notion, Confluence, GitHub README, Substack post, your blog's static site generator.
HTML. A standalone file you can drop on any static host or attach to an email.
PDF. For training packets, printed binders, or anywhere the recipient wants a single file.

The same tutorial can ship in three places at once, from one source. When the procedure changes, you re-record once and re-export. The video equivalent of that workflow doesn't exist.

What you give up

Two honest gaps.

The first is presence. A friendly face on camera builds rapport in a way text doesn't. For a tutorial that's also doing a sales or relationship job, that's a real loss.

The second is the very first impression. A polished 5-minute video looks more impressive on the landing page than a clean text tutorial. People who don't actually need the tutorial will judge the video higher. People who do need the tutorial prefer the text by a wide margin.

If your tutorial is on the marketing surface and doing a marketing job, accept the trade-off. If it's in the help center or the internal wiki, skip the video.

The next tutorial

Pick a procedure your team explains over and over in Slack or in support tickets. The next time you go to explain it, hit record instead. You'll be done in thirty minutes, and the next person who asks gets a link.

Try UIHike on the how-to-question your team gets asked most often. Once you've replied to it with a link three times, you'll feel the difference between writing a tutorial and recording one.

— The UIHike team