Back to articles index
Format comparisons

HTML escape vs URL encode vs Base64 — which one when?

Compare three encodings web developers reach for daily, by purpose (safe embedding / transport compatibility / binary-to-text), scope of application, and character set. Includes common misuse patterns.

“Escaping” is a single word for three different jobs

In web work the phrase “escape that string” gets used for a wide range of conversions, but HTML escaping, URL encoding (percent-encoding), and Base64 are three distinct operations. They run in different contexts, target different character sets, exist for different reasons, and grow the byte count by very different amounts. Conflating them is how XSS slips past a “sanitised” template, how a Base64 string pasted into a URL silently breaks, and how parameters built with encodeURI get cut in half by an unescaped &.

Four axes guide the choice. Where does the output land (HTML body / URL / binary transport)? What is being neutralised (parser-special characters / URL reserved characters / non-ASCII bytes)? Is the result meant for humans or machines? How much bigger does it get? The right mental model is not “escape everything to be safe” — it is “pick the encoding whose context matches the output sink”.

Side-by-side comparison

PropertyHTML escapeURL encode (percent-encoding)Base64
PurposeNeutralise characters parsed specially by HTML / XMLMake reserved characters and non-ASCII safe inside URLsCarry arbitrary bytes as ASCII text
Target characters< > & " 'Reserved chars (? & = / # …) and non-ASCIIEvery byte in the input
Example (input → output)<a>&lt;a&gt;nihongonihongo (ASCII unchanged); ?q=hi there?q=hi%20therenosendbm9zZW5k
Size overheadA few characters become 5-6x; whole document is roughly unchangedNon-ASCII roughly 3x (UTF-8 byte triples each become 9 chars)4/3x (about 133%)
ReversibleYesYesYes
SpecHTML Living Standard / XML 1.0RFC 3986RFC 4648
Standard APIsTemplate engines auto-escape; DOM textContent is safe by constructionencodeURIComponent / encodeURI / decodeURIComponentbtoa / atob (ASCII only); use TextEncoder for real bytes
Primary useEmbedding strings into HTML/XML documents (XSS prevention)Embedding values into URL path / query / fragmentMIME email attachments, data URIs, JWT, binary fields in JSON

The size overhead matters in practice. Inlining a 100 KB image into HTML as a data URI costs roughly 133 KB once Base64-encoded; once it lives inside an img src attribute, HTML escaping adds a bit more, and the document ends up around 140 KB pre-gzip. URL encoding leaves ASCII alone, so an English-only path is essentially free, but a Japanese filename swells by roughly 3x as each multi-byte UTF-8 code point becomes a string of %XX triples.

Use case → the right encoding

User input rendered inside a template: HTML escape. Mandatory anywhere user-controlled text reaches a value="" attribute or a <div> body. React, Vue, Astro and similar modern engines auto-escape interpolated strings; the failure modes are escape hatches like dangerouslySetInnerHTML and v-html, where the responsibility falls back on you.

User-controlled text inside a URL: URL encoding. Search terms in ?q=..., slugs in /users/..., and download names in Content-Disposition headers all need encodeURIComponent. The string A&B has to become A%26B, otherwise the literal & is interpreted as a parameter delimiter and the value collapses to A. Non-ASCII filenames follow the same rule and turn into %E3%83%95... style sequences.

Inlining a small image into HTML: Base64 inside a data URI (data:image/png;base64,iVBO...), then placed in an img src. Useful to remove an HTTP request for icons, but counterproductive once the image grows past a few KB because the CSS or HTML loses cacheability.

Binary payload in email or JSON: Base64. SMTP assumes 7-bit ASCII, so MIME (RFC 2045) Base64-encodes attachments; JSON is text-only, so a binary field gets stringified as "image": "iVBORw0KG..." before JSON.stringify.

Base64 carried inside a URL (JWT, share links with payload): use Base64URL, not standard Base64. It swaps + for -, / for _, and drops the trailing = padding, so nothing gets eaten by URL parsing rules. JWT’s header.payload.signature is exactly this variant.

In-browser conversions and the traps to avoid

Day-to-day work brings constant conversions: reading a %E3%81%82-laden server log, decoding a Base64 image to inspect it in a text editor, converting between Base64 and Base64URL. For URL encoding and decoding the shortest path is url-codec; for Base64 alone base64; for switching between Base16 / Base32 / Base58 / Base64URL while inspecting the same payload base-codec. Each runs entirely in your browser, so pasting a URL with secrets or an API token does not ship anything off the device. The implementation is auditable on GitHub, and the DevTools Network tab confirms there is no upload. We do not ship a standalone HTML-escape tool because the DOM (textContent is auto-escaped, innerHTML is not) and template engines cover that case already.

Four traps worth memorising. First, HTML escaping alone does not stop XSS. Inside an a href attribute, inside a script body, or inside a style block the rules change. Attribute values need at least " escaped and benefit from URL-scheme validation to refuse javascript: URLs. Second, encodeURI vs encodeURIComponent. encodeURI is meant for whole URLs, so it deliberately leaves ? & = / alone — feed it a parameter value and the value will be split by its own &. Always reach for encodeURIComponent on parameter values. Third, never put raw Base64 into a URL. + will become a space, / will be read as a path delimiter, and = will be consumed as form padding. Either generate Base64URL from the start, or wrap the standard Base64 in encodeURIComponent (the former is more readable). Fourth, the double-escape of values inside HTML links. A snippet like <a href="?name=..."> needs HTML escaping on the attribute value and URL encoding on the parameter value; missing either lets the wrong character class through. Template engines apply HTML escaping for you, but they do not apply URL encoding — that step has to come from your own encodeURIComponent call.