How to decode mojibake and recover the original Japanese text
Recover the original Japanese behind mojibake such as 繧エ繝�, ã®ã, or ???. Walks through how Shift_JIS / EUC-JP / UTF-8 / ISO-2022-JP encoding mismatches are inverted by brute force, all inside the browser.
Where strings like 繧エ繝� and ã®ã come from
Mojibake (garbled Japanese) appears when the encoding used to write a string differs from the encoding the reader assumes. Three patterns cover the vast majority of real cases. First, Japanese saved as UTF-8 but read as Shift_JIS comes out as full-width katakana-looking gibberish such as 繧ウ繝シ繝ウ. Second, UTF-8 read as Latin-1 (ISO-8859-1) produces strings full of European accented letters such as ã®ã or éè. Third, a Shift_JIS CSV opened as UTF-8 turns into a row of ? placeholders.
In the wild these turn up in legacy CSV exports (Japanese spreadsheets saved from old Excel installs), email Subject headers, API responses from legacy systems, files moved between Mac and Windows, and old Japanese HTML pages that browsers nowadays decode as UTF-8. The text is not actually destroyed — only the decoding step was wrong — so re-encoding the visible bytes back through the right pair recovers the original. Trying every plausible pair is exactly what mojibake-fix automates.
Paste, run, pick a candidate
Open mojibake-fix and paste the garbled string straight into the input box. Internally the tool runs every (source → misread) pair: UTF-8 → SJIS, UTF-8 → EUC-JP, UTF-8 → Latin-1, SJIS → UTF-8, SJIS → EUC-JP, EUC-JP → UTF-8, EUC-JP → SJIS, and Latin-1 → UTF-8. Each candidate gets a “Japanese likeness” score, and results are listed top-down by score.
The score combines several signals: how many of the output characters are hiragana, katakana, or common kanji; how few control characters or Private Use Area code points show up; and whether telltale mojibake bytes (繧, 繝, 邊 — the bytes a UTF-8 → SJIS misread typically produces) are gone. In most cases the top one or two candidates are the answer. Copy whichever fits the context to the clipboard, and for whole-file CSV repairs feed the result into csv-encoding-convert to re-emit the file in the correct encoding.
Why brute force works and what the byte structures look like
Brute force is cheap because the Japanese encoding zoo only has four or five members worth checking: UTF-8 (and UTF-16), Shift_JIS (including CP932), EUC-JP, ISO-2022-JP (JIS), and Latin-1 (ISO-8859-1). Try those pairs and you cover effectively 100 % of real-world mojibake. The operation is “string → encode as bytes under encoding A → reinterpret bytes as encoding B” and a check for whether the output reads as Japanese.
Implementation-wise the tool relies on encoding-japanese for the Unicode ↔ Shift_JIS / EUC-JP conversions, and handles Latin-1 with a direct byte ↔ code point loop. The byte structures of each encoding have characteristic ranges — Shift_JIS uses 0x81–0x9F and 0xE0–0xFC as lead bytes of 2-byte characters, EUC-JP uses 0xA1–0xFE, and UTF-8 uses 0xC2–0xF4 as multibyte starts. When the candidate bytes do not respect the target encoding’s pattern, the conversion returns null and that candidate is dropped from the list.
Why you should not paste garbled text into a random site
Many “fix mojibake” sites on the open web send the pasted text to their server for processing. The text being decoded was usually meant to be a name, address, email body, contract number, or customer order — content that is sensitive once readable. The garbled form has not lost that meaning; recovering it produces the original PII or business record. Sending it through someone else’s backend just to read it back is the same shape of leak as uploading the original file.
mojibake-fix runs the brute-force search inside the browser using encoding-japanese and plain JavaScript. There is no outbound request tied to the input; keep the DevTools Network tab open through the conversion and you will see only the initial asset load. The source is auditable on GitHub. For full-file CSV encoding conversion see csv-encoding-convert, for Unicode normalization (NFC / NFKC) see unicode-normalize, and for inspecting individual code points see unicode-inspect — all in the same in-browser design.