Back to articles index
Format comparisons

MD5 vs CRC32 vs SHA-256 — picking a file integrity check

Compare three checksums for download corruption detection, tamper detection, and duplicate file matching by speed, output length, collision resistance, and use case. Why CRC32 is not a tamper check.

“File integrity” hides three different jobs

The phrase “file integrity check” covers three distinct jobs that need different tools. Catching bit-rot during a download is noise detection on a transport. Finding duplicate files is equality testing across a large set. Confirming nobody has swapped the file is tamper detection against a motivated adversary. Four axes carry the decision. Throughput dominates when you hash millions of objects in a backup or dedup pipeline. Output length trades storage and comparison cost against collision probability. Type of collision resistance splits into “accidental collisions” (false positive rate) and “adversarial collisions” (attacks) — they are not the same property. Standard adoption decides whether a checksum is already embedded in ZIP / Ethernet / PNG or whether you have to compute it yourself.

This article focuses on MD5, CRC32, and SHA-256. SHA-1 sits close to MD5 in the cryptographic landscape and is covered separately in SHA-256 vs MD5 vs SHA-1; here we put CRC32 on stage as the often-overlooked non-cryptographic option and compare it to the two best-known cryptographic peers in the “is this file OK?” workflow.

Side-by-side comparison

PropertyCRC32MD5SHA-256
Output length32 bit (4 byte)128 bit (16 byte)256 bit (32 byte)
Hex length8 chars32 chars64 chars
Throughput (single core)10+ GB/s with hardware instructions1-2 GB/s300-500 MB/s; 1.5-2 GB/s with SHA-NI
CategoryNon-cryptographic, error detection onlyCryptographic, brokenCryptographic
Collisions foundBy design, 1 in 2^32 chance2004 (Wang), practical attacks sinceNone to date
Tamper detectionImpossible (1-bit flip can be repaired)Impossible (deliberate collisions)Yes
Standard adoptionZIP / Ethernet / PNG / GzipRFC 1321 (security-deprecated)NIST FIPS 180-4, current
Year introduced1961 (Peterson)1991 (Rivest)2001 (NIST)

CRC32 is an error-detection code, not a cryptographic hash. By design it collides 1 in 2^32 (~4.3 billion) — meaning that with about 65,000 files you cross 50 % chance of an accidental collision (the birthday problem). It earns its place anyway thanks to raw speed and standardised embedding. Ethernet frames, ZIP entries, PNG chunks, and Gzip blocks all carry a CRC32 already, so most of the time you benefit from it without writing any code. MD5 has been collision-broken since 2004 and is unsuitable for tamper detection, but accidental collisions are astronomically rare, which keeps it relevant for deduplication. SHA-256 has no known practical collision as of 2026 and is the default for distribution signatures, Git, and TLS.

Choosing by use case

Download integrity (ISO images, installers): SHA-256. If the publisher ships a *.sha256 next to the file, recompute and compare. Treat publishers who only offer MD5 as a risk — a man-in-the-middle that swaps the file can swap the MD5 too. When possible, also verify the publisher’s GnuPG signature.

Duplicate detection (photo libraries, backup dedup): MD5. It runs at 1-2 GB/s even on terabytes of data, and the chance of an accidental collision is effectively zero for non-adversarial inputs. SHA-256 is overkill here. If you need even more speed, BLAKE3 is several times faster than SHA-256 with comparable security.

Built-in error detection in transports and container formats: CRC32. In practice you rarely compute it yourself — ZIP, TCP, and PNG already check it for you. The cases where you reach for an explicit CRC32 are typically firmware ROM checks or custom serial protocols.

Tamper detection for backups and distributed artefacts: SHA-256 or stronger. Both MD5 and CRC32 fail for the same structural reason: given any tampered file, you can adjust trailing bytes to restore the original checksum. And in supply-chain scenarios where the publisher’s server is compromised, the MD5 file itself is often rewritten, so the check provides false confidence.

Short identifiers (cache keys, ETags): CRC32 or MD5 if collision resistance is not in the threat model — 4 or 16 bytes is meaningfully cheaper than 32. The risk is that the identifier later gets repurposed for tamper detection without anyone revisiting the algorithm; defaulting to SHA-256 from the start avoids that quiet migration.

Computing checksums in the browser

When you need an ad-hoc hash of a downloaded file or a local document, hash-generate computes MD5, SHA-1, SHA-256, SHA-512, SHA-3, and BLAKE2 in a single pass — paste in or drop the file, compare against the publisher’s abc123... byte by byte. For error-detection codes, crc-calc computes CRC32 / CRC16 / CRC8 individually, useful for firmware verification and embedded serial protocols.

Three things to keep in mind. (1) Do not trust an MD5 checksum at face value. If the publisher’s site is compromised, the MD5 line on the download page is compromised too. Verify the publisher’s GnuPG signature (*.asc) when one exists. (2) Line endings and BOMs shift text hashes silently. Normalise CRLF / LF and BOM presence before computing. (3) Never repurpose CRC32 for tamper detection. CRC is a linear operation, so an attacker can compute a delta that restores the original CRC after editing the payload. The Web Crypto API based implementation is published on GitHub, and the DevTools Network tab confirms that file bodies never leave the page during hashing.