ZIP vs TAR vs 7z — which archive format should you pick?
Compare ZIP, TAR.gz, and 7z for distribution packaging, backups, and Linux server transfer by compression ratio, cross-platform support, metadata preservation, and random-access reads.
Four axes that drive the decision
Picking an archive format is not just about which one compresses best. Four axes matter. Compression ratio drives download bandwidth and backup storage cost. Compatibility decides whether the recipient (Windows user, Linux server, macOS Finder) can open the file with the tools they already have. Metadata preservation — Unix permissions, owner, symlinks, timestamps — is decisive for server-to-server transfer and backup restore. Random access matters whenever you only want a listing or a single file out of a large archive.
“7z is the smallest”, “ZIP is the most compatible”, “TAR is the Linux thing” — all roughly true, all individually insufficient. Distribution to end users hinges on compatibility, backups on ratio plus metadata, server transfers on metadata plus compatibility.
Side-by-side comparison
| Property | ZIP | TAR (.tar.gz / .tar.xz) | 7z |
|---|---|---|---|
| Year introduced | 1989 | 1979 (tar) / 1992 (gzip) | 1999 |
| Compression algorithm | Deflate (also LZMA / Bzip2) | Gzip / Bzip2 / XZ (LZMA2) | LZMA2 (default) |
| Ratio for source / text (baseline ZIP=100%) | 100% | .tar.gz: 90-100% / .tar.xz: 55-70% | 50-65% |
| Compression unit | Per-file | Whole stream | Solid (across files) |
| Random access | Yes (central directory) | No (must read sequentially) | Limited (solid blocks hurt this) |
| Unix permissions | Limited (via extensions) | Full | Limited (via extensions) |
| Symlinks | Via extensions | Full | Via extensions |
| Encryption | ZipCrypto (weak) / AES-256 | None built in (pair with gpg) | AES-256 (standard) |
| Windows built-in support | Yes (Explorer) | No | No (needs 7-Zip) |
| macOS built-in support | Yes (Archive Utility) | Yes (tar) | No |
| Linux built-in support | unzip package | Yes (tar / gzip) | p7zip package |
| Streaming | Limited | Yes (pipe-friendly) | Limited |
Ratios are rough numbers for a payload of source code, text, and executables. Already-compressed media — photos, video, modern Office files — barely shrinks in any format.
ZIP’s standout strength is random access. The central directory at the end of the file lists every entry with its offset, so extracting one file is fast. The flip side is that each file is compressed independently, so archives full of similar small files (logs, many tiny texts) achieve much worse ratios than solid formats.
TAR’s standout strengths are streaming and full Unix metadata. Commands like tar c | ssh remote tar x push and extract simultaneously, and ownership, permissions, and symlinks all survive. Compression is layered separately, with .tar.gz (speed) and .tar.xz (ratio) as the two main flavours. The trade-off is that pulling one file from a huge archive forces sequential reading from the start.
7z combines solid compression with LZMA2 to top the ratio charts. A source-code backup at 50–65 % of the ZIP size is routine. AES-256 is built in. The cost is compatibility — the recipient needs 7-Zip or p7zip installed.
Use case → recommended format
Distribution to Windows users, email attachments: ZIP. Both Windows Explorer and macOS Finder double-click-extract it out of the box, so support cost stays minimal. If you need a password, specify AES-256 explicitly — the default ZipCrypto from the 1990s is barely better than plaintext.
Linux server-to-server transfers, Docker image layers, deploy bundles: .tar.gz or .tar.xz. Permissions, ownership, and symlinks survive untouched, which matters for /etc, web-app deploys, and config-management trees. Use .tar.xz when bandwidth is precious and extraction speed less so, .tar.gz when the archive is fetched often and extraction needs to be quick.
Personal backups, long-term storage, storage-bound workloads: 7z (LZMA2 + solid). Source and document bundles routinely land at 50–65 % of the ZIP size — a meaningful win on cloud-storage bills. Pair with AES-256 for sensitive backups. The cost is encoding time and full CPU usage during compression.
Streaming downloads in the browser, CI artifacts: .tar.gz or ZIP. .tar.gz is the natural fit for HTTP responses piped through tar x (CI can extract while downloading). ZIP is awkward to stream end-to-end because the central directory lives at the tail, but its CDN and OS compatibility is unmatched.
Distributing sensitive material: 7z with AES-256, or .tar.gz encrypted with GPG. ZIP’s AES-256 extension is unevenly supported and tools quietly fall back to ZipCrypto more often than you would like. Always verify the encryption scheme used, not just the “encrypted” flag.
Browser-only workflows and the foot-guns to remember
When an archive arrives from an unknown source and you are not sure what’s inside, archive-info lists every entry (name, size, compression method) without extracting anything. Once you trust it, archive-extract decompresses ZIP, TAR, TAR.GZ, and GZIP entirely in the browser. Neither tool uploads your file.
The reason to do this in the browser is straightforward: sensitive archives — personal backups, internal design docs, bundled contract PDFs — should never reach an upload-style “online unzipper”. Many such services reserve a right to analyse or retain inputs for “service improvement”, and once data has left, you cannot recall it. The source is on GitHub for review, and the DevTools Network tab confirms that nothing leaves the page during extraction.
Three operational foot-guns are worth remembering. Zip Slip: malicious archive entries with paths like ../../etc/passwd can escape the target directory during extraction. Any server-side extraction code must normalise paths and reject anything above the target root. Multi-byte file-name mojibake: older ZIP encoders set encoding flags inconsistently, so Japanese or Korean file names sometimes render as garbage on the recipient side; UTF-8 names are now the safer default. Already-compressed inputs do not shrink: photos, video, modern PDFs, and Office files store almost no additional bytes after another compression pass — the headline ratios only show up on text-heavy or uncompressed-binary payloads.