zpdf A PDF text extraction library written in Zig. Features Memory-mapped file reading for efficient large file handling Streaming text extraction (no intermediate allocations) Multiple decompression filters: FlateDecode, ASCII85, ASCIIHex, LZW, RunLength Font encoding support: WinAnsi, MacRoman, ToUnicode CMap XRef table and stream parsing (PDF 1.5+) Configurable error handling (strict or permissive) Multi-threaded parallel page extraction Benchmark Text extraction performance vs MuPDF 1.26 ( mutool convert -F text ): Sequential Document Pages Size zpdf MuPDF Speedup Adobe Acrobat Reference 651 19 MB 137 ms 530 ms 3.9x C++ Standard Draft 2,134 8 MB 276 ms 1,038 ms 3.8x Pandas Documentation 3,743 15 MB 447 ms 1,216 ms 2.7x Intel SDM 5,252 25 MB 508 ms 2,250 ms 4.4x Parallel (multi-threaded) Document Pages Size zpdf MuPDF Speedup Adobe Acrobat Reference 651 19 MB 60 ms 512 ms 8.5x C++ Standard Draft 2,134 8 MB 142 ms 1,020 ms 7.2x Pandas Documentation 3,743 15 MB 233 ms 1,204 ms 5.2x Intel SDM 5,252 25 MB 127 ms 2,260 ms 18x Peak throughput: 41,000 pages/sec (Intel SDM, parallel) Build with zig build -Doptimize=ReleaseFast for these results. Note: MuPDF's threading ( -T flag) is for rendering/rasterization only. Text extraction via mutool convert -F text is single-threaded by design. zpdf parallelizes text extraction across pages. Requirements Zig 0.15.2 or later Building zig build # Build library and CLI zig build test # Run tests Usage Library const zpdf = @import ( "zpdf" ); pub fn main () ! void { var gpa = std . heap . GeneralPurposeAllocator (.{}){}; defer _ = gpa . deinit (); const allocator = gpa . allocator (); const doc = try zpdf . Document . open ( allocator , "file.pdf" ); defer doc . close (); var buf : [ 4096 ] u8 = undefined ; var writer = std . fs . File . stdout (). writer ( & buf ); defer writer . interface . flush () catch {}; for (0 .. doc . pages . items . len ) | page_num | { try doc . extractText ( page_num , & writer . interface ); } } CLI zp...
First seen: 2025-12-30 21:05
Last seen: 2025-12-31 13:07