BG Beter Geregeld ICT
PDF redactie · 2 min leestijd · 13 November 2025

OCR redaction: making scanned PDFs editable for redaction

A scanned PDF is a series of images, not text. Searching and redacting won't work without OCR. Here's the workflow.

Many documents arrive as scanned PDFs — signed contracts, old archive files, letters from doctors. The text is readable to people, but to software it's just images. OCR converts it into something workable.

\n\n

The difference

\n
    \n
  • Text PDF: the file contains underlying text objects. Searching works, copy-paste works, redaction works.
  • \n
  • Image PDF: only raster images. Searching doesn't work, text isn't selectable. It must go through OCR first.
  • \n
\n\n

OCR quality varies

\n
    \n
  • Well-scanned paper (300+ DPI): 98–99% character accuracy with modern OCR (Tesseract 5, Azure Read, Google Vision).
  • \n
  • Poorly scanned / crumpled / dirty copy: 70–90%. Manual verification required.
  • \n
  • Handwritten: a separate model is needed, 60–85% for clearly legible handwriting.
  • \n
\n\n

Workflow for redaction with OCR

\n
    \n
  1. Apply OCR to the PDF (Acrobat has this built in; open-source Tesseract does too).
  2. \n
  3. Search for sensitive patterns (national ID numbers, email addresses, phone numbers).
  4. \n
  5. Redact the identified regions.
  6. \n
  7. Output as text-plus-image or image-only, depending on your goal.
  8. \n
  9. Verification: run OCR over the output again and check that the sensitive text is no longer findable.
  10. \n
\n\n

A specific pitfall

\n

If you run OCR and then redact at the text level, but export the result as an image, you may still have sensitive data lurking in the background text layer. Make sure you redact the text layer as well — not just the visual layer.

\n\n

See also: redaction guide, pattern mode.

Onderwerpen

#redactie #ocr #gescande-pdf

Volledige gids: Redacción de PDF para pymes: la guía completa

Dit artikel is onderdeel van onze uitgebreide PDF redactie-gids. Lees de pillar voor het complete plaatje.

Lees de pillar →