Stripping PDF metadata: why it matters and what's hiding inside
A PDF often contains 10× more data than what you see — author name, software version, edit history, revisions, comments. Here's how to clean it up.
Beyond the visible content, a PDF contains metadata and "hidden" content that you don't want to share when distributing the file.
What's actually in there?
- Document properties: author, title, subject, keywords.
- Creation/modification dates: when the document was last edited.
- Producer software: which tool created it (e.g. "Microsoft Word 2019" or "Adobe Acrobat Pro DC 2023").
- Embedded fonts: usually not an issue, but sometimes includes path information.
- Comments and annotations: even if they're visually hidden.
- Form field data: values that were filled in on forms.
- Attachments: files linked to the PDF.
- JavaScript: PDFs can contain scripts.
- Digital signature info.
Why you should care
- An author name in the metadata can reveal your employer on supposedly "anonymous" documents.
- Modification history can show when you made last-minute changes.
- Earlier revisions may be embedded in the PDF (Word files saved as PDF often retain tracked changes).
- Attachments can accidentally be included when you distribute the file.
How to strip it
- Use a redaction tool that supports metadata removal.
- Alternative: use "Print to PDF" from a PDF viewer — this produces a new PDF with no legacy metadata.
- Verify: open the cleaned PDF, go to File → Properties, and check the fields.
A word of caution
If you have a signed document, removing metadata will also strip the digital signature. Make sure that's intentional before you proceed.
See also: redaction guide, why black bars don't work.
Volledige gids: Redacción de PDF para pymes: la guía completa
Dit artikel is onderdeel van onze uitgebreide PDF redactie-gids. Lees de pillar voor het complete plaatje.
Lees de pillar →