Black boxes make clinical records unusable for research, ROI workflows, and AI training. Re-Doc replaces all 18 HIPAA identifiers with consistent synthetic data. The diagnosis, the drug dosages, the clinical narrative remain intact. Only the patient identity changes.
Dx: Hypertension, Stage 2. Continue metoprolol 50mg QD.
$10.93 million per breach, the highest of any industry. These are the documented failure modes behind that number — technical, operational, and clinical.
Medical imaging files often have patient name, DOB, and MRN burned directly into image pixels — not in a text layer. Adobe cannot detect it. Standard NLP tools skip it entirely. It looks like part of the scan.
Per DICOM Standard PS3.15 Annex E and IHE Radiology guidance, PHI embedded as human-readable text in pixel data requires pixel-level destruction — not a text overlay. Re-Doc's visual processing reads the pixel layer pixel-by-pixel, finds the PHI, and permanently replaces those pixels in the output file. Nothing left to copy. Nothing left to extract.
IRB-approved studies need readable patient data with identifiers removed — not clinical context destroyed. A discharge summary with every detail blacked out tells researchers nothing. The treatment course, medication names, diagnostic codes, and physician assessment are what matter. Black-box redaction removes identity and utility together.
Text anonymization keeps the clinical narrative intact. Only the patient identity changes, not the medical facts.
Diagnoses, medications, and dosages pass through unchanged. Once all 18 Safe Harbor identifiers are removed, they are no longer PHI under HIPAA.
The physician's name in a narrative note. The MRN embedded in a table footer. The date buried in a page header. Pattern-matching tools catch structured fields — they miss the identifiers woven into clinical prose.
Re-Doc's context-aware model reads the entire clinical document — not just pattern-matched field labels. It understands that “Dr. Patel ordered” in a narrative is a physician identifier, not just a proper noun. The same entity is caught on every page, in every form it appears.
A Business Associate Agreement defines who is responsible. It does not verify that PHI was actually removed from the document before it was shared. Unauthorized disclosure incidents in the HHS breach portal regularly involve records shared with vendors after incomplete de-identification.
When PHI is properly removed from the document itself, there is nothing left to breach. The BAA covers the relationship. Re-Doc covers the technical reality.
Re-Doc sits between your EHR export and distribution. Upload via API or drag-and-drop. The clinical narrative stays intact. Processing logs provide an entity-level audit trail per document.
EHR exports, scanned faxes, discharge summaries, operative notes. Any format your clinical workflows produce.
A context-aware model reads every entity across all 18 HIPAA identifier categories and replaces each with a consistent synthetic equivalent matched to type and format.
Same document structure, same clinical narrative, same layout. Patient identity replaced. Diagnoses, medications, and treatment notes preserved exactly.
Send to research teams, push to AI training pipelines, or share with auditors. Processed to support Safe Harbor and Expert Determination compliance strategies.
Healthcare documents come in two fundamentally different forms: scanned images and native digital files. The correct de-identification approach depends entirely on which one you have.
True pixel-level destruction — no text layer to extract
Visual processing reads the scanned document pixel-by-pixel, finds PHI by region, and burns permanent black boxes over those areas in the output PDF. The original pixel data is destroyed — not covered. No text layer exists to extract from a scanned document, because scanned documents are images.
Synthetic data swap. Clinical narrative stays usable.
Finds every PHI entity in a native PDF or DOCX and replaces it with demographically consistent synthetic data. “Maria Elena Gonzalez” becomes “Sarah Ann Thompson” consistently across every page, every reference. Clinical content stays untouched: diagnoses, medications, dosages, treatment timelines.
Every PHI identifier replaced. Medication names, dosages, and clinical observations untouched. The document is immediately usable for research or audit.
Patient: Maria Elena Gonzalez
MRN: MRN-004821
DOB: 02/14/1965
Attending: Dr. Rajesh Patel
Facility: St. Luke's Medical Chicago
Dx: Hypertension, Stage 2. Continue metoprolol 50mg QD. Follow up in 90 days.
Patient: Sarah Ann Thompson
MRN: MRN-007643
DOB: 09/03/1968
Attending: Dr. Kevin Harmon
Facility: Riverside General Columbus
Dx: Hypertension, Stage 2. Continue metoprolol 50mg QD. Follow up in 90 days.
All 18 Safe Harbor identifiers replaced with consistent synthetic data. Clinical content preserved. Processed to support HIPAA Safe Harbor de-identification requirements under 45 CFR §164.514(b)(2).
These are the high-volume, compliance-critical workflows where black-box redaction and manual review consistently fall short.
A New Drug Application to the EMA requires a Clinical Study Report anonymized for publication under Policy 0070. A single oncology NDA may include 50,000 to 100,000+ pages of clinical trial data. Manual de-identification at specialized firms costs $100,000-$500,000 per submission and takes 6-12 months. Re-Doc processes the same volume through the Batch API in days. Every patient identifier across every appendix, table, and narrative section replaced with consistent synthetic data. Designed to support your EMA Policy 0070 submission package preparation.
IRB-approved studies require HIPAA Safe Harbor de-identification: removal of all 18 identifier categories before sharing with researchers. Traditional approaches use expert determination (expensive, slow) or manual review (error-prone, misses contextual PHI). Re-Doc applies LLM-based entity detection across all 18 categories simultaneously, preserving the clinical narrative researchers actually need: diagnoses, lab values, medication histories, and treatment responses. Processed to support Safe Harbor requirements without destroying study utility.
HIM departments process large volumes of patient record requests, each requiring de-identification of third-party PHI before release. The 30-day HIPAA response window is strict. The manual de-identification step is the bottleneck: a clinician reviewing every redaction placement on every page, per request. Re-Doc processes each request through the API in seconds. Health Information Management teams upload the chart, receive a de-identified output, and fulfill the request on deadline — without a physician reviewing every black box placement.
Most de-identification tools are built for structured database exports. Re-Doc handles unstructured documents: scanned faxes, discharge summaries, narrative notes. That is where PHI actually lives.
Typical tools
in the market
Re-Doc
Purpose-built for clinical docs
Structured data only. Cannot open a PDF, DOCX, or scanned clinical document.
Processes PDFs, DOCX, scanned faxes, and image-based medical records.
Redaction removes text. Clinical narrative breaks down for downstream use.
Text anonymization replaces PHI with synthetic equivalents. Context preserved.
No scanned document support. Misses fax-originated records and DICOM pixel PHI.
Visual processing handles scanned faxes, DICOM-adjacent documents, and burned-in pixel PHI.
Manual, per-file processing. Unusable for high-volume ROI and research workflows.
Batch API processes hundreds of authorization requests in parallel. Audit trail included.
No audit trail aligned with HIPAA minimum necessary standard.
Processing logs per document with entity-level detection records for compliance review.
Redaction when you need permanent pixel destruction. Text anonymization when the document still needs to work. Both pipelines, one platform.
References & Sources