
Do you remember the Vesuvius Challenge? Ancient scrolls, carbonized by a volcano, unreadable for centuries, were decoded using ML. This is the same, but with screenshots of files that have been redacted (poorly). One of the flawed redaction includes screenshots of emails with attachment in MIME format, which means the file is in text form (base64) at the bottom of the email. It's possible to recreate that attachment by copy/pasting the text and renaming it into pdf, png, or other.
The challenge is that they used Courier New font so 1 and L look the same. One engineer experimented around this with OCR first and then training a CNN.
Sources: tweet, blog, Vesuvius Challenge
Stay Updated
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Comments
Sign in as a member to join the conversation.
Loading comments…