By now you may have heard of the hacker who says she deleted 99 percent of posts from Parler, the Twitter wannabe site used by Trump supporters to help stage Wednesday’s violent uprising on Capitol Hill. What you may not know is the terrible encryption and security that made scraping so easy.
To summarize, the scraping was done by a hacker who goes by the handle donk_enby. She originally planned to archive content posted to Parler last Wednesday in hopes of preserving self-incriminating material before account holders came to their senses and deleted it. On Sunday, donk_enby said she had collected about 80 terabytes of posts, including more than 1 million videos, many of which contain the GPS metadata that identified the videos’ exact locations.
“For the journalists who DM me to ask, in non-technical terms, I would describe Parler’s current archival situation as ‘a bunch of people storming into a burning building and trying to grab as many things as possible’” donk_enby wrote on Twitter on Sunday. “Things will be available in a more accessible form later on.”
The reason for urgency: Amazon, Apple, and Google have all informed Parler that the lack of content moderation violates their terms of service. The archivists wanted to get the messages while the site remained online. But it turned out that donk_enby was able to retrieve messages even after they were deleted.
A major reason for its success: Parler’s site was a mess. The public API did not use authentication. When users deleted their posts, the site was unable to delete the content and instead just added a delete flag. Oh, and each post had a numeric ID incremented from the ID of the most recently published.
The rookie code made it easy to automate the scraping, as this script used by the donk_enby archive team demonstrates. As a result, vast numbers of posts discussing the uprising before, during and after it was carried out will be preserved indefinitely, making them available to investigators, journalists, prosecutors and others.
Another amateur error was Parler’s failure to remove geolocations from images and videos posted online. Sites like Twitter and Google routinely remove such metadata from content posted by their users. In contrast, the video files hosted on Parler were “raw”, meaning they still contained this information.
Parler’s moderation policies — even more lax than those of Twitter, Facebook and YouTube — have already made the site popular with far-right users looking for a forum to discuss debunked conspiracy theories. With Twitter permanently banning Trump, the president’s supporters embraced the site even more enthusiastically.
Prosecutors already have more than 150 suspects in Wednesday’s riots. Keeping about 80 TB of Parler messages, including more than 1 million raw video files, can lead to more people being charged.