DMCA Experiment, Part 3: Searching For The Delete Triggers

The Main Guru DMCA Experiment, Usenet

So far, so good. In the first two steps, we established that anti-piracy firms use bots that check for certain parameters, the most important being keywords or keyphrases. If an exact release name is used as a title (for example Game.Of.Thrones.S05E01.FRENCH.720p.HDTV.x264-Scaph), it seems to triggers a sure takedown, even if the rest of the subject line clearly shows the file to be completely harmless, or even something else entirely.

Of course, content matters nothing: There doesn’t seem to be even basic checking of the actual uploaded files, at least not with your basic takedown. No matter if the file is named “ubuntu.rar”, with the content of the RAR being “ubuntu.iso” (a linux image), it gets deleted, period.

But what exactly in the title triggers the takedown? Does it have to be a complete release name, or is something like “Game.Of.Thrones.S04E12” enough? And what about just “Game.Of.Thrones”? Ahat if the subject line is something unrelated, but the uploaded files have the keyword in them? And what if file sizes vary so much from the real thing that any self-respecting bot would have to ignore the post?

Questions, Questions: Resuming An Old Project

It has been over a year since the last test took place. A long time has passed, but the quest for (DMCA) truth still burns in my heart. Or something like that. Either way, we’re up and ready to tackle the issue again. It certainly helped that a nice fellow named Ben send us a message on Reddit almost 6 months ago, in which he thanked us for the information and asked for more. As a law graduate, he will soon release a report on DMCA law.

The Load Guru helped the law! All of a sudden, we got this familiar warm, fuzzy feeling in our stomaches… if that’s not a good motivation to continue this little project, that we don’t know what is.

Test 1: Confirming Old Findings

Armed with our sweet little VPS, we first began to check if something had changed in the last 13 months. Will DMCA bots still delete everything with a clear release title?

The TV show we emulated is still Game Of Thrones, because it produced fast and reliable takedowns in our older tests. Because we think that content doesn’t really matter, the file almost always ready “ubuntu.rar”, with no attempt to disguise the fact that the archive contains a linux image.

Header: Game.Of.Thrones.S06E02.FRENCH.AHDTV.x264

Newsgroup: alt.binaries.boneless

Still online / DMCA’d: Taken down (within less than a day)

NZB Link: Here

No surprise here: All of our uploads got targeted, with 4 of 5 deleted completely (in less than 24 hours!), while the last one at least suffered some damage in completion, but still 98,83%. Sloppy work indeed, dear bot…

Test 1.5: Just To Be Sure…

And here almost the same thing again, this time with “gameofthrones.rar” as the name of the file, and slightly different titles. And no surprise:

Header: Game.Of.Thrones.6×10.I.Venti.Dell.Inverno.ITA.DLMux.x264-NovaRip

Newsgroup: alt.binaries.boneless

Still online / DMCA’d: Taken down (in about a day)

NZB Link: Here

Release name in the subject: Almost a 100% chance to get a takedown, period.

Test 2: Do Bots Consider File Names?

These are two experiments, merged into one test set. The first question: How obscure can the subject line get, and what will pass the filter? One upload is titled “Game.Of.Thrones.6×10”, the other just “Game.Of.Thrones”. Which one will be deleted, none, just one or maybe both?

We have established that the subject line is a very strong indicator for DMCA scripts. But what about file names? If, for example, the subject line is totally irrelevant, what will happen if the file name is a spot-on match? This time, we uploaded two files, with the subject being “DO NOT DWNLOAD” and “THIS IS A FAKE”, but a filename with an exact release name.

As a control sample for both experiments, one file has a subject that has already shown to get deleted very quickly. This will serve as control case to make sure the bots are still actively looking for Game Of Thrones episodes and target the exact name we used as file title for test two.

Header: Game.Of.Thrones.6×10.I.Venti.Dell.Inverno.ITA-ENG.720p.DLMux.DD5.1.h26

Newsgroup: alt.binaries.boneless

Still online / DMCA’d: Taken down (in about a day)

NZB Link: Here

The results: Our control upload was abused within a day, as was “Game.Of.Thrones.6×10”; however, both uploads with exact match file names are still online. It seems the bots only check for subject line, and ignore the actual attached data. Interesting!

Test 3: Is File Size Or File Ending A Factor?

But maybe file size is something the bots look for, maybe to filter out irrelevant results? Here is an unpacked MP3 with an exact-match subject line, once with a music file with the ending “.MP3”, and once renamed to “.AVI”.

(coming soon)

Test 4: Is File Hash / Signature Checked?

For those who don’t know: A file hash, or MD5 checksum, is kind of a signature used to make sure a file has exactly the same content as the referrence. It can be found on download sites with more security relevant data to avoid tampering, but has also found widespread use with file hosters and similar services to block duplicate file uploads, or to avoid the reuploading of files that have already been abused – they have kind of a blacklist for that purpose.

So, are the bots clever enough to use one of the most widespread file detection methods? This time, we uploaded one file with a sure-to-get-deleted subject line, waited for a day after the takedown, and then uploaded the exact same file again under a different name.

(coming soon)