The idea is simple. Basically I've created a program which allows you to verify a static content on a hidden service. I assume that most people when using Tor are using the Tor Browser. This is the recommended way to stay anonymous, by not standing out you are part of a larger anonymity set. Compare this to the normal internet, where there are multiple browsers with versions and operating systems to keep track of. Because most people on Tor use the same client, we can emulate that client programmatically to create requests that are indistinguishable from those that would come from a real user. Since the verifier is using the same client, it will display the same behaviour and send the same HTTP headers like that of a real user. On the internet this would not work, because these things are so diverse that they can uniquely identify a user. On Tor however, the Tor Browser has been specifically engineered to not fingerprint individual users, meaning that all users mostly look the same.
For a verifier this is useful because it will be difficult for a server to tell apart a verification request from a real request. This allows it to monitor the server without the server being able to alter its behaviour by knowing when its being watched.
I implemented a prototype to test out this idea. Here's a link to the repository.
Run it once to stamp the "state" of a hidden service...
tamperfree stamp <url.onion>
...then run it again later to verify that the state has not changed.
tamperfree verify <url.onion>
When stamping a website tamperfree tries to identify and save a secure hash of the raw content received from the server for each path visited. It does this by working as a proxy between the Tor Browser and the Tor Socks proxy that it uses to connect to the network. Then it opens the targeted url in a selenium controlled instance of the Tor Browser. When verifying it captures the same raw content, computes the hashes etc and then it compares against the saved hashes.
There are some major caveats to this tool:
- Can't use it on HTTPS. The proxy can't see the plaintext data as it passes directly to the browser and the encrypted content is not static since I'm guessing different IVs or somesuch are used.
- Can't use it on dynamic websites (i.e. websites that change the content).
- Using it on non-Tor websites is kind of pointless since the server would easily identify it by the user agent.
- Sites requiring user interactions or multiple page loads are not supported. This would require that I spoof user behaviour, adding further complexity to the tool. I think it's doable for smaller sites to build a tree of possible user interactions and generate fake traffic based on every sub-tree. However, I am certain that most users will likely navigate a site the same way, meaning that certain patterns will emerge and make real users stand out from the fake.
So the use case for my tamperfree tool is pretty slim. You're limited to having a single page webapp which loads everything it needs on the first given url. Oh, and it has to run as a tor hidden service.
You want to run this tool often. So often that a malicious server has a higher probability of encountering a verification request than a real request if attempting to tamper by picking a request at random. To do this we simply want the tool to send out more requests to the server than the real requests. For my use-case, a whistleblowing site, the number of real visitors expected is actually pretty low (few visitors per day), so this is something we can do.
Timing is also important. If the verifier runs on a regular schedule then it's trivial for the server to figure out that incoming requests in that specified schedule are verification requests. Therefore it's important that the verifier runs on a random schedule that the server can not determine. E.g. running the verifier every 5 minutes would make it easy for the server to figure out the pattern and only serve it's malicious responces to requests outside that schedule. However, just running a ´sleep(rand())´ would also lead to scenarios where if there are multiple requests happening in quick succession, then the server can see that it's unlikely to be the verifier. I'm currently trying to figure this part out, and I think I will dedicate my next blogpost to it.
That's it for now. This post is a bit rambling and I apologize for that. When I get the time I will go back and try to clean it up, maybe add some pictures to make it easier to read. I'm publishing this now to try to get into the groove of posting again. If you have any feedback or ideas on this, feel free to contact me.