Blocking ads and trackers using HOSTS
2015-04-05
If you've stumbled across this post, you're probably familiar with adblocking extensions such as Adblock and uBlock(seriously recommend the latter for a handful of reasons) and most likely you're in need of a solution to take back your network and system resources as well as a need for less clutter and more privacy in your daily web ventures, however, this method for blocking ads at the browser level only tends to be quite inefficient and fairly limited. Wouldn't it be cool to also have ads and trackers blocked at the system level, including but not limited to applications like Skype, uTorrent, IE(seriously?) and other browsers or the many shareware/freeware apps that track your usage via mechanisms like Google Analytics(some use exactly that for tracking).
The solution is fairly simple, we're going to use a simple hostname based block list to map undesirable domain names to either 0.0.0.0 or 127.0.0.1. In my testing on OS X, I found that 0.0.0.0 works best, that might not be the case on different operating systems. The blocking is done via the ages old hosts(5) unix file, but still very useful mechanism for easy static ip-name mappings at the host level.
The current block list that I use is hosted at hosts.neocities.org. I'm not affiliated with that site and don't know who is providing it, that being said I use git to track and review changes between updates. The list is quite exhaustive, combining lists from several other sources cited in the header. I'd like to see a couple more lists combined like that from several other places(mainly the ones from uBlock would be useful), but you can then add extra lists by modifying the script fairly easily.
Now the script itself, is hosted on Github. Please read the entire script and what I've written bellow before running the script on your system.
Before you go on and use the script on your OS X, I really encourage to start using git in your /etc/ directory. The script won't even work without a git repo in /etc/, unless you know what you're doing and you're going to modify it to bypass that. Having a git repo in your etc directory gives you revisioning, rollback, beta-testing, review and scrutiny abilities to whatever you're doing to your etc. I do this on my workstations, laptops and servers that I manage. The added git overhead on your daily etc routines is insignificant when compared to the benefits you get when you most need them.
The script is smart enough to not break your current system. What it does as part of the first time run initialization is copy your current /etc/hosts to /etc/hosts.d/hosts.1.head. All your existing localhost rules and custom rules will be maintained there. The adblocking rules will go into /etc/hosts.d/hosts.3.adblock. You can add custom mapping rules(for staging servers, local network mappings) to hosts.2.custom.
Then each time the script updates it will do the following:
- Update hosts.3.adblock with the latest rules from upstream;
- Concatenate the rules in /etc/hosts.d in the numeric order to your /etc/hosts;
- Show you a git diff of the changes and the option to commit those changes or deny to review, undo or commit yourself using git;
The script also has some pfsense blocking rules from www.emergingthreats.net and some custom ip blocking enabled in /etc/pf.rules/ip-block.pf. This is disabled by default, you can enable it by setting the PFSENSE var to "true" or passing -f as argument. If you know of some other worthy and fresh ad/malware ip lists let me know.
Although my script is OS X only, it's fairly easy to port it to any other UNIX system(I welcome patches to the main script via Github), having such a solution for the Windows platform would be cool too. Maybe someone reading this can weigh in with their solution or insight? Would it work fair enough, is cygwin the only way for automating this? Nonetheless, stay tuned, since I have a similar router solution(AsusWRT, DD-WRT) coming up soon, that steps up the game a notch and provides blocking for your entire network, though it surely doesn't deprecate this host level solution (on a laptop for e.g. that is frequently switching networks).
Pros for this setup:
- Easy setup and update (when compared to a firewall or a custom dns);
- Cross-platform and cross-application solution;
- Faster and less intrusive(also no https mitm) than proxy solutions(such as Privoxy);
- Easy to temporarily disable: just
cp /etc/hosts.d/hosts.1.head /etc/hosts
and to restoregit checkout /etc/hosts
;
Caveats:
- On some operating systems hosts files with tens of thousands of rules might slow name resolution up to a certain degree. In my usage with over 50000 rules, OS X and Linux is quite fine in that regard. If you find that such is your case, maybe using a dns server or firewall rules is better for you;
- Some blank spaces, containers, divs or unresolved error messages will take the place of the ads themselves in sites and apps that don't handle failure very well. You can get rid of the browser related blanks at least by using uBlock extension with just the cosmetic rules enabled(in the extension Settings);
- Related to the previous one, you might experience some failures in certain web related functionality(fairly limited though). Most of them will be social related or news sites that use ad nag pages before they redirect you to the article content itself. Personally I don't care about them and as soon as I hit such a road block I close it and move on. The benefit of more resources and network bandwidth for my system as well as the increased privacy and less clutter in general, totally trumps any minor drawback like this;
- The script relies on the links(1)(or elinks) tool to parse the html page at hosts.neocities.org and extract only the text. On OS X I use homebrew to install additional tools that I need. If you have a better solid solution that relies only on coreutils or other commonly installed shell utilities let me know;