In the first part of this series, we discussed whois and DNS/reverse DNS, the two most fundamental tools in tracking down and stopping a plagiarist or a site ripper.
Though they are powerful and important tools, they are not perfect. Definitively tracking down the host, the goal of almost any plagiarist/scraper hunt, often times requires delving into more advanced tools, including IP whois and traceroute, both of which we will discuss here.
As with the first article, we will continue to use Domain Tools as our primary resource for these tools. However, even though Domain Tools automates many of these functions, it is important to know the how they work both so we can use them elsewhere if needed and to understand their limitations.
So, without further ado, here is how to definitively track down the host of a Web site and determine who to send notification to.
In the first article, we learned how to find the IP address of a plagiarist site using DNS. However, that IP address is more valuable than just a number. That IP address “belongs” to someone. Though some large organizations own very large IP blocks, most IP addresses are distributed to ISPs by registries such as the American Registry for Internet Numbers (ARIN).
A reverse IP lookup simply identifies who the owner of that particular IP address is. Separate from the traditional whois lookup, which tells you who registered the domain name, the IP whois lookup lets you know who owns the IP address itself and, thus, who is responsible for the server located there.
To perform a IP whois lookup on Domain Tools, simply click the red “W” next to the IP address. It will take you to a page like this one that displays information about the company or organization responsible for the IP number.
Well over 99% of the time, an IP whois will tell you who the host of a site is definitively and give you the information you need to track down the host. In the vast majority of cases, this is the tool that I, and other plagiarism hunters, use to determine who the host is and find out where to send a notice.
The only problem with an IP whois is that it can not account for situations where an ISP or other organization buys a block from the registrar but then sells them or leases them to another company. Though another company is responsible for the server at the IP, the information in the lookup will be for the first company.
Though this happens very rarely since most ISPs and hosts want to own the IPs they use, it does happen occasionally. I recently noticed a situation similar to this with some sites hosted on Verio. Resolving that situation and finding the host requires using a different tool known as a traceroute.
A traceroute simply tracks Internet traffic as it goes to the server. Since all Internet traffic goes through a series of routers and servers, it is possible to track the “hops” and look at where the site is actually hosted.
When you perform a traceroute on an IP, such as this one, you see a series of connections. The ones at the top are the ones closest to where the traceroute started and the ones closest to the bottom are closest to the destination, or site you are trying to look up.
By looking at the domains and information for the last few hops, you can usually determine who is hosting the server. This information can be useful two different ways. First, you can use it to detect when a smaller host is leasing servers from a larger one, such as when a small foreign company is reselling space from a larger, more local, Web host. Second, it can also detect a situation where the IP whois information is not completely accurate.
To perform a traceroute on Domain Tools, simply click the purple “T” to the far right of the IP address on the main whois page. It will feed you the traceroute results automatically, but bear in mind that it will begin at their traceroute server, not your home machine.
Though using a traceroute is almost always definitive, performing and interpreting one is complicated and time-consuming, often times there are several domains to research and the results, depending upon the network structure, can be confusing to read. The relative simplicity of IP whois makes it a much better choice for tracking down a host, but leaves the traceroute tool as a necessary backup.
Using the four tools discussed so far, one can easily and definitively track down nearly any host on the planet. To escape detection using these tools a host would most likely have to be engaging in either unethical or outright illegal behavior. Any reputable host should be easily identified using these methods.
However, in this day and age of automated scraping and spam blog networks, sometimes finding the host isn’t enough. Sometimes the operation stretches out much farther and it is important to track down as much of the operation as possible.
Doing that requires the use of still other tools, which we will discuss next week in the third and final part of this series.