Plagiarism-Fighting Network Tooks: Part Two

Filed as Features, Guides on July 30, 2007 12:30 pm

In the first part of this series, we discussed whois and DNS/reverse DNS, the two most fundamental tools in tracking down and stopping a plagiarist or a site ripper.

Though they are powerful and important tools, they are not perfect. Definitively tracking down the host, the goal of almost any plagiarist/scraper hunt, often times requires delving into more advanced tools, including IP whois and traceroute, both of which we will discuss here.

As with the first article, we will continue to use Domain Tools as our primary resource for these tools. However, even though Domain Tools automates many of these functions, it is important to know the how they work both so we can use them elsewhere if needed and to understand their limitations.

So, without further ado, here is how to definitively track down the host of a Web site and determine who to send notification to.

IP Whois

In the first article, we learned how to find the IP address of a plagiarist site using DNS. However, that IP address is more valuable than just a number. That IP address “belongs” to someone. Though some large organizations own very large IP blocks, most IP addresses are distributed to ISPs by registries such as the American Registry for Internet Numbers (ARIN).

A reverse IP lookup simply identifies who the owner of that particular IP address is. Separate from the traditional whois lookup, which tells you who registered the domain name, the IP whois lookup lets you know who owns the IP address itself and, thus, who is responsible for the server located there.

To perform a IP whois lookup on Domain Tools, simply click the red “W” next to the IP address. It will take you to a page like this one that displays information about the company or organization responsible for the IP number.

You can perform an IP whois on ARIN’s site as well as the sites of the other IP number registries.

Well over 99% of the time, an IP whois will tell you who the host of a site is definitively and give you the information you need to track down the host. In the vast majority of cases, this is the tool that I, and other plagiarism hunters, use to determine who the host is and find out where to send a notice.

Traceroute

The only problem with an IP whois is that it can not account for situations where an ISP or other organization buys a block from the registrar but then sells them or leases them to another company. Though another company is responsible for the server at the IP, the information in the lookup will be for the first company.

Though this happens very rarely since most ISPs and hosts want to own the IPs they use, it does happen occasionally. I recently noticed a situation similar to this with some sites hosted on Verio. Resolving that situation and finding the host requires using a different tool known as a traceroute.

A traceroute simply tracks Internet traffic as it goes to the server. Since all Internet traffic goes through a series of routers and servers, it is possible to track the “hops” and look at where the site is actually hosted.

When you perform a traceroute on an IP, such as this one, you see a series of connections. The ones at the top are the ones closest to where the traceroute started and the ones closest to the bottom are closest to the destination, or site you are trying to look up.

By looking at the domains and information for the last few hops, you can usually determine who is hosting the server. This information can be useful two different ways. First, you can use it to detect when a smaller host is leasing servers from a larger one, such as when a small foreign company is reselling space from a larger, more local, Web host. Second, it can also detect a situation where the IP whois information is not completely accurate.

To perform a traceroute on Domain Tools, simply click the purple “T” to the far right of the IP address on the main whois page. It will feed you the traceroute results automatically, but bear in mind that it will begin at their traceroute server, not your home machine.

Though using a traceroute is almost always definitive, performing and interpreting one is complicated and time-consuming, often times there are several domains to research and the results, depending upon the network structure, can be confusing to read. The relative simplicity of IP whois makes it a much better choice for tracking down a host, but leaves the traceroute tool as a necessary backup.

Conclusions

Using the four tools discussed so far, one can easily and definitively track down nearly any host on the planet. To escape detection using these tools a host would most likely have to be engaging in either unethical or outright illegal behavior. Any reputable host should be easily identified using these methods.

However, in this day and age of automated scraping and spam blog networks, sometimes finding the host isn’t enough. Sometimes the operation stretches out much farther and it is important to track down as much of the operation as possible.

Doing that requires the use of still other tools, which we will discuss next week in the third and final part of this series.

Tags: ,

This post was written by

You can visit the for a short bio, more posts, and other information about the author.


Submissions & Subscriptions

Submit the post to Reddit, StumbleUpon, Digg or Del.icio.us.

Did you like it? Then subscribe to our RSS feed!



  1. » Plagiarism-Fighting Network Tooks: Part TwoJuly 30, 2007 at 12:57 pm
  2. By Lucian posted on July 30, 2007 at 6:14 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    There are some useful tools which show a map with the exact location of an IP address Ip address lookup

    Reply

  3. By Jonathan Bailey posted on July 30, 2007 at 10:20 pm
    Want an avatar? Get a gravatar! • You can link to this comment

    Lucian,

    The tool you’re talking about is sometimes known as a geographic traceroute or geolocation. It’s an interesting service, I’ve used one provided by Visualware called Visualroute a few times that is very impressive.

    The problem is that the information is not always accurate. AOL addresses always show up as being from Virginia, my wireless card always shows up as Chicago and the information in these results is often misleading.

    Furthermore, it doesn’t usually provide any necessary information. Knowing where a host is geographically can be better determined by looking at the host’s Web site and only would affect what law you use to report them.

    It’s a neat trick but one I haven’t found a lot of use for.

    However, the site you link to is very neat. It is extremely simple to use. I’ll have to keep it in mind should it become necessary.

    Thank you for the tip!

    Reply

    Your words are your own, so be nice and helpful if you can. If this is the first time you're posting a comment, it might go into moderation. Don't worry, it's not lost, so there's no need to repost it! We accept clean XHTML in comments, but don't overdo it please.

    Current ye@r *