Digital Fingerprints For Images: Detecting Image Theft for Free

Photobloggers, typically, have a much more difficult time detecting misuse of their content than writers.

This is because the Internet was built first and foremost for sharing text, and nowhere is this more clear than when we search for something on the Web. It doesn’t matter if we’re doing an image or video search, we’re using text to describe what we want and locating it based upon descriptions and tags.

Though this is very effective for delivering us types of content, it doesn’t work as well for finding duplicates. Once a photo is copied, content creators have little control what text is attached to it or appears around it.

In short, barring some kind of sophisticated image-matching technology, copied photos tend to disappear into the ether unless encountered by accident.

Fortunately, if photographers take a few moments to prepare their photos for easy tracking, they can greatly increase their chances of spotting and stopping a plagiarist.

Attaching Text

The problem with searching for images is that image search engines only understand text. Where a writer can simply punch in a sentence of their work into Google or use a tool such as Copyscape to detect copies of their work, photographers can’t simply upload a copy of their image and tell the search engines to find duplicates.

While there are powerful tools that automatically search for such duplication, they tend to be expensive and beyond the reach of your average photoblogger.

Photographers and artists, however, can use the free search engines to detect infringements if they can find a way to attach text to their images that will be carried with them wherever they are copied. Fortunately, there are several opportunities to do just that and ways to leverage those opportunities into easy copy detection.

Opportunity Knocks

Though we typically think of a jpg file as being purely an image, there are actually several text elements to it.

The first, and most obvious, is the file name itself. The filename is a text field in and of itself and is searchable by nearly all image search engines. If you do a search for nearly any potential file name followed by a “.jpg” in Google Image Search, you’ll find hundreds of matches.

Since many, if not most, cases of image theft do not involve modifying the file name, this is a natural place to hide identifying text. If done properly, it may not even appear to be a detection scheme, but just another file name used to properly mark the image.

Best of all, the process can be done quickly and easily using a bulk renaming tool such as Rename It to automatically modify large numbers of images.

However, since file names can be so easily modified, it makes sense to use a slightly more hidden method of detection. Fortunately, all jpg images have the ability to hide metadata within themselves. This data, known as Exif data, can provide a wealth of information including details about the photographer, the date the image was taken, the camera that was used and even comments about the image.

The greatest thing about Exif data is that it is invisible to the ordinary user as well as the ordinary plagiarist. Exif data is rarely erased or modified when copying images and even clever thieves forget to change the information when reusing an image without permission.

The problem with Exif data is two-fold. First, it is not always indexed properly by all search engines and second is that it is not available with all image types. Indeed, only jpg images seem to display the metadata correctly.

Still, Exif data can provide an extra “red flag” to search tools and can be easily manipulated through a free image editor such as XnView. Because of this, it makes sense to take a few moments to sneak in identifying text into the Exif data, along with author and copyright information, before posting it to the Web.

However, once you have these tools at your disposal, the question becomes what to do with it. The answer, interestingly enough, lies in a technique that text bloggers have been using for years.

Digital Fingerprints

When MaxPower introduced his Digital Fingerprint Plugin last year, he gave bloggers a new means of detecting RSS scraping.

The idea is simple. You take a semi-random but unique string of characters and numbers, embed that into the footer of each post and then search for the string using a regular search engine. Since the string is unique and doesn’t appear anywhere on the site itself, just the feed, only sites that have scraped the feed should be listed.

That same technique can, in turn, be applied to photographs. All you have to do is the following:

Develop a string of 10-12 characters that is unique to you and can identify your works on the Web.
Do a test search for that string to ensure that no other sites or images contain it.
Embed the string into the filename of every image post, ideally right before the extension.
Embed the string into the Exif as a backup measure.
Perform regular searches for the fingerprint. An ideal search string might look something like this: fingerprint -site:yoursite.com

The end result is that anyone who steals your image and leaves the fingerprint intact will be detected. Fortunately, that describes most plagiarists, especially those that are in a “grab and run” mode when looking for new images.

Google Search: Enhancing User Experience After Reddit Blackouts

The system is completely free, can be used by anyone seeking to learn more about how their images are copied on the Web and doesn’t require a great deal of time to implement, just some forward thinking.

However, this isn’t to say that the system is perfect, there are some very severe limitations to it, which are well worth highlighting.

Limitations

The most obvious problem with this system is that the fingerprints themselves can be erased. File names can be changed, metadata can be scrubbed and all of the detection avenues can be wiped. Even a plagiarist not looking to wipe their tracks can easily negate this by simply reformatting the image and changing the file name to their taste.

The other problem is that support for Exif data in search engines is very spotty at best. It may be important to try several different image search engines, including specialized ones such as Picsearch, to see which generate the best results.

In general, it can be assumed that this system is not going to catch all of the plagiarists and perhaps not even most. But it is a far better approach than simply leaving the process of detecting image theft up to chance, which will detect almost nothing and is about as effective as doing nothing at all.

Conclusions

Security is not an absolute thing and it is a matter of being as secure as practical. This system, though free, easy and fast, is far from perfect and should only be used as part of a larger system. Preferably one involving theft prevention tools such as watermarking and overlays.

Fortunately, new tools are on the horizon that will make these hodgepodge systems obsolete and will enable photographers and artists to track their work not just by name, but by content. These tools can track an image even if it is cropped, modified or even reformatted.

In the meantime, those tools remain out of reach for most small photographers. For those of us without the budgets to invest in high-end detection technology, we have to make do with the technology that is available to us.

However, as limited as the techniques may be, the can prove very effective, especially considering that the very nature of the plagiarist is to do as little as possible to get the job done.

Though it may seem strange banking on the laziness of a plagiarist, it stands to reason that if they weren’t lazy or weren’t in a rush, they wouldn’t be stealing other people’s content.

Jonathan Bailey

Jonathan Bailey writes at Plagiarism Today, a site about plagiarism, content theft and copyright issues on the Web. Jonathan is not a lawyer and none of the information he provides should be taken as legal advice.