Now Reading
Archiving Blogs and the Blogosphere

Archiving Blogs and the Blogosphere

As blogs are becoming a more mature medium, research into the history of blogs becomes even more relevant. Earlier this year an article by the Wall Street Journal celebrated the 10th anniversary of blogs with Jorn Barger’s Robot Wisdom as of December 23, 1997. Not only was the author of the article accused of getting the history wrong and re-writing history it also heated up the debate on what the first blog was. (Note: the site is not up anymore, but here’s a useful resource.)

As Rex Hammock points out there is no single history of blogs and argues that “everyone should write their own version of the history of blogging.” As blogging is a practice that has shaped itself over time it is nearly impossible to point to one single blog as “the first blog” in retrospect. Blogs evolved out of a practice that is still developing and shaping itself. The debate surrounding the article also showed how poorly the blogosphere is archived and how difficult it is to conduct research on the history of blogs.

As much as the blogosphere is focused on time, the web is oblivious to time.

The format of the blog revolves around time with timestamped entries displayed in a reverse chronological order. The front page shows the most recent posts and as the number of posts grows old entries might be moved to the blog’s archive. A blog archives itself but who is archiving the blogosphere?

An early attempt in indexing the blogosphere was made by Brigitte Eaton who started to compile a list of blogs she knew of on Eatonweb* in 1999. She manually kept the list up-to-date and manually judged inclusion requests based on her loose criterion of blogs as “something that’s organized chronologically and updated fairly regularly.” (Internet Archive)

Eaton also checked how long a blog had been around as to “weed out the one’s where people give up after a few weeks.” She saw blogging as a process and was aware of constant changes over time. Eatonweb created a graveyard for dead blogs by providing a “[dead?] link” which “is a way of notifying me that a weblog no longer exists or its url has changed. it won’t delete it from the database, it just gets flagged.” (Internet Archive)

Eatonweb became an established source and index of blogs and its archive provides a rich history. Indexing happens at a specific moment in time and by connecting these snapshot of time an archive is created. The Internet Archive provides a good resource for dead blogs and blogs with ever changing links as it provides snapshot in time. Blog indexes such as Eatonweb and Technorati are constantly updating their indexes but they don’t provide snapshots in time of their own index.

Popular blog indexes and search engines such as Technorati and Google Blog Search are very much focussed on time with the ability to sort by freshness. On the other hand they are completely oblivious to time as they don’t allow to search over time. As Dave Winer and Doc Searls noted it is impossible to search for a word in time ” Find me the first use of a word on the Web. It seems they could do that.” (Winer)

See Also
Google search

Google has been experimenting with the Timeline feature (view:timeline) for a while now but it looks at the date in the text instead of the date of creation. This would be a very useful feature in Google Blog Search: query + timestamp. As pretty much all blog posts are timestamped it should be possible to search for a query in time.

Not everyone seems to be convinced that time works both ways as Technorati recently dropped content from its index that is over six months old. Time is a very important aspect in blogs and the blogosphere and the focus should not only be on freshness but also on history. A blog is not only as good as its latest post but as good as its whole archive.

* Disclaimer: Eatonweb is currently owned and run by Splashpress Media (also the owner of the Blog Herald.)

View Comments (5)
  • Interesting perspective on the “history of blogs”. My personal site was started in 1994, but became what is known as a blog in 1996, including more personal stories and news than just articles.

    The chronological method of displaying content I started a year before, but it was all done manually. Any “archiving”, a term which needs more definition, was done by the creation of a static page for each article/blog post. The front and category pages were created manually with excerpts, linking to the physical page. I guess that’s an archive, right?

    The dynamic aspect of what came to be known alternately as CMS and blog platforms removed the manual creation of these pages, making the process SO much easier.

    Does that mean my site was one of the first? No. Among the first, sure. However, it took a very long time for me to call that website a blog. I still call it a website. I have blogs, and that site is run on WordPress, but it’s not a “blog” in my mind, even after all these years.

    Finding the “first blog” might make things more complicated as some of us were blogging before we knew what blogging was, and even now, have issues with the label. Interesting twist, huh? :D

  • Once I get home I am going to dig up the CD-ROM with a backup of my first website on it. I sure hope it still works though as I recently noticed some of my old backups don’t work anymore. I hardly remember what my first site looked like, I think I updated it with a “new” reference.

    It is impossible to “name” a phenomenon in retrospect and than decide who was doing “it” and who wasn’t. I also think that what we now consider to be a blog is different than years ago when terms such as weblog and blog originated. Media and practices evolve which some people like and some people don’t (“this is not a blog”). A whole book could/should be written on the subject ;)

Scroll To Top