Zeal Closure

Everyone knows that you can use an Internet directory to find Websites. What may come as a surprise is the array of other uses for ODP data. Never underestimate human ingenuity.

The Open Directory Project was conceived in 1998 as a rival to Yahoo! At the time the Yahoo! directory was the world's most popular Internet search tool, but its small, paid staff could not keep pace with the explosive growth of the World Wide Web. The new-founded ODP, on the other hand, invited all Net-citizens to join and help catalogue the Web. To convince the Net community that they'd be working for all the Web and not only for the directory owner, Netscape took a revolutionary step: they made the complete directory available for download, under a free use license - for everybody, including their own competitors.

Seven years later... what is ODP data used for today?

In the early days of the free use license, cloning the directory, or parts of it, was the most frequent form of use, and it remains popular today. The number of sites which host copies can only be estimated, but it runs into many thousands. A common approach is to add advertising and site previews to the directory and give it a branded design. However the more creative users have not been content to simply copy. They reorganise. While the ODP lists sites within categories in alphabetical order, several downstream users have their own system of ranking. The best known are the Google Directory, which lists sites by its own proprietary PageRank, and Alexa, which arranges by its own statistics on popularity. Other users mix and mingle ODP listings into new categories.

At the next level, the massive Open Directory provides a happy hunting ground for search engine robots. So many search engines built their first index from the ODP that Brett Tabke of Webmaster World declared it "literally the mother of all search engines." (WMW thread) The ODP also provides a convenient source of site titles and descriptions, which can be used in the display of search engine results. Google, Yahoo, MSN Search and Ask all make use of them where other sources fail them.

The true cross-breeds between search engine and directory go much further. Experimenters in this field have turned again and again to the vast database of the ODP. It is used for example by the clustering search engine Exalead. Or a vertical search engine can be created by indexing the sites listed in a specific ODP category. Gigablast demonstrates this on a grand scale, providing a search for every category. Then again ODP data can feed into to the 'related links' displayed by search tools such as the UCMore toolbar, or to the training of focused crawlers, as used by Data Fountains. The ODP may even contribute to search engine algorithms. For example Baoning Wu, Vinay Goel and Brian D. Davison recently used a sub-set of ODP data to devise Topical TrustRank.

But the story of creativity doesn't end here. Nearly 5 million categorised urls and a huge human-created ontology are a magnet for researchers not only in search, but from many other fields. Over 100 scholarly papers have been published using ODP data. The sample listed in ODP Research Papers illustrates the range of topics, from search applications to text classification, from semantic analysis to testing software.

The enthusiasm of volunteers has created the world's biggest directory. That achievement has fed a fountain of ideas in the world of search. Who knows where they may one day lead?

- jeanmanco & chris2001

This article is based on the results of the Investigating Data Use project. Editors can find the complete study in the DmozWiki.

 

Top

Please send all comments, questions or suggestions to the newsletter editor.