Web 3.0: Chicken Farms on the Semantic Web

January 2008

Web 3.0: Chicken Farms on the Semantic Web
Rensselaer Polytechnic Institute

The explosive growth of blogs, wikis, social networking sites, and other online communities has transformed the Web in recent years. The mainstream media has taken notice of the so-called Web 2.0 revolution—stories abound about events such as Facebook’s huge valuation and trends like the growing Hulu-YouTube rivalry and Flickr’s role in the current digital camera sales boom.

However, a new set of technologies is emerging in the background, and even the Web 2.0 crowd is starting to take notice.

The Semantic Edge

One of the best-attended sessions at the 2007 Web 2.0 Summit (www.web2summit.com) was called “The Semantic Edge.” Its theme was the use of semantic technologies to bring new functionality to such Web staples as search, social networking, and multimedia file sharing.

The session included beta demos by Metaweb Technologies (www.metaweb.com), which bills itself as “an open, shared database of the world’s knowledge”; Powerset (www.powerset.com), a company building intelligent search tools with natural-language technology; and Radar Networks (www.radarnetworks.com), whose Twine tool, shown in Figure 1, aims to “leverage and contribute to the collective intelligence of your friends, colleagues, groups and teams.”

Figure 1 image

Figure 1. Twine, a beta tool released by Web 3.0 start-up Radar Networks, uses Semantic Web technologies to help users organize, find, and share online information.

Not included in the panel but working in the same space are other newcomers such as Garlik (www.garlik.com), a UK company creating tools to control personal information on the Web; online TV provider Joost (www.joost.com); Talis (www.talis.com), a vendor of software that makes data “available to share, remix and reuse”; and TopQuadrant (www.topquadrant.com), which offers consulting, teaching, and tool development in this space.

More established companies exploring semantic technologies for the Web include Mondeca (www.mondeca.com), a European enterprise information integration company, and Ontoprise (www.ontoprise.de), a German vendor of ontology-related tools. Big industry players like Oracle, Microsoft, and IBM are also getting into the game.

The WEB and Web 2.0

All of this activity suggests that a new set of Web technologies is transitioning from toys and demos to tools and applications. Of course, this isn’t the first time this has happened.

In the mid-1990s, the Web seemed to bloom overnight: Companies started putting Web addresses on their products, personal home pages began springing up, and Mark Andreesen’s Mosaic browser got millions of downloads as more people discovered the World Wide Web.

The technology had actually been around for some time—Tim Berners-Lee created the Web in 1989—but it wasn’t until this later time that it turned a knee in the growth curve and became one of the most important applications in history.

Another wave of technologies, dubbed Web 2.0 by Tim O’Reilly, began to emerge a few years later. Newspapers began losing subscribers to news blogs, encyclopedia companies woke up to discover Wikipedia was forcing them to change the way they work, and “google” became a verb on everybody’s lips. Even those who weren’t computer geeks began to talk about Flickr, YouTube, and Facebook.

Again, these technologies required time to mature, catch on virally, and turn that knee in the curve before they enjoyed widespread adoption.

Toward Web 3.0

A new generation of Web applications, which technology journalist John Markoff called “Web 3.0” (“Entrepreneurs See a Web Guided by Common Sense,” The New York Times, 12 Nov. 2006), is now starting to come to the public’s attention. Companies like those showcased at the Web 2.0 Summit’s “Semantic Edge” session are exploiting years of behind-the-scenes development, and there is growing excitement in the commercialization of what, until now, has been a slowly expanding wave of activity.

Although semantic technologies have been around for a while, activity under the name “Semantic Web” really began to take off around 2000. Development of the Resource Description Framework was under way at the World Wide Web Consortium, which produced a first specification in 1999. However, the W3C metadata activity that had spawned it was inactive, and some original RDF supporters were shifting investment to other areas, such as XML and Web services, making it hard for the RDF adherents to find resources for further development.

The change came with an investment in the technology by the US Defense Advanced Research Projects Agency, which saw extending RDF as a way to deal with numerous interoperability problems plaguing the US Department of Defense, particularly with respect to sharing information across organizational boundaries. DARPA joined with the European Union’s Information Society Technologies project, interested in similar issues, to form an ad hoc research group to explore how to apply some ideas from the field of AI to meet these needs.

This research investment brought together a curious mixture of Web gurus looking to bring data to the Web, AI practitioners starting to appreciate the power that scaling small amounts of semantics to Web size could provide, and visionary government data providers with interoperability problems that increasingly demanded solutions. These funds also supported development of early Semantic Web demos and tools that came to the attention of industrial researchers.

In 2001, the W3C renewed work in this area under the banner of the Semantic Web Activity (www.w3.org/2001/sw), and within a couple of years, new working groups were looking at improving the RDF standard; completing the standardization of RDF Schema (RDFS), a vocabulary definition language on top of RDF; and beginning work on OWL, an ontology language for the Web. In February 2004, new versions of RDF and RDFS, and the first version of OWL, became W3C Recommendations—standards for the Web.

Chicken-and-egg problems

With any new technology, the transition from research to practice and from standards to deployment imposes a time delay. This delay can sometimes be quite long, as a real chicken-and-egg problem arises: Tool vendors and manufacturers are reluctant to implement products until they see a market forming, but the market doesn’t tend to form until the tools are available. The length of the delay thus typically depends on how soon vendors hear the demand from users and can get prototypes and tools to them.

However, the Semantic Web involves several other chicken-and-egg problems.

First, these applications require, in part or whole, data that is available for sharing either within or across an enterprise. Represented in RDF, this data can be generated from a standard database, mined from existing Web sources, or produced as markup of document content.

Machine-readable vocabularies for describing these data sets or documents are likewise required. The core of many Semantic Web applications is an ontology, a machine-readable domain description, defined in RDFS or OWL. These vocabularies can range from a simple “thesaurus of terms” to an elaborate expression of the complex relationships among the terms or rule sets for recognizing patterns within the data.

(While the Semantic Web community has long recognized that these different vocabulary levels fill different niches in the Web ecology, some critics mistakenly assume all Web ontologies are of the latter type. Overcoming this misunderstanding continues to be a challenge to the community.)

Finally, Web 3.0 applications require extensions to browsers, or other Web tools, enhanced by Semantic Web data. As in the early days of the Web when we were creating HTML pages without being quite sure what to do with them, for a long time people have been creating and exchanging Semantic Web documents and data sets without knowing exactly how Web applications would access and use them.

The advent of RDF query languages, particularly SPARQL (currently a W3C Candidate Recommendation), made it possible to create three-tiered Semantic Web applications similar to standard Web applications. These in turn can present Semantic Web data in a usable form to end users or to other applications, eliciting more obvious value from the emerging Web of data and documents.

However, motivating companies or governments to release data, ontology designers to build and share domain descriptions, and Web application developers to explore Semantic-Web-based applications all hinge on one another. Accomplishing this has sometimes been a daunting proposition.

Recent trends

Despite these challenges, the pace of semantic technology development has accelerated recently. In the early days of the technology, small companies tried—sometimes unsuccessfully—to create Semantic Web tools. During the past couple of years, however, larger companies have begun providing tools and technologies, both in product sets and open source offerings, and some of the biggest names in the data and software sectors have been testing the water.

Government data sets are being shared, small Semantic Web domain descriptions like the Friend of a Friend ontology are seeing great uptake (FOAF files currently number in the tens of millions), and SPARQL end points have motivated many Web application developers to seriously look at this technology. This in turn has led new start-ups to focus less on the tool market and more on user-facing applications.

Emerging Web 3.0 companies are combining the Web data resources, standard languages, ever-better tools, and (mostly simple) ontologies into applications that take advantage of the power of this new breed of semantic technologies. The entrepreneurs behind these efforts are exploiting the convergence of Semantic Web capabilities to embed small amounts of reasoning into large-scale Web applications, with tremendous potential.

It’s an exciting time for those of us who have been evangelists, early adopters, and language designers for Semantic Web technology. What we see in Web 3.0 is the Semantic Web community moving from arguing over chickens and eggs to creating its first real chicken farms. The technology might not yet be mature, but we’ve come a long way, and the progress promises to continue for a long time to come.


Jim Hendler
is the Tetherless World Senior Constellation Professor at Rensselaer Polytechnic Institute. Contact him at hendler@cs.rpi.edu.