Building Web 2.0

May 2007

Building Web 2.0
University of California, Irvine

The global buildup of Internet connectivity and growing availability of inexpensive computing and communication devices have made the World Wide Web a virtual continent that is borderless. Anyone in the world with a computer and Internet access can now explore, join, build, or abandon any Web community at any time.

This new freedom is often attributed to the “Web 2.0 era” of services and applications that let webizens easily share opinions and resources. Consequently, users can collectively contribute to a Web presence and generate massive content behind their virtual collaboration.


Tim O’Reilly was among the first to evangelize the concept of Web 2.0, coining the phrase in 2004. He reflected a year later that “One of the key lessons of the Web 2.0 era is this: Users add value…. Therefore, Web 2.0 companies set inclusive defaults for aggregating user data and building value as a side-effect of ordinary use of the application” (

Following O’Reilly’s definition, Web 2.0 technologies provide rich and lightweight online tools that let users contribute new data they can aggregate to harness a community’s “collective intelligence.” However, Web 2.0 should not be equated with such technologies.

In his Internet Alchemy blog, Ian Davis asserts that “Web 2.0 is an attitude not a technology” ( “It’s about enabling and encouraging participation through open applications and services,” he adds. “By open I mean technically open with appropriate APIs but also, more importantly, socially open, with rights granted to use the content in new and exciting contexts.”

Web 2.0 thus represents a paradigm shift in how people use the Web. While most users were once limited to passively viewing Web sites created by a small number of providers with markup and programming skills, now nearly everyone can actively contribute content online. Technologies are important tools, but they are secondary to achieving the greater goal of promoting free and open access to knowledge.

Toward that end, Web 2.0 systems should be simple, scalable, and sensible.

Not all users are technically savvy. A Web 2.0 system should provide a simple interface so that even the least sophisticated webizen can contribute input. Simplicity is important so that common people, not just experts, can build and use the Web.

All webizens should have an equal opportunity to participate in Web 2.0 systems. Popular systems must employ fair and widely accepted protocols to accommodate numerous users without discrimination. Scalability is especially important on the Web given its global reach.

A Web 2.0 system should be able to digest all legible input, regardless of the source, and produce sensible conclusions. This could be as simple as using visitor counts to identify the most popular pages or materials, or as sophisticated as doing trend analysis similar to that used by program trading in stock markets.


There is no one set of technologies that every Web 2.0 system uses. Any Web-based software that lets users create and update content is arguably a Web 2.0 technology. However, several families of technologies that encourage user participation and social networking are associated with the Web 2.0 era.

Many new technologies make the Web interface smooth and intuitive. Ajax, JavaScript, Cascading Style Sheets (CSS), Document Object Model (DOM), Extensible HTML (XHTML), XSL Transformations (XSLT)/XML, and Adobe Flash provide users with a rich and fun interactive experience without the drawbacks of most old Web applications. These technologies display and deliver Web services just like desktop software, making distributed processing difficulties invisible.

Other new technologies make it easy for Web services to connect to multiple data and information sources. XML-RPC, Representational State Transfer (REST), RSS, Atom, mashups, and similar technologies facilitate the subscription, propagation, reuse, and intermixing of Web content.

Perhaps the most important resource for Web 2.0 is the user. Providing friendly tools for user participation in content creation, consumption, and distribution has been the key to success (and failure) for many startups in the Web 2.0 era. Technologies such as blogs, wikis, podcasts, and vod-casts foster the growth of new Web communities.

Technologies are also in place to make Web sites more scalable. For example, Google and Yahoo! process most requests in less than a second, and connections to popular user-based Web sites such as YouTube and Flickr are nearly effortless.


Compared to technologies that make Web sites simpler to use and more scalable, those designed to produce and manage collective intelligence are relatively immature. Imple- menting scalability can indeed be challenging, but sensibility comes at variable sophistication levels.

User feedback

Hit counters roughly indicate Web sites’ relative popularity, while the volume of user comments provides a measure of user participation. However, these and other such simple metrics do not necessarily communicate the value of online content.

Some Web sites drill further down by asking users to indicate whether certain information is helpful or even to rate it on, say, a scale of 1 to 10 and then indexing the results. Nevertheless, the widespread reluctance of many people to provide feedback severely limits the effectiveness of such mechanisms.

Recommender systems

Most current Web 2.0 sites were originally designed to be either user data repositories (such as YouTube and Flickr) or social networks (like MySpace and Xanga). They thus lack structured intelligence and present popular results in an ad hoc manner. Finding meaningful information can be almost impossible; most of the time, bumping into something interesting is pure luck.

To address this problem, some Web sites feature recommender systems that employ filtering technologies to point users to objects of interest. Collaborative filtering provides personalized recommendations based on individual user preferences as well as those of other users with similar interests, while content-based filtering analyzes and rates the content of infor- mation sources to create profiles of users’ interests.

Search engines

Most Web 2.0 sites include search engines to help users locate content others have created. These systems retrieve information by inspecting keyword metatags embedded by the author. However, such tags might be created randomly and not correlate with the actual content.

Newer versions of search engines use a combination of data content (term frequency and density), data context (file name and domain name), and the number of incoming links (PageRank data). Web 2.0 site developers must continue developing better techniques to provide more effective search capability, especially for multimedia content.


Mashups are a simple and powerful Web 2.0 content creation/reuse technology that lets users integrate information from multiple sources to provide an enriched experience. For example, it’s possible to build a Web site that shows application-specific data next to photos selected from Flickr at run time or atop locations displayed on a Google map. The content origins of newly created pages can be explicitly acknowledged or embedded in the production process.

The quality of service in mashups depends on its composite services—low-quality output from one service can degrade the quality of its successors. Thus, when a mashup contains many service providers, determining individual services’ accountability is necessary to properly attribute credit, identify the root of a problem, or improve the complete process (Y. Zhang, K-J. Lin, and Jane Y.J. Hsu, “Accountability Monitoring and Reasoning in Service-Oriented Architectures,” Service-Oriented Computing and Applications, vol. 1, no. 1, 2007, pp. 35-50).

Web 2.0 has the democratic goal of allowing—in fact, encouraging—all webizens to create, share, distribute, and enjoy ideas and information. To reach this goal, Web-based systems must be simple to use, highly scalable, and rich in sensible content. Among these qualities, sensibility is the hardest to master and will experience the most technological breakthroughs.

Only when this goal is accomplished will it be possible to identify the common set of Web 2.0 capabilities requiring support in all “webfront” devices, much as PC desktops now offer standard Web connection and browser features.

Kwei-Jay Lin is a professor in the Department of Electrical Engineering and Computer Science at the University of California, Irvine. Contact him at

Web 3.0: Chicken Farms on the Semantic Web

January 2008

Web 3.0: Chicken Farms on the Semantic Web
Rensselaer Polytechnic Institute

The explosive growth of blogs, wikis, social networking sites, and other online communities has transformed the Web in recent years. The mainstream media has taken notice of the so-called Web 2.0 revolution—stories abound about events such as Facebook’s huge valuation and trends like the growing Hulu-YouTube rivalry and Flickr’s role in the current digital camera sales boom.

However, a new set of technologies is emerging in the background, and even the Web 2.0 crowd is starting to take notice.

The Semantic Edge

One of the best-attended sessions at the 2007 Web 2.0 Summit ( was called “The Semantic Edge.” Its theme was the use of semantic technologies to bring new functionality to such Web staples as search, social networking, and multimedia file sharing.

The session included beta demos by Metaweb Technologies (, which bills itself as “an open, shared database of the world’s knowledge”; Powerset (, a company building intelligent search tools with natural-language technology; and Radar Networks (, whose Twine tool, shown in Figure 1, aims to “leverage and contribute to the collective intelligence of your friends, colleagues, groups and teams.”

Figure 1 image

Figure 1. Twine, a beta tool released by Web 3.0 start-up Radar Networks, uses Semantic Web technologies to help users organize, find, and share online information.

Not included in the panel but working in the same space are other newcomers such as Garlik (, a UK company creating tools to control personal information on the Web; online TV provider Joost (; Talis (, a vendor of software that makes data “available to share, remix and reuse”; and TopQuadrant (, which offers consulting, teaching, and tool development in this space.

More established companies exploring semantic technologies for the Web include Mondeca (, a European enterprise information integration company, and Ontoprise (, a German vendor of ontology-related tools. Big industry players like Oracle, Microsoft, and IBM are also getting into the game.

The WEB and Web 2.0

All of this activity suggests that a new set of Web technologies is transitioning from toys and demos to tools and applications. Of course, this isn’t the first time this has happened.

In the mid-1990s, the Web seemed to bloom overnight: Companies started putting Web addresses on their products, personal home pages began springing up, and Mark Andreesen’s Mosaic browser got millions of downloads as more people discovered the World Wide Web.

The technology had actually been around for some time—Tim Berners-Lee created the Web in 1989—but it wasn’t until this later time that it turned a knee in the growth curve and became one of the most important applications in history.

Another wave of technologies, dubbed Web 2.0 by Tim O’Reilly, began to emerge a few years later. Newspapers began losing subscribers to news blogs, encyclopedia companies woke up to discover Wikipedia was forcing them to change the way they work, and “google” became a verb on everybody’s lips. Even those who weren’t computer geeks began to talk about Flickr, YouTube, and Facebook.

Again, these technologies required time to mature, catch on virally, and turn that knee in the curve before they enjoyed widespread adoption.

Toward Web 3.0

A new generation of Web applications, which technology journalist John Markoff called “Web 3.0” (“Entrepreneurs See a Web Guided by Common Sense,” The New York Times, 12 Nov. 2006), is now starting to come to the public’s attention. Companies like those showcased at the Web 2.0 Summit’s “Semantic Edge” session are exploiting years of behind-the-scenes development, and there is growing excitement in the commercialization of what, until now, has been a slowly expanding wave of activity.

Although semantic technologies have been around for a while, activity under the name “Semantic Web” really began to take off around 2000. Development of the Resource Description Framework was under way at the World Wide Web Consortium, which produced a first specification in 1999. However, the W3C metadata activity that had spawned it was inactive, and some original RDF supporters were shifting investment to other areas, such as XML and Web services, making it hard for the RDF adherents to find resources for further development.

The change came with an investment in the technology by the US Defense Advanced Research Projects Agency, which saw extending RDF as a way to deal with numerous interoperability problems plaguing the US Department of Defense, particularly with respect to sharing information across organizational boundaries. DARPA joined with the European Union’s Information Society Technologies project, interested in similar issues, to form an ad hoc research group to explore how to apply some ideas from the field of AI to meet these needs.

This research investment brought together a curious mixture of Web gurus looking to bring data to the Web, AI practitioners starting to appreciate the power that scaling small amounts of semantics to Web size could provide, and visionary government data providers with interoperability problems that increasingly demanded solutions. These funds also supported development of early Semantic Web demos and tools that came to the attention of industrial researchers.

In 2001, the W3C renewed work in this area under the banner of the Semantic Web Activity (, and within a couple of years, new working groups were looking at improving the RDF standard; completing the standardization of RDF Schema (RDFS), a vocabulary definition language on top of RDF; and beginning work on OWL, an ontology language for the Web. In February 2004, new versions of RDF and RDFS, and the first version of OWL, became W3C Recommendations—standards for the Web.

Chicken-and-egg problems

With any new technology, the transition from research to practice and from standards to deployment imposes a time delay. This delay can sometimes be quite long, as a real chicken-and-egg problem arises: Tool vendors and manufacturers are reluctant to implement products until they see a market forming, but the market doesn’t tend to form until the tools are available. The length of the delay thus typically depends on how soon vendors hear the demand from users and can get prototypes and tools to them.

However, the Semantic Web involves several other chicken-and-egg problems.

First, these applications require, in part or whole, data that is available for sharing either within or across an enterprise. Represented in RDF, this data can be generated from a standard database, mined from existing Web sources, or produced as markup of document content.

Machine-readable vocabularies for describing these data sets or documents are likewise required. The core of many Semantic Web applications is an ontology, a machine-readable domain description, defined in RDFS or OWL. These vocabularies can range from a simple “thesaurus of terms” to an elaborate expression of the complex relationships among the terms or rule sets for recognizing patterns within the data.

(While the Semantic Web community has long recognized that these different vocabulary levels fill different niches in the Web ecology, some critics mistakenly assume all Web ontologies are of the latter type. Overcoming this misunderstanding continues to be a challenge to the community.)

Finally, Web 3.0 applications require extensions to browsers, or other Web tools, enhanced by Semantic Web data. As in the early days of the Web when we were creating HTML pages without being quite sure what to do with them, for a long time people have been creating and exchanging Semantic Web documents and data sets without knowing exactly how Web applications would access and use them.

The advent of RDF query languages, particularly SPARQL (currently a W3C Candidate Recommendation), made it possible to create three-tiered Semantic Web applications similar to standard Web applications. These in turn can present Semantic Web data in a usable form to end users or to other applications, eliciting more obvious value from the emerging Web of data and documents.

However, motivating companies or governments to release data, ontology designers to build and share domain descriptions, and Web application developers to explore Semantic-Web-based applications all hinge on one another. Accomplishing this has sometimes been a daunting proposition.

Recent trends

Despite these challenges, the pace of semantic technology development has accelerated recently. In the early days of the technology, small companies tried—sometimes unsuccessfully—to create Semantic Web tools. During the past couple of years, however, larger companies have begun providing tools and technologies, both in product sets and open source offerings, and some of the biggest names in the data and software sectors have been testing the water.

Government data sets are being shared, small Semantic Web domain descriptions like the Friend of a Friend ontology are seeing great uptake (FOAF files currently number in the tens of millions), and SPARQL end points have motivated many Web application developers to seriously look at this technology. This in turn has led new start-ups to focus less on the tool market and more on user-facing applications.

Emerging Web 3.0 companies are combining the Web data resources, standard languages, ever-better tools, and (mostly simple) ontologies into applications that take advantage of the power of this new breed of semantic technologies. The entrepreneurs behind these efforts are exploiting the convergence of Semantic Web capabilities to embed small amounts of reasoning into large-scale Web applications, with tremendous potential.

It’s an exciting time for those of us who have been evangelists, early adopters, and language designers for Semantic Web technology. What we see in Web 3.0 is the Semantic Web community moving from arguing over chickens and eggs to creating its first real chicken farms. The technology might not yet be mature, but we’ve come a long way, and the progress promises to continue for a long time to come.

Jim Hendler
is the Tetherless World Senior Constellation Professor at Rensselaer Polytechnic Institute. Contact him at


Corpus-based language studies : an advanced resource book

By Tony McEnery; Richard Xiao; Yukio Tono

Corpus-based language studies : an advanced resource book
by Tony McEnery; Richard Xiao; Yukio Tono
Type: bks
Publisher: London ; New York : Routledge, 2006.
Analyzing linguistic data : a practical introduction to statistics using R

By Harald Baayen

Analyzing linguistic data : a practical introduction to statistics using R
by Harald Baayen
Type: bks
Publisher: Cambridge : Cambridge University Press, 2007.
My notes: none


Corpus linguistics : critical concepts in linguistics

By Wolfgang Teubert; Ramesh Krishnamurthy

Corpus linguistics : critical concepts in linguistics
by Wolfgang Teubert; Ramesh Krishnamurthy
Type: bks
Publisher: London ; New York : Routledge, 2007.

Computational linguistics and intelligent text processing : 8th international conference, CICLing 2007, Mexico City, Mexico, February 18-24, 2007 : proceedings

By Alexander Gelbukh

Computational linguistics and intelligent text processing : 8th international conference, CICLing 2007, Mexico City, Mexico, February 18-24, 2007 : proceedings
by Alexander Gelbukh
Type: bks
Publisher: Berlin ; New York : Springer, ©2007.
New media in politics : a comparison of attitudes in liberal and conservative Web logs

By Caitlin Clark Fahey

New media in politics : a comparison of attitudes in liberal and conservative Web logs
by Caitlin Clark Fahey
Type: bks
Publisher: Norton, MA : Wheaton College, 2007.
My notes: none

Blogs from the liberal standpoint : 2004-2005

By Lawrence R Velvel

Blogs from the liberal standpoint : 2004-2005
by Lawrence R Velvel
Type: bks
Publisher: Andover, Mass. : Doukathsan Press, ©2006.


Author               European Semantic Web Conference (4th : 2007 :
Innsbruck, Austria)

Title                The Semantic Web : research and applications : 4th
European Semantic Web Conference, ESWC 2007, Innsbruck,
Austria, June 3-7, 2007 : proceedings / Enrico
Franconi, Michael Kifer, Wolfgang May (eds.).

Published            Berlin ; New York : Springer, c2007.

Electronic Resource
n=0302-9743&volume=4519 [ Restricted to Springer LINK
subscribers ]

Electronic Resource  Table of contents only:

Location/Request     Library Service Center | 001.64 L471, v. 4519 LSC

Description          xviii, 830 p. : ill. ; 24 cm.

Series               Lecture notes in computer science, ISSN 0302-9743 ;

Contents             Invited talks — Best papers — Semantic Web services
— Ontology learning, inference and mapping — Case
studies — Social Semantic Web — Ontologies :
requirements and analysis — Personalization —
Foundations of the Semantic Web — Natural languages
and ontologies — Applications — Querying and Web data
models — System descriptions.

Surfing the Library 2.0 Wave

Surfing the Library 2.0 Wave


There’s an ocean full of metaphors we can use to help us grasp the opportunities arising from “everything 2.0,” but I have a favorite–surfing. Surfing sounds great, but requires skills that “hodads” may not have thought about. There are wipe outs galore, courage is required, and most of all, staying in synch with a moving target requires the ability to focus. So in thinking about 2.0 stuff, I’ll stick with my surfing metaphor, knowing that all of us have plenty of career experience ending up in the white water.

Coining “Library 2.0” out of the ubiquitous “Web 2.0” was a marketing stroke of genius: Not only is it true, but “unbelievers” can really “get” what you’re trying to say with a few terse and well-chosen words. Most of our great thinkers have adopted Library 2.0 rhetoric in outreach, marketing strategies, and budgetary ploys, and you know, I think we’re on top of the rhetoric. But when it comes to actually implementing a bold new Library 2.0 step, where to begin, and what to do? I’m going to offer some starting points in this column, knowing many of us have already started paddling into the “2.0” break, and are well past the point of taking 2.0 surfing lessons at a virtual Waikiki.

The Old Library 1.0 Hot Doggers Are Hanging 10

Personally, I think there are a few things that need to be said up front, even at the risk of repeating myself. What we’ve always done–the Library 1.0 part–is not atavistic, but cutting-edge. I wrote at length about how to use core library skills to break into new organizational roles in a recent article that appeared in ONLINE (September/October 2006, p. 21), so I won’t restate everything here. I’ll just leave it at this: Those of us who can provide strategic reference services, articulate a meaningful digital preservation policy, and collect the knowledge our users need, are right on track. That’s Library 1.0, pure and simple, but it’s a terrific tidal chart for surfing into the 2.0 point break. There’s an added benefit: 1.0 strategies work best if we take an activist stance. Library 1.0 services must be pushed forward (via blogs, podcasts, wilds, and more); marketed (one-on-one, to the media, to our users); broadcast (relentlessly, using the deep and powerful rhetoric about knowledge management at our disposal); and sustained (in other words, get out of your office and go talk to people).

Info pros who can analyze their career situations using Library 1.0 principles are very well positioned to make bold moves with new technology. Simple, right? No, not really–like real surfing, it takes focus, a certain degree of courage, and a plan. Here are two zones of opportunity I’ve identified recently, while avoiding wipeouts.

Know Your ‘E-Roles’

One way of analyzing the new flexibility we enjoy is to identify our “e-roles,” as Marydee Ojala does in her editorial remarks in the September/ October issue of ONLINE. Nowadays, we can take multiple roles within organizations, as well as in helping our users. The key analytical task we must employ is to evaluate 2.0 technologies. such as social networking software, and determine where we can add value. Here again, a little 1.0 savvy has its benefits. Even as social networking software (think Facebook and MySpace) is flourishing, the mainstream media is already beginning to report on burnout with it. College students are “rediscovering” the value of a small circle of friendships with people they see often. Facebook, meet face time. The successful 2.0 librarian is a trend spotter, and there’s one that was a no-brainer for tech-watchers.

Our e-roles, both the known and the yet-to-emerge, have never been more diverse. One reason for this is the growing awareness among management thinkers that “cross functional” work roles can boost creativity and productivity. So info pros who can combine library skill, IT know-how, even tutoring and teaching, can add substantial value to organizations. The new zones of collaboration help to reposition our collections and services, and present us with daily opportunities to innovate, For example, what would you do if you worked in an organization where IT staff did not address any content issues, yet the CIO had de facto control over networked content? Such places are not bard to find. Strategies abound, and here’s one: Talk to IT staff, talk to management, talk to everyone–and take over the content management role. Likewise, if you work in a community of practice where communications aren’t moderated or shepherded, would you sense an opportunity? The 2.0 info pro definitely would. It could be a fertile space for a wiki, a multiuser blog, or archived podcasting.

Grasping all of your potential e-roles can unlock doors which seemed forever shut, but in these times, the new flexibility is infectious, memetic, and pervasive. It used to be that “marketing the library” was a daring, guerilla sort of thing to do, best performed by the natural extroverts among us. But Library 2.0 mainstreams marketing, socializing, networking, jumping in without permission, finding links and connections others can’t see, and so on. Library 2.0, with its emphasis on empowering communication in all directions, has handed us a golden opportunity to help management sort out the “E” in the “E-Organization,”

Library 2.0 Boldly Faces Space Usage

Remember, digital libraries are a collection of both services and media. Hence my second 2.0 field of opportunity–our legacy of large amounts of physical space. It’s difficult to generalize about library space, because a public library system’s needs differ from research university needs, and special libraries tend to be unique. But there is a unifying reality that spans most types of physical space: We can now accomplish much more with digital resources than ever before, and we have a chance to reconfigure our space. And we are not the only ones who know it.

It can be a little scary to reassess real estate, since it’s “location, location, location,” and if library collections disappear from immediate sight and go into remote locations, they may be at risk. But society at large now accepts digital media, even as it continues to love buying and borrowing books; we can’t hide from that. Instead, we should embrace the moment.

My view is that it’s better to be bold and address things directly. In corporate firms, virtual libraries with remote staff are pretty common, and many info pros are thriving in this environment. Other organizations, like historical societies, need print–but often back up their treasures with dark archives. Universities face space demands of every sort. Where does Library 2.0 end up in the equation?

The answer comes in two parts, and the first is more important. Library 2.0, as I argue above, is about people communicating. Think first of functional space for staff, and how it interfaces with the public. Are people mixing enough? Second, think of print collections, with a cold and objective heart. It’s time to take a hard look at the balance between high-use print material housed locally and off-site print or dark archives ~running in the background.” It’s a good idea to have a daring space plan ready at all times. It should fully preserve the local print collection that is most needed, yet also allow for storing other material off-site. A bold approach might define you, the information professional, as an avatar for 21st-century information management.

I never recommend action I wouldn’t try myself. In 2004, faced with a faculty boss who wanted to either update or close my library, I presented my ready-made plan in detail. It included weeding more than 10,000 items and moving staff, and I knew it would cause pain to our senior emeriti, who had lovingly supported our library since 1945. But opportunity abounded: My faculty boss had only general ideas of what he wanted, so I was able to drive the design process, advancing the principles of the “learning commons.” It was like launching into a 30-foot wave in Waimea Bay, because it could’ve all gone down in white water (i.e., I’d be running a conference room, not a library). But it didn’t: My collection plan saved our unique materials, extended Wi-Fi service into a full Information Gateway, and added digital projection capabilities. Now we are custodians of a truly beautiful Library Commons in a historic landmark building. In fact, we’ve grown in net space if you count the new downstairs storage area we gained. It won’t work in every environment, but the question Fm asking is, “What are we holding onto, and why?” If you can answer that in your own situation, you may already hold the keys to transforming physical space that is lying fallow into a more vibrant zone for interaction.

Going from the Known to the Unknown

My forecast for Library 2.0 is that it will be a moving target, just like successive “sets” of waves that come from the open ocean. Here are some examples of why I believe that.

Just 2 years ago, social bookmarking was a new animal; today, it’s “folksonomy” and it’s studied in graduate school. Also, during the 2005-2006 academic year, the single most popular platform for viewing faculty lectures at UC Berkeley was by podcast–and viewing can be verifiably linked to downloads to Apple iPods. Webcasting is a distant second by downloads. Just this week, Tim Berners-Lee, the Web’s creator, called for the academic study of “Web Science.”

It’s a fast world these days. Happily, really big research library systems, like the one I work in, have given up on seeing themselves as static institutions with eternal charges writ in stone. Instead, survival depends on “continuous planning.”

What holds true for august institutions also holds true for individuals. We are all continuous planners now, and have been for some time. Hardly a month goes by when I fail to see a new device or application reviewed in The New York Times or the San Francisco Chronicle that has a direct, immediate impact on how I perform reference and Web administration right now. Sometimes the lapse between news and implementation drops to mere days. But my Library 1.0 skills have been a great preparation for fast change. Just last week, I assisted a professor with an op-ed piece he was writing while he was away in Washington, D.C. The answers he needed lay in more than one place–books, reference databases, and in Wikipedia. He got his verified answers by email, and the end product appeared on the editorial page of The Sacramento Bee.

Hey, it’s great catching the Library 2.0 wave–even with my 1.0 longboard.


By Terence K. Huwe


Terence K. Huwe is director of library and information resources at the University of California-Berkeley’s Institute of Industrial Relations. His responsibilities include library administration, reference, and overseeing Web services for several departments at campuses throughout the University of California. His email address is

YouTube Everywhere

March 11, 2008  |  Posted by: The YouTube Team  |  Permalink

YouTube Everywhere

We try really hard to make YouTube as open as possible. Anyone can upload and view videos, which can be embedded anywhere and viewed on all kinds of different devices. And, of course, anyone can participate in our community by commenting on videos, rating them, and sharing them with friends.

Nevertheless, we worried that we weren’t open enough. So, we pulled some all-nighters and added some powerful new ways to integrate YouTube content and community into other websites, desktop applications, video games, mobile devices, televisions, cameras, and lots more.

For users, the exciting news is that they will be able to actively participate in the YouTube community from just about anywhere, including the online destinations and web communities they already love and visit regularly. For partners and developers, YouTube has grown into much more than a website. It has become an open, general purpose, video services platform, available for use by just about any third-party website, desktop application, or consumer device. We now provide a complete set of (CRUD) capabilities for uploading, managing, searching, and playing back user videos and metadata from the YouTube “cloud,” managed by us. We do all of the hard work of transcoding and hosting and streaming and thumbnailing your videos, and we provide open access to our sizable global audience, enabling you to generate traffic for your site, visibility for your brand, or support for your cause. Meanwhile, we provide full access to our substantial video library, enabling you to attract users and enhance the experience on your site. It’s all free, and it’s available to everyone, starting now.

Technically, we have introduced some new APIs. (This is just a geeky acronym for Awesomely Powerful Interactions, which is what users are now capable of performing from just about anywhere.) Building upon our existing APIs for querying the YouTube library and playing embedded YouTube videos, we have added the following new API services for external developers and partners:

  • Upload videos and video responses to YouTube
  • Add/Edit user and video metadata (titles, descriptions, ratings, comments, favorites, contacts, etc)
  • Fetch localized standard feeds (most viewed, top rated, etc.) for 18 international locales
  • Perform custom queries optimized for 18 international locales
  • Customize player UI and control video playback (pause, play, stop, etc.) through software

The number of possible new applications is endless. Electronic Arts has enabled gamers to capture videos of fantastical user-generated creatures from their upcoming game, Spore, and publish these directly into YouTube. The University of California, Berkeley is bringing free educational content to the world, enhancing their open source lecture capture and delivery system to publish videos automatically into YouTube. Animoto enables its users to create personalized, professional-quality music videos from their own photos and upload them directly to YouTube. Tivo is providing its users a rich and highly participative YouTube viewing experience on the television. For more details about the innovative ways these other partners are utilizing YouTube APIs, see our case studies.

What will you build with our new APIs? The possibilities are endless. Learn more about the exciting new features in this release from our engineers or visit Google Code to get started now. We can’t wait to see what you create.

Pay per Click – PPC

Pay per click (PPC) is an advertising model used on search engines, advertising networks, and content websites/blogs, where advertisers only pay when a user actually clicks on an ad to visit the advertiser’s website. Advertisers bid on keywords they predict their target market will use as search terms when they are looking for a product or service. When a user types a keyword query matching the advertiser’s keyword list, or views a page with relevant content, the advertiser’s ad may be shown. These ads are called a “Sponsored link” or “sponsored ads” and appear next to or above the “natural” or organic results on search engine results pages, or anywhere a webmaster/blogger chooses on a content page.

Pay per click ads may also appear on content network websites. In this case, ad networks such as Google AdSense and Yahoo! Publisher Network attempt to provide ads that are relevant to the content of the page where they appear, and no search function is involved.

While many companies exist in this space, Google AdWords, Yahoo! Search Marketing, and Microsoft adCenter are the largest network operators as of 2007. Minimum prices per click, often referred to as Costs Per Click (CPC), vary depending on the search engine, with some as low at $0.01. Very popular search terms can cost much more on popular engines. Arguably this advertising model may be open to abuse through click fraud,

Wikipedia should go for-profit, give profits away

Wikipedia Should Go For-Profit, Give Profits Away

By Dan Malven

Congratulations to the Wikimedia Foundation for being named a Technology Pioneer 2008 by the World Economic Forum. It’s a fantastic accomplishment. But as Wikimedia becomes a pioneering global citizen, we should challenge their leadership to do more than harness the collective intelligence of its millions of collaborators to just educate, inform and entertain the world for free.

According to my calculations, the collective intelligence captured in Wikimedia’s properties could (and should!) be used to generate over US$1 billion for charitable causes in less than four years, and by year five should be contributing over US$1 billion per year to the world’s charities!

Sounds crazy? Yes. Is it? No! Read on.


Newman’s Own is a U.S. based company that is a role model for what Wikimedia should become. Newman’s Own, owned by famous actor Paul Newman, operates as a regular profit-oriented company with competitive product offerings and market-rate salaries, marketing, distribution, R&D, administration and other general corporate expenses. The big difference is that all net profits generated by the business are donated to charity. If you live in the U.S. you have probably seen Paul Newman’s image clothed in garish costumes on the packaging of salad dressings, potato chips, salsa and other food products. It is shameless exploitation of Paul Newman’s image and reputation. But that exploitation has reportedly generated over US$200 million in charitable contributions since its formation in 1982. In fact, its corporate motto is ‘Shameless exploitation in pursuit of the common good.’ A book with that as its title describes the story and its charitable beneficiaries.

If anyone reading this is unfamiliar with Wikimedia properties, they are community-created web sites with enormous consumer traffic. The largest property,, is a ‘crowd-sourced’ online encyclopedia. According to my friends at the leading online media measurement company comScore Media Metrix, the Wikimedia family of sites, when viewed as a single property, is the 6th most popular online media property in the world, in terms of monthly unique visitors (UV’s) and is a Top 25 Global property in terms of monthly page views (PV’s). And every page view is somebody researching something.

It is an iron-clad truth that revenue can be generated from online consumer traffic. Particularly from online consumer traffic where people are searching for something (see revenue and earnings of GOOG). Some percentage of the inquiries at Wikimedia properties has commercial value and advertisers would pay a lot of money to be in front of those people as they do their research. Wikimedia would not even need their own sales force to generate revenue because there are middle-man companies called advertising networks that have already aggregated advertisements from lots of different advertisers that would fall over themselves to serve their ads up on Wikimedia sites (the largest ad network is Google’s Adsense). All Wikimedia would have to do is paste a small bit of code on every page and voila, a money machine is created.

Now here is where the math comes in and the numbers get mind-boggling: First, a term of art in the interactive advertising business is RPM, and it stands for Revenue Per Thousand Page Views (M represents the thousand). A really low-quality site in terms of the traffic having commercial value (think chat and some social networks) can generate about US$1 RPM just by signing up for various ad networks and not even having their own sales force. Sites with traffic constituting more commercial value can get up to US$5 RPM using the ad networks. Sites with high-value traffic that hires its own sales force and works closely with advertisers and their agencies can generate US$20 RPMs and up.

A venture capitalist recently posted his views on expected RPMs for different types of businesses. If you follow that link, you will see that Wikimedia would be classified as a business with many different “endemic” advertising opportunities, which he puts in the US$20 RPM range and up (examples of endemic advertising opportunities in Wikipedia would be entries on cities and countries would attract travel-related advertisements, entries on information technology would attract IT-related advertisements, etc.). Certainly not all of Wikimedia traffic would have endemic advertising opportunities / high commercial value, but a lot of it would and it would probably match or exceed the percentage of Google searches that have commercial value relative to all Google searches.

According to comScore Media Metrix, the Wikimedia family of sites generated 40 billion page views from October 2006 to October 2007! And the number of monthly page views in October 2007 was 66% higher than the monthly number from October 2006. Lets use that to make some conservative estimates:

  • Lets cut the growth rate in half to 33% and estimate 53 billion total page views for 2008.
  • Lets assume that on average Wikimedia’s traffic can only generate US$2 RPMs.

That is still US$106 million in revenue! And the operating expenses are very low, and this effort would not add much more to it, so almost all of it would go to charity (Wikimedia’s financial statements are available here, and they show the year ending June 30, 2006 had total operating expenses of less than US$1 million). Now maybe there isn’t enough ad inventory available immediately in the ad networks to service all these page views, but Wikimedia could flex its traffic-volume muscles, and its social mission, and pretty quickly take inventory allocation away from other properties to satisfy its page volume appetite.

Looking out a bit into the future and the numbers get really nutty. But its not unreasonable since its an online property that operates at a very high usage scale (just a notch below Google and Yahoo), yet has NONE of the product development and R&D costs that other for-profit companies must incur – their whole product is created by collaborators contributing their time for free. Lets look five years out, at 2012:

  • Assume an average annual page-view growth rate of 20% over that period, yielding just over 100 billion page views in 2012
  • Assume that Wikimedia has developed its corporate and technical infrastructures and now has a direct sales force focused on its high-value vertical content areas, world-class tools and programs for advertisers and can serve video and animated content and ads and can now generate an average of US$10 RPM. (And there is a chance that some of the governments of the world could be convinced to treat the buying of ads on Wikimedia sites as tax-deductible because it would be in effect a donation to charity.)

That is US$1 billion in revenue, with maybe US$50 million of expenses and all the rest of it donated to charity! And every year the amount gets larger and larger as the traffic volume on Wikimedia continues to grow and the internet advertising industry continues to grow.

Lets put some context to these numbers. Rick Reilly wrote a great article in Sports Illustrated on how the United Nations estimates that over 1 million people in Africa die every year from malaria and that 650,000 of those people could be saved if they just had a mosquito net to sleep under. It costs in total about US$20 to get a mosquito net manufactured, delivered and installed in Africa. Based on that article Sports Illustrated started a program in partnership with the U.N. Foundation called Nothing But Nets. US$13 million donated to that program could save the 650,000 preventable malaria deaths in Africa each year. Even with the lowest assumptions from its first year of operations, Wikimedia has the earnings potential to generate US$13 million for charity in about 6 weeks! This is but one of many examples of how this kind of money can help improve the world.

Even if generating profits caused Wikimedia to vaporize itself after 6 weeks and cease to exist, it would still be the right thing to do if it could save 650,000 lives.

But generating profits would not cause Wikimedia to vaporize itself. It would do the opposite. It would grow stronger every day. Instead of the mythical perpetual motion machine, it would become a real-world perpetual money machine.

The contributors that create the Wikimedia properties would contribute even more passionately because its for the common good. Consumers would consume Wikimedia content even more than they do now because its for the common good. Advertisers would advertise on it at or above market rates because its for the common good. A portion of the money generated would be reinvested to hire smart people at competitive salaries in ad sales, technology and administration, improve its infrastructure to serve video and rich media content and create state-of-art tools for content contributors and advertisers. Wikimedia would have potential employees, corporate partners and advertisers banging its door down to be a part of a company with this kind of social mission combined with its innovation pedigree.

What would be a greater testament to the triumph of free markets than for everyone to supply what they are capable of, as content contributors, consumers or advertisers, in an effort to create a sustainable source of money for those less fortunate? It would be a free market based, global perpetual motion money machine for the common good.

I recently watched an interview with Jimmy Wales, the founder of the Wikimedia Foundation where he said that most of the world’s ills are caused by a lack of education and information. And that you can look at the social value of Wikimedia through a utopian lens and believe that by educating and informing people, it can have a positive social impact. I agree with all that. But I don’t think anyone can disagree that it still takes ‘filthy lucre’ to make things happen a lot faster.

There is a presidential election cycle here in the U.S. right now. Maybe we can get a candidate to adopt this as a cause and use their bully pulpit to make it happen. Or maybe this effort needs its version of Paul Newman to be the face of it. It’s a global operation so it needs someone with global recognition Oprah Winfrey? Global icon and Queen of All Media? Oprah has announced her retirement from her TV show in 2011, maybe she’s looking for a new effort?

Deploying billions of dollars to charitable causes is no small task. It takes a well-run, global organization to do it effectively. Someone should step up and offer their leadership skills. Maybe Wikimedia’s coming out party as a pioneering global citizen at the World Economic Forum in January 2008 can start the ball rolling.

To paraphrase Paul Newman:

To the leadership of the Wikimedia Foundation: If you are unwilling to shamelessly exploit yourselves for the common good, then shame on you.

To all of us as contributors to, consumers of and potential advertisers on Wikimedia properties: If we are unable to compel the Wikimedia leadership to shamelessly exploit themselves for the common good, then shame on us.

Dan Malven is an entrepreneur, venture capital investor, husband and father of three. He runs Drumcott Capital and writes the blog Startup Conversations, where this essay originally appeared.