It delivers results that are closer to "live" than Google's previous system, the company said.
Previously, Google would crawl a fraction of the Web each night, index it and push it out in its results. With Caffeine, as Google crawls the Web and finds new information, it indexes it immediately. "We process it immediately so we can serve it seconds later," said Matt Cutts, the head of Google's webspam team. He unveiled the news at the Search Marketing Expo in Seattle.
When Google started, it would update its index only every four months, he said. Around 2000, it started indexing every month in a process that took a week to 10 days. "The funny thing is, we didn't have enough capacity to update all our data centers at once," he said. That meant that people might get different results when searching for the same term if they were hitting different Google data centers.
Caffeine went live in the last week or so and is now being used in all Google data centers.
Caffeine provides 50 percent fresher results for web searches than their last index, and it's the largest collection of web content thy have ever offered. Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.
In addition to serving "fresher" results, Caffeine "massively increases our ability to scale up," Cutts said. The company will be able to index many more documents -- "on the order of 100 petabytes," he said.
Caffeine adds new information at a rate of hundreds of thousands of gigabytes per day, Google said in a blog post.
The progression in how Google does its indexing mirrors how people increasingly expect to find the very latest information online. Google noticed that after the Sept. 11 attacks on the U.S., when people were looking for the most up-to-the-minute information possible, Cutts said.