technologies behind google ranking
In my previous post, I introduced the philosophies behind Google ranking. As part of our effort to discuss search quality, I want to tell you more about the technologies behind our ranking. The core technology in our ranking system comes from the academic field of Information Retrieval (IR). The IR community has studied search for almost 50 years. It uses statistical signals of word salience, like word frequency, to rank pages. (See "Modern Information Retrieval: A Brief Overview" for a quick overview of IR technology.) IR gave us a solid foundation, and we have built a tremendous system on top using links, page structure, and many other such innovations.
Search in the last decade has moved from give me what I said to give me what I want. User expectations from search have rightly increased. We work hard to fulfill the expectations of each and every user, and to do that we need to better understand the pages, the queries, and our users. Over the last decade we have pushed the technologies for understanding these three components (of the search process) to completely new dimensions.
When we talk about queries at Google, we use square brackets [ ] to mark the beginning and end of queries (see "How to write queries" by Matt Cutts), a notation I will use throughout this post. (Pages and search results change frequently, so in time, some examples used here may not behave as explained.)
- Understanding pages: Over years we have invested heavily in our crawl and indexing system. As a result we have a very large and very fresh index. In addition to size and freshness, we have improved our index in other ways. One of the key technologies we have developed to understand pages is associating important concepts to a page even when they are not obvious on the page. We find the official homepage for Sprovieri Gallery in London for the Italian query [galleria sprovieri londra], even though the official page does not have either London or Londra on it. In the U.S., a user searching for [cool tech pc vancouver, wa] finds the homepage www.cooltechpc.com even though the page does not mention anywhere that they are in Vancouver, WA. Other technologies we have developed include distinctions between important and less important words in the page and the freshness of the information on the page.
- Understanding queries: It is critical that we understand what our users are looking for (beyond just the few words in their query). We have made several notable advances in this area including a best-in-class spelling suggestion system, an advanced synonyms system, and a very strong concept analysis system.
- Understanding users: Our work on interpreting user intent is aimed at returning results people really want, not just what they said in their query. This work starts with a world class localization system, and adds to it our advanced personalization technology, and several other great strides we have made in interpreting user intent, e.g. Universal Search.
Finally let me briefly mention the latest advance we have made in search: Cross Language Information Retrieval (CLIR). CLIR allows users to first discover information that is not in their language, and then using Google's translation technology, we make this information accessible. I call this advance: give me what I want in any language. A user looking for Tony Blair's biography in Russia who types the query in Russian [Тони Блэр биография] is prompted at the bottom of our results to search the English web with:
Similarly a user searching for Disney movie songs in Egypt with the query [أغاني أفلام ديزني] is prompted to search the English web. We are very excited about CLIR as it truly brings us closer to our mission to organize the world's information and make it universally accessible and useful.I hope my two posts about Google ranking have made it clear that we live and breathe search, and we are more passionate than ever about it. Our fervor for serving all our users worldwide is unprecedented. We pride ourselves in running a very good ranking system, and are working incredibly hard every day to make it even better.
I could go on and on showing examples of state-of-the-art technology that we have developed to make our ranking system as good as it is, but the fact is that search is nowhere close to being a solved problem. Many queries still don't get satisfactory results from Google, and each such query is an opportunity to improve our ranking system. I am confident that with numerous techniques under development in our group, we will make large improvements to our ranking algorithms in the near future.
Related Post
12:50 PM
|
Labels:
search quality
|
This entry was posted on 12:50 PM
and is filed under
search quality
.
You can follow any responses to this entry through
the RSS 2.0 feed.
You can leave a response,
or trackback from your own site.


0 comments:
Post a Comment