2.1 Preliminary Analysis of Some Personalized Search Engines Eurekster had launched the concept of personalized search engine in 2004. Neither Google nor Yahoo introduced this service, but instead from Eurekster this service launched, which was opens to the general public on 21 January 2004. Before launching this service the Eurekster site had been involved in a beta test involving only a few hundred people for couple of months. Eurekster provides many options and some of them are searches filtered by friends, SearchMates, sharing searches, other sites, and using any search engines in combination with Eurekster (Sullivan, 2004).
In recent years, three big search providers Google, Yahoo and Microsoft have taken over to dominate personalized searches. Each of the three major search engines use an algorithm. Google uses the Google Hilltop algorithm. Yahoo uses a page rank algorithm, even though it does not always satisfy customers. MSN (Windows Live Search) uses the a keyword based search algorithm. These algorithms will be explained in detail below. Jeh and Widom (2003, 272) explain: The fundamental motivation underlying PageRank is the recursive notion that important pages are those linked-to by many important pages.
A page with only two in-links, for example, may seem unlikely to be an important page, but it may be important if the two referencing pages are Yahoo! and Netscape, which themselves are important pages because they have numerous in-links. One way to formalize this recursive notion is to use the ``random surfer model introduced in [11]. Imagine that trillions of random surfers are browsing the web: if at a certain time step a surfer is looking at page P, at the next time step he looks at a random out-neighbor of P.
As time goes on, the expected percentage of surfers at each page P converges (under certain conditions) to a limit that is independent of the distribution of starting points. Intuitively, this limit is the PageRank of P, and is taken to be an importance score for P, since it reflects the number of people expected to be looking at P at any one time. If P equals page, then r equals rank. PageRank is p(r). These are the basis of most algorithms using page rank. 2.1.1 Google Personalized Search On March 29, 2004, Google provided a new tool that would help people make their personalized web searching.
This Google Personalized Search has allowed people to create a profile of their interests, which then influences the Web site links shown when they conduct a search and was available for testing at the Google Labs. (Bazeley, 2004) Google personalized search left the Google lab and made available to users on 38 domains in addition to google.com on November 2005. Googles personalized searches give more weight to topics that interest one and provided a feature to maintain history of searches on Google, allowing revisiting pages previously visited just by scanning the history of the search.
People who use this service, have to sign up for a Google account such as Gmail, AdSense, and other Google services, like AOL (Sherman, 2005). On February 2, 2007, Google enhanced its personalized search. Now anyone who signs-up for any Google service using a Google Account (such as Gmail, AdSense, Google Analytics among others) will automatically enroll into three additional Google products: Search History, Personalized Search, and Personalized Homepage (Sullivan, 2007). Screenshot 1, 2 and 3 show the personalized search history of Google Personalized Search (Appendix).
The Google Hilltop algorithm powers personalized searches on Google. The Google Hilltop algorithm is defined (Thibodeau, 2003) as: the Google Hilltop algorithm determines the relevance and importance of a specific web page determined by the search query or keyword used in the search box. In its basic, simplest form, instead of relying only on the PageRank™ value to find “authoritative pages”, it would be more useful if that “PR value” (PageRank™ value) would be more relevant by the topic or subject of that same page.
Read More