TY - JOUR AB - In this paper, we describe six algorithmic problems that arise in web search engines and that are not or only partially solved: (1) Uniformly sampling of web pages; (2) modeling the web graph; (3) finding duplicate hosts; (4) finding top gainers and losers in data streams; (5) finding large dense bipartite graphs; and (6) understanding how eigenvectors partition the web. AU - Henzinger, Monika H ID - 11762 IS - 1 JF - Internet Mathematics SN - 1542-7951 TI - Algorithmic challenges in web search engines VL - 1 ER - TY - CONF AB - Web search engines have emerged as one of the central applications on the internet. In fact, search has become one of the most important activities that people engage in on the Internet. Even beyond becoming the number one source of information, a growing number of businesses are depending on web search engines for customer acquisition. In this talk I will brief review the history of web search engines: The first generation of web search engines used text-only retrieval techniques. Google revolutionized the field by deploying the PageRank technology – an eigenvector-based analysis of the hyperlink structure- to analyze the web in order to produce relevant results. Moving forward, our goal is to achieve a better understanding of a page with a view towards producing even more relevant results. Google is powered by a large number of PCs. Using this infrastructure and striving to be as efficient as possible poses challenging systems problems but also various algorithmic challenges. I will discuss some of them in my talk. AU - Henzinger, Monika H ID - 11801 SN - 0302-9743 T2 - 2th Annual European Symposium on Algorithms TI - Algorithmic aspects of web search engines VL - 3221 ER - TY - CONF AB - Web search engines have emerged as one of the central applications on the Internet. In fact, search has become one of the most important activities that people engage in on the the Internet. Even beyond becoming the number one source of information, a growing number of businesses are depending on web search engines for customer acquisition. The first generation of web search engines used text-only retrieval techniques. Google revolutionized the field by deploying the PageRank technology – an eigenvector-based analysis of the hyperlink structure – to analyze the web in order to produce relevant results. Moving forward, our goal is to achieve a better understanding of a page with a view towards producing even more relevant results. AU - Henzinger, Monika H ID - 11800 SN - 0302-9743 T2 - 31st International Colloquium on Automata, Languages and Programming TI - The past, present, and future of web search engines VL - 3142 ER - TY - CONF AB - In this article we describe the approach taken by the first web search engines, discuss the state of the art, and present some of the challenges for the future. AU - Henzinger, Monika H ID - 11859 SN - 0277-786X T2 - SPIE Proceedings TI - The past, present, and future of web information retrieval VL - 5296 ER - TY - JOUR AB - The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesting and valuable information can be extracted, such as the distribution of web pages over domains, the distribution of interest in different areas, communities related to different topics, the nature of competition in different categories of sites, and the degree of communication between different communities or countries. AU - Henzinger, Monika H AU - Lawrence, Steve ID - 11877 IS - suppl_1 JF - Proceedings of the National Academy of Sciences SN - 0027-8424 TI - Extracting knowledge from the World Wide Web VL - 101 ER -