Toward the Analysis of over 10 billion Web pages and Reliability Verification of Search Engines’ Hit Counts: Current Status and Future Directions

Dr. Hayato Yamana
Waseda University, Japan


Abstract

Two topics will be presented during this colloquium. Part 1: Toward the Analysis of over 10 billion Web pages by Hayato YAMANA (Prof. of Waseda Univ.) In the Japanese project called e-Society, over 10 billion Web pages have been gathered. Many research challenges shall be initiated by using the data. In this talk, I will show some statistic results by analyzing the data. Moreover, I will discuss future research challenges by using the data. Part 2: Reliability Verification of Search Engines’ Hit Counts: Current Status and Future Directions by Koh Satoh (Bachelor Student at Yamana Laboratory) and Hayato YAMANA In this talk, we provide a scientific basis to adopt search engines’ hit counts, numbers returned as search result counts. Since many studies adopt search engines’ hit counts to estimate the popularity of a particular query, the reliability of hit counts is indispensable for archiving trustworthy studies. However, hit counts are unreliable because they will ―dance, i.e., change, when a user clicks the ―Search button more than once or clicks the Next button on the search results page, or when a user queries the same term on another day. We have analyzed the characteristics of hit count transition by gathering various types of hit counts over two months by using 10,000 queries. Then, we have evaluated our defined basis. Finally, we will show you our future plan.