CONFERENCE: Future of Web Search PRESENTED BY: Ricardo Baetha-Yates DATE: May, 19-20, 2004 LOCATION: Universitat Pompeu Fabra (UPF), Barcelona PAOLO BOLDI Graph Fibrations, graph isomorphism, and PageRank Universitˆ degli Studi di Milano Is a theoretical work, lacking of practical applications. Covering projections in algebraic topology. A graph can be turned into a topological space by considering its geometric realization. A graph fibration is a graph morphism if every ark of H can be uniquely lifted up to the choice of its target. It is a local isomorphism. PANAYIOTIS TSAPARAS Theoretica analysis of Link Analysis Ranking (LAR ) algorithm. A link denotes an endorsement. There is a pletora of LAR algorithms. This study tries to find a way to compare them. The presenter shows different ways to calculate the rank distance. SVD (Singular Value Decomposition) discovers the linear trends in data. LUCA BECCHETTI Using Rank Propagation and Probabilistic Counting for Link-based This talk is about Web spam. Link farms are fake pages linking to a spam page to increase its ranking. The research goal is the statistical analysis for web spam detection. The put together a test collection of UK hosts and they calculated some statistical indicators on the incoming links and outgoing links. PageRank can also be used for spam detection. If the PageRank of the incoming links of a page is statistically distributed then is ok. Else, the incoming links and pages can be machine generated. TrustRank. Other interesting approaches can use an explicit path detection page rank. RELATED BLOG POST -> http://www.seobythesea.com/?p=185 ANDRAS BENCZUR Searching the Web with Low Space Approximations The presenter propose to use sketching projetion instead of reductions. DANIEL GRUHL Blog, Friendship and Geography The blogspace is the collection of weblogs, also called blogosphere. The look and feel of blogs are quirky are highly personal and consumed by a small number of regulare repeated visits. Each blogger has a profile. The speaker propose a reformulation of Stanley Milgram study to dscove the gready routing algorithm for the blogger population. Exploring the experimental data from the Livejournal popolatin seems that the data does not bahave as expected. So he proposes a new definition of distance based on the number of the closer nodes. Q: is it possible instead of guessing the rule to learn the rule? RONNY LEMPEL Compact indexing of versioned data The works builds on the work on indexing shared content. The methods of this theory does not work if the text of the quoted text is changed or interleaved with the answers. ALESSANDRO PANCONESI The Music of the (p) Spheres Universitˆ la Sapienza DEVDATT DUBHASHI The Wisdom of Crowds (who surf) A crowd is better than an expert: example is wikipedia The application examples are collaborative filtering and simultaneous categorization of people and products. SEBASTIANO VIGNA Minimal-Interval semantics: a minimal interval is a part of a document. This semantic method can be used for phrasal queries. MG4J is the engine they build using this technique. twease.org GERHARD WEIKUM Efficient Topk Queries for XML Information Retrieval Max Plan Institute Informatik He shows sme example of queries that Google does not help to answer. The solution is the semantic search and the semantic web. The structure provided by the xml of the semantic web can help combining the content of different web pages that are linked under differt criterias. The core of this study is the automatic expansion of the query using the information contained in the xml descriptors. There is not much semantic xml on the net but a good example is wikipedia. The presenter show the way to extract this information and used in a completely different situation. JUAN HUETE Application of Information Diagrams Structured information tetrieval uses some available meta information for the document to improve the retrieval precision. They tried to extend the retrieval based on bayesian networks for Structured-IR. HUGO ZARAGOZA Tuning Error Optimisation in Ad-Hoc Retrieval Yahoo!Research Barcelona Machine Learning is useful to say how a new feature should be combined with a classical information retrieval engine. This is to exhaust the space of the possible combinations. DAVID LOSADA The importance of Sentence Recognition. JOSIANE XAVIER PARREIRA Efficient and Decentralized PageRank Approximation in a P2P Web Search Network Peer to Peer search. The general goal of this esearch fied is that every node retains a part f the index. The goal is to put together these parts in a meaningful way (efficiency driven). NICK CRASWELL Image Search Live Microsoft Research Cambridge Studying the logs of the queries from the live service they stated looking for more obscure and less frequent queries. They have more than 200 features in the ranking system RankNet. One of the result shows that people keep going more in depth into the results when looking for images. They studied the different interfaces and they realized that often the user scrolled and clicked next. To solve this they thought of implementing an "infinite scroll". They used the blog search to gather opinions on their implementation of the image search. They used graphviz to show how to the images could be grouped. Also they used this technique to understand how the user proceeded browsing the collection. On of the marker they implemented is colled Normalised Discounted Cumulative Gai (NDCG). PAUL-ALEXANDRE CHIRITA Current Approaches to Personalized Web Search User profiling is a a core component for search systems. There is a gap in the current literature on personalized search. Either they focus on where to gather the personal information or how to represent this ifnromation. A simple idea was to put together a personalized PageRank. The presenter shows a couple of other techniques to compute personal PageRank. Another research area is to filter the results fllowing different criteria. Another idea is to apply some Re-Ranking techniques. Another approach is that of using LSA in personalized search (Sun et. al.: CubeSVD, WWW 2005). Future profiles include the optimization of more complex profiling taking data from the desktop. RICARDO BAETA-YATES Applications of Mining Web Queries Web Mining and Web Search There is a huge collection of observed data: query logs, user logs. We have also the users actions that can tell us a lot about the search experience. [1] mindset.research.yahoo.com The idea is not to personalize the user but to personalize the task. Another objective of the study was to perform topic extraction from query analysis. PAOLO ROSSO Web mining for Natural Language Engineering Tasks ANDREI BRODER From Query Based Information Retrieval to context driven Information Supply Intersting presentations on the history of IR. Pew Foundation reports. Third generation engines manages to recognize the keywords in the query to tailor the information proposed to the user. The main move was from the syntactic matching to the trivial semantic matching which is IMO quite effective. According to him, there should not be the need of formulating the queries the information should come to you directly. FINGER was a protocol to see if there was an user online. Now Skype is much more easily accessible to achieve the same. The car navigation systems is another good example. How does this work in the IR contect. The idea that there should be a constant feed for current conditions to the searh system. Web advertising is an example of information supply. -- REAL-TIME NOTES / ANNOTATIONS OF THE PAPER: - -- REFERENCES: {as documents / sites are referenced add them below} http://www.wordspy.com/ http://geourl.com/ ... -- NOTES ON / KEY TO THIS TEMPLATE: A headline (like a field in a database) will be CAPITALISED This differentiates from the text that follows A variable that you can change will be surrounded by _underscores_ Spaces in variables are also replaced with under_scores This allows people to select the whole variable with a simple double-click A tool-tip is lower case and surrounded by {curly brackets / parentheses} These supply helpful contextual information. --