TITLE OF PAPER: Keynote and User Experience papers PRESENTED BY: CONFERENCE: SIGIR DATE: August, 7, 2006 LOCATION: University of Washington, Seattle, USA -- REAL-TIME NOTES / ANNOTATIONS OF THE PAPER: Jamie Callahan introduces Keith van Rijsbergen. He published this book Information Retrieval. Fabio Crestani was one of his students. KEITH VAN RIJSBERGEN KEYNOTE: QUANTUM HAYSTACKS Keith worked with Gery Salton. He mention two books that higly influenced his work: G. Salton, Automatic Information Organization and Retrieval, 1968. And N. Jardine & R. Sibson, Mathematical Taxonomy, 1971. We need theories to combine logic and probability. Top-down versus bottom up (principled theories). Science of the Artificial. There is a need of test collection. Relevance is not a static notion. How do we handle relevance when is dynamic. The probability of replication is the ability to port the result obtained in a certain test situation to another contecxt. Relevance can be considered as an event or a property. Work on Clustering has gone backwards. You can start thinking about clustering without introducing algorithms. Similarity measurements are still an open question. Can I measure the effectivenes of the clustering? Can we describe it mathematically? The cluster hypothesis: cluster-based retrieval has as its foundation a hypothesis, which states tat closely associated documents tend to be relevant to the same request. What are you trying to measure? Underlying conjoint structure mapped to numerical representation. E & F measures serve the purpose to compare point and curves results -> interpolation. The IR demon is the quivalent of Maxwell deamon for probabilistic information retrieval. EUGENE AGICHTEIN the goal is to harness rich user interactions with search results to improve quality of search. Linking implicit interactions and explicit judgments [Fox et al, TOIS 2005] browsing behavior of individual users is inflenced by many factors. Rich user interaction space. One of the goal is to predict user preference. RankNet is a neural net trained specifically for ranking. One of the goal of the work was to understand what presentation features were helpful for the retrieval. ANDREW TURPIN User Perfomance versus Precision Measures for Simple Search Tasks. Do metrix match user experience? We can calculate the Mean Average Precision for a bunch of queries and then do some statistics on top of those. Assumption: more relevant documents high in the list is good. Do users want more than one relevant document? Do users read lists top to bottom? Who determines relevance? Binary? Conditional or state-based? MAP is tractable but it does not reflect user's experience. In the experiment in year 2000 they proved that even if map was different, user experiemnces were the same. In this year experiment, they artificially contstructed lists of results with different map values. They found that there was the same time required to find the relevant results. The conclusion is that the metrices in use like MAP, P@1, P@5 do not allow us to compare IR systems the assumption that an increase in MAP translates into an increase in user performance or satisfaction is not true. EUGENE AGICHTEIN Web Search Ranking -> users can help indicating which results are more relevant. User behavior in the wild is not reliable. How can we integrate user interaction into ranking? Personalization -> rerank Collaborative filtering -> directhit General ranking They used RankNet [Burges et al]. Their idea was to integrate the User behavior directly into the ranker. Incorporating user behavior into ranking algorithm drammatically improves relevance. GUANG FENG AggregateRank: Bringing Order to Websites PageRank HostRank is a reproposition of PageRank which uses instead of the machine links the address of he machines in which the pages are hosted. According to this research we only need to rank websites. It is not a good ides to base the ranking to the pages. Structural re-ranking Given an initially retrieved list in response to a qury, re-rank is to obtain high precision at top ranks using similarity between the documents. NIE Topical link analysis Rottentomatoes does not answer the query about tomatoes. Traditional link analysis does not help because it the web site is famous for entertainment but not for food. The idea is to include some topic information in the ranking. SERGEI VASSILVITSKY Relevance Feedback in Web Search We search is a non-interactive system. Exceptions are spell checking and query suggestions. The idea is to pass the feedback from the user in a very soft way: using smilys. Hypothesis: relevant pages tent to point to other relevant pages. Irrelevant pages tend to be pointed by other irrelevant pages. The algorithm uses some existing information about the relevance of some existing pages compared to your initial query. From these initial pages, it computes the graph of the pages connected. If any of these are returned by the query, it adjust the ranking to match this knowledge. FERNANDO DIAZ Improving the Estimation of Relevance Models Using Large External Corpora The Mixture of Relevance Models is a technique to combine an external corpora to improve the ranking of a well target corpora. TAO TAO Regularized Estimation of Mixture Models for Robust Pseudo-Relevance Feedback Queries are usually very short. Pseudo Feedback is a way to expanding the query. Parameter sensitivity is a major challenge of pseudo feedback. How can we make the pseudo feedback more robust? Can automatically set the parameters? - -- REFERENCES: {as documents / sites are referenced add them below} ... -- NOTES ON / KEY TO THIS TEMPLATE: A headline (like a field in a database) will be CAPITALISED This differentiates from the text that follows A variable that you can change will be surrounded by _underscores_ Spaces in variables are also replaced with under_scores This allows people to select the whole variable with a simple double-click A tool-tip is lower case and surrounded by {curly brackets / parentheses} These supply helpful contextual information. --