Monthly Archive for April, 2006

Sequence matrices of the visual information retrieval experiment

I completed the script that performs the sequences extraction from the patterns of usage in the visual information retrieval experiment. The first matrix is the cumulative representation of the LSI method across the different tasks. The second matrix is, on the contrary, CNG. To understand the meaning of the cells the following key is needed: the columns are the events at time T+1, while the rows at time T. The first element is the case A, then B then C and D. These states are so summarised:

A: The user read a document and then select a document in the relative neighborhood of the first document;

B: The user read a document and then moves away from the cluster of the first document;

C: The user selects and stays in the cluster;

D: The user selects and moves away from the cluster.

[19, 132, 0 , 0]

[83, 588, 10, 90]

[4 , 49 , 0 , 0]

[14, 104, 0 , 2]

[5 , 88 , 0 , 1]

[48, 537, 17, 98]

[2 , 49 , 0 , 0]

[6 , 109, 1 , 2]

Next step is to verify whether these frequencies are different from the expected. On a first sight it will seem that the most frequent move is the ‘read + long jump’. Now we need to check whether this difference is significative.

P.S. To verify the move within the cluster we used the sequence determined by the Minimal Spanning Tree, which is extremely selective in defining the “in-cluster” movements.

More sense from audit trials: eploratory sequential data analysis

T. Judd and G. E. Kennedy. More sense from audit trials: eploratory sequential data analysis. In R. Atkinsons, C. McBeath, D. Jonas-Dwyer, and R. Phillips, editors, Beyond the comfort zone: Proceedings of the 21st ASCILITE Conference, pages 476–484, Perth, Australia, December, 5-8 2004. [pdf]

——————

This paper introduces the use of exploratory sequential data analysis (ESDA) to detect, quantify and correlate patterns within audit trail data. We describe four sequence analysis techniques and use them to analyse data from 34 students’ attempts at an interactive drag and drop task. Using a model sequence of events based on the task’s underlying educational design as reference, we employed these techniques to: (i) calculate an ‘average’ sequence of events based on individual user sequences, (ii) characterise individual sequences in terms of their similarity to the design model, (iii) identify common partial sequences within individual sequences, and (iv).

characterise transitions between two disparate actions within the task. We then used the results of these analyses to explore why most students failed to complete all components of the task.

We suggest that it was not because the task was too long or that it lacked challenge but that students intentionally and selectively ignored certain non-key steps in the task. It is our contention that ESDA techniques, in conjunction with judiciously collected audit trail data, represent a powerful and compelling tool for educational designers and researchers.

State Chart

Tags: , , ,

Found some effects in the information retrieval expreriment

Today I spent some more time exploring the Visual Information Retrieval experiment. Apparently, the users scored better using CNG for the second task (see the picture below). This task was much more complex than the first one because of the smaller number of correct results in the dataset (5 out of 1000). Surprisingly I found also another effect, the time required to select the relevant results. It seems that the users spent less time selecting the relevant results in the first task using CNG.

Cng Task2

Discussing this results with my advisor, we agreed that the results I have are interesting but not sufficient for explaining the effects and to decide on which method gives better results.

Some other explorations are needed: I have to use the Wilcoxon test for repeated measure for binding the results of the first task with that of the second. Secondarily it will be interesting to compute the average distance between consequent read items for each algorithm. This might be a nice parameter to represent the ability of the algorithm to group together relevant or irrelevant results.

Google Maps now covers all europe: Yipeeh!

Finally the technological choice I operated some time ago now start to make fully sense. Now it is possible to use STAMPS everywhere in Europe with a decent degree of details. In the picture below, Neuchatel, in Switzerland. So, please join the first trial: shoutspace [at] gmail.com !

Google Maps Eu

Copyright notice: 2006 Googel-Map data; 2006 TeleAtlas.

Tags:

Further analysis of the Visual Information Retrieval experiment

Ok, so far the analysis on the map interaction shows that there is no huge difference in the way the users interact with the map. LSI seems to spread the results all over the available space. On the contrary, CNG returns rankings which are much wider than LSI, with the resulting effect of pushing the points of the map on the top part of the quadrant.

The interesting fact which emerges is that CNG offers the relevant points much closer to the query origin, in the central lower part of the map. The subsequent analysis of the items selected shows exactly this point: while LSI mixes the most relevant items in the jungle of the other results, CNG keeps them much closer to the query origin in a space that is less populated by other results (see the map below).

Lsivscng Map Selected

So, where to go next? Two ideas so far: (a) Mapping the final selected points to see whether they are more dispersed with CNG or LSI; (b) Trace the interaction trails to assess if there are common patterns of exploration of the points in the map.

Apart from this extra analysis on the map then I am more concerned with the numerical results from the experiment. Our goal is always to check whether there is a difference between CNG and LSI in the user experience. Is there any outscore of one of the two methods concerning: (1) relevant results selected; (2) time needed to select the relevant results; (3) user perception of appropriateness; (4) number of query composed; (5) average time spent for each query; … 

Tags: , , , , , ,

heatmap of the visual information retrieval experiment – part 3

Finally I corrected a couple of bugs in the script that I was using for the visualization. Then I spent a considerable amount of time to try to overlay the obtained heatmap to the original results map I obtained before. PIL is a great resource in this sense. I found a couple of hacks that were pointing in that direction [1], [2]. Unfortunately, they did not work for my case.

Finally I decided to give a try to the embedded function called blend. It worked well for my situation.

So what I am doing in the script is to:

1. divide the map into a certain number of cells;

2. count the frequency with which the items pile in each cell;

3. export the obtained matrix in csv (comma separated values);

4. open this file with R via RPy in the script;

5. process this file so to obtain the wanted heatmap;

6. save the heatmap in an external file;

7. process the heatmap with PIL so to make it right for the other images;

8. blend the map over the other existing images.

The part of the code that do the “interesting” part of this is reported below. The results is showed in the pictures below.

Lsivscng Map Freq

Tags: , ,


Continue reading ‘heatmap of the visual information retrieval experiment – part 3′

Supporting Language Learning Communities Using Mobile Blogs

S. A. Petersen, G. Chabert, and M. Divitini. Supporting language learning communities using mobile blogs. In Proceedings of IADIS Mobile Learning’06, Dublin, Ireland, 14-16 July 2006.

———–


Communities are important for language learners to learn and practice the language. A classroom provides a sense of community for a group of students learning a language. We aim to extend the learning arena outside of the classroom and maintain the sense of community while the students are mobile. We propose the use of a mobile community blog to encourage collaboration among the students and to bridge the disconnection that is caused when some of the students travel abroad to improve their language. We discuss considerations that have been taking into account in adapting standard blog functionalities to support a community of mobile language learners.

Tags: , ,

heatmap of visual information retrieval experiment

After a couple of hacks to make my python installation work with R, I finally managed to export an heatmap of the cereals task. The map is upside down, but shows in essence that the most populated part of the map is the upper-central.

The plan is to do the same for the number of items selected in relation to the total in the spot. This might show a different trend of the usage. Maybe parts of the map that have been “explored” less were the most important in terms of the number of correct items retrieved in that spot. We will see…

Useful links:

[1] A nice tutorial on how to draw an heatmap with Python and R;

[2] Draw a Heat Map, the R manual page.

Heatmap Cc

Tags: , , ,

STAMPS in the press

The journal “La Tribune de Genève” published today an article on STAMPS and the field trial we are organising. The article spans on the theme of location based services and why is so interesting to know where we are. The journalist, Emmanuel Grandjean, reports a few quotes of mine that I made during the interview. One of the main claims I raised is that communication is more and more fluid and LBSs add a new dimension to this flexibility.

Also, that communication tend to be maximally efficient. So, knowing where our partner is located can already prevent lots of different inferences on what is possible to communicate and how with our partner being in that location, and with us being in our location.

Stamps Tribune De Geneve-1

Tags: , , , ,

building an heatmap of the map usage

Back from Easter holidays I discussed with Patrick this cumulative map representation. The goal is to make visible the part of the map that attracted more of the user’s attention or time. For this reason I was trying to count the number of items that fall into a sub-area of the map; how many of these items were read by the users and how many selected by the users.

One of the possible visualization strategy that we discussed is the idea of using ‘heatmaps‘ a kind of bi-dimensional gradient that can highlight hot spots on the map (they are particularly used in cognitive science to represent eye tracking).

Another idea that we discussed briefly is that of showing the sequence of action on the map. In this way will be eventually possible to detect if there are common pattern of usage among the users (i.e., how many moved from square 1 to square 2?, and so on).

For the moment I am stuck with a minor visualization bug …

Lsivscng Map

Tags: , , , , ,

Cumulative density map of the Virtual Retrieval experiment

I have been pocking around to generate a visualization that can help understand how the users interacted with the map in our virtual retrieval experiment. It was fun to discover that there were some zones more used than others in the map, namely the central upper quadrant and the central left and right quadrants. The image below shows the density of visualization of results.

The map has been subdivided into 9 quadrants. In each quadrant is possible to see three big numbers. The upper one is the number of results that fall into the sector (even if you see few points there might be super impositions between similar queries). The lower-left big number in blue is the number of the items that have been read by the users. The red is the number of items that have been selected by the users in that quadrant.

Map Lsi Cer

Tags: , , , , , , , ,

Visual Retrieval Experiment

Dear Reader,

for the development of my thesis’ work I prepared, together with Lorenzo Viscanti, an experiment on Visual Information Retrieval. I ask your participation in this experiment that will last not more that 15/20 minutes.

If you agree, you will need to visit the following web page: http://www.noosfactory.com/visual%5Fir/ . There you will find some more information on the assignment: you will be presented two tasks, each of which will last 5 minutes. Each task will require you to run some queries on a collection of news articles compiled by Reuter during the ’80s. Finally, you will be posed 4 brief questions.

Please note that there will be an extra 5 minutes to download the application that you will use to participate in the task. This time may vary depending on your internet connection speed.

Visual Ir

Tags: , , , ,

La redaction Lausannoise de 24 Heures

Redaction 24H Lausanne

Minimum Spanning Tree of Urban Tapestries messages

Recently I managed to calculate the Minimum Spanning Tree (MST) of the Urban Tapestries dataset that was collected during the trial of 2004. From Wikipedia:

Given a connected, undirected graph, a spanning tree of that graph is a subgraph which is a tree and connects all the vertices together. A single graph can have many different spanning trees. We can also assign a weight to each edge, which is a number representing how unfavorable it is, and use this to assign a weight to a spanning tree by computing the sum of the weights of the edges in that spanning tree. A minimum spanning tree or minimum weight spanning tree is then a spanning tree with weight less than or equal to the weight of every other spanning tree. More generally, any undirected graph has a minimum spanning forest.

300Px-Minimum Spanning Tree.Svg

Copyright notice: the present content was taken from the following URL, the copyrights are reserved by the respective author/s.

For the Urban Tapestries dataset, I had to adapt a version of Kruskal’s algorithm for minimum spanning trees that was greatly implemented by Prof. David Eppstein (thanks!). So the final result is visible below. Now the next step is to calculate a MST over the semantic distribution of the points and then find a way to compare these two trees to measure the distortion of the two layers.

Ut Mst V1 Small

Copyright notice: Google 2005, The GeoInformation Group 2006, the copyrights are reserved by the respective author/s.

Tags: , , , , , , ,

Using Visualizations to Analyze Workspace Activity and Discern Software Project Evolution

R. M. Ripley, A. Sarma, and A. van der Hoek. Using visualizations to analyze workspace activity and discern software project evolution. ISR Technical Report UCI-ISR-06-1, University of California, Irvine, California, USA, 2006. [pdf]

———————–

In this paper, the authors presented a prototype of a 3D visualization for workspace activity. This visualization shows only active workspaces and artifacts and depicts activities as changes to artifacts. Each change to an artifact is denoted as a cylinder and stacks of such cylinders represent activities in a particular workspace (developer-centric) or activities carried on specific artifact (artifact-centric). They applied the visualization to several open-source projects and demonstrated pro ject evolution and other interesting situations.

The visualization has two primary modes: developer-centric and artifact-centric, which we will discuss shortly. Common to both modes, the visualization shows only active artifacts and workspaces, thereby eliminating the clutter of inactive entities. Stacks of cylinders represent workspace activities

(changes to artifacts), with each cylinder corresponding to a particular artifact in a particular workspace with the dimensions representing the size of the change (the bigger the change, the larger the cylinder).

In the developer-centric view (see the figure, inset), a stack of cylinders represents a developer’s workspace with each cylinder representing an artifact being changed in that workspace. Workspaces with many activities correspond to tall stacks of cylinders. The stacks of cylinders with the most recent changes are placed in the front of the view and, as time elapses, stacks for workspaces with “older” activity slowly start moving to the back, representing dormancy. Thereby a user is able to quickly discern the loci of activities from the height of the stacks and recency of these activities from their position. The artifact-centric view (see the figure, bottom) behaves similarly to the developer view, but instead each stack of cylinders represents a particular artifact and each cylinder in the stack represents changes to that artifact made by a particular workspace.

Palantir3D

Copyright notice: the present content was taken from the following URL, the copyrights are reserved by the respective author/s.

Tags: , ,