Monthly Archive for August, 2008

Time as essence for photo browsing through personal digital libraries

A. Graham, H. Garcia-Molina, A. Paepcke, and T. Winograd. Time as essence for photo browsing through personal digital libraries. In JCDL ’02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pages 326–335, New York, NY, USA, 2002. ACM. [PDF]

———-

This paper decribes PhotoBrowser, a prototype system for personal digital pictures that organize the content using the timestamps of each photo. The authors’ main assumption is that text annotations are great because are accessible to a variety of search and processing algorithms. Many systems exploits this possibility requiring time-intensive manual annotations (e.g., FotoFile [Kang and Shneiderman, 2000] and PhotoFinder Kuchinsky et al., 1999]).

The authors’ proposal is that of exploiting the timestamps at which the pictures have been taken. They propose an organizational, and visual, clustering of the pictures which divide the photos using longer periods of inactivities between one picture and the next.

To verify their assumption, the authors tested the PhotoBrowser against other two applications. The first was a Hierarchical Browser, which organized the pictures at different levels corresponding to time at different granularity. The Scrollable Browser, organized the pictures into a single time-ordered list that could be scrolled.

The author asked 12 subjects to perform 6 retrieval tasks following different criteria. Their main result was that the browser that organized the pictures into temporal clusters enabled significantly faster completion times at an average of about 50 seconds.

Graham_PhotoBrowsing_completion-times.jpg

Multimodal and Mobile Personal Image Retrieval: A User Study

These last months, I have been collaborating to a research project on Multimodal Information Retrieval of digital pictures collected through camera phones. Recently, one of the papers resuming the results of the research was presented at the International Workshop of Mobile Information Retrieval, held in conjunction with SIGIR in Singapore. Here goes the abstract and the URL to download the paper.

X. Anguera, N. Oliver, and M. Cherubini. Multimodal and mobile personal image retrieval: A user study. In K. L. Chan, editor, Prooceeding of the International Workshop on Mobile Information Retrieval (MobIR’08), pages 17–23, Singapore, 20-24 July 2008. [PDF]

Mobile phones have become multimedia devices. Therefore it is not uncommon to observe users capturing photos and videos on their mobile phones. As the amount of digital multimedia content expands, it becomes increasingly difficult to find specific images in the device. In this paper, we present our experience with MAMI, a mobile phone prototype that allows users to annotate and search for digital photos on their camera phone via speech input. MAMI is implemented as a mobile application that runs in real-time on the phone. Users can add speech annotations at the time of capturing photos or at a later time. Additional metadata is also stored with the photos, such as location, user identification, date and time of capture and image-based features. Users can search for photos in their personal repository by means of speech without the need of connectivity to a server. In this paper, we focus on our findings from a user study aimed at comparing the efficacy of the search and the ease-of-use and desirability of the MAMI prototype when compared to the standard image browser available on mobile phones today.

Smartalbum: a multi-modal photo annotation system

T. Tan, J. Chen, P. Mulhem, and M. Kankanhalli. Smartalbum: a multi-modal photo annotation system. In MULTIMEDIA ’02: Proceedings of the tenth ACM international conference on Multimedia, pages 87–88, New York, NY, USA, 2002. ACM. [PDF]

——-

Applications supporting annotation of pictures with voice: SmartAlbum (Tan et al., 2002) that unifies two indexing approaches, namely content-based and speech, Chen et al. (2001) proposed the use of a structural speech syntax to annotate photographs in four different fields, namely event, location, people, and date/time. Show&Tell (Srihari et al., 1999), which uses speech annotations to index and retrieve both personal and medical images, and FotoFile (Kuchinsky et al., 1999), which extends annotation to a more general multimedia object.

This demonstration presents a novel application (called SmartAlbum) for photo indexing and retrieval that unifies two different image indexing approaches. The system uses two modalities to extract information about a digital photograph; i.e. content-based and speech annotation for image description. The result is a powerful image retrieval tool that has capabilities beyond what current single-mode retrieval systems can offer. We show on a corpus of 1200 images the interest of our approach.

Simplifying the management of large photo collections

A. Girgensohn, J. Adcock, M. Cooper, J. Foote, and L. Wilcox. Simplifying the management of large photo collections. In Proceedings of INTERACT’03, pages 196–203. IOS Press, 2003. [PDF]

——–

This paper contains useful references on the fact that event segmentation of pictures was observed to be a valid criterion for organizing pictures The paper reports an algorithm that was compared with those of Graham at al. (2002), Platt et al. (2002), and Loui and Savakis (2000).

With digital still cameras, users can easily collect thousands of photos. Our goal is to make organizing and browsing photos simple and quick, while retaining scalability to large collections. To that end, we created a photo management application concentrating on areas that improve the overall experience without neglecting the mundane components of such an application. Our application automatically divides photos into meaningful events such as birthdays or trips. Several user interaction mechanisms enhance the user experience when organizing photos. Our application combines a light table for showing thumbnails of the entire photo collection with a tree view that supports navigating, sorting, and filtering photos by categories such as dates, events, people, and locations. A calendar view visualizes photos over time and allows for the quick assignment of dates to scanned photos. We fine-tuned our application by using it with large personal photo collections provided by several users.

Modality preference – learning from users

R. Wasinger and A. Krüger. Modality preference – learning from users. In Proceedings of the User Experience Design for Pervasive Computing workshop, Munich, Germany, 11 May 2005. [PDF]

——-

This paper describes a qualitative comparison of input modalities for mobile applications. The author conducted a user study with about 50 users that were asked to fill questionnaires on a PDA. The questions targeted products that the users could find in a shop and therefore had to be answered while on the place. The users were divided in different groups and each group was exposed to different input modality: speech, handwriting, intra- and extra-gestures.

Interestingly, speech was reported as being the most comfortable modality to use in comparison with handwriting but also the modality that could mostly expose privacy. Users reported being uncomfortable speaking sensible information to their devices. Handwriting was see as conflicting with other manual activities that needed to be carried out while shopping.

Motivating annotation for personal digital photo libraries: Lowering barriers while raising incentives

J. Kustaniwitz and B. Shneiderman. Motivating annotation for personal digital photo libraries: Lowering barriers while raising incentives. Technical Report HCIL-2004-18, University of Mariland, College Park, MD, USA, 2005. [PDF]

——–

The main argument of this paper is that annotation is a tedious activity. Few users take the time to annotate carefully and sistematically their collection and therefore thousands of pictures lie unorganized in folders pretty much as printed pictures lie in shoeboxes. The author highlights how part of the problem is that users themselves do not understand what they can do differently once pictures are annotated.

Toward the end of raising motivation for tagging pictures, this paper describes several applications for visualizing collections of annotated pictures. For instance, Birthday Collage is an application that automatically pulls together pictures taken at birthdays for the same person.

The value of personal digital photo libraries grows immensely when users invest effort to annotate their photos. Frameworks for understanding annotation requirements could guide improved strategies that would motivate more users to invest the necessary effort. We propose one framework for annotation techniques along with the strengths and weaknesses of each one, and a second framework for target user groups and their motivations. Several applications are described that provide useful and information-rich representations, but which require good annotations, in the hope of providing incentives for high quality annotation. We describe how annotations make possible four novel presentations of personal photo collections: (1) Birthday Collage to show growth of a child over several years, (2) FamiliarFace to show family trees of photos, (3) Kaleidoscope to show photos of related people in an appealing tableau, and (4) TripPics to show photos from a sequential story such as a vacation trip.

Kustaniwitz_BirthdayCollage.jpg

Direct annotation: A drag-and-drop strategy for labeling photos

B. Schneiderman and H. Kang. Direct annotation: A drag-and-drop strategy for labeling photos. In IV ’00: Proceedings of the International Conference on Information Visualisation, pages 88–96, Washington, DC, USA, 2000. IEEE Computer Society. [PDF]

———–

The basic argument of the authors of this paper is that large pictures collections need annotation in order to support efficient retrieval. However, annotation is a tedious and time-consuming task. The authors present in this paper a basic HCI technique which allows the user to annotate without much typing. Keywords can be inputed once-for-all through a list and becomes available to be dragged and dropped over pictures.

The system has the advantage of supporting labeling of regions of the pictures (see figure below) but has the disadvantage of reduced scalability.

Annotating photos is such a time-consuming, tedious and error-prone data entry task that it discourages most owners of personal photo libraries. By allowing users to drag labels such as personal names from a scrolling list and drop them on a photo, we believe we can make the task faster, easier and more appealing. Since the names are entered in a database, searching for all photos of a friend or family member is dramatically simplified. We describe the user interface design and the database schema to support direct annotation, as implemented in our PhotoFinder prototype.

Shneiderman_Direct-annotation.png

The Genetic Map of Europe

Biologists have constructed a genetic map of Europe showing the degree of relatedness between its various populations. This genetic map contains many similarities to the geographical map. The major genetic differences are between populations of the north and south Europe.

Researchers say that the patters of genetic differences reflects the colonization history of Europe, which dates back 45,000 years ago. The map also identify the existence of two genetic isolated groups: the Finns and the Italians.

The former arose because the Finnish population was at one time very small and then expanded, bearing the atypical genetics of its few founders. The latter is between Italians (yellow, bottom center) and the rest. This may reflect the role of the Alps in impeding free flow of people between Italy and the rest of Europe.

[more]

genetic-map-of-europe.jpg

Google “Dream” Phone

This week the long-awaited Google Phone produced by HTC was approved by the American FCC commission for use in the United States. This means that commercialization might start as early as October this year.

I thought that was a good idea to write a short post about it because we reached a critical mass of demos and applications circulating the Net. Google philosophy is clear and it is opposite of that of Apple or Nokia: using open-sources solutions and leveraging on open source movements to create the critical ecology around their product.

They also pulled together a nice team of designers, who released a clever sets of features for the interface. This phone will integrate nicely Google’s product that we use and love, like gMaps, gMail and so on. However it will also provide new experience opportunities for the users. For instance, the HTC device should incorporate a compass that should allow an augmented reality experience of gMaps’ Street View. Turning around the device should adapt the view of the map so to that the virtual experience will closely resemble and adapt the virtual experience. Plus the application will overlay the names of the streets to the view, giving useful information to the user.

The gPhone is also full of clever UI features that are going to make the user experience both richer and simpler. For instance, access to notifications was aggregated in a single part of the interface: a pull-down tile on the menu bar.

In sum, I am looking forward to put my hands on the first commercial model and make a full review of the system. For now, I celebrate to what resembles to be clever design decisions.

android-home-screen.jpg

people tend to stick to their own size

One reason Helena and I would never be close friends is that I am about half tall as she. People tend to stick to their own size group because it’s easier on the neck. Unless they are romantically involved, in which case the size difference is sexy. It means: I am willing to go the distance for you.

Miranda July. No one belongs here more than you. Canongate, Edinburgh, UK, 1997.

[web site]

Photo annotation on a camera phone

A. Wilhelm, Y. Takhteyev, R. Sarvas, N. V. House, and M. Davis. Photo annotation on a camera phone. In CHI ’04: CHI ’04 extended abstracts on Human factors in computing systems, pages 1403–1406, New York, NY, USA, 2004. ACM. [PDF]

——

This paper argues for the use of the multimedia capabilities of phones for helping indexing and retrieval of pictures. They built a system capable of incorporating metadata, like the user’s position, and keywords defined by the user to the picture. This information was stored to a remote server and then used during retrieval.

To evaluate their design they conducted a user study where they deployed the system to a group of 55 participants. They found that one fo the major problem was the unpredictability of the network connection that made interaction with the remote server lenghty. From the study they also learned that participants used the camera of their phone extensively but the value that they attributed to many pictures was very little. Instead, they users took advantage of the networking capabilities of the phone to share events of their life with their friends. The capture of these moments would have not been possible without the continuous availability of the mobile phone.

Interviewed participants expressed more interest for sharing and browsing the captured images than retrieving specific pictures. They were generally not interested in fully annotating pictures using keywords.

The ubiquitous camera: An in-depth study of camera phone use

T. Kindberg, M. Spasojevic, R. Fleck, and A. Sellen. The ubiquitous camera: An in-depth study of camera phone use. IEEE Pervasive Computing, 4(2):42–50, 2005. [PDF]

——-

This paper describes a study of how people use camera phones. The authors interviewed 34 subjects trying to understand what they photographed and why. They developed a number of categories of subjects of pictures, dividing them into people and scenes + things. The authors found that people took pictures with their mobile for affective reasons, to enrich a mutual experience by sharing an image with those whowere present at the time of capture. Also, they took pictures to make an absent friend or family member aware of an event.

Additionally the authors found that many pictures were taken to support a personal reflection: supporting a mutual task with colocated friends, a remote task for a friend, and a personal task.

Two thirds of the images examined were captured to share, mainly for affective reasons, and the sharing mainly took place when people were face-to-face.

The authors highlight as the key value of a camera phone is the ability to spontaneously show images. They suggest that finding and browsing images wbhould be simple and possible. The portability and pervasiveness of capture of this content has a tremendous value for the interconnection with digital and public displays, and for absent people.

The interviewed participants reacted positively to the idea of enriching this content with contextual information. Finally, they found that image browsing of the subjects’ phones when many pictures were present was ineffective for image retrieval.

Kindberg_Camera-Phone.jpg

Ht06, tagging paper, taxonomy, flickr, academic article, to read

C. Marlow, M. Naaman, D. Boyd, and M. Davis. Ht06, tagging paper, taxonomy, flickr, academic article, to read. In HYPERTEXT ’06: Proceedings of the seventeenth conference on Hypertext and hypermedia, pages 31–40, New York, NY, USA, 2006. ACM. [PDF]

———

This seminal paper presents a taxonomy of tagging systems based on incentives and contribution models to inform potential evaluative frameworks. First, the authors explain why tagging systems are used and the different strands of research around this area, namely the social bookmarking, the social tagging systems, and folksonomies. They explain how these systems are affected by the vocabulary problem, where different users use different terms to describe the same thing.

The authors model a prototypical tagging system as having three components, a number of resourcer, tags associated to these resources and users who generated the tags. Some systems can have links among the resources and/or links among the users.

The taxonomy they define contains seven categories: tagging rights, or the system’s restriction on group tagging; tagging support, or the mechanism of tag entry being either blind (the user cannot view the tags assigned by others to the same resource), viewable or suggestive; the form of aggregation of tags around a given resource, namely collecting repeated tags generated by different users, or bag-model, or aggregating the most frequent tags, or set-model; another difference is the type of object being tagged; related to this last point is the source of the material, the resources can be supplied by the participants, by the system, or being pubicly available; additionally, the resources can be linked, grouped or separated; finally, the users can be linked, grouped or kept separated.

Concerning the user incentives, tagging is performed for different reasons: for future retrieval; for contribution and sharing; to attract attention; to play or compete; for self presentation; and finally for expressing opinions.

Finally the paper presents an interesting analysis of a Flickr dataset. The authors studied the growth of distinct tags for 10 randomly selected users over the course of uploaded photo. They found a number of different behaviors. In some cases new tags are added consistently as photo are uploaded. Sometimes, only few tags are used at the beginning and more later on. For many users distinct tag growth declines over time. Finaly, they found vocabulary inside a community of users to be more overlapping than the vocabulary of a random sample of unconnected users.

Marlow_distinct-tags.jpg

Interview viz: visualization-assisted photo elicitation

N. A. V. House. Interview viz: visualization-assisted photo elicitation. In CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pages 1463–1468, New York, NY, USA, 2006. ACM. [PDF]

———-

This paper describes a Visualization-Assiste photo elicitation techniques. The authors’ main argument is that photo elicitation interviews can be helped with visualization because people tend to forget the context in which a picture was taken.

They developed MMM2, a multimedia system that help organizing a collection of pictures and produces different visualization, organized by time, or by sharing recepient or source.

Using the visualizations, interviewee could better recall the contextual information at the time the picture was taken. Also the evidence sometimes refreshed or contracticted the participants’ memories. Third, patters of use, of which the participants were unaware, became apparent, revealing important events, or lacunae of activity.

vanhouse_viz-photo-elicitation.jpg

Why we tag: motivations for annotation in mobile and online media

M. Ames and M. Naaman. Why we tag: motivations for annotation in mobile and online media. In CHI ’07: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 971–980, New York, NY, USA, 2007. ACM. [PDF]

——-

This paper present a throughout study of people using a mobile application for tagging pictures. The author conducted photo elicitation interviews of Flickr users that were active in uploading pictures from their mobile using ZoneTag, an ad-hoc application for mobile picture tagging and photo sharing.

The authors defined eight intertwined reasons of why people use tags. These could be described by the different function, being either organization or communication, or sociality, directed to self or to others.

Interestingly, the author found that while adding context is the traditional motivation for annotating printed photographs, this was not the primary concern of interviewed participants: ”while it is likely that tags will provide this function as a currently-unanticipated benefit in the future, adding context to facilitate remembering details about photographs was not the primary motivation for tagging in the present.

As the picture taken by ZoneTag were automatically uploaded to Flickr, a community portal for photo sharing, other interesting behavior emerge. For instance, some participant reported coordinating tags with other to form a distributed photo ”pool”.

The authors conclude the article by suggesting a number of implications for design. In particular they suggested that the annotation functionality should be pervasive and multi-functional covering the different aspect of their suggested taxonomy. Also, they suggested to have flexibility in the annotation mecanism, leaving the tagging feature as an option and not mandatory. Finally, they suggested to have both mobile and desktop components to leverage the fast entry and the processing power of the desktop application.

200808120944.jpg