K. Rodden and K. R. Wood. How do people manage their digital photographs? In Proceedings of CHI 2003, Fort Lauredale, Florida, USA, April 5-10 2003. [PDF] [PDF2]
———–
This paper describes a longitudinal research on how people manage their collections of digital photographs. The authors asked 13 subjects to used a prototipe software, named Shoebox, during 6 months to catalogue their pictures. The main features of the software was that of enabling text and voice tagging of the pictures. They found that after 6 months, these features were not used because participants relied efficiently on their memory and on the temporal sequence of the pictures to retrieve them.
The paper contains also an interesting argument in that text -based queries can be still reasonable effective even if spoken material is inaccurately transcribed (Brown et al., 1996).
The paper contains interesting comparison between digital photos and printed photos. Interestingly the study wa conducted in 2003, when digital pictures were still relatively new. The paper reports qualitative findings of how people used digital collections. Interestingly, participants adopted Shoebox archival feature, which organized pictures into Rolls and by their timestamps. However, participants did not increase the number of annotations they made on their individual pictures. They felt this feature was uninteresting because it was not helping them to increase their retrieval efficiency.
The availability of text-based indexing and reitrieval did not provide their participant extra motivation to invest the effort in annotating their pictures.
T. J. Hazen, B. Sherry, and M. Adler. Speech-based annotation and retrieval of digital photographs. In Proceedings of INTERSPEECH 2007, the 8th Annual Conference of the International Speech Communication Association, pages 2165–2168, Antwerp, Belgium, August 27-31 2007. [PDF]
———
This paper presents an application for supporting pictures retrieval on mobile phones using voice annotations. The authors’ basic assumption is that speech is more efficient than text for operating a mobile device and in general more efficient for conveying complex properties.
The core of the application they proposed is a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars.
The paper present the evaluation of the application with a combination of a field deployment and a lab study where participants were asked to retrieve a set of pictures which were captured by themselves or by other participants. The retrieval was measures as the number of succesfull attempts to retrieve with the first query and whithin 5 queries.
Results indicated that users’ knowledge of the subject matter of the photographs was not playing a role in the retrieval process.
B. L. Chalfonte, R. S. Fish, and R. E. Kraut. Expressive richness: a comparison of speech and text as media for revision. In CHI ’91: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 21–26, New York, NY, USA, 1991. ACM. [PDF]
——–
This paper presents an experimental comparison of two modalities sharing annotations on a shared document. The authors designed this experiment under the assumption given by previous research that richer, more informal, and more interactive media should be better suited for ahndling collaborative tasks.
The authors designed an experiment for comparing subject reviewing a paper using text and speech annotations. They found that participant were more likely to make local annotations in the written modality (spelling and grammar changes) and more likely to make global annotations in the speech modality (structure, missing a fundamental point, project status, etc.).
They defined a nice metric: the index of self-correction, the number of times an annotation was corrected before the final submission. They found that speech annotations were more likely to be corrected than written annotations. They also found that speech allowed for non-verbal communication that made communication richer.
From the analysis it appeared that speech was superior because it was more expressive, and because it placed fewer cognitive demands on a communicator. However, this study did not focus in revealing the advantages or trade-offs of annotations in different modalities.
A. G. Hauptmann and A. I. Rudnicky. A comparison of speech versus typed input. In Proceedings of the Third DARPA Speech and Natural Language Workshop, pages 219–224, San Matteo, 1990. Morgan Kaufmann. [pdf]
——-
This paper describes a controlled experiment in which the author compared two input modalities: text and speech. The author designed a task where the subjects had to input a number of numeric strings to the computer. They either used their voice, the keyboard or a combination of the two. They used a number of custom made metrics, like the transaction error rate (the number of transactions that were absolutely necessary divided by the total number of transactions); aggregate cycle time (the total time a subject needed to enter a number correctly).
The utterance accuracy results showed that speech requires many more interactions to complete the task than typing. According to the metrics defined typing reported better results in comparison to speech. The author discussed several reasons of why it was the case.
The author showed how speech compares with typing for the entry of digit string tasks. However, the authors cautioned how real world task, requiring more keystroke per syllable, would demonstrate the effectiveness of speech much better.
The authors concluded that depending on the task, speech can have a tremendous advantages for casual users. The more a task requires visual moniotoring of input the more preferable speech will become as an input medium. For skilled typist these relation might be reversed.
Recent Comments