SEARCHING FOR MULTIMEDIA: AN ANALYSIS OF AUDIO, VIDEO, AND IMAGE WEB QUERIES
Bernard
J. Jansen
Computer Science Program
University of Maryland (Asian Division)
Seoul, 140-022 Korea
E-mail: jjansen@acm.org
Abby
Goodrum
College of Information Science and Technology
Drexel University
3141 Chestnut St.
Philadelphia PA 19104
E-mail: goodruaa@drexel.edu
Amanda
Spink
School of Information Sciences and Technology
The Pennsylvania State University
University Park PA 16801
E-mail: spink@ist.psu.edu
Please Cite: Jansen, B. J., Goodrum, A., and Spink. A. 2000. Searching for multimedia: video, audio, and image Web queries. World Wide Web Journal, 3(4), 249 - 254. [http://manta.cs.vt.edu/www/vol3no4Contents.html].
ABSTRACT
The development of digital libraries has led to the integration of textual and multimedia information in many document collections. The World Wide Web provides the necessary connectivity for many users of these digital libraries. Studies exploring the searching characteristics of Web users are an important and growing area of research. Most Web user studies have focused on Web searching in general, regardless of subject matter or format. Little research has examined how Web users search specifically for multimedia information. This study examines users' multimedia searching on a major Web information retrieval system. The data set examined consisted of 1,025,908 queries from 211,058 users of Excite ®, a major Web search engine. From this data set, terms were used to identify queries for audio, image, and video queries. The queries were isolated and examined at various levels of analysis. Our findings were compared to data from previous, more general, Web searching studies. Implications for the design of Web information retrieval systems and interfaces are discussed.
INTRODUCTION
The World Wide Web (Web) is an immense repository of multimedia information (Angelides & Dustdar, 1997; Lesk, 1997). Multimedia information may include combinations of text, image, video, film or audio artifacts. Many museums as repositories of multimedia information are going online (Takahashi, Kushida, Hong, Sugita, Kurita, Rieger, Martin, Gay, Reeve & Loverance. 1998). One can now visit world famous art galleries via the Web, such as Monet’s work at (http://sunsite.unc.edu/wm/paint/auth/monet/first/). As of 22 August 1999, Alta Vista (http://www.altavista.digital.com) had indexed approximately 9,983,032 images on the Web (Jansen, 1999). Lawrence and Giles (1999) estimated there are 180 million images on the publicly indexed Web and 3Tb of image data, not including other types of multimedia files, such as audio and video. The hypertext transfer protocol (HTTP) lends itself to the easy transfer of audio, video, and image formats integrated with textual information.
In general, Web users must search for multimedia information as they would search for textual information (Schauble, 1997). The simplest image search algorithm used by information retrieval (IR) systems locates multimedia files by searching for file extensions and matching the filename to terms in the query (Witten, Moffat & Bell, 1994). Some Web IR systems may retrieve on-line documents that are primary textual but with embedded multimedia files. The multimedia filename may not match the query terms, but the Wed document may contain text that does.
Many Web IR systems provide no special mechanism for multimedia searching. Excite (http://www.excite.com) and Yahoo (http://www.yahoo.com) are two such Web IR systems. The advantage of this approach is that multimedia searching is performed in an identical manner to text searching. No additional burden is placed on the searcher. If the searcher desires a multimedia document, the searcher enters the query and specifies some multimedia attribute. For example, a user searching for recordings of Jimmy Buffet songs could enter "Jimmy Buffet songs" or "audio of Jimmy Buffet songs." This query might very well retrieve lyric sheets of Jimmy Buffet's songs, rather than the actual audio files. The searcher could also use audio file extensions, such as avi or wav. The same procedures would be utilized for video or image retrieval, using appropriate terms and file extensions for each media. The disadvantage of this approach is that it places more contextual knowledge burden on the searchers, who may not be familiar with multimedia formats. Cognitive load is further challenged by necessitating that users translate a non-semantic information need into a textual query. This creates what some authors refer to as a lack of representational congruity (Goodrum in review), or as a semantic gap (Gudivada & Raghavan, 1995). This problem is usually exacerbated by the presentation of retrieved items as text-only entries in a list rather than as thumbnail images, sound bites, or video keyframes.
Some Web IR systems provide mechanisms for users searching for multimedia, e.g., by radio boxes or media specific search syntax. Alta Vista (http://www.altavista.com) searchers can narrow a query to specifically search for an image. Lycos (http://www.lycos.com/) searchers can search for pictures and audio files in MP3 format only. HotBot (http://www.hotbot.com) provides searching for image, video, and the MP3 audio format. Some Web IR systems specialize in multimedia collections. Webseek (http://www.ctr.columbia.edu/webseek/) allows users to search by term or select from general categories of images and video. Both Webseek and Alta Vista returns thumbnail images and file names in the document result list. Webseek also provides tools for content-based searching for images and videos using color histograms generated from the visual scenes.
The next section of the paper discusses related research to our study.
RELATED STUDIES
There is a growing body of research analyzing users' general Web searching characteristics, with fewer studies specifically examining queries by users seeking multimedia information. Jansen and Pooch (1999) provide an in-depth review of Web user searching studies in general (i.e., without regard to textual or multimedia). Spink, Bateman, Jansen (1999) present research concerning the intent of Web searchers on a Web IR system.
Multimedia searching research has typically focused on the retrieval of images utilizing indexed image collections (Enser, 1995; Goodrum & Kim, 1998; Hastings, 1995; O'Connor, O'Connor & Abbas, 1999; Turner, 1990). Some image research has focused on the design of multimedia IR systems (Aslandogan, Thier, Yu, Zou & Rishe, 1997). Other researchers have investigated audio and video retrieval (Brown, Foote, Jones, Spärck Jones &Young, 1996). Smith, Ruocco and Jansen (1998) provided analysis on the demand for seeking video when designing a multimedia classroom.
Goodrum and Spink (1999) specifically analyzed users' image queries, terms and sessions using the same data set used for our study. In Goodrum and Spink (1999), twenty-eight (28) terms were used to identify queries for both still and moving images, resulting in a subset of 33,149 image queries by 9,855 users. They provided data on: (1) image queries -- the number of search terms, and the use of visual modifiers, (2) image search sessions -- the number of queries per user, modifications made to subsequent queries in a session, and (3) image terms -- their rank/frequency distribution and the most highly used search terms. They found a mean of 2.64 image queries per user containing a mean of 3.74 terms per query. Image queries contained a large number of unique terms. The most frequently occurring image related terms appeared less than 10 percent of the time, with most terms occurring only once. This analysis contrasted with earlier work by Enser (1995), who examined written queries for pictorial information in a non-digital environment.
In this research, we focus on a large set of Web multimedia queries from Excite, including image, audio and video queries. We sought to investigate the searching characteristics of Web users as they search for multimedia information with implications for Web IR system design. The design of this study generally adheres to the format and definitions for Web studies outlined by Jansen and Pooch (1999). This analysis is part of a larger ongoing study of Web searching behavior by Jansen, Spink and Saracevic (1998,1999a, in press) utilizing transaction logs of searches conducted by Excite users.
The next section of the paper discusses the research questions addressed by this study.
RESEARCH QUESTIONS
This study addresses the following research questions.
The next section of the paper describes the research design used in our study.
RESEARCH DESIGN
Excite Data Set
Founded in 1994, Excite, Inc. is a major Internet media public company that offers free Web searching and a variety of other services. The company and its services are described in more detail at its Web site (http://www.excite.com). Excite searches are based on the exact terms that a user enters in the query; however, capitalization is disregarded, with the exception of logical commands AND, OR, and AND NOT. There is no stemming. An online thesaurus and concept linking method called Intelligent Concept Extraction (ICE) is used to find related terms in addition to terms entered. Some of the advanced search features are:
For a complete explanation of Excite’s searching capabilities see (http://www.excite.com).
The transaction log data set consisted of 1,025,908 records. Each action record contained three fields, which were:
Our analysis focused on the user’s sessions, queries, and terms. Basically, a session is the entire sequence of queries by a particular user. A query is the one or more terms entered into the Web IR system. A term is any string of characters bounded by white space.
The next section of the paper discusses the data analysis techniques used in our study.
Data Analysis
The data set was loaded into a database management application. Queries that contained multimedia terms were developed in this application. Specifically, the queries and the number terms utilized in the queries were:
Figure 1 shows the specific terms used in each query. The queries were case insensitive.
Figure 1: List of terms used to identify queries.
Audio Terms |
Image Terms |
Video Terms |
au |
art ' |
.avi |
.au |
bitmap |
.mjpeg |
audio |
bmp |
.mov |
av |
.bitmap |
.mov8 |
.av |
.bmp |
.mpeg |
band |
camera |
.mpg |
cd |
cartoon |
animated |
concerts |
gallery |
clip |
lyrics |
gif |
clips |
mpz |
.gif |
drivers |
multimedia ' |
image |
mjpeg |
music |
images |
mov |
noise |
jpeg |
movie |
song |
jpg |
movies |
songs |
pcx |
mpeg |
sonic |
.jpeg |
mpg |
sonics |
.jpg |
plugins |
sound |
.pcx |
quicktime |
sound card |
photo |
video |
sound cards |
photographs |
viewers |
soundblaster |
photograph |
avi |
sounds |
photos |
|
soundwave |
pic |
|
speakers |
pics |
|
track |
.pic |
|
vocals |
.pics |
|
wav |
picture |
|
.wav |
pictures |
|
png |
||
.png |
||
tif |
||
tiff |
||
.tif |
||
.tiff |
These queries were executed against the database of 1,025,908 Web queries. If a user session contained a query that did not use any of these terms, that query would not appear in the analysis. Since it is difficult to determine an user's information need based on a single term, the result lists were reviewed, and the queries that were obviously not multimedia related were removed. When in doubt, the query was not removed from the result lists. We feel confident that majority of the queries in this analysis relate to multimedia searching.
Generally, the queries were not altered in anyway. Research by Jansen, Spink and Saracevic (in press) shows that the cleaning of the query terms (i.e., removing non-alphanumeric characters such as +, - , :, etc.) results in minor changes to the overall results. We did remove leading and trailing + and " characters in the term analysis. Also, as discussed by Jansen and Pooch (1999), concerning Web transaction logs, we are making an assumption in this analysis that the user identification field denotes a searcher, while technically it denotes a computer. This impacts the analysis, especially on lengthy sessions. These sessions may indicate that the machine is a common use computer.
In the next section of the paper, we present results in separate sections of audio, video, and images data analysis followed by a more in-depth comparison between multimedia and general Web searching characteristics.
RESULTS
Table 1 provides an overview of the results of the data set analysis.
Table 1: Comparison of statistics from the three multimedia categories.
Audio Queries |
Video Queries |
Image Queries |
||||
Number |
% |
Number |
% |
Number |
% |
|
Total |
3810 |
0.37% |
7630 |
0.74% |
27144 |
2.65% |
Queries/ User |
Terms/ Query |
Queries/ User |
Terms/ Query |
Queries/User |
Terms/ Query |
|
Median |
2 |
4 |
2 |
3 |
2 |
3 |
Mean |
2.44 |
4.11 |
2.91 |
3.32 |
3.27 |
3.46 |
Std Dev |
2.95 |
2.67 |
3.85 |
1.96 |
5.49 |
2.2 |
Max |
51 |
37 |
70 |
44 |
267 |
33 |
Min |
1 |
1 |
1 |
1 |
1 |
1 |
Table 1 presents the median, mean, standard deviation, maximum, and minimum for session length and queries length in each of the three-multimedia categories. The findings are discuss below.
Audio Queries
Findings related to audio queries were:
Video Queries
Findings related to video queries were:
Image Queries
Findings related to image queries were:
However, when Excite users were searching for multimedia, they were more likely to search for images than audio or video. Audio queries were the smallest proportion of multimedia queries, but they were slightly longer than video or image queries.
The next section of the paper examines the terms most frequently used to find multimedia information on the Web.
Term Analysis
Table 2 lists the top ten terms used for multimedia searching.
Table 2: Top 10 multimedia terms in each category.
Audio |
Video |
Image |
|||||||
Rank |
Term |
Number |
% |
Term |
Number |
% |
Term |
Number |
% |
1 |
music |
1365 |
8.72 |
movies |
1707 |
6.96 |
pictures |
10571 |
11.26 |
2 |
sound |
485 |
3.10 |
video |
1696 |
6.92 |
photos |
3507 |
3.74 |
3 |
audio |
457 |
2.92 |
movie |
1289 |
5.26 |
pictures |
1508 |
1.61 |
4 |
lyrics |
340 |
2.17 |
videos |
860 |
3.51 |
pics |
1500 |
1.60 |
5 |
cd |
333 |
2.13 |
clips |
428 |
1.75 |
photo |
1241 |
1.32 |
6 |
song |
227 |
1.45 |
clipart |
219 |
0.89 |
gallery |
950 |
1.01 |
7 |
songs |
225 |
1.44 |
pictures |
204 |
0.83 |
images |
875 |
0.91 |
8 |
wav |
211 |
1.35 |
mpeg |
133 |
0.54 |
art |
809 |
0.86 |
9 |
band |
204 |
1.30 |
animated |
117 |
0.48 |
camera |
679 |
0.72 |
10 |
sounds |
117 |
.90 |
avi |
117 |
0.48 |
photography |
579 |
0.62 |
The terms are listed from the top ranked term to the tenth ranked term by frequency of occurrence in queries from that category. Number is the frequency of occurrence, e.g., the number one ranked audio term (e.g., music) occurred 1,365 times in the audio queries. The % is the percentage that this number represents of all terms from all queries in that multimedia query category.
As shown in Table 2, there is surprising little overlap between categories, with 'pictures' being the only top term to appear in more than one category. This is surprising because multimedia formats are typically used in combination, especially audio and image files, and one would expect some overlap among the categories. Across all three categories, there appear to be three or four terms that dominate, discounting stemming. These terms are music, movies, video, movie, and pictures. For Web site designs, these terms should be included in the meta-data of appropriate Web sites. For Web IR systems that desire to cater to multimedia searchers, these terms probably should be available via some interface mechanism.
The next section of the paper discusses the major findings of our study.
DISCUSSION
In analyzing trends in multimedia queries, one can compare and contrast-searching characteristics, such as those listed in Table 1, in each of the three categories - video, audio and image. From Table 1, we note that the median session in all cases was 2 queries and the average was varied from 2.44 for audio queries to 3.27 for image queries. These figures are generally higher than those reported from general Web searching, where the mean session was 1 query and the mean was 2.84 queries.
With respect to the query level of analysis, the median query length varied with 3 terms for video and image queries and 4 terms for audio queries. The mean query length ranged from 3.32 terms for video queries to 4.11 terms for audio queries. We compared these statistics to general Web searching characteristics, using data from Jansen, Spink and Saracevic (1999a). As might be expected, these figures are higher than general Web searching, where the reported median was 1 term and the average was 2.21 terms. The higher figure is expected due to the need by the searcher to add a multimedia term to the query. However, these findings also suggest that multimedia searching may place additional cognitive load on the searcher by requiring that non - semantic information needs be represented textually. This representational congruity at the query level is an issue that the IR system should address to better assist users during the search process.
Our findings also highlight several key aspects of multimedia searching. First, the number of users searching for multimedia documents, especially images, suggests a need to provide Web mechanism to facilitate this searching and possibly for viewing of results. Second, multimedia sessions and queries are still short compared to traditional IR system searching, but longer relative to general Web sessions and queries. There is little query reformulation for the majority of users. This may suggest either a problem with the Web IR system or that the precision of the Web IR system has satisfied the searcher’s information need. Third, there appear to be a small number of multimedia terms that occur frequently and a large number of terms that occur very infrequently. Web IR systems should capitizalize on the frequently occurring terms and offer thesaurus-type assistance for infrequently occurring query terms.
CONCLUSION & FURTHER RESEARCH
Our analysis of these Web multimedia queries indicates that users engaged in multimedia searching may be challenged by a lack of representational congruity. There are four areas that affect the outcome of IR system interaction with respect to representational congruity:
Problems arise when either documents or information needs cannot be expressed in a manner that will provide congruence between the representation and its referent. In the case of multimedia searching, there are problems in representing audio, video, and image information needs with textual queries, and with representing retrieved multimedia documents as short textual abstracts. The use of textually bounded systems for the retrieval of multimedia results in an increase in the contextual load placed on the user, as is evidenced by the number of terms and the number of queries needed to retrieve multimedia objects on the Web. In order to express a non-textual information need in only textual terms, the user takes on an additional cognitive load. In order to make relevance judgments, the user must many times visually inspect the full record in order to know if the retrieved document contains the requested multimedia information.
Although it may not be possible at this time to provide users with non-textual mechanisms for querying a Web IR system’s database, tools can be provided to assist users in specifying a multimedia information need and retrieving information with media file extensions. What is more challenging at this time is the provision of multimedia surrogates in the retrieved item list. The provision of extracted thumbnails and sound bites from web pages for relevance judgments and query reformulation are areas of potential benefit for future research.
ACKNOWLEDGMENTS
The authors gratefully acknowledge the assistance of Excite, Inc. in providing the data for this research. Without the generous data sharing by Excite Inc. this research would not be possible. We also acknowledge the generous support of our institutions for this research.
REFERENCES
Angelides, M., & Dustdar, S. (1997). Multimedia information systems. Kluwer: Boston.
Aslandogan, Y., Thier, C., Yu, C., Zou, J., & Rishe, N. (1997). Using semantic contents and WordNet in image retrieval. Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval (pp. 286 - 295).
Brown, M., Foote, J., Jones, G., Spärck Jones, K., &Young, S. (1996). Open-vocabulary speech indexing for voice and video mail retrieval. Proceedings of the fourth ACM international multimedia conference on Proceedings ACM Multimedia 96 (pp. 307 - 316).
Enser, P.G.B. (1995) Progress in documentation: Pictorial information retrieval. Journal of Documentation, 51(2), 126-170.
Goodrum, A. & Spink, A. (1999) Visual Information Seeking: A study of image queries on the World Wide Web. Proceedings the 1999 Annual Meeting of the American Society for Information Science, Washington, DC. November, 1999.
Goodrum, A. (in review) "Multidimensional Scaling of Video Surrogates," Journal of the American Society of Information Science.
Goodrum, A., & Kim, (1998). Visualizing the history of chemistry: Queries to the CHF Pictorial Collection. Report to the Chemical Heritage Foundation Pictorial Collection. http://www.chemheritage.org/Publications/ChemHeritage/Goodrum/goodrum.htm
Gordon, M., & Pathak, P. (1999). Finding information on the World Wide Web: The retrieval effectiveness of search engines. Information Processing and Management, 35(2), 141 – 180.
Gudivada, V.V., & Raghavan, V.V. (1995) "Content-based image retrieval systems," IEEE Computer, 28(9), 18-22.
Hastings, S. K. (1995). Query categories in a study of intellectual access to digitized art images. Proceedings of the 1995 Annual Meeting of the American Society for Information Science, 32, 3-8.
Jansen, B. J. & Pooch, U. (under review). Web use studies: A review of current and frame for future research. Submitted to Journal of the American Society of Information Science.
Jansen, B. J. (1999). Note on retrieval of number of images indexed at Alta Vista. With Alta Vista, one can select the image radio box and enter a ‘*’ (e.g., the wildcard character) into the search box. This will return the number of images in the Alta Vista inverted file index.
Jansen, B. J., Spink, A., & Saracevic, T. (1998). Failure analysis in query construction: Data and analysis from a large sample of Web queries. Proceedings of the Third ACM Conference on Digital Libraries, Pittsburgh, PA. (pp. 289-290).
Jansen, B. J., Spink, A., & Saracevic, T. (in press). Real life, real users and real needs: A study and analysis of user queries on the Web. Information Processing and Management.
Jansen, B., Spink, A., & Saracevic, T. (1999). The Use of Relevance Feedback on the Web: Implications for Web IR System Design. Proceedings of WebNet 99: The World Conference of the World Wide Web, Internet, and Intranet, October, 1999, Hawaii.
Kirsch, S. (1998). Everything you need to know about the Internet. Retrieved from the World Wide Web on 23 August 1999 at http://topgun.infoseek.com/stk/presentations/sigir.ppt.
Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web Nature. 400, 107 –109.
Lesk, M. (1997a). Going digital. Scientific American, 276(3), 58-60.
Lesk, M. (1997b). Practical digital libraries: Books, bytes, and bucks. Morgan Kaufman: San Francisco.
Nielsen/NetRating. (1999). Retrieved from the World Wide Web on 24 August 1999 from http://www.nielsen-netratings.com/.
O’Connor, B., O'Connor, M., & Abbas, J. (1999). Functional descriptors of image documents: User-generated captions and response statements. Journal of the American Society for Information Science, 50(8), 681-697.
Schauble, P. (1997). Multimedia information retrieval. Kluwer: Boston.
Smith, T., Ruocco, A., & Jansen, B. (1998). Digital Video in Education. Proceedings of the Thirtieth SIGCSE Technical Symposium on Computer Science Education, 122 – 126.
Spink, A., Bateman, J., & Jansen, B. (1999). Searching the Web: A Survey of Excite Users. Internet Research: Electronic Networking Applications and Policy.
Takahashi, J., Kushida, T. Hong, J., Sugita, S., Kurita, Y., Rieger, R., Martin, W., Gay, G. Reeve, J., & Loverance, R. (1998). Global digital museum multimedia information access and creation on the Internet. Proceedings of the third ACM Conference on Digital Libraries, 244 - 253).
Turner, J. (1990). Representing and accessing information in the stockshot database of the National Film Board of Canada. The Canadian journal of Information Science, 15, 1-22.
Witten, I.H., Moffat, A., & Bell, T. C. (1994). Managing gigabytes: Compressing and indexing documents and images. Van Nosstrand Reinhold: New York.