Important Semantics of the Image Retrieval Essay

What’s the Big Picture? The Semantic Gap in Image Retrieval s) s) Affiliations(s) E-mail Webpage Semantics has been a major focus of research in the development of multimedia systems. This paper examines the research issues involved in the effort to bridge the semantic gap specifically in relation to image retrieval. Section 1 outlines the background and problem of the semantic gap. Section 2 discusses the common traits of top-down approaches and the difficulties they face. Section 3 discusses the opposite end of the arguments; bottom-up approaches. And the final section of this paper concludes the importance of a combination of top-down and bottom-up approaches for future works in the field. Keywords Semantic Gap, Ontologies, Automatic Annotation, Image Retrieval 1. Introduction With the rapid growth in the volume and availability of multimedia information, it has been increasingly vital for the development of techniques which allow for multimedia information to be effectively searched. The process of semantically categorising multimedia information inevitably generates more information. This information about information is what is known as metadata. And the introduction of metadata gives rise to what is known as the semantic gap. In a survey conducted in the early years of content-based image retrieval, Smeulders et al [1] described the semantic gap as “the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation”. In other words, the semantic gap in the context of multimedia image retrieval systems is the inconsistencies between the users’ and the computers’ representations of images. At the end of their survey, the authors concluded that: “A critical point in the advancement of content-based retrieval is the semantic gap, where the meaning of an image is rarely self-evident. The aim of content-based retrieval systems must be to provide maximum support in bridging the semantic gap between the simplicity of available visual features and the richness of the user semantics.” The driving force of research in bridging the semantic gap is to provide ways in which multimedia information can be consistently represented amongst users and computers. And thus can be stored and retrieved effectively in context. The aim of this paper is to provide an overview to the various methodologies and approaches adopted by developers in the past in multimedia image retrieval, and outlining the direction of research in bridging the semantic gap in the future. Figure 1 shows the range of hierarchy of levels in the semantic gap between raw media and full semantic representation. Starting from the bottom is raw media. This is the raw data without any semantic representations. The next level up is the descriptors level. Basic representations of the image are at this level. These are for example colour composition histograms, texture, salient regions (regions of interest/focus) etc. The objects level is where objects are distinguished. Note that at this level what the objects represent is not identified. Only regions belonging to different objects are distinguished. At the object labels level, the objects are given symbolic names to identify them. Yet the relationships between these objects are not represented. Finally at the top of the hierarchy sits the full semantics level. This is the representation of the image in the viewer’s perspective. Relationships between objects are represented with respect to context. This may include information that is not apparent from or derivable from the image itself such as the date, time, etc. Figure 1 – The hierarchy of levels in the semantic gap between raw media and full semantic representation. 2. Top-down Approaches Top-down approaches use ontologies to describe multimedia information with the aim of creating a well-formed structure of information to improve the effectiveness in image search and retrieval. An ontology is a data model which represents a domain. Note that although this paper’s focus is on image retrieval, ontologies are not limited to representing images. Ontologies can be used to represent a variety of multimedia information such as audio and video as well. Ontologies typically consist of object classes, their attributes, their relationships with each other and instance information showing how they are populated within the domain. As well as providing information about the content of the multimedia, ontologies also can be set to describe the multimedia object itself, such as its author(s), date of creation etc. An example of early attempts to tackle the semantic categorisation of multimedia information is the MPEG-7 [2] format. Formally named Multimedia Content Description Interface, it uses XML [3] to store metadata and supports limited interpretation of the informational meaning. In the past, efforts have been made to shift the MPEG-7 content description notation towards ontology languages such as Resource Description Framework (RDF) [4] and Web Ontology Language (OWL) [5]. The Semantic Web is perhaps the most well known project in the development of ontologies. Its aim is to provide a common data format and framework which allows all data to be shared amongst various applications, thus accomplishing the “ultimate compatibility” [6]. RDF and OWL are the standards recommended by the World Wide Web Consortium (W3C) for the Semantic Web in February 2004. RDF is a standard in representing information, allowing such information to be exchanged over the web. OWL is a standard used to publish ontologies thus, allowing ontologies to be managed, shared, and integrated amongst applications. 2.1 Ranking The key advantage advertised in the use of ontologies is the potential to be reused. Ontologies which are publically available i.e. the Semantic Web are to be reused, modified, extended and trimmed in order to save development time. And hence, the community benefits as a whole [7]. Figure 2 – The Swoogle architecture. In order for ontologies to be effectively reused, they must be effectively searched and ranked. This gives rise to specialised ontology search engines. The most well known example of such search engines is Swoogle. The Swoogle search engine has four components: discovery, indexing, analysis and services. The discovery component seeks URLs to collect Semantic Web Documents (SWDs) from the Internet and the Semantic Web. The Indexing component then analyses the SWDs and generates metadata about the Semantic Web. The Analysis component analyses the generated metadata in order for the SWDs to be ranked. The Services component is a search service allowing agents to access and navigate the Semantic Web [8]. See Figure 2 for the Swoogle architecture. Swoogle uses algorithms to rank Semantic Web objects at three levels. Semantic Web ontologies are analysed based on content and link structures at the document level. Knowledge is analysed based on objects in databases and the Web at the object level. And knowledge is analysed based on ontology content and query results in the Semantic Web at the sub-graph level [8]. 2.2 Scope A common problem associated with the use of ontologies to extensively model a domain is that they have the tendency to grow larger than what is necessary for their intended purposes [9]. This occurs as a result of developers either adapting larger ontologies to their specific uses without winnowing extraneous material or through the attempt of developers to provide for imaginable possibilities in the future of the ontology’s use. This problem is expected to only become greater as more and more developers become involved in the field. To begin with, the number of formal ontologies developed and in current use continues to grow. Field-specific use examples of ontologies include UMLS, CS Research, Military Coalition Ontology and Gene Ontology while commercial enterprises have developed the Common Business Library (CBL), Open Catalog Format (OCF), Commerce XML (cXML), RosettaNet, Real Estate Transaction Markup Language (RETML), Open Applications Group Integration Specification (OAGIS), Open Financial Exchange (OFX), Standardized Material and Service Classification, United Nations Standard Products and Services Code (UN/SPSC) and the Universal Content Extended Classifications System (UCEC). Each of these has its own set of terms that are highly implicit in their field as well as applying to different usages depending upon the task or role requested. To achieve the interaction necessary among these and other ontologies, it becomes necessary to add further definition to each term as a means of ensuring multiple applicability. Throughout its short history, the semantic web has continued to break barriers in disciplinary fields as well as international borders, increasing the amount of information to be stored and shared. This, in turn, increases the size of ontologies used to ensure appropriate communication among applications. This increased complexity in the ontologies to be used as well as the additional realization of relevant fields requires increased automation in the creation and annotation of image metadata while increased automation requires yet even more increases in complexity [10]. Using the analogy of ontology as the fundamental building block of the semantic web, this amounts to building a castle in order to house a mouse. An example of the large amounts of space required to adequately integrate these various ontologies among a specific application process is described in the MIAKT project [11]. In this project, application ontologies, distinguished from domain ontologies, are necessary to ensure proper communication between resources, requiring a complicated system of interlinked ontologies as the fundamental base upon which the rest of the system is constructed. Figure 3 – The MIAKT framework. When it comes to generating ontological descriptors specifically for images, the problems become even greater as user-defined input must be automatically integrated into the machine-generated flow of segment processing and identification. 2.3 Ontologies Change The issue of changing ontologies is illustrated in studies conducted with the Amateur Fiction Online Community, a virtual community in which amateur writers have developed a tremendous library of amateur writing and other communication tools that are user searchable, but not equipped with any form of metadata [12]. The need for this inclusion is evident not only in the way they communicate with each other, constantly searching for information that may or may not be available in a constantly shifting environment, but also in the very nature of the community in which standards, expectations and vocabularies can differ extensively. In pointing out the problems with virtual communities such as this one, mechanical ability to search the available library could well provide a great service to its daily operation and ease of use, yet also points out the problem of constantly changing ontologies as the members of these communities shift. This becomes an even greater difficulty in trying to annotate information with images as the shifting nature of ontologies exists even within the same field. Illustrating this, developers of the OntoMedia ontology discovered this problem early on in their process [13]. Definitions used to describe images were often found to be quite vague as to their meaning as well as taking on special significance in one community that was not necessarily shared within another. When the metadata that has been generated to describe images is introduced, with its potentially large files full of colour, histogram, and other multi-linguistic descriptions, the problem of ontology shifting becomes even greater. 2.4 Integration To bring these various ontological systems within a single framework, analogies have been made to the process of linking differing schematas. As reported in [14], there are three basic approaches taken to integrate ontologies within a given application or domain. These approaches are referred to generally as alignment, partial compatibility and unification. Alignment refers to the mapping of concepts and relations to indicate equivalence. An alignment that supports equivalent inferences and computations on equivalent concepts and relations is called partial compatibility. Finally, an alignment that creates a one-to-one mapping of all concepts and relations in both ontologies, thereby allowing any inference or computation expressed in one also expressible in the other is considered unification. The OntoMedia ontology is a good example of how ontologies integrate with manually entered top-down metadata related to images. The program actually uses several sub-ontologies as a means of providing appropriate support for the various types of media that might be annotated with it without overburdening the end user with extraneous knowledge that doesn’t apply. The interface reduces the amount of manual entries while maximizing the inter-related connections made automatically and reducing the drain on the local drive by pulling the ontology from a publicly shared server while saving on the local drive [15]. In addition to the processes discussed above, automation can be developed to help reduce the size of ontologies that have outgrown their effectiveness or simply to reduce the space they occupy to only the most necessary elements. Manual winnowing has been proven to be less effective as developers often do not have the knowledge and experience to correctly determine those elements of the ontology that are unnecessary. This is addressed to some degree in the way in which the OntoMedia ontology is divided in several sub-ontologies and pulls from the public server until the completed file is stored on the local drive. However, as is discussed in [16], this type of approach can also have strictly limiting effects upon the end user by restricting access to partitions that are not a portion of the core set currently being used. 3. Bottom-up Approaches Opposite from the top-down approaches in which key terms and vocabularies are already defined by the ontology of the domain and assigned based upon pre-determined criteria, the concept behind bottom-up approaches is to work from the raw data itself to build the description. Raw data can be compiled automatically or based upon end-user input. Folksonomies are an example of the bottom-up approaches that can be taken to developing ontologies from the perspective of the end user. The term folksonomy is derived from the combination of ‘folk’ and ‘taxonomy’ in which the consumer of anything with a URL assigns the location their own keywords and descriptors, using their own everyday language on a social network available for all [17]. This is considered bottom-up because there are no set language codes, no authoritative involvement and the URLs as well as the language used to describe them are completely open-ended. This process can be conducted in automatic image retrieval by describing the image in terms of its salient regions. According to [18], peaks in a difference-of-Gaussian pyramid have provided the most stable interest regions in studies that have compared them with a variety of other interest point detectors. Using this technique, advances in automatic metadata generation for image retrieval has been focused upon adapting classical textual methods of retrieval to image analysis and term-matching. Figure 4 – Example salient regions found from the peaks in the difference-of-Gaussian pyramid. Once the salient regions have been identified, there remain a number of ways in which they can be described including any colour moments or Gabor texture descriptors. Currently, there isn’t a strict standard on which, if any, specific descriptors should be included in image retrieval systems. However, this automatic generation of descriptors can often lead to a tremendous amount of word generation to describe a single image. As is described in [19], this can be reduced through the use of other applications such as stemming, which is already a part of vector-space retrieval. 3.1 Ranking As it’s reported in [20], the rise in peer to peer personal computing has opened up the field of file sharing, including the sharing of images, to the average lay person, who retains the ability to identify files according to their own system and language structures based upon the way in which they view the document. Ontologies based on this knowledge would be considered bottom-up and are typically ranked based upon the number of links or referrals they receive from the end user. However, a problem in falling back on the page rank method used by Google, Swoogle and OntoKhoj [21] is that these ontologies are neither proven to be a ‘good’ or appropriate representation of the knowledge it represents nor does the page rank system provide an appropriate representation of the developed and available ontologies – just because one has not received a lot of attention or publicity does not necessarily indicate that it is poorly developed or ‘worse than’ other available options. Despite this, link analysis remains one of the most used, if not the only used, method of bottom-up ranking of ontologies. An example of this type of ranking system can be found in AKTiveRank which queries the Swoogle database. As can be seen in Figure 4, the main component of the architecture is the Java Servlet that receives the query from the user and queries Swoogle for the given search terms. The search terms only allow the user to search for concepts within the ontology rather than dealing with any properties or comments and then AKTiveRank delivers ranked ontology candidates based upon their analysed relevance to the terms delivered. Figure 5 – The AKTiveRank Architecture. 3.2 Scope One of problems associated with the automatic annotation of metadata to images is the issue of how to determine those segments of the image to be defined. Several approaches have been taken in trying to develop systems that will automatically identify the key features of images. These include attention to the rectangular areas of the image, significant ‘blob’ area identification, segmentation and scene-oriented approaches. The defined areas are then compared to keyword annotations. In a very real sense, the process of automatically generating metadata for images based on a segmentation process is still dependent upon top-down strategies in that the overall image must be understood before the segments can be properly identified and annotated [22]. While the basics of what the image contains may be able to be generated automatically, the problem of defining the context of the objects in the image remains difficult to overcome automatically. In other words, while the objects may be identified (tree, bush, people), queries are generally made at a higher level than this and the system has not yet been developed that can make the leap from object identification to context identity (a picnic or a hunting party) [23]. For this reason, the automatic systems of annotation typically focus on the gap that exists between the descriptors and the objects rather than the objects and the semantics. 4. Conclusions and Future Work While automated systems have been developed that can work from the bottom-up in defining significant segments of an image and assign appropriate metadata, most of these programs have yet to bridge the semantic gap that exists between the identification of an object, such as a house or bird, with the identification of the context, such as the birdhouse standing in the front yard. Bottom-up labelling has also emerged as a natural by-product of end-user activities as lay-persons attempt to find their own solutions to the problem of appropriate image retrieval. While automated programs to identify images can be very helpful in filling in the lower end data regarding a particular image, studies continue to indicate that end users are searching for images using higher end search terms. This means that image classification must still be conducted manually until the semantic gap can be breached by more automated processes. Further research should be conducted in the possibility of image comparison, more specific object identification and contextual cues. Meanwhile, the inclusion of terms and classifications winnowed from end-user lists can provide developers with possible contextual clues to begin automated search programs. 5. References [1] Smeulders A.W.M., Worring M., Santini S., Gupta A. and Jain R. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. March. Intell. 22, 1349-1380 (2000). [2] Manjunath B.S., Salembier P. and Sikora T. Introduction to MPEG-7: Multimedia Content Description Interface. Wiley & Sons. April. ISBN 0-471-48678-7 (2002). [3] Tim B., Paoli J., Sperberg-McQueen C.M., Maler E. and Yergeau F. Extensible Markup Language (XML) 1.0 (Fourth Edition) - Origin and Goals, World Wide Web Consortium. http://www.w3.org/TR/2006/REC-xml-20060816/#sec-origin-goals last accessed October 2006. [4] Hunter J. Adding Multimedia to the Semantic Web - Building an MPEG-7 Ontology. DSTC Pty Ltd, University of Old, Australia. 2001. [5] Tsinaraki C., Polydoros P., Moumoutzis N. and Christodoulakis S. Coupling OWL with MPEG-7 and TV-Anytime for Domain-specific Mutimedia Information Integration and Retrieval. [6] Herman I. (W3C) Semantic Web Activity Lead. http://www.w3.org/2001/sw/ last accessed October 2006. [7] Alani H. Ontology construction form Online Ontologies. 15th International World Wide Web Conference, Edinburgh, 2006. [8] Ding L., Pan R., Finin T., Joshi A., Peng Y. and Kolari P. Finding and Ranking Knowledge on the Semantic Web. Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, Baltimore MD 21250. 2005. [9] Noy N.F. and Musen A. Specifying Ontology Views by Traversal. In 3rd International Semantic Web Conference (ISWC ‘04), Hiroshima, Japan, 2004. [10] De Roure D., Jennings N.R. and Shadbolt N.R. The Semantic Grid: Past, Present and Future. 2004. [11] Dupplaw D., Dasmahapatra S., Hu B., Lewis P. and Shadbolt, N. Multimedia Distributed Knowledge Management in MIAKT. [12] Lawrence K.F. and Schraefel M.C. Web Based Semantic Communities – Who, How and Why We Might Want Them in the First Place. [13] Jewell M.O., Lawrence F. and Tuffield M.M. OntoMedia: An Ontology for the Representation of Heterogeneous Media. [14] CROSI (Capturing Representing and Operationalising Semantic Integration). University of Southhampton. 26 April 2005. [15] Jewell M.O., Lawrence K.F., Prugel-Bennett A. and Schraefel M.C. Annotation of Multimedia Using OntoMedia. [16] Alani H., Harris S, O’Neil B. Winnowing Ontologies Based on Application Use. [17] Al-Khalifa H.S. and Davis H.C. FolksAnnotation: A Semantic Metadata Tool for Annotating Learning Resources Using Folksonomies and Domain Ontologies. 2006. [18] Hare J.S. and Lewis P.H. On Image Retrieval Using Salient Regions with Vector Spaces and Latent Semantics. [19] Ibid. [20] Zhou J., Hall W., De Roure D.C. and Dialani V.K. Supporting Ad Hoc Resource Sharing on the Web: A Peer-to-Peer Approach to Hypermedia Link Services. ACM Transactions on Internet Technology. Vol. 5, N. N, June 2006, pp. 1-26. [21] Alani H. and Brewster C. Ontology Ranking Based on the Analysis of Concept Structures. K-Cap. Banff, Alberta, Canada, 2005. [22] Hare J.S. and Lewis P.H. Saliency-based Models of Image Content and their Application to Auto-Annotation by Semantic Propagation. School of Electronics and Computer Science, University of Southampton. [23] Hare J.S., Lewis P.H., Enser P.G.B. and Sandom C.J. Mind the Gap: Another Look at the Problem of the Semantic Gap in Image Retrieval. I could not find all of the referencing information for some of the references included in the zip files – such as the names of journals or affiliations. For your convenience, and hoping you have access to this information. The incomplete reference data begins with reference # 10. Read More

Important Semantics of the Image Retrieval - Essay Example

Extract of sample "Important Semantics of the Image Retrieval"

CHECK THESE SAMPLES OF Important Semantics of the Image Retrieval

Semantic Web and Implications

Semantic Connectivity

Environmental Scan for a Search Engine

Ethical Advertising: Anti-Obama Billboard Images

The E-Book Revolution

The Visual Cognitive Process

Optical Character Recognition System

Depth of Processing and the Retention of Words in Episode Memory