Goto

Collaborating Authors

 differentia


A semantics-driven methodology for high-quality image annotation

Giunchiglia, Fausto, Bagchi, Mayukh, Diao, Xiaolei

arXiv.org Artificial Intelligence

Recent work in Machine Learning and Computer Vision has highlighted the presence of various types of systematic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this paper, we propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology whose main goal is to make explicit the (otherwise implicit) intended annotation semantics, thus minimizing the number and role of subjective choices. A key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels and, as a consequence, for driving the annotation of images based on the objects and the visual properties they depict. The methodology is validated on images populating a subset of the ImageNet hierarchy.


Egocentric Hierarchical Visual Semantics

Erculiani, Luca, Bontempelli, Andrea, Passerini, Andrea, Giunchiglia, Fausto

arXiv.org Artificial Intelligence

We are interested in aligning how people think about objects and what machines perceive, meaning by this the fact that object recognition, as performed by a machine, should follow a process which resembles that followed by humans when thinking of an object associated with a certain concept. The ultimate goal is to build systems which can meaningfully interact with their users, describing what they perceive in the users' own terms. As from the field of Lexical Semantics, humans organize the meaning of words in hierarchies where the meaning of, e.g., a noun, is defined in terms of the meaning of a more general noun, its genus, and of one or more differentiating properties, its differentia. The main tenet of this paper is that object recognition should implement a hierarchical process which follows the hierarchical semantic structure used to define the meaning of words. We achieve this goal by implementing an algorithm which, for any object, recursively recognizes its visual genus and its visual differentia. In other words, the recognition of an object is decomposed in a sequence of steps where the locally relevant visual features are recognized. This paper presents the algorithm and a first evaluation.


Towards Visual Semantics

Giunchiglia, Fausto, Erculiani, Luca, Passerini, Andrea

arXiv.org Artificial Intelligence

In Visual Semantics we study how humans build mental representations, i.e., concepts , of what they visually perceive. We call such concepts, substance concepts. In this paper we provide a theory and an algorithm which learns substance concepts which correspond to the concepts, that we call classification concepts , that in Lexical Semantics are used to encode word meanings. The theory and algorithm are based on three main contributions: (i) substance concepts are modeled as visual objects , namely sequences of similar frames, as perceived in multiple encounters ; (ii) substance concepts are organized into a visual subsumption hierarchy based on the notions of Genus and Differentia that resemble the notions that, in Lexical Semantics, allow to construct hierarchies of classification concepts; (iii) the human feedback is exploited not to name objects, as it has been the case so far, but, rather, to align the hierarchy of substance concepts with that of classification concepts. The learning algorithm is implemented for the base case of a hierarchy of depth two. The experiments, though preliminary, show that the algorithm manages to acquire the notions of Genus and Differentia with reasonable accuracy, this despite seeing a small number of examples and receiving supervision on a fraction of them.