Max Planck Institute for Informatics
Long-Term Image Boundary Prediction
Bhattacharyya, Apratim (Max Planck Institute for Informatics) | Malinowski, Mateusz (Max Planck Institute for Informatics) | Schiele, Bernt (Max Planck Institute for Informatics) | Fritz, Mario (Max Planck Institute for Informatics)
Boundary estimation in images and videos has been a very active topic of research, and organizing visual information into boundaries and segments is believed to be a corner stone of visual perception. While prior work has focused on estimating boundaries for observed frames, our work aims at predicting boundaries of future unobserved frames. This requires our model to learn about the fate of boundaries and corresponding motion patterns---including a notion of "intuitive physics." We experiment on natural video sequences along with synthetic sequences with deterministic physics-based and agent-based motions. While not being our primary goal, we also show that fusion of RGB and boundary prediction leads to improved RGB predictions.
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags
Tandon, Niket (Max Planck Institute for Informatics ) | Hariman, Charles (Max Planck Institute for Informatics) | Urbani, Jacopo (Max Planck Institute for Informatics and VU University Amsterdam) | Rohrbach, Anna (Max Planck Institute for Informatics) | Rohrbach, Marcus (University of California, Berkeley) | Weikum, Gerhard (Max Planck Institute for Informatics)
Commonsense knowledge about part-whole relations (e.g., screen partOf notebook) is important for interpreting user input in web search and question answering, or for object detection in images. Prior work on knowledge base construction has compiled part-whole assertions, but with substantial limitations: i) semantically different kinds of part-whole relations are conflated into a single generic relation, ii) the arguments of a part-whole assertion are merely words with ambiguous meaning, iii) the assertions lack additional attributes like visibility (e.g., a nose is visible but a kidney is not) and cardinality information (e.g., a bird has two legs while a spider eight), iv) limited coverage of only tens of thousands of assertions. This paper presents a new method for automatically acquiring part-whole commonsense from Web contents and image tags at an unprecedented scale, yielding many millions of assertions, while specifically addressing the four shortcomings of prior work. Our method combines pattern-based information extraction methods with logical reasoning. We carefully distinguish different relations: physicalPartOf, memberOf, substanceOf. We consistently map the arguments of all assertions onto WordNet senses, eliminating the ambiguity of word-level assertions. We identify whether the parts can be visually perceived, and infer cardinalities for the assertions. The resulting commonsense knowledge base has very high quality and high coverage, with an accuracy of 89% determined by extensive sampling, and is publicly available.
Multimedia Data for the Visually Impaired
Tandon, Niket (Max Planck Institute for Informatics) | Sharma, Shekhar (PQRS Research) | Makkad, Tanima (PQRS Research)
The Web contains a large amount of information in the form of videos that remains inaccessible to the visually impaired people. We identify a class of videos whose information content can be approximately encoded as an audio, thereby increasing the amount of accessible videos. We propose a model to automatically identify such videos. Our model jointly relies on the textual metadata and visual content of the video. We use this model to re-rank Youtube video search results based on accessibility of the video. We present preliminary results by conducting a user study with visually impaired people to measure the effectiveness of our system.
Acquiring Comparative Commonsense Knowledge from the Web
Tandon, Niket (Max Planck Institute for Informatics) | Melo, Gerard de (Tsinghua University) | Weikum, Gerhard (Max Planck Institute for Informatics)
Applications are increasingly expected to make smart decisions based on what humans consider basic commonsense. An often overlooked but essential form of commonsense involves comparisons, e.g. the fact that bears are typically more dangerous than dogs, that tables are heavier than chairs, or that ice is colder than water. In this paper, we first rely on open information extraction methods to obtain large amounts of comparisons from the Web. We then develop a joint optimization model for cleaning and disambiguating this knowledge with respect to WordNet. This model relies on integer linear programming and semantic coherence scores. Experiments show that our model outperforms strong baselines and allows us to obtain a large knowledge base of disambiguated commonsense assertions.
Deriving a Web-Scale Common Sense Fact Database
Tandon, Niket (Max Planck Institute for Informatics) | Melo, Gerard de (Max Planck Institute for Informatics) | Weikum, Gerhard (Max Planck Institute for Informatics)
The fact that birds have feathers and ice is cold seems trivially true. Yet, most machine-readable sources of knowledge either lack such common sense facts entirely or have only limited coverage. Prior work on automated knowledge base construction has largely focused on relations between named entities and on taxonomic knowledge, while disregarding common sense properties. In this paper, we show how to gather large amounts of common sense facts from Web n-gram data, using seeds from the ConceptNet collection. Our novel contributions include scalable methods for tapping onto Web-scale data and a new scoring model to determine which patterns and facts are most reliable. The experimental results show that this approach extends ConceptNet by many orders of magnitude at comparable levels of precision.