Bhatt, Mehul
Commonsense Visual Sensemaking for Autonomous Driving: On Generalised Neurosymbolic Online Abduction Integrating Vision and Semantics
Suchan, Jakob, Bhatt, Mehul, Varadarajan, Srikrishna
We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking in the backdrop of autonomous driving. A general neurosymbolic method for online visual sensemaking using answer set programming (ASP) is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework that is generally usable within hybrid architectures for realtime perception and control. We evaluate and demonstrate with community established benchmarks KITTIMOD, MOT-2017, and MOT-2020. As use-case, we focus on the significance of human-centred visual sensemaking -- e.g., involving semantic representation and explainability, question-answering, commonsense interpolation -- in safety-critical autonomous driving situations. The developed neurosymbolic framework is domain-independent, with the case of autonomous driving designed to serve as an exemplar for online visual sensemaking in diverse cognitive interaction settings in the backdrop of select human-centred AI technology design considerations. Keywords: Cognitive Vision, Deep Semantics, Declarative Spatial Reasoning, Knowledge Representation and Reasoning, Commonsense Reasoning, Visual Abduction, Answer Set Programming, Autonomous Driving, Human-Centred Computing and Design, Standardisation in Driving Technology, Spatial Cognition and AI.
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving
Kondyli, Vasiliki, Bhatt, Mehul, Suchan, Jakob
We develop a human-centred, cognitive model of visuospatial complexity in everyday, naturalistic driving conditions. With a focus on visual perception, the model incorporates quantitative, structural, and dynamic attributes identifiable in the chosen context; the human-centred basis of the model lies in its behavioural evaluation with human subjects with respect to psychophysical measures pertaining to embodied visuoauditory attention. We report preliminary steps to apply the developed cognitive model of visuospatial complexity for human-factors guided dataset creation and benchmarking, and for its use as a semantic template for the (explainable) computational analysis of visuospatial complexity.
Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving
Suchan, Jakob, Bhatt, Mehul, Varadarajan, Srikrishna
We demonstrate the need and potential of systematically integrated vision and semantics} solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in (deep learning based) visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking ---e.g., semantic representation and explainability, question-answering, commonsense interpolation--- in safety-critical autonomous driving situations.
Semantic Analysis of (Reflectional) Visual Symmetry: A Human-Centred Computational Model for Declarative Explainability
Suchan, Jakob, Bhatt, Mehul, Vardarajan, Srikrishna, Amirshahi, Seyed Ali, Yu, Stella
We present a computational framework for the semantic interpretation of symmetry in naturalistic scenes. Key features include a human-centred representation, and a declarative, explainable interpretation model supporting deep semantic question-answering founded on an integration of methods in knowledge representation and computer vision. In the backdrop of the visual arts, we showcase the framework's capability to generate human-centred, queryable, relational structures, also evaluating the framework with an empirical study on the human perception of visual symmetry. Our framework represents and is driven by the application of foundational Vision and KR methods in the psychological and social sciences.
Answer Set Programming Modulo `Space-Time'
Schultz, Carl, Bhatt, Mehul, Suchan, Jakob, Wałęga, Przemysław
We present ASP Modulo `Space-Time', a declarative representational and computational framework to perform commonsense reasoning about regions with both spatial and temporal components. Supported are capabilities for mixed qualitative-quantitative reasoning, consistency checking, and inferring compositions of space-time relations; these capabilities combine and synergise for applications in a range of AI application areas where the processing and interpretation of spatio-temporal data is crucial. The framework and resulting system is the only general KR-based method for declaratively reasoning about the dynamics of `space-time' regions as first-class objects. We present an empirical evaluation (with scalability and robustness results), and include diverse application examples involving interpretation and control tasks.
Visual Explanation by High-Level Abduction: On Answer-Set Programming Driven Reasoning About Moving Objects
Suchan, Jakob (University of Bremen) | Bhatt, Mehul (University of Bremen) | Wałega, Przemysław (Örebro University, Sweden) | Schultz, Carl (University of Warsaw)
We propose a hybrid architecture for systematically computing robust visual explanation(s) encompassing hypothesis formation, belief revision, and default reasoning with video data. The architecture consists of two tightly integrated synergistic components: (1) (functional) answer set programming based abductive reasoning with space-time tracklets as native entities; and (2) a visual processing pipeline for detection based object tracking and motion analysis. We present the formal framework, its general implementation as a (declarative) method in answer set programming, and an example application and evaluation based on two diverse video datasets: the MOTChallenge benchmark developed by the vision community, and a recently developed Movie Dataset.
Artificial Intelligence for Predictive and Evidence Based Architecture Design
Bhatt, Mehul (University of Bremen and The DesignSpace Group) | Suchan, Jakob (University of Bremen and The DesignSpace Group) | Schultz, Carl (University of Bremen and The DesignSpace Group) | Kondyli, Vasiliki (University of Bremen and The DesignSpace Group) | Goyal, Saurabh (University of Bremen and The DesignSpace Group)
The evidence-based analysis of people's navigation and wayfinding behaviour in large-scale built-up environments (e.g., hospitals, airports) encompasses the measurement and qualitative analysis of a range of aspects including people's visual perception in new and familiar surroundings, their decision-making procedures and intentions, the affordances of the environment itself, etc. In our research on large-scale evidence-based qualitative analysis of wayfinding behaviour, we construe visual perception and navigation in built-up environments as a dynamic narrative construction process of movement and exploration driven by situation-dependent goals, guided by visual aids such as signage and landmarks, and influenced by environmental (e.g., presence of other people, time of day, lighting) and personal (e.g., age, physical attributes) factors. We employ a range of sensors for measuring the embodied visuo-locomotive experience of building users: eye-tracking, egocentric gaze analysis, external camera based visual analysis to interpret fine-grained behaviour (e.g., stopping, looking around, interacting with other people), and also manual observations made by human experimenters. Observations are processed, analysed, and integrated in a holistic model of the visuo-locomotive narrative experience at the individual and group level. Our model also combines embodied visual perception analysis with analysis of the structure and layout of the environment (e.g., topology, routes, isovists) computed from available 3D models of the building. In this framework, abstract regions like the visibility space, regions of attention, eye movement clusters, are treated as first class visuo-spatial and iconic objects that can be used for interpreting the visual experience of subjects in a high-level qualitative manner. The final integrated analysis of the wayfinding experience is such that it can even be presented in a virtual reality environment thereby providing an immersive experience (e.g., using tools such as the Oculus Rift) of the qualitative analysis for single participants, as well as for a combined analysis of large group. This capability is especially important for experiments in post-occupancy analysis of building performance. Our construction of indoor wayfinding experience as a form of moving image analysis centralizes the role and influence of perceptual visuo-spatial characteristics and morphological features of the built environment into the discourse on wayfinding research. We will demonstrate the impact of this work with several case-studies, particularly focussing on a large-scale experiment conducted at the New Parkland Hospital in Dallas Texas, USA.
Constructive Geometric Constraint Solving as a General Framework for KR-Based Declarative Spatial Reasoning
Schultz, Carl (University of Muenster) | Bhatt, Mehul (University of Bremen)
We present a robust and scalable KR-centered foundation for modularly supporting general declarative spatial representation and reasoning within diverse declarative programming AI frameworks. Based on Constructive Geometric Constraint Solving, our approach provides the foundations for mixed qualitative-quantitative reasoning about space - mereotopology, relative orientation, size, proximity - encompassing key application-driven capabilities such as qualification, spatial consistency solving, quantification, and dynamic geometry. The paper also demonstrates: (a) the framework with benchmark problems (e.g., contact and orientation problems) and applications in spatial Q/A; (b) integration with constraint logic programming, and (c) empirical results illustrating how the proposed encodings outperform existing methods by orders of magnitude on the selected problems.
Between Sense and Sensibility: Declarative narrativisation of mental models as a basis and benchmark for visuo-spatial cognition and computation focussed collaborative cognitive systems
Bhatt, Mehul
What lies between `\emph{sensing}' and `\emph{sensibility}'? In other words, what kind of cognitive processes mediate sensing capability, and the formation of sensible impressions ---e.g., abstractions, analogies, hypotheses and theory formation, beliefs and their revision, argument formation--- in domain-specific problem solving, or in regular activities of everyday living, working and simply going around in the environment? How can knowledge and reasoning about such capabilities, as exhibited by humans in particular problem contexts, be used as a model and benchmark for the development of collaborative cognitive (interaction) systems concerned with human assistance, assurance, and empowerment? We pose these questions in the context of a range of assistive technologies concerned with \emph{visuo-spatial perception and cognition} tasks encompassing aspects such as commonsense, creativity, and the application of specialist domain knowledge and problem-solving thought processes. Assistive technologies being considered include: (a) human activity interpretation; (b) high-level cognitive rovotics; (c) people-centred creative design in domains such as architecture & digital media creation, and (d) qualitative analyses geographic information systems. Computational narratives not only provide a rich cognitive basis, but they also serve as a benchmark of functional performance in our development of computational cognitive assistance systems. We posit that computational narrativisation pertaining to space, actions, and change provides a useful model of \emph{visual} and \emph{spatio-temporal thinking} within a wide-range of problem-solving tasks and application areas where collaborative cognitive systems could serve an assistive and empowering function.
The AAAI-13 Conference Workshops
Agrawal, Vikas (IBM Research-India) | Archibald, Christopher (Mississippi State University) | Bhatt, Mehul (University of Bremen) | Bui, Hung (Nuance) | Cook, Diane J. (Washington State University) | Cortés, Juan (University of Toulouse) | Geib, Christopher (Drexel University) | Gogate, Vibhav (University of Texas at Dallas) | Guesgen, Hans W. (Massey University) | Jannach, Dietmar (TU Dortmund) | Johanson, Michael (University of Alberta) | Kersting, Kristian (University of Bonn) | Konidaris, George (Massachusetts Institute of Technology) | Kotthoff, Lars (University College Cork) | Michalowski, Martin (Adventium Labs) | Natarajan, Sriraam (Indiana University) | O'Sullivan, Barry (University College Cork) | Pickett, Marc (Naval Research Laboratory) | Podobnik, Vedran (University of Zagreb) | Poole, David (University of British Columbia) | Shastri, Lokendra (GM Research, India) | Shehu, Amarda (George Mason University) | Sukthankar, Gita (University of Central Florida)