kordjamshidi
FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing
Premsri, Tanawan, Kordjamshidi, Parisa
Frame of Reference (FoR) is a fundamental concept in spatial reasoning that humans utilize to comprehend and describe space. With the rapid progress in Multimodal Language models, the moment has come to integrate this long-overlooked dimension into these models. In particular, in text-to-image (T2I) generation, even state-of-the-art models exhibit a significant performance gap when spatial descriptions are provided from perspectives other than the camera. To address this limitation, we propose Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing (FoR-SALE), an extension of the Self-correcting LLM-controlled Diffusion (SLD) framework for T2I. For-Sale evaluates the alignment between a given text and an initially generated image, and refines the image based on the Frame of Reference specified in the spatial expressions. It employs vision modules to extract the spatial configuration of the image, while simultaneously mapping the spatial expression to a corresponding camera perspective. This unified perspective enables direct evaluation of alignment between language and vision. When misalignment is detected, the required editing operations are generated and applied. FoR-SALE applies novel latent-space operations to adjust the facing direction and depth of the generated images. We evaluate FoR-SALE on two benchmarks specifically designed to assess spatial understanding with FoR. Our framework improves the performance of state-of-the-art T2I models by up to 5.3% using only a single round of correction.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Michigan (0.04)
- (3 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization
Kamali, Danial, Barezi, Elham J., Kordjamshidi, Parisa
Compositional generalization is crucial for artificial intelligence agents to solve complex vision-language reasoning tasks. Neuro-symbolic approaches have demonstrated promise in capturing compositional structures, but they face critical challenges: (a) reliance on predefined predicates for symbolic representations that limit adaptability, (b) difficulty in extracting predicates from raw data, and (c) using non-differentiable operations for combining primitive concepts. To address these issues, we propose NeSyCoCo, a neuro-symbolic framework that leverages large language models (LLMs) to generate symbolic representations and map them to differentiable neural computations. NeSyCoCo introduces three innovations: (a) augmenting natural language inputs with dependency structures to enhance the alignment with symbolic representations, (b) employing distributed word representations to link diverse, linguistically motivated logical predicates to neural modules, and (c) using the soft composition of normalized predicate scores to align symbolic and differentiable reasoning. Our framework achieves state-of-the-art results on the ReaSCAN and CLEVR-CoGenT compositional generalization benchmarks and demonstrates robust performance with novel concepts in the CLEVR-SYN benchmark.
- North America > Dominican Republic (0.04)
- Asia > Singapore (0.04)
- North America > United States > Michigan (0.04)
- (3 more...)
Prompt2DeModel: Declarative Neuro-Symbolic Modeling with Natural Language
Faghihi, Hossein Rajaby, Nafar, Aliakbar, Uszok, Andrzej, Karimian, Hamid, Kordjamshidi, Parisa
This paper presents a conversational pipeline for crafting domain knowledge for complex neuro-symbolic models through natural language prompts. It leverages large language models to generate declarative programs in the DomiKnowS framework. The programs in this framework express concepts and their relationships as a graph in addition to logical constraints between them. The graph, later, can be connected to trainable neural models according to those specifications. Our proposed pipeline utilizes techniques like dynamic in-context demonstration retrieval, model refinement based on feedback from a symbolic parser, visualization, and user interaction to generate the tasks' structure and formal knowledge representation. This approach empowers domain experts, even those not well-versed in ML/AI, to formally declare their knowledge to be incorporated in customized neural models in the DomiKnowS framework.
- North America > United States > Michigan (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report (0.82)
- Workflow (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Neuro-symbolic Training for Reasoning over Spatial Language
Premsri, Tanawan, Kordjamshidi, Parisa
Recent research shows that more data and larger models can provide more accurate solutions to natural language problems requiring reasoning. However, models can easily fail to provide solutions in unobserved complex input compositions due to not achieving the level of abstraction required for generalizability. To alleviate this issue, we propose training the language models with neuro-symbolic techniques that can exploit the logical rules of reasoning as constraints and provide additional supervision sources to the model. Training models to adhere to the regulations of reasoning pushes them to make more effective abstractions needed for generalizability and transfer learning. We focus on a challenging problem of spatial reasoning over text. Our results on various benchmarks using multiple language models confirm our hypothesis of effective domain transfer based on neuro-symbolic training.
- North America > United States > Michigan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models
Rizvi, Md Imbesat Hassan, Zhu, Xiaodan, Gurevych, Iryna
Spatial reasoning is a crucial component of both biological and artificial intelligence. In this work, we present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning. To support our study, we created and contribute a novel Spatial Reasoning Characterization (SpaRC) framework and Spatial Reasoning Paths (SpaRP) datasets, to enable an in-depth understanding of the spatial relations and compositions as well as the usefulness of spatial reasoning chains. We found that all the state-of-the-art LLMs do not perform well on the datasets -- their performances are consistently low across different setups. The spatial reasoning capability improves substantially as model sizes scale up. Finetuning both large language models (e.g., Llama-2-70B) and smaller ones (e.g., Llama-2-13B) can significantly improve their F1-scores by 7--32 absolute points. We also found that the top proprietary LLMs still significantly outperform their open-source counterparts in topological spatial understanding and reasoning.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- Asia > Singapore (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
NavHint: Vision and Language Navigation Agent with a Hint Generator
Zhang, Yue, Guo, Quan, Kordjamshidi, Parisa
Existing work on vision and language navigation mainly relies on navigation-related losses to establish the connection between vision and language modalities, neglecting aspects of helping the navigation agent build a deep understanding of the visual environment. In our work, we provide indirect supervision to the navigation agent through a hint generator that provides detailed visual descriptions. The hint generator assists the navigation agent in developing a global understanding of the visual environment. It directs the agent's attention toward related navigation details, including the relevant sub-instruction, potential challenges in recognition and ambiguities in grounding, and the targeted viewpoint description. To train the hint generator, we construct a synthetic dataset based on landmarks in the instructions and visible and distinctive objects in the visual environment. We evaluate our method on the R2R and R4R datasets and achieve state-of-the-art on several metrics. The experimental results demonstrate that generating hints not only enhances the navigation performance but also helps improve the interpretability of the agent's actions.
- Information Technology > Visual Languages (0.75)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
VLN-Trans: Translator for the Vision and Language Navigation Agent
Zhang, Yue, Kordjamshidi, Parisa
Language understanding is essential for the navigation agent to follow instructions. We observe two kinds of issues in the instructions that can make the navigation task challenging: 1. The mentioned landmarks are not recognizable by the navigation agent due to the different vision abilities of the instructor and the modeled agent. 2. The mentioned landmarks are applicable to multiple targets, thus not distinctive for selecting the target among the candidate viewpoints. To deal with these issues, we design a translator module for the navigation agent to convert the original instructions into easy-to-follow sub-instruction representations at each step. The translator needs to focus on the recognizable and distinctive landmarks based on the agent's visual abilities and the observed visual environment. To achieve this goal, we create a new synthetic sub-instruction dataset and design specific tasks to train the translator and the navigation agent. We evaluate our approach on Room2Room~(R2R), Room4room~(R4R), and Room2Room Last (R2R-Last) datasets and achieve state-of-the-art results on multiple benchmarks.
Kordjamshidi
We present Saul, a new probabilistic programming language designed to address some of the shortcomings of programming languages that aim at advancing and simplifying the development of AI systems. Such languages need to interact with messy, naturally occurring data, to allow a programmer to specify what needs to be done at an appropriate level of abstraction rather than at the data level, to be developed on a solid theory that supports moving to and reasoning at this level of abstraction and, finally, to support flexible integration of these learning and inference models within an application program. Saul is an object-functional programming language written in Scala that facilitates these by (1) allowing a programmer to learn, name and manipulate named abstractions over relational data; (2) supporting seamless incorporation of trainable (probabilistic or discriminative) components into the program, and (3) providing a level of inference over trainable models to support composition and make decisions that respect domain and application constraints. Saul is developed over a declaratively defined relational data model, can use piecewise learned factor graphs with declaratively specified learning and inference objectives, and it supports inference over probabilistic models augmented with declarative knowledge-based constraints.We describe the key constructs of Saul and exemplify its use in developing applications that require relational feature engineering and structured output prediction.
DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning
Faghihi, Hossein Rajaby, Guo, Quan, Uszok, Andrzej, Nafar, Aliakbar, Raisi, Elaheh, Kordjamshidi, Parisa
We demonstrate a library for the integration of domain knowledge in deep learning architectures. Using this library, the structure of the data is expressed symbolically via graph declarations and the logical constraints over outputs or latent variables can be seamlessly added to the deep models. The domain knowledge can be defined explicitly, which improves the models' explainability in addition to the performance and generalizability in the low-data regime. Several approaches for such an integration of symbolic and sub-symbolic models have been introduced; however, there is no library to facilitate the programming for such an integration in a generic way while various underlying algorithms can be used. Our library aims to simplify programming for such an integration in both training and inference phases while separating the knowledge representation from learning algorithms. We showcase various NLP benchmark tasks and beyond. The framework is publicly available at Github(https://github.com/HLR/DomiKnowS).
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Indiana (0.05)
- North America > United States > Texas (0.04)
- (2 more...)