AITopics | compositionality

6a42b45af2b72e6e5b5e3a6fe695809f-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsJun-18-2026, 03:59:06 GMT

The model can easily distinguish A and B according to the background (i.e., the so-called geometric skews [26]), but not according to the features of the class instance itself. However, if there is another class C, which is also in black background. In this tri-classification task (distinguishing A,B, and C), an ideal model should focus on the feature of the instance itself but not the background. This is one of the difficulties: distribution bias on samples, that some beneficial features (e.g., background) may be good for the classification, but not good for understanding the class (in a compositional way). Another difficulty is entanglement of the labels. We provide the labels in a relative way that the label of A is '0' and of B is '1', but not their true textual meanings (e.g., white paper and green leaves). The concept information is entangled and embedded into the label, thus, it is hard for the model to tell which visual features capture the corresponding concepts (i.e., white refers to the color feature and paper refers to the texture feature). We hope our understanding of this issue can inspire researchers to focus more on compositionality and design excellent continual learners.

artificial intelligence, helmet, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning

Neural Information Processing SystemsJun-17-2026, 07:17:48 GMT

Vision-language models like CLIP have demonstrated remarkable zero-shot capabilities in classification and retrieval. However, these models often struggle with compositional reasoning - the ability to understand the relationships between concepts. A recent benchmark, SugarCrepe++ [11], reveals that previous works on improving compositionality have mainly improved lexical sensitivity but neglected semantic understanding. In addition, downstream retrieval performance often deteriorates, although one would expect that improving compositionality should enhance retrieval. In this work, we introduce CLIC (Compositionally-aware Learning in CLIP), a fine-tuning method based on a novel training technique combining multiple images and their associated captions. CLIC improves compositionality across architectures as well as differently pre-trained CLIP models, both in terms of lexical and semantic understanding, and achieves consistent gains in retrieval performance. This even applies to the recent CLIPS [33], which achieves SOTA retrieval performance. Nevertheless, the short fine-tuning with CLIC leads to an improvement in retrieval and to the best compositional CLIP model on SugarCrepe++.

caption, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

ACompressive-Expressive Communication Framework for Compositional Representations

Neural Information Processing SystemsJun-16-2026, 03:46:22 GMT

Compositionality in knowledge and language--the ability to represent complex concepts as a combination of simpler ones--is a hallmark of human cognition and communication. Despite recent advances, deep neural networks still struggle to acquire this property reliably. Neural models for emergent communication look to endow artificial agents with compositional language by simulating the pressures that form human language. In this work, we introduce CELEBI2 (CompressiveExpressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositional representations through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence and the iterated learning framework, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, Progressive Decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, Final-State Imitation trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

Neural Information Processing SystemsJun-14-2026, 05:48:14 GMT

Effective task representations should facilitate compositionality, such that after learning a variety of basic tasks, an agent can perform compound tasks consisting of multiple steps simply by composing the representations of the constituent steps together. While this is conceptually simple and appealing, it is not clear how to automatically learn representations that enable this sort of compositionality. We show that learning to associate the representations of current and future states with a temporal alignment loss can improve compositional generalization, even in the absence of any explicit subtask planning or reinforcement learning. We evaluate our approach across diverse robotic manipulation tasks as well as in simulation, showing substantial improvements for tasks specified with either language or goal images.

artificial intelligence, proceedings, representation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.80)

Add feedback

Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning

Neural Information Processing SystemsJun-12-2026, 04:49:51 GMT

Vision-language models like CLIP have demonstrated remarkable zero-shot capabilities in classification and retrieval. However, these models often struggle with compositional reasoning - the ability to understand the relationships between concepts. A recent benchmark, SugarCrepe++, reveals that previous works on improving compositionality have mainly improved lexical sensitivity but neglected semantic understanding. In addition, downstream retrieval performance often deteriorates, although one would expect that improving compositionality should enhance retrieval. In this work, we introduce CLIC (Compositionally-aware Learning in CLIP), a fine-tuning method based on a novel training technique combining multiple images and their associated captions. CLIC improves compositionality across architectures as well as differently pre-trained CLIP models, both in terms of lexical and semantic understanding, and achieves consistent gains in retrieval performance. This even applies to the recent CLIPS, which achieves SOTA retrieval performance. Nevertheless, the short fine-tuning with CLIC leads to an improvement in retrieval and to the best compositional CLIP model on SugarCrepe++.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.60)

Add feedback

Compositional Plan Vectors

Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

Neural Information Processing SystemsApr-30-2026, 19:24:19 GMT

Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations - for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.

machine learning, reinforcement learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

b8b93c48f5bfa385d071342089d70422-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsApr-30-2026, 01:20:31 GMT

caption, large language model, machine learning, (22 more...)

Neural Information Processing Systems

Country: Europe (0.93)

Genre:

Overview (0.68)
Research Report > New Finding (0.46)

Industry:

Information Technology (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

6a42b45af2b72e6e5b5e3a6fe695809f-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-28-2026, 11:02:43 GMT

The model can easily distinguish A and B according to the background (i.e., the so-called geometric skews [26]), but not according to the features of the class instance itself. However, if there is another class C, which is also in black background. In this tri-classification task (distinguishing A,B, and C), an ideal model should focus on the feature of the instance itself but not the background. This is one of the difficulties: distribution bias on samples, that some beneficial features (e.g., background) may be good for the classification, but not good for understanding the class (in a compositional way). Another difficulty is entanglement of the labels. We provide the labels in a relative way that the label of A is '0' and of B is '1', but not their true textual meanings (e.g., white paper and green leaves). The concept information is entangled and embedded into the label, thus, it is hard for the model to tell which visual features capture the corresponding concepts (i.e., white refers to the color feature and paper refers to the texture feature). We hope our understanding of this issue can inspire researchers to focus more on compositionality and design excellent continual learners.

artificial intelligence, helmet, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Industry: Education (0.46)

Technology: