AITopics | Memisevic, Roland

Collaborating Authors

Memisevic, Roland

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Hallucination Detection through Noise Injection

Liu, Litian, Pourreza, Reza, Panchal, Sunny, Bhattacharyya, Apratim, Qin, Yao, Memisevic, Roland

arXiv.org Artificial IntelligenceFeb-8-2025

Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has linked hallucinations to model uncertainty, suggesting that hallucinations can be detected by measuring dispersion over answer distributions obtained from a set of samples drawn from a model. While drawing from the distribution over tokens defined by the model is a natural way to obtain samples, in this work, we argue that it is sub-optimal for the purpose of detecting hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. To this end, we propose a very simple and efficient approach that perturbs an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling. We demonstrate its effectiveness across a wide range of datasets and model architectures.

hallucination detection, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.03799

Country:

North America > Mexico (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Media (0.46)

Add feedback

ClevrSkills: Compositional Language and Visual Reasoning in Robotics

Haresh, Sanjay, Dijkman, Daniel, Bhattacharyya, Apratim, Memisevic, Roland

arXiv.org Artificial IntelligenceNov-13-2024

Robotics tasks are highly compositional by nature. For example, to perform a high-level task like cleaning the table a robot must employ low-level capabilities of moving the effectors to the objects on the table, pick them up and then move them off the table one-by-one, while re-evaluating the consequently dynamic scenario in the process. Given that large vision language models (VLMs) have shown progress on many tasks that require high level, human-like reasoning, we ask the question: if the models are taught the requisite low-level capabilities, can they compose them in novel ways to achieve interesting high-level tasks like cleaning the table without having to be explicitly taught so? To this end, we present ClevrSkills - a benchmark suite for compositional reasoning in robotics. ClevrSkills is an environment suite developed on top of the ManiSkill2 simulator and an accompanying dataset. The dataset contains trajectories generated on a range of robotics tasks with language and visual annotations as well as multi-modal prompts as task specification. The suite includes a curriculum of tasks with three levels of compositional understanding, starting with simple tasks requiring basic motor skills. We benchmark multiple different VLM baselines on ClevrSkills and show that even after being pre-trained on large numbers of tasks, these models fail on compositional reasoning in robotics tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.09052

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits

Khisti, Ashish, Ebrahimi, M. Reza, Dbouk, Hassan, Behboodi, Arash, Memisevic, Roland, Louizos, Christos

arXiv.org Artificial IntelligenceOct-23-2024

We consider multi-draft speculative sampling, where the proposal sequences are sampled independently from different draft models. At each step, a token-level draft selection scheme takes a list of valid tokens as input and produces an output token whose distribution matches that of the target model. Previous works have demonstrated that the optimal scheme (which maximizes the probability of accepting one of the input tokens) can be cast as a solution to a linear program. In this work we show that the optimal scheme can be decomposed into a two-step solution: in the first step an importance sampling (IS) type scheme is used to select one intermediate token; in the second step (single-draft) speculative sampling is applied to generate the output token. For the case of two identical draft models we further 1) establish a necessary and sufficient condition on the distributions of the target and draft models for the acceptance probability to equal one and 2) provide an explicit expression for the optimal acceptance probability. Our theoretical analysis also motives a new class of token-level selection scheme based on weighted importance sampling. Our experimental results demonstrate consistent improvements in the achievable block efficiency and token rates over baseline schemes in a number of scenarios. The transformer architecture (Vaswani et al., 2017) has revolutionized the field of natural language processing and deep learning. One of the key factors contributing to the success story of transformers, as opposed to prior recurrent-based architectures (Hochreiter and Schmidhuber, 1997; Chung et al., 2014), is their inherent train-time parallelization due to the attention mechanism. This allows for massive scaling and lead to the development of state-of-the-art Large Language Models (LLMs) (Touvron et al., 2023; Achiam et al., 2023; Brown et al., 2020; Chowdhery et al., 2023) which have demonstrated remarkable performance across a wide range of tasks.

large language model, machine learning, multi-draft speculative sampling, (16 more...)

arXiv.org Artificial Intelligence

2410.18234

Country:

Europe (0.45)
North America (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Look, Remember and Reason: Grounded reasoning in videos with language models

Bhattacharyya, Apratim, Panchal, Sunny, Lee, Mingu, Pourreza, Reza, Madan, Pulkit, Memisevic, Roland

arXiv.org Artificial IntelligenceJan-21-2024

Multi-modal language models (LM) have recently shown promising performance in high-level reasoning tasks on videos. However, existing methods still fall short in tasks like causal or compositional spatiotemporal reasoning over actions, in which model predictions need to be grounded in fine-grained low-level details, such as object motions and object interactions. In this work, we propose training an LM end-to-end on low-level surrogate tasks, including object detection, re-identification, and tracking, to endow the model with the required low-level visual capabilities. We show that a two-stream video encoder with spatiotemporal attention is effective at capturing the required static and motion-based cues in the video. By leveraging the LM's ability to perform the low-level surrogate tasks, we can cast reasoning in videos as the three-step process of Look, Remember, Reason wherein visual information is extracted using low-level visual skills step-by-step and then integrated to arrive at a final answer. We demonstrate the effectiveness of our framework on diverse visual reasoning tasks from the ACRE, CATER, Something-Else and STAR datasets. Our approach is trainable end-to-end and surpasses state-of-the-art task-specific methods across these tasks by a large margin.

large language model, machine learning, surrogate task, (20 more...)

arXiv.org Artificial Intelligence

2306.17778

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving

Ling, Zhan, Fang, Yunhao, Li, Xuanlin, Mu, Tongzhou, Lee, Mingu, Pourreza, Reza, Memisevic, Roland, Su, Hao

arXiv.org Artificial IntelligenceDec-5-2023

Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at https://github.com/lz1oceani/LLM-As-Hierarchical-Policy.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2311.00694

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Deductive Verification of Chain-of-Thought Reasoning

Ling, Zhan, Fang, Yunhao, Li, Xuanlin, Huang, Zhiao, Lee, Mingu, Memisevic, Roland, Su, Hao

arXiv.org Artificial IntelligenceOct-3-2023

Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how humans engage in careful and meticulous deductive logical reasoning processes to solve tasks, we seek to enable language models to perform explicit and rigorous deductive reasoning, and also ensure the trustworthiness of their reasoning process through self-verification. However, directly verifying the validity of an entire deductive reasoning process is challenging, even with advanced models like ChatGPT. In light of this, we propose to decompose a reasoning verification process into a series of step-by-step subprocesses, each only receiving their necessary context and premises. To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps. It also empowers language models to carry out reasoning self-verification in a step-by-step manner. By integrating this verification process into each deductive reasoning stage, we significantly enhance the rigor and trustfulness of generated reasoning steps. Along this process, we also improve the answer correctness on complex reasoning tasks. Code will be released at https://github.com/lz1oceani/verify_cot.

artificial intelligence, chain-of-thought reasoning, natural language, (1 more...)

arXiv.org Artificial Intelligence

2306.03872

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Painter: Teaching Auto-regressive Language Models to Draw Sketches

Pourreza, Reza, Bhattacharyya, Apratim, Panchal, Sunny, Lee, Mingu, Madan, Pulkit, Memisevic, Roland

arXiv.org Artificial IntelligenceAug-16-2023

Large language models (LLMs) have made tremendous progress in natural language understanding and they have also been successfully adopted in other domains such as computer vision, robotics, reinforcement learning, etc. In this work, we apply LLMs to image generation tasks by directly generating the virtual brush strokes to paint an image. We present Painter, an LLM that can convert user prompts in text description format to sketches by generating the corresponding brush strokes in an auto-regressive way. We construct Painter based on off-the-shelf LLM that is pre-trained on a large text corpus, by fine-tuning it on the new task while preserving language understanding capabilities. We create a dataset of diverse multi-object sketches paired with textual prompts that covers several object types and tasks. Painter can generate sketches from text descriptions, remove objects from canvas, and detect and classify objects in sketches. Although this is an unprecedented pioneering work in using LLMs for auto-regressive image generation, the results are very encouraging.

machine learning, natural language, sketch, (18 more...)

arXiv.org Artificial Intelligence

2308.0852

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Is end-to-end learning enough for fitness activity recognition?

Mercier, Antoine, Berger, Guillaume, Panchal, Sunny, Letsch, Florian, Boehm, Cornelius, Kang, Nahua, Bax, Ingo, Memisevic, Roland

arXiv.org Artificial IntelligenceMay-14-2023

End-to-end learning has taken hold of many computer vision tasks, in particular, related to still images, with task-specific optimization yielding very strong performance. Nevertheless, human-centric action recognition is still largely dominated by hand-crafted pipelines, and only individual components are replaced by neural networks that typically operate on individual frames. As a testbed to study the relevance of such pipelines, we present a new fully annotated video dataset of fitness activities. Any recognition capabilities in this domain are almost exclusively a function of human poses and their temporal dynamics, so pose-based solutions should perform well. We show that, with this labelled data, end-to-end learning on raw pixels can compete with state-of-the-art action recognition pipelines based on pose estimation. We also show that end-to-end learning can support temporally fine-grained tasks such as real-time repetition counting.

artificial intelligence, fitness activity recognition

arXiv.org Artificial Intelligence

2305.08191

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Metaphors We Learn By

Memisevic, Roland

arXiv.org Artificial IntelligenceNov-17-2022

Gradient based learning using error back-propagation ("backprop") is a wellknown contributor to much of the recent progress in AI. A less obvious, but arguably equally important, ingredient is parameter sharing - most well-known in the context of convolutional networks. In this essay we relate parameter sharing ("weight sharing") to analogy making and the school of thought of cognitive metaphor. We discuss how recurrent and auto-regressive models can be thought of as extending analogy making from static features to dynamic skills and procedures. We also discuss corollaries of this perspective, for example, how it can challenge the currently entrenched dichotomy between connectionist and "classic" rule-based views of computation. It is well-known that neural networks, regardless whether training is supervised or self-supervised, require large amounts of training data to work well. To ensure generalization, one can maximize the number of training examples, minimize the number of tunable parameters, or do both. Parameter sharing is a common principle to reduce the number of tunable parameters without having to reduce the number of actual parameters (synaptic connections) in the network. In fact, it is hard to find any neural network architecture in the literature, that does not make use of parameter sharing in some way.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2211.06441

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Modeling Deep Temporal Dependencies with Recurrent Grammar Cells""

Michalski, Vincent, Memisevic, Roland, Konda, Kishore

Neural Information Processing SystemsFeb-14-2020, 08:57:59 GMT

We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t 1. To this end we show how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent network, by training it to predict future frames from the current one and the inferred transformation using backprop-through-time. We also show how stacking multiple layers of gating units in a recurrent pyramid makes it possible to represent the "syntax" of complicated time series, and that it can outperform standard recurrent neural networks in terms of prediction accuracy on a variety of tasks. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, modeling deep temporal dependency, neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback