AITopics | Sebastian, Abu

Collaborating Authors

Sebastian, Abu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?

Camposampiero, Giacomo, Hersche, Michael, Wattenhofer, Roger, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceMar-14-2025

This work presents a first evaluation of two state-of-the-art Large Reasoning Models (LRMs), OpenAI's o3-mini and DeepSeek R1, on analogical reasoning, focusing on well-established nonverbal human IQ tests based on Raven's progressive matrices. We benchmark with the I-RAVEN dataset and its more difficult extension, I-RAVEN-X, which tests the ability to generalize to longer reasoning rules and ranges of the attribute values. To assess the influence of visual uncertainties on these nonverbal analogical reasoning tests, we extend the I-RAVEN-X dataset, which otherwise assumes an oracle perception. We adopt a two-fold strategy to simulate this imperfect visual perception: 1) we introduce confounding attributes which, being sampled at random, do not contribute to the prediction of the correct answer of the puzzles and 2) smoothen the distributions of the input attributes' values. We observe a sharp decline in OpenAI's o3-mini task accuracy, dropping from 86.6% on the original I-RAVEN to just 17.0% -- approaching random chance -- on the more challenging I-RAVEN-X, which increases input length and range and emulates perceptual uncertainty. This drop occurred despite spending 3.4x more reasoning tokens. A similar trend is also observed for DeepSeek R1: from 80.6% to 23.2%. On the other hand, a neuro-symbolic probabilistic abductive model, ARLC, that achieves state-of-the-art performances on I-RAVEN, can robustly reason under all these out-of-distribution tests, maintaining strong accuracy with only a modest reduction from 98.6% to 88.0%. Our code is available at https://github.com/IBM/raven-large-language-models.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.11207

Country:

North America > United States (0.46)
North America > Canada (0.28)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages

Terzić, Aleksandar, Hersche, Michael, Camposampiero, Giacomo, Hofmann, Thomas, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceDec-26-2024

Selective state-space models (SSMs) are an emerging alternative to the Transformer, offering the unique advantage of parallel training and sequential inference. Although these models have shown promising performance on a variety of tasks, their formal expressiveness and length generalization properties remain underexplored. In this work, we provide insight into the workings of selective SSMs by analyzing their expressiveness and length generalization performance on regular language tasks, i.e., finite-state automaton (FSA) emulation. We address certain limitations of modern SSM-based architectures by introducing the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization on a set of various regular language tasks using a single layer. It utilizes a dictionary of dense transition matrices, a softmax selection mechanism that creates a convex combination of dictionary matrices at each time step, and a readout consisting of layer normalization followed by a linear map. We then proceed to evaluate variants of diagonal selective SSMs by considering their empirical performance on commutative and non-commutative automata. We explain the experimental results with theoretical considerations. Our code is available at https://github.com/IBM/selective-dense-state-space-model.

machine learning, natural language, selective ssm, (18 more...)

arXiv.org Artificial Intelligence

2412.1935

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning

Hersche, Michael, Camposampiero, Giacomo, Wattenhofer, Roger, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceDec-7-2024

This work compares large language models (LLMs) and neuro-symbolic approaches in solving Raven's progressive matrices (RPM), a visual abstract reasoning test that involves the understanding of mathematical rules such as progression or arithmetic addition. Providing the visual attributes directly as textual prompts, which assumes an oracle visual perception module, allows us to measure the model's abstract reasoning capability in isolation. Despite providing such compositionally structured representations from the oracle visual perception and advanced prompting techniques, both GPT-4 and Llama-3 70B cannot achieve perfect accuracy on the center constellation of the I-RAVEN dataset. Our analysis reveals that the root cause lies in the LLM's weakness in understanding and executing arithmetic rules. As a potential remedy, we analyze the Abductive Rule Learner with Context-awareness (ARLC), a neuro-symbolic approach that learns to reason with vector-symbolic architectures (VSAs). Here, concepts are represented with distributed vectors s.t. dot products between encoded vectors define a similarity kernel, and simple element-wise operations on the vectors perform addition/subtraction on the encoded values. We find that ARLC achieves almost perfect accuracy on the center constellation of I-RAVEN, demonstrating a high fidelity in arithmetic rules. To stress the length generalization capabilities of the models, we extend the RPM tests to larger matrices (3x10 instead of typical 3x3) and larger dynamic ranges of the attribute values (from 10 up to 1000). We find that the LLM's accuracy of solving arithmetic rules drops to sub-10%, especially as the dynamic range expands, while ARLC can maintain a high accuracy due to emulating symbolic computations on top of properly distributed representations. Our code is available at https://github.com/IBM/raven-large-language-models.

accuracy, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.05586

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

On the Role of Noise in Factorizers for Disentangling Distributed Representations

Karunaratne, Geethan, Hersche, Michael, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceNov-29-2024

To efficiently factorize high-dimensional distributed representations to the constituent atomic vectors, one can exploit the compute-in-superposition capabilities of vector-symbolic architectures (VSA). Such factorizers however suffer from the phenomenon of limit cycles. Applying noise during the iterative decoding is one mechanism to address this issue. In this paper, we explore ways to further relax the noise requirement by applying noise only at the time of VSA's reconstruction codebook initialization. While the need for noise during iterations proves analog in-memory computing systems to be a natural choice as an implementation media, the adequacy of initialization noise allows digital hardware to remain equally indispensable. This broadens the implementation possibilities of factorizers. Our study finds that while the best performance shifts from initialization noise to iterative noise as the number of factors increases from 2 to 4, both extend the operational capacity by at least 50 times compared to the baseline factorizer resonator networks. Our code is available at: https://github.com/IBM/in-memory-factorizer

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.00354

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.46)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.30)

Add feedback

Kernel Approximation using Analog In-Memory Computing

Büchel, Julian, Camposampiero, Giacomo, Vasilopoulos, Athanasios, Lammie, Corey, Gallo, Manuel Le, Rahimi, Abbas, Sebastian, Abu

arXiv.org Artificial IntelligenceNov-5-2024

Kernel functions are vital ingredients of several machine learning algorithms, but often incur significant memory and computational costs. We introduce an approach to kernel approximation in machine learning algorithms suitable for mixed-signal Analog In-Memory Computing (AIMC) architectures. Analog In-Memory Kernel Approximation addresses the performance bottlenecks of conventional kernel-based methods by executing most operations in approximate kernel methods directly in memory. The IBM HERMES Project Chip, a state-of-the-art phase-change memory based AIMC chip, is utilized for the hardware demonstration of kernel approximation. Experimental results show that our method maintains high accuracy, with less than a 1% drop in kernel-based ridge classification benchmarks and within 1% accuracy on the Long Range Arena benchmark for kernelized attention in Transformer neural networks. Compared to traditional digital accelerators, our approach is estimated to deliver superior energy efficiency and lower power consumption. These findings highlight the potential of heterogeneous AIMC architectures to enhance the efficiency and scalability of machine learning applications.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.03375

Country:

North America > United States (0.68)
Europe (0.46)
Asia (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization

Rashiti, Gentiana, Karunaratne, Geethan, Sachan, Mrinmaya, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceSep-12-2024

The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.00004

Country:

Asia (1.00)
North America > United States (0.28)
North America > Mexico (0.28)

Genre: Research Report (0.63)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Towards Learning Abductive Reasoning using VSA Distributed Representations

Camposampiero, Giacomo, Hersche, Michael, Terzić, Aleksandar, Wattenhofer, Roger, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceJun-27-2024

We introduce the Abductive Rule Learner with Context-awareness (ARLC), a model that solves abstract reasoning tasks based on Learn-VRF. ARLC features a novel and more broadly applicable training objective for abductive reasoning, resulting in better interpretability and higher accuracy when solving Raven's progressive matrices (RPM). ARLC allows both programming domain knowledge and learning the rules underlying a data distribution. We evaluate ARLC on the I-RAVEN dataset, showcasing state-of-the-art accuracy across both in-distribution and out-of-distribution (unseen attribute-rule pairs) tests. ARLC surpasses neuro-symbolic and connectionist baselines, including large language models, despite having orders of magnitude fewer parameters. We show ARLC's robustness to post-programming training by incrementally learning from examples on top of programmed knowledge, which only improves its performance and does not result in catastrophic forgetting of the programmed solution. We validate ARLC's seamless transfer learning from a 2x2 RPM constellation to unseen constellations. Our code is available at https://github.com/IBM/abductive-rule-learner-with-context-awareness.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.19121

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Training of Physical Neural Networks

Momeni, Ali, Rahmani, Babak, Scellier, Benjamin, Wright, Logan G., McMahon, Peter L., Wanjura, Clara C., Li, Yuhang, Skalli, Anas, Berloff, Natalia G., Onodera, Tatsuhiro, Oguz, Ilker, Morichetti, Francesco, del Hougne, Philipp, Gallo, Manuel Le, Sebastian, Abu, Mirhoseini, Azalia, Zhang, Cheng, Marković, Danijela, Brunner, Daniel, Moser, Christophe, Gigan, Sylvain, Marquardt, Florian, Ozcan, Aydogan, Grollier, Julie, Liu, Andrea J., Psaltis, Demetri, Alù, Andrea, Fleury, Romain

arXiv.org Artificial IntelligenceJun-5-2024

Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2406.03372

Country:

Europe > France (0.68)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
(3 more...)

Genre: Research Report (0.64)

Industry:

Education (0.68)
Energy > Oil & Gas (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit for Analog In-Memory Computing

Ferro, Elena, Vasilopoulos, Athanasios, Lammie, Corey, Gallo, Manuel Le, Benini, Luca, Boybat, Irem, Sebastian, Abu

arXiv.org Artificial IntelligenceFeb-12-2024

Analog In-Memory Computing (AIMC) is an emerging technology for fast and energy-efficient Deep Learning (DL) inference. However, a certain amount of digital post-processing is required to deal with circuit mismatches and non-idealities associated with the memory devices. Efficient near-memory digital logic is critical to retain the high area/energy efficiency and low latency of AIMC. Existing systems adopt Floating Point 16 (FP16) arithmetic with limited parallelization capability and high latency. To overcome these limitations, we propose a Near-Memory digital Processing Unit (NMPU) based on fixed-point arithmetic. It achieves competitive accuracy and higher computing throughput than previous approaches while minimizing the area overhead. Moreover, the NMPU supports standard DL activation steps, such as ReLU and Batch Normalization. We perform a physical implementation of the NMPU design in a 14 nm CMOS technology and provide detailed performance, power, and area assessments. We validate the efficacy of the NMPU by using data from an AIMC chip and demonstrate that a simulated AIMC system with the proposed NMPU outperforms existing FP16-based implementations, providing 139$\times$ speed-up, 7.8$\times$ smaller area, and a competitive power consumption. Additionally, our approach achieves an inference accuracy of 86.65 %/65.06 %, with an accuracy drop of just 0.12 %/0.4 % compared to the FP16 baseline when benchmarked with ResNet9/ResNet32 networks trained on the CIFAR10/CIFAR100 datasets, respectively.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.07549

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Zero-shot Classification using Hyperdimensional Computing

Ruffino, Samuele, Karunaratne, Geethan, Hersche, Michael, Benini, Luca, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceJan-30-2024

Classification based on Zero-shot Learning (ZSL) is the ability of a model to classify inputs into novel classes on which the model has not previously seen any training examples. Providing an auxiliary descriptor in the form of a set of attributes describing the new classes involved in the ZSL-based classification is one of the favored approaches to solving this challenging task. In this work, inspired by Hyperdimensional Computing (HDC), we propose the use of stationary binary codebooks of symbol-like distributed representations inside an attribute encoder to compactly represent a computationally simple end-to-end trainable model, which we name Hyperdimensional Computing Zero-shot Classifier~(HDC-ZSC). It consists of a trainable image encoder, an attribute encoder based on HDC, and a similarity kernel. We show that HDC-ZSC can be used to first perform zero-shot attribute extraction tasks and, can later be repurposed for Zero-shot Classification tasks with minimal architectural changes and minimal model retraining. HDC-ZSC achieves Pareto optimal results with a 63.8% top-1 classification accuracy on the CUB-200 dataset by having only 26.6 million trainable parameters. Compared to two other state-of-the-art non-generative approaches, HDC-ZSC achieves 4.3% and 9.9% better accuracy, while they require more than 1.85x and 1.72x parameters compared to HDC-ZSC, respectively.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.16876

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback