AITopics

2408.07583

Genre:

Overview (0.69)
Research Report (0.69)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.73)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Frascaroli, Emanuele, Panariello, Aniello, Buzzega, Pietro, Bonicelli, Lorenzo, Porrello, Angelo, Calderara, Simone

CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning

arXiv.org Artificial IntelligenceAug-14-2024

With the emergence of Transformers and Vision-Language Models (VLMs) such as CLIP, fine-tuning large pre-trained models has recently become a prevalent strategy in Continual Learning. This has led to the development of numerous prompting strategies to adapt transformer-based models without incurring catastrophic forgetting. However, these strategies often compromise the original zero-shot capabilities of the pre-trained CLIP model and struggle to adapt to domains that significantly deviate from the pre-training data. In this work, we propose Continual Generative training for Incremental prompt-Learning, a simple and novel approach to mitigate forgetting while adapting CLIP. Briefly, we employ Variational Autoencoders (VAEs) to learn class-conditioned distributions within the embedding space of the visual encoder. We then exploit these distributions to sample new synthetic visual embeddings and train the corresponding class-specific textual prompts during subsequent tasks. Through extensive experiments on different domains, we show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities, evaluated using a novel metric tailored for CL scenarios. Notably, further analysis reveals that our approach can bridge the gap with joint prompt tuning. The codebase is available at https://github.com/aimagelab/mammoth.

computer vision, continual learning, international conference, (10 more...)

2407.15793

Country:

North America > United States > California (0.04)
Europe > Italy (0.04)

Genre:

Research Report (0.70)
Overview (0.48)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

A Survey of Deep Learning for Group-level Emotion Recognition

Huang, Xiaohua, Xu, Jinke, Zheng, Wenming, Mao, Qirong, Dhall, Abhinav

With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. Unlike individual's emotion, group emotions exhibit diversity and dynamics. Presently, several DL approaches have been proposed to effectively leverage the rich information inherent in group-level image and enhance GER performance significantly. In this survey, we present a comprehensive review of DL techniques applied to GER, proposing a new taxonomy for the field cover all aspects of GER based on DL. The survey overviews datasets, the deep GER pipeline, and performance comparisons of the state-of-the-art methods past decade. Moreover, it summarizes and discuss the fundamental approaches and advanced developments for each aspect. Furthermore, we identify outstanding challenges and suggest potential avenues for the design of robust GER systems. To the best of our knowledge, thus survey represents the first comprehensive review of deep GER methods, serving as a pivotal references for future GER research endeavors.

emotion, emotion recognition, recognition, (14 more...)

2408.15276

Genre:

Overview (1.00)
Research Report > Promising Solution (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Kim, Minje, Skoglund, Jan

Neural Speech and Audio Coding

This paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of speech and audio codecs and discusses the limitations of purely data-driven approaches, which often require inefficiently large architectures to match the performance of model-based methods. The study presents hybrid systems as a viable solution, offering significant improvements to the performance of conventional codecs through meticulously chosen design enhancements. Specifically, it introduces a neural network-based signal enhancer designed to post-process existing codecs' output, along with the autoencoder-based end-to-end models and LPCNet--hybrid systems that combine linear predictive coding (LPC) with neural networks. Furthermore, the paper delves into predictive models operating within custom feature spaces (TF-Codec) or predefined transform domains (MDCTNet) and examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs. Through these investigations, the paper demonstrates the potential of hybrid systems to advance the field of speech and audio coding by bridging the gap between traditional model-based approaches and modern data-driven techniques.

artificial intelligence, latexit sha1, machine learning, (17 more...)

2408.06954

Country: North America > United States > Illinois (0.04)

Genre: Overview (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ghasemi, Majid, Moosavi, Amir Hossein, Sorkhoh, Ibrahim, Agrawal, Anjali, Alzhouri, Fadi, Ebrahimi, Dariush

An Introduction to Reinforcement Learning: Fundamental Concepts and Practical Applications

Reinforcement Learning (RL) is a branch of Artificial Intelligence (AI) which focuses on training agents to make decisions by interacting with their environment to maximize cumulative rewards. An overview of RL is provided in this paper, which discusses its core concepts, methodologies, recent trends, and resources for learning. We provide a detailed explanation of key components of RL such as states, actions, policies, and reward signals so that the reader can build a foundational understanding. The paper also provides examples of various RL algorithms, including model-free and model-based methods. In addition, RL algorithms are introduced and resources for learning and implementing them are provided, such as books, courses, and online communities. This paper demystifies a comprehensive yet simple introduction for beginners by offering a structured and clear pathway for acquiring and implementing real-time techniques.

agent, algorithm, reinforcement, (14 more...)

2408.07712

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(2 more...)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Overview (0.93)
Research Report (0.64)

Industry:

Leisure & Entertainment > Games (1.00)
Education > Educational Setting > Online (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Islam, Mohammed Saidul, Laskar, Md Tahmid Rahman, Parvez, Md Rizwan, Hoque, Enamul, Joty, Shafiq

DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts

Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.

data story, data table, visualization, (16 more...)

2408.05346

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > North Korea (0.14)
Europe > Russia (0.05)
(27 more...)

Genre:

Overview (0.93)
Research Report (0.81)

Industry:

Government > Military (0.93)
Government > Regional Government (0.93)
Information Technology > Security & Privacy (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

Yadav, Prateek, Raffel, Colin, Muqeeth, Mohammed, Caccia, Lucas, Liu, Haokun, Chen, Tianlong, Bansal, Mohit, Choshen, Leshem, Sordoni, Alessandro

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with improved performance or generalization. A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a particular input or application. The promise, effectiveness, and large design space of MoErging has spurred the development of many new methods over the past few years. This rapid pace of development has made it challenging to compare different MoErging methods, which are rarely compared to one another and are often validated in different experimental setups. To remedy such gaps, we present a comprehensive survey of MoErging methods that includes a novel taxonomy for cataloging key design choices and clarifying suitable applications for each method. Apart from surveying MoErging research, we inventory software tools and applications that make use of MoErging. We additionally discuss related fields of study such as model merging, multitask learning, and mixture-of-experts models. Taken as a whole, our survey provides a unified overview of existing MoErging methods and creates a solid foundation for future work in this burgeoning field.

arxiv preprint arxiv, dataset, granularity, (12 more...)

2408.07057

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > Dominican Republic (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Hartmann, Maria, Danoy, Grégoire, Bouvry, Pascal

Heterogeneity: An Open Challenge for Federated On-board Machine Learning

The design of satellite missions is currently undergoing a paradigm shift from the historical approach of individualised monolithic satellites towards distributed mission configurations, consisting of multiple small satellites. With a rapidly growing number of such satellites now deployed in orbit, each collecting large amounts of data, interest in on-board orbital edge computing is rising. Federated Learning is a promising distributed computing approach in this context, allowing multiple satellites to collaborate efficiently in training on-board machine learning models. Though recent works on the use of Federated Learning in orbital edge computing have focused largely on homogeneous satellite constellations, Federated Learning could also be employed to allow heterogeneous satellites to form ad-hoc collaborations, e.g. in the case of communications satellites operated by different providers. Such an application presents additional challenges to the Federated Learning paradigm, arising largely from the heterogeneity of such a system. In this position paper, we offer a systematic review of these challenges in the context of the cross-provider use case, giving a brief overview of the state-of-the-art for each, and providing an entry point for deeper exploration of each issue.

federated learning, learning, satellite, (15 more...)

2408.06903

Country: North America > United States > Florida > Broward County > Fort Lauderdale (0.04)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundational Models

Baradwaj, Simha Sankar, Gilliland, Destiny, Rincon, Jack, Hermjakob, Henning, Yan, Yu, Adam, Irsyad, Lemaster, Gwyneth, Wang, Dean, Watson, Karol, Bui, Alex, Wang, Wei, Ping, Peipei

Foundational Models (FMs) are gaining increasing attention in the biomedical AI ecosystem due to their ability to represent and contextualize multimodal biomedical data. These capabilities make FMs a valuable tool for a variety of tasks, including biomedical reasoning, hypothesis generation, and interpreting complex imaging data. In this review paper, we address the unique challenges associated with establishing an ethical and trustworthy biomedical AI ecosystem, with a particular focus on the development of FMs and their downstream applications. We explore strategies that can be implemented throughout the biomedical AI pipeline to effectively tackle these challenges, ensuring that these FMs are translated responsibly into clinical and translational settings. Additionally, we emphasize the importance of key stewardship and co-design principles that not only ensure robust regulation but also guarantee that the interests of all stakeholders, especially those involved in or affected by these clinical and translational applications are adequately represented. We aim to empower the biomedical AI community to harness these models responsibly and effectively. As we navigate this exciting frontier, our collective commitment to ethical stewardship, co-design, and responsible translation will be instrumental in ensuring that the evolution of FMs truly enhances patient care and medical decision making, ultimately leading to a more equitable and trustworthy biomedical AI ecosystem.

ai system, proceedings, stakeholder, (13 more...)

2408.01431

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany (0.04)
(10 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(8 more...)

Bartolo, Matthias, Seychell, Dylan, Bajada, Josef

Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection

With the ever-growing variety of object detection approaches, this study explores a series of experiments that combine reinforcement learning (RL)-based visual attention methods with saliency ranking techniques to investigate transparent and sustainable solutions. By integrating saliency ranking for initial bounding box prediction and subsequently applying RL techniques to refine these predictions through a finite set of actions over multiple time steps, this study aims to enhance RL object detection accuracy. Presented as a series of experiments, this research investigates the use of various image feature extraction methods and explores diverse Deep Q-Network (DQN) architectural variations for deep reinforcement learning-based localisation agent training. Additionally, we focus on optimising the detection pipeline at every step by prioritising lightweight and faster models, while also incorporating the capability to classify detected objects, a feature absent in previous RL approaches. We show that by evaluating the performance of these trained agents using the Pascal VOC 2007 dataset, faster and more optimised models were developed. Notably, the best mean Average Precision (mAP) achieved in this study was 51.4, surpassing benchmarks set by RL-based single object detectors in the literature.

agent, architecture, detection, (14 more...)

2408.06803

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)