AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

How to Choose a Threshold for an Evaluation Metric for Large Language Models

Sarmah, Bhaskarjit, Li, Mingshu, Lyu, Jingrao, Frank, Sebastian, Castellanos, Nathalia, Pasquali, Stefano, Mehta, Dhagash

arXiv.org Machine LearningDec-10-2024

To ensure and monitor large language models (LLMs) reliably, various evaluation metrics have been proposed in the literature. However, there is little research on prescribing a methodology to identify a robust threshold on these metrics even though there are many serious implications of an incorrect choice of the thresholds during deployment of the LLMs. Translating the traditional model risk management (MRM) guidelines within regulated industries such as the financial industry, we propose a step-by-step recipe for picking a threshold for a given LLM evaluation metric. We emphasize that such a methodology should start with identifying the risks of the LLM application under consideration and risk tolerance of the stakeholders. We then propose concrete and statistically rigorous procedures to determine a threshold for the given LLM evaluation metric using available ground-truth data. As a concrete example to demonstrate the proposed methodology at work, we employ it on the Faithfulness metric, as implemented in various publicly available libraries, using the publicly available HaluBench dataset. We also lay a foundation for creating systematic approaches to select thresholds, not only for LLMs but for any GenAI applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2412.12148

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.70)

Industry:

Banking & Finance (1.00)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Effective Reward Specification in Deep Reinforcement Learning

Roy, Julien

arXiv.org Artificial IntelligenceDec-9-2024

In the last decade, Deep Reinforcement Learning has evolved into a powerful tool for complex sequential decision-making problems. It combines deep learning's proficiency in processing rich input signals with reinforcement learning's adaptability across diverse control tasks. At its core, an RL agent seeks to maximize its cumulative reward, enabling AI algorithms to uncover novel solutions previously unknown to experts. However, this focus on reward maximization also introduces a significant difficulty: improper reward specification can result in unexpected, misaligned agent behavior and inefficient learning. The complexity of accurately specifying the reward function is further amplified by the sequential nature of the task, the sparsity of learning signals, and the multifaceted aspects of the desired behavior. In this thesis, we survey the literature on effective reward specification strategies, identify core challenges relating to each of these approaches, and propose original contributions addressing the issue of sample efficiency and alignment in deep reinforcement learning. Reward specification represents one of the most challenging aspects of applying reinforcement learning in real-world domains. Our work underscores the absence of a universal solution to this complex and nuanced challenge; solving it requires selecting the most appropriate tools for the specific requirements of each unique application.

artificial intelligence, machine learning, survey article, (20 more...)

arXiv.org Artificial Intelligence

2412.07177

Country:

North America > Canada (0.45)
Europe > Switzerland (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > Promising Solution (0.65)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.92)
(2 more...)

Add feedback

Monet: Mixture of Monosemantic Experts for Transformers

Park, Jungwoo, Ahn, Young Jin, Kim, Kee-Eung, Kang, Jaewoo

arXiv.org Artificial IntelligenceDec-9-2024

Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity -- where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these features through sparse dictionary learning, they have compromised LLM performance due to reliance on post-hoc reconstruction loss. To address this issue, we introduce Mixture of Monosemantic Experts for Transformers (Monet) architecture, which incorporates sparse dictionary learning directly into end-to-end Mixture-of-Experts pretraining. Our novel expert decomposition method enables scaling the expert count to 262,144 per layer while total parameters scale proportionally to the square root of the number of experts. Our analyses demonstrate mutual exclusivity of knowledge across experts and showcase the parametric knowledge encapsulated within individual experts. Moreover, Monet allows knowledge manipulation over domains, languages, and toxicity mitigation without degrading general performance. Our pursuit of transparent LLMs highlights the potential of scaling expert counts to enhance mechanistic interpretability and directly resect the internal knowledge to fundamentally adjust model behavior. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Monet.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.04139

Country:

Europe > United Kingdom > England > Staffordshire (0.04)
North America > United States > Florida (0.04)
Oceania > New Zealand (0.04)
(32 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Banking & Finance (1.00)
Government > Regional Government (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale

Ardywibowo, Randy, Sunki, Rakesh, Kuo, Lucy, Nayak, Sankalp

arXiv.org Artificial IntelligenceDec-9-2024

Information Retrieval (IR) systems used in search and recommendation platforms frequently employ Learning-to-Rank (LTR) models to rank items in response to user queries. These models heavily rely on features derived from user interactions, such as clicks and engagement data. This dependence introduces cold start issues for items lacking user engagement and poses challenges in adapting to non-stationary shifts in user behavior over time. We address both challenges holistically as an online learning problem and propose BayesCNS, a Bayesian approach designed to handle cold start and non-stationary distribution shifts in search systems at scale. BayesCNS achieves this by estimating prior distributions for user-item interactions, which are continuously updated with new user interactions gathered online. This online learning procedure is guided by a ranker model, enabling efficient exploration of relevant items using contextual information provided by the ranker. We successfully deployed BayesCNS in a large-scale search system and demonstrated its efficacy through comprehensive offline and online experiments. Notably, an online A/B experiment showed a 10.60% increase in new item interactions and a 1.05% improvement in overall success metrics over the existing production baseline.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.02126

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.95)
(2 more...)

Add feedback

Machine Learning Driven Smishing Detection Framework for Mobile Security

Goel, Diksha, Ahmad, Hussain, Jain, Ankit Kumar, Goel, Nikhil Kumar

arXiv.org Artificial IntelligenceDec-9-2024

The increasing reliance on smartphones for communication, financial transactions, and personal data management has made them prime targets for cyberattacks, particularly smishing, a sophisticated variant of phishing conducted via SMS. Despite the growing threat, traditional detection methods often struggle with the informal and evolving nature of SMS language, which includes abbreviations, slang, and short forms. This paper presents an enhanced content-based smishing detection framework that leverages advanced text normalization techniques to improve detection accuracy. By converting nonstandard text into its standardized form, the proposed model enhances the efficacy of machine learning classifiers, particularly the Naive Bayesian classifier, in distinguishing smishing messages from legitimate ones. Our experimental results, validated on a publicly available dataset, demonstrate a detection accuracy of 96.2%, with a low False Positive Rate of 3.87% and False Negative Rate of 2.85%. This approach significantly outperforms existing methodologies, providing a robust solution to the increasingly sophisticated threat of smishing in the mobile environment.

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.09641

Country:

Asia > India > Rajasthan > Jaipur (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Impact of Privacy Parameters on Deep Learning Models for Image Classification

Chaulagain, Basanta

arXiv.org Artificial IntelligenceDec-9-2024

The project aims to develop differentially private deep learning models for image classification on CIFAR-10 datasets \cite{cifar10} and analyze the impact of various privacy parameters on model accuracy. We have implemented five different deep learning models, namely ConvNet, ResNet18, EfficientNet, ViT, and DenseNet121 and three supervised classifiers namely K-Nearest Neighbors, Naive Bayes Classifier and Support Vector Machine. We evaluated the performance of these models under varying settings. Our best performing model to date is EfficientNet with test accuracy of $59.63\%$ with the following parameters (Adam optimizer, batch size 256, epoch size 100, epsilon value 5.0, learning rate $1e-3$, clipping threshold 1.0, and noise multiplier 0.912).

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.06689

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Add feedback

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

Liang, Chia Xin, Tian, Pu, Yin, Caitlyn Heqi, Yua, Yao, An-Hou, Wei, Ming, Li, Wang, Tianyang, Bi, Ziqian, Liu, Ming

arXiv.org Artificial IntelligenceDec-8-2024

This survey and application guide to multimodal large language models(MLLMs) explores the rapidly developing field of MLLMs, examining their architectures, applications, and impact on AI and Generative Models. Starting with foundational concepts, we delve into how MLLMs integrate various data types, including text, images, video and audio, to enable complex AI systems for cross-modal understanding and generation. It covers essential topics such as training methods, architectural components, and practical applications in various fields, from visual storytelling to enhanced accessibility. Through detailed case studies and technical analysis, the text examines prominent MLLM implementations while addressing key challenges in scalability, robustness, and cross-modal learning. Concluding with a discussion of ethical considerations, responsible AI development, and future directions, this authoritative resource provides both theoretical frameworks and practical insights. It offers a balanced perspective on the opportunities and challenges in the development and deployment of MLLMs, and is highly valuable for researchers, practitioners, and students interested in the intersection of natural language processing and computer vision.

large language model, machine learning, natural language, (25 more...)

arXiv.org Artificial Intelligence

2411.06284

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Indiana (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(5 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Overview (1.00)
(2 more...)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(8 more...)

Add feedback

Depression detection from Social Media Bangla Text Using Recurrent Neural Networks

Ahmed, Sultan, Rakin, Salman, Waliur, Mohammad Washeef Ibn, Islam, Nuzhat Binte, Hossain, Billal, Akbar, Md. Mostofa

arXiv.org Artificial IntelligenceDec-8-2024

Mostofa Akbar Department of CSE Bangladesh University of Engineering & T echnology Dhaka, Bangladesh mostofa@cse.buet.ac.bd Abstract --Emotion artificial intelligence is a field of study that focuses on figuring out how to recognize emotions, especially in the area of text mining. T oday is the age of social media which has opened a door for us to share our individual expressions, emotions, and perspectives on any event. We can analyze sentiment on social media posts to detect positive, negative, or emotional behavior toward society. One of the key challenges in sentiment analysis is to identify depressed text from social media text that is a root cause of mental ill-health. Furthermore, depression leads to severe impairment in day-to-day living and is a major source of suicide incidents. In this paper, we apply natural language processing techniques on Facebook texts for conducting emotion analysis focusing on depression using multiple machine learning algorithms. Preprocessing steps like stemming, stop word removal, etc. are used to clean the collected data, and feature extraction techniques like stylometric feature, TF-IDF, word embedding, etc. are applied to the collected dataset which consists of 983 texts collected from social media posts. In the process of class prediction, LSTM, GRU, support vector machine, and Naive-Bayes classifiers have been used. We have presented the results using the primary classification metrics including F1-score, and accuracy. This work focuses on depression detection from social media posts to help psychologists to analyze sentiment from shared posts which may reduce the undesirable behaviors of depressed individuals through diagnosis and treatment. I NTRODUCTION Text is the most important means of communication in today's world. Popular online social networking sites such as Facebook, Twitter, MySpace, etc. are mainly text-based. The rapid growth of Social Media has created enough opportunities to share information across time and space. Users are now comfortable contributing more to the content of social media websites and posting their own material. The emergence of internet-based media sources has resulted in the availability of substantial user data for the emotional analysis of text and images.

dataset, stylometric feature, vector, (15 more...)

arXiv.org Artificial Intelligence

2412.05861

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.25)
Oceania > Australia (0.04)
North America > United States > Maryland > Baltimore County (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Services (0.93)
Information Technology > Security & Privacy (0.88)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

A Comprehensive Guide to Explainable AI: From Classical Models to LLMs

Hsieh, Weiche, Bi, Ziqian, Jiang, Chuanqi, Liu, Junyu, Peng, Benji, Zhang, Sen, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Chen, Keyu, Feng, Pohsun, Wen, Yizhu, Song, Xinyuan, Wang, Tianyang, Liu, Ming, Yang, Junjie, Li, Ming, Jing, Bowen, Ren, Jintao, Song, Junhao, Tseng, Hong-Ming, Zhang, Yichao, Yan, Lawrence K. Q., Niu, Qian, Chen, Silin, Wang, Yunze, Liang, Chia Xin

arXiv.org Artificial IntelligenceDec-8-2024

Explainable Artificial Intelligence (XAI) addresses the growing need for transparency and interpretability in AI systems, enabling trust and accountability in decision-making processes. This book offers a comprehensive guide to XAI, bridging foundational concepts with advanced methodologies. It explores interpretability in traditional models such as Decision Trees, Linear Regression, and Support Vector Machines, alongside the challenges of explaining deep learning architectures like CNNs, RNNs, and Large Language Models (LLMs), including BERT, GPT, and T5. The book presents practical techniques such as SHAP, LIME, Grad-CAM, counterfactual explanations, and causal inference, supported by Python code examples for real-world applications. Case studies illustrate XAI's role in healthcare, finance, and policymaking, demonstrating its impact on fairness and decision support. The book also covers evaluation metrics for explanation quality, an overview of cutting-edge XAI tools and frameworks, and emerging research directions, such as interpretability in federated learning and ethical AI considerations. Designed for a broad audience, this resource equips readers with the theoretical insights and practical skills needed to master XAI. Hands-on examples and additional resources are available at the companion GitHub repository: https://github.com/Echoslayer/XAI_From_Classical_Models_to_LLMs.

challenge and limitation computational complexity, inspection import partialdependencedisplay 6 7, transformer encoder-decoder architecture, (16 more...)

arXiv.org Artificial Intelligence

2412.008

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(14 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

Shi, Haizhou, Wang, Yibin, Han, Ligong, Zhang, Huan, Wang, Hao

arXiv.org Machine LearningDec-7-2024

Estimating the uncertainty of responses of Large Language Models~(LLMs) remains a critical challenge. While recent Bayesian methods have demonstrated effectiveness in quantifying uncertainty through low-rank weight updates, they typically require complex fine-tuning or post-training procedures. In this paper, we propose Training-Free Bayesianization~(TFB), a novel framework that transforms existing off-the-shelf trained LoRA adapters into Bayesian ones without additional training. TFB systematically searches for the maximally acceptable level of variance in the weight posterior, constrained within a family of low-rank isotropic Gaussian distributions. We theoretically demonstrate that under mild conditions, this search process is equivalent to variational inference for the weights. Through comprehensive experiments, we show that TFB achieves superior uncertainty estimation and generalization compared to existing methods while eliminating the need for complex training procedures. Code will be available at https://github.com/Wang-ML-Lab/bayesian-peft.

large language model, machine learning, tfb, (17 more...)

arXiv.org Machine Learning

2412.05723

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)

Add feedback