AITopics

2402.02746

Country:

North America > United States > Utah (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Ayyoub, Hani Y., Al-Kadi, Omar S.

Learning Style Identification Using Semi-Supervised Self-Taught Labeling

Education is a dynamic field that must be adaptable to sudden changes and disruptions caused by events like pandemics, war, and natural disasters related to climate change. When these events occur, traditional classrooms with traditional or blended delivery can shift to fully online learning, which requires an efficient learning environment that meets students' needs. While learning management systems support teachers' productivity and creativity, they typically provide the same content to all learners in a course, ignoring their unique learning styles. To address this issue, we propose a semi-supervised machine learning approach that detects students' learning styles using a data mining technique. We use the commonly used Felder Silverman learning style model and demonstrate that our semi-supervised method can produce reliable classification models with few labeled data. We evaluate our approach on two different courses and achieve an accuracy of 88.83% and 77.35%, respectively. Our work shows that educational data mining and semi-supervised machine learning techniques can identify different learning styles and create a personalized learning environment.

dataset, dimension, student, (17 more...)

doi: 10.1109/TLT.2024.3358864

2402.14597

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan > Irbid Governorate > Irbid (0.04)
Europe > United Kingdom (0.04)
(3 more...)

Genre:

Instructional Material > Online (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Overview (0.93)
(2 more...)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Learning with Mixture of Prototypes for Out-of-Distribution Detection

Lu, Haodong, Gong, Dong, Wang, Shuo, Xue, Jason, Yao, Lina, Moore, Kristen

Out-of-distribution (OOD) detection aims to detect testing samples far away from the in-distribution (ID) training data, which is crucial for the safe deployment of machine learning models in the real world. Distance-based OOD detection methods have emerged with enhanced deep representation learning. They identify unseen OOD samples by measuring their distances from ID class centroids or prototypes. However, existing approaches learn the representation relying on oversimplified data assumptions, e.g., modeling ID data of each class with one centroid class prototype or using loss functions not designed for OOD detection, which overlook the natural diversities within the data. Naively enforcing data samples of each class to be compact around only one prototype leads to inadequate modeling of realistic data and limited performance. To tackle these issues, we propose PrototypicAl Learning with a Mixture of prototypes (PALM) which models each class with multiple prototypes to capture the sample diversities, and learns more faithful and compact samples embeddings to enhance OOD detection. Our method automatically identifies and dynamically updates prototypes, assigning each sample to a subset of prototypes via reciprocal neighbor soft assignment weights. To learn embeddings with multiple prototypes, PALM optimizes a maximum likelihood estimation (MLE) loss to encourage the sample embeddings to be compact around the associated prototypes, as well as a contrastive loss on all prototypes to enhance intra-class compactness and inter-class discrimination at the prototype level. Compared to previous methods with prototypes, the proposed mixture prototype modeling of PALM promotes the representations of each ID class to be more compact and separable from others and the unseen OOD samples, resulting in more reliable OOD detection. Moreover, the automatic estimation of prototypes enables our approach to be extended to the challenging OOD detection task with unlabelled ID data. Extensive experiments demonstrate the superiority of PALM over previous methods, achieving state-of-the-art average AUROC performance of 93.82 on the challenging CIFAR-100 benchmark. Code is available at https://github.com/jeff024/PALM. Deep learning (DL) plays a crucial role in many real-world applications, such as autonomous driving (Huang et al., 2020), medical diagnosis (Zimmerer et al., 2022), and cyber-security (Nguyen et al., 2022).

assignment weight, detection, prototype, (13 more...)

2402.02653

Country: Oceania > Australia > New South Wales (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.54)
Health & Medicine > Diagnostic Medicine (0.48)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Variational DAG Estimation via State Augmentation With Stochastic Permutations

Bonilla, Edwin V., Elinas, Pantelis, Zhao, He, Filippone, Maurizio, Kitsios, Vassili, O'Kane, Terry

Estimating the structure of a Bayesian network, in the form of a directed acyclic graph (DAG), from observational data is a statistically and computationally hard problem with essential applications in areas such as causal discovery. Bayesian approaches are a promising direction for solving this task, as they allow for uncertainty quantification and deal with well-known identifiability issues. From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying combinatorial space. We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations. We carry out posterior estimation via variational inference, where we exploit continuous relaxations of discrete distributions. We show that our approach can outperform competitive Bayesian and non-Bayesian benchmarks on a range of synthetic and real datasets.

permutation, state augmentation, variational dag estimation, (11 more...)

2402.02644

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition

Pham, Quang, Do, Giang, Nguyen, Huy, Nguyen, TrungTin, Liu, Chenghao, Sartipi, Mina, Nguyen, Binh T., Ramasamy, Savitha, Li, Xiaoli, Hoi, Steven, Ho, Nhat

Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator. We further propose CompeteSMoE, an effective and efficient algorithm to train large language models by deploying a simple router that predicts the competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains from the competition routing policy while having low computation overheads. Our extensive empirical evaluations on two transformer architectures and a wide range of tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies.

competesmoe, effective training, sparse mixture, (13 more...)

2402.02526

Country:

Asia > Middle East > Jordan (0.04)
Asia > Singapore (0.04)
Oceania > Australia > Queensland (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
(2 more...)

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Pandey, Gaurav, Nandwani, Yatin, Naseem, Tahira, Mishra, Mayank, Xu, Guangxuan, Raghu, Dinesh, Joshi, Sachindra, Munawar, Asim, Astudillo, Ramón Fernandez

Following the success of Proximal Policy Optimization (PPO) for Reinforcement Learning from Human Feedback (RLHF), new techniques such as Sequence Likelihood Calibration (SLiC) and Direct Policy Optimization (DPO) have been proposed that are offline in nature and use rewards in an indirect manner. These techniques, in particular DPO, have recently become the tools of choice for LLM alignment due to their scalability and performance. However, they leave behind important features of the PPO approach. Methods such as SLiC or RRHF make use of the Reward Model (RM) only for ranking/preference, losing fine-grained information and ignoring the parametric form of the RM (eg., Bradley-Terry, Plackett-Luce), while methods such as DPO do not use even a separate reward model. In this work, we propose a novel approach, named BRAIn, that re-introduces the RM as part of a distribution matching approach.BRAIn considers the LLM distribution conditioned on the assumption of output goodness and applies Bayes theorem to derive an intractable posterior distribution where the RM is explicitly represented. BRAIn then distills this posterior into an amortized inference network through self-normalized importance sampling, leading to a scalable offline algorithm that significantly outperforms prior art in summarization and AntropicHH tasks. BRAIn also has interesting connections to PPO and DPO for specific RM choices.

gradient estimate, kl-divergence, objective, (14 more...)

2402.02479

Country: Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
(3 more...)

Supervised Algorithmic Fairness in Distribution Shifts: A Survey

Lin, Yujie, Li, Dong, Zhao, Chen, Wu, Xintao, Tian, Qin, Shao, Minglai

Supervised fairness-aware machine learning under distribution shifts is an emerging field that addresses the challenge of maintaining equitable and unbiased predictions when faced with changes in data distributions from source to target domains. In real-world applications, machine learning models are often trained on a specific dataset but deployed in environments where the data distribution may shift over time due to various factors. This shift can lead to unfair predictions, disproportionately affecting certain groups characterized by sensitive attributes, such as race and gender. In this survey, we provide a summary of various types of distribution shifts and comprehensively investigate existing methods based on these shifts, highlighting six commonly used approaches in the literature. Additionally, this survey lists publicly available datasets and evaluation metrics for empirical studies. We further explore the interconnection with related research fields, discuss the significant challenges, and identify potential directions for future studies.

distribution shift, fairness, target domain, (14 more...)

2402.01327

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York (0.04)
North America > United States > Arkansas (0.04)
(2 more...)

Genre: Overview (1.00)

Industry: Law (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Efficient Causal Graph Discovery Using Large Language Models

Jiralerspong, Thomas, Chen, Xiaoyin, More, Yash, Shah, Vedant, Bengio, Yoshua

We propose a novel framework that leverages LLMs for full causal graph discovery. While previous LLM-based methods have used a pairwise query approach, this requires a quadratic number of queries which quickly becomes impractical for larger causal graphs. In contrast, the proposed framework uses a breadth-first search (BFS) approach which allows it to use only a linear number of queries. We also show that the proposed method can easily incorporate observational data when available, to improve performance. In addition to being more time and data-efficient, the proposed framework achieves state-of-the-art results on real-world causal graphs of varying sizes. The results demonstrate the effectiveness and efficiency of the proposed method in discovering causal relationships, showcasing its potential for broad applicability in causal graph discovery tasks across different domains.

causal graph, causal relationship, graph, (12 more...)

2402.01207

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Tapo, Allahsera Auguste, Traore, Ali, Danioko, Sidy, Tembine, Hamidou

Machine Intelligence in Africa: a survey

arXiv.org Artificial IntelligenceFeb-3-2024

In the last 5 years, the availability of large audio datasets in African countries has opened unlimited opportunities to build machine intelligence (MI) technologies that are closer to the people and speak, learn, understand, and do businesses in local languages, including for those who cannot read and write. Unfortunately, these audio datasets are not fully exploited by current MI tools, leaving several Africans out of MI business opportunities. Additionally, many state-of-the-art MI models are not culture-aware, and the ethics of their adoption indexes are questionable. The lack thereof is a major drawback in many applications in Africa. This paper summarizes recent developments in machine intelligence in Africa from a multi-layer multiscale and culture-aware ethics perspective, showcasing MI use cases in 54 African countries through 400 articles on MI research, industry, government actions, as well as uses in art, music, the informal economy, and small businesses in Africa. The survey also opens discussions on the reliability of MI rankings and indexes in the African continent as well as algorithmic definitions of unclear terms used in MI.

data mining, natural language, pattern recognition, (25 more...)

2402.02218

Country:

Africa > Nigeria (1.00)
Africa > Democratic Republic of the Congo (0.92)
Africa > Cameroon (0.67)
(45 more...)

Genre:

Summary/Review (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(5 more...)

Industry:

Water & Waste Management > Water Management > Water Supplies & Services (1.00)
Transportation > Ground > Road (1.00)
Telecommunications (1.00)
(47 more...)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
(20 more...)

arXiv.org Machine LearningFeb-3-2024

A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding

Chen, Xinyuan, Hu, Liangyuan, Li, Fan

In longitudinal observational studies with a time-to-event outcome, a common objective in causal analysis is to estimate the causal survival curve under hypothetical intervention scenarios within the study cohort. The g-formula is a particularly useful tool for this analysis. To enhance the traditional parametric g-formula approach, we developed a more adaptable Bayesian g-formula estimator. This estimator facilitates both longitudinal predictive and causal inference. It incorporates Bayesian additive regression trees in the modeling of the time-evolving generative components, aiming to mitigate bias due to model misspecification. Specifically, we introduce a more general class of g-formulas for discrete survival data. These formulas can incorporate the longitudinal balancing scores, which serve as an effective method for dimension reduction and are vital when dealing with an expanding array of time-varying confounders. The minimum sufficient formulation of these longitudinal balancing scores is linked to the nature of treatment regimes, whether static or dynamic. For each type of treatment regime, we provide posterior sampling algorithms, which are grounded in the Bayesian additive regression trees framework. We have conducted simulation studies to illustrate the empirical performance of our proposed Bayesian g-formula estimators, and to compare them with existing parametric estimators. We further demonstrate the practical utility of our methods in real-world scenarios using data from the Yale New Haven Health System's electronic health records.

confounder, longitudinal, treatment regime, (14 more...)

arXiv.org Machine Learning

2402.02306

Country:

North America > United States > North Carolina > Vance County > Henderson (0.04)
North America > United States > Mississippi (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Health Care Providers & Services (0.86)
Health & Medicine > Health Care Technology > Medical Record (0.54)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)