AITopics | Fernando, Heshan

Collaborating Authors

Fernando, Heshan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Fernando, Heshan, Shen, Han, Ram, Parikshit, Zhou, Yi, Samulowitz, Horst, Baracaldo, Nathalie, Chen, Tianyi

arXiv.org Machine LearningOct-28-2024

Post-training of pre-trained LLMs, which typically consists of the supervised finetuning (SFT) stage and the preference learning (RLHF or DPO) stage, is crucial to effective and safe LLM applications. The widely adopted approach in posttraining popular open-source LLMs is to sequentially perform SFT and RLHF/DPO. However, sequential training is sub-optimal in terms of SFT and RLHF/DPO tradeoff: the LLM gradually forgets about the first stage's training when undergoing the second stage's training. We theoretically prove the sub-optimality of sequential post-training. Furthermore, we propose a practical joint post-training framework with theoretical convergence guarantees and empirically outperforms sequential post-training framework, while having similar computational cost. Recent years have witnessed the great capabilities of large language models (LLMs) trained on a large corpus of datasets (OpenAI, 2022; Dubey et al., 2024; Abdin et al., 2024). These models have been applied to a wide range of tasks including virtual assistant (OpenAI, 2022), code development (Roziere et al., 2023), and education/research (Achiam et al., 2023). Typically LLMs undergo the pre-training phase and the post-training phase.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2410.15483

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback

Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Chen, Lisha, Fernando, Heshan, Ying, Yiming, Chen, Tianyi

arXiv.org Artificial IntelligenceOct-5-2023

Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2305.20057

Country:

Europe (1.00)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the Stability Analysis of Open Federated Learning Systems

Sun, Youbang, Fernando, Heshan, Chen, Tianyi, Shahrampour, Shahin

arXiv.org Artificial IntelligenceMar-12-2023

-- We consider the open federated learning (FL) systems, where clients may join and/or leave the system during the FL process. Given the variability of the number of present clients, convergence to a fixed model cannot be guaranteed in open systems. Instead, we resort to a new performance metric that we term the stability of open FL systems, which quantifies the magnitude of the learned model in open systems. Under the assumption that local clients' functions are strongly convex and smooth, we theoretically quantify the radius of stability for two FL algorithms, namely local SGD and local Adam. We observe that this radius relies on several key parameters, including the function condition number as well as the variance of the stochastic gradient. Our theoretical results are further verified by numerical simulations on synthetic data. Federated learning (FL) [1] is a machine learning setup where a group of clients work cooperatively to learn a statistical model. The learning process is coordinated by a central server which facilitates the exchange of model updates. FL algorithms enjoy the benefits of model sharing among clients while preserving data privacy, and they also reduce the number of communications without making too much sacrifice on the performance [2]. In a canonical FL algorithm, the central server broadcasts the initial model to all clients, and then, each client performs several steps of local updates before sending the model to the server.

artificial intelligence, machine learning, stability, (17 more...)

arXiv.org Artificial Intelligence

2209.12307

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback