Overview
An Overview on Machine Learning Methods for Partial Differential Equations: from Physics Informed Neural Networks to Deep Operator Learning
Gonon, Lukas, Jentzen, Arnulf, Kuckuck, Benno, Liang, Siyu, Riekert, Adrian, von Wurstemberger, Philippe
The approximation of solutions of partial differential equations (PDEs) with numerical algorithms is a central topic in applied mathematics. For many decades, various types of methods for this purpose have been developed and extensively studied. One class of methods which has received a lot of attention in recent years are machine learning-based methods, which typically involve the training of artificial neural networks (ANNs) by means of stochastic gradient descent type optimization methods. While approximation methods for PDEs using ANNs have first been proposed in the 1990s they have only gained wide popularity in the last decade with the rise of deep learning. This article aims to provide an introduction to some of these methods and the mathematical theory on which they are based. We discuss methods such as physics-informed neural networks (PINNs) and deep BSDE methods and consider several operator learning approaches.
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities
Parthasarathy, Venkatesh Balavadhani, Zafar, Ahtsham, Khan, Aafaq, Shahid, Arsalan
This report examines the fine-tuning of Large Language Models (LLMs), integrating theoretical insights with practical applications. It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to their pivotal role in AI. A comparison of fine-tuning methodologies, including supervised, unsupervised, and instruction-based approaches, highlights their applicability to different tasks. The report introduces a structured seven-stage pipeline for fine-tuning LLMs, spanning data preparation, model initialization, hyperparameter tuning, and model deployment. Emphasis is placed on managing imbalanced datasets and optimization techniques. Parameter-efficient methods like Low-Rank Adaptation (LoRA) and Half Fine-Tuning are explored for balancing computational efficiency with performance. Advanced techniques such as memory fine-tuning, Mixture of Experts (MoE), and Mixture of Agents (MoA) are discussed for leveraging specialized networks and multi-agent collaboration. The report also examines novel approaches like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), which align LLMs with human preferences, alongside pruning and routing optimizations to improve efficiency. Further sections cover validation frameworks, post-deployment monitoring, and inference optimization, with attention to deploying LLMs on distributed and cloud-based platforms. Emerging areas such as multimodal LLMs, fine-tuning for audio and speech, and challenges related to scalability, privacy, and accountability are also addressed. This report offers actionable insights for researchers and practitioners navigating LLM fine-tuning in an evolving landscape.
MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark
Zhou, Minxuan, Liang, Hao, Li, Tianpeng, Wu, Zhiyu, Lin, Mingan, Sun, Linzhuang, Zhou, Yaqi, Zhang, Yan, Huang, Xiaoqin, Chen, Yicong, Qiao, Yujing, Chen, Weipeng, Cui, Bin, Zhang, Wentao, Zhou, Zenan
With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchmarks have not sufficiently integrated visual and textual information. To address this gap, we proposed MathScape, a new benchmark that emphasizes the understanding and application of combined visual and textual information. MathScape is designed to evaluate photo-based math problem scenarios, assessing the theoretical understanding and application ability of MLLMs through a categorical hierarchical approach. We conduct a multi-dimensional evaluation on 11 advanced MLLMs, revealing that our benchmark is challenging even for the most sophisticated models. By analyzing the evaluation results, we identify the limitations of MLLMs, offering valuable insights for enhancing model performance.
Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach
Peralez, Johan, Delage, Aurélien, Castellini, Jacopo, Cunha, Rafael F., Dibangoye, Jilles S.
Centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to epsilon-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue. This paper presents a novel and more scalable alternative, namely sequential-move centralized training for decentralized execution. This paradigm further pushes the applicability of Bellman's principle of optimality, raising three new properties. First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones. Next, it proves that epsilon-optimal value functions are piecewise linear and convex in sufficient sequential-move statistics. Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons. Besides, it makes it easy to use single-agent methods, e.g., SARSA algorithm enhanced with these findings applies while still preserving convergence guarantees. Experiments on two- as well as many-agent domains from the literature against epsilon-optimal simultaneous-move solvers confirm the superiority of the novel approach. This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.
Context-Aware Temporal Embedding of Objects in Video Data
Farhan, Ahnaf, Hossain, M. Shahriar
In video analysis, understanding the temporal context is crucial for recognizing object interactions, event patterns, and contextual changes over time. The proposed model leverages adjacency and semantic similarities between objects from neighboring video frames to construct context-aware temporal object embeddings. Unlike traditional methods that rely solely on visual appearance, our temporal embedding model considers the contextual relationships between objects, creating a meaningful embedding space where temporally connected object's vectors are positioned in proximity. Empirical studies demonstrate that our context-aware temporal embeddings can be used in conjunction with conventional visual embeddings to enhance the effectiveness of downstream applications. Moreover, the embeddings can be used to narrate a video using a Large Language Model (LLM). This paper describes the intricate details of the proposed objective function to generate context-aware temporal object embeddings for video data and showcases the potential applications of the generated embeddings in video analysis and object classification tasks.
Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification
Murindanyi, Sudi, Nakatumba-Nabende, Joyce, Sanya, Rahman, Nakibuule, Rose, Katumba, Andrew
The increasing popularity of Artificial Intelligence in recent years has led to a surge in interest in image classification, especially in the agricultural sector. With the help of Computer Vision, Machine Learning, and Deep Learning, the sector has undergone a significant transformation, leading to the development of new techniques for crop classification in the field. Despite the extensive research on various image classification techniques, most have limitations such as low accuracy, limited use of data, and a lack of reporting model size and prediction. The most significant limitation of all is the need for model explainability. This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet such as EfficientNetV2, ResNet152V2, Xception, Inception-ResNetV2, MobileNetV3; and cutting-edge foundation models like YOLOv8 and DINOv2, a self-supervised Vision Transformer Model. All models performed well, but Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds. A key aspect of this research was the application of Explainable AI to provide the explainability of all the models. This journal presents the explainability of Xception model with LIME, SHAP, and GradCAM, ensuring transparency and trustworthiness in the models' predictions. This study highlights the importance of selecting the right model according to task-specific needs. It also underscores the important role of explainability in deploying AI in agriculture, providing insightful information to help enhance AI-driven crop management strategies.
Integrated Hardware and Software Architecture for Industrial AGV with Manual Override Capability
Iob, Pietro, Schiavo, Mauro, Cenedese, Angelo
This paper presents a study on transforming a traditional human-operated vehicle into a fully autonomous device. By leveraging previous research and state-of-the-art technologies, the study addresses autonomy, safety, and operational efficiency in industrial environments. Motivated by the demand for automation in hazardous and complex industries, the autonomous system integrates sensors, actuators, advanced control algorithms, and communication systems to enhance safety, streamline processes, and improve productivity. The paper covers system requirements, hardware architecture, software framework and preliminary results. This research offers insights into designing and implementing autonomous capabilities in human-operated vehicles, with implications for improving safety and efficiency in various industrial sectors.
Self-supervised Learning for Geospatial AI: A Survey
Chen, Yile, Huang, Weiming, Zhao, Kaiqi, Jiang, Yue, Cong, Gao
The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. This requirement aligns with the principles of self-supervised learning (SSL), which has attracted increasing attention for its adoption in geospatial data. This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data: points, polylines, and polygons. We systematically categorize various SSL techniques into predictive and contrastive methods, discussing their application with respect to each data type in enhancing generalization across various downstream tasks. Furthermore, we review the emerging trends of SSL for GeoAI, and several task-specific SSL techniques. Finally, we discuss several key challenges in the current research and outline promising directions for future investigation. By presenting a structured analysis of relevant studies, this paper aims to inspire continued advancements in the integration of SSL with GeoAI, encouraging innovative methods to harnessing the power of geospatial data.
When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design
Moitra, Abhishek, Bhattacharjee, Abhiroop, Li, Yuhang, Kim, Youngeun, Panda, Priyadarshini
This review explores the intersection of bio-plausible artificial intelligence in the form of Spiking Neural Networks (SNNs) with the analog In-Memory Computing (IMC) domain, highlighting their collective potential for low-power edge computing environments. Through detailed investigation at the device, circuit, and system levels, we highlight the pivotal synergies between SNNs and IMC architectures. Additionally, we emphasize the critical need for comprehensive system-level analyses, considering the inter-dependencies between algorithms, devices, circuit & system parameters, crucial for optimal performance. An in-depth analysis leads to identification of key system-level bottlenecks arising from device limitations which can be addressed using SNN-specific algorithm-hardware co-design techniques. This review underscores the imperative for holistic device to system design space co-exploration, highlighting the critical aspects of hardware and algorithm research endeavors for low-power neuromorphic solutions.
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Tang, Kunsheng, Zhou, Wenbo, Zhang, Jie, Liu, Aishan, Deng, Gelei, Li, Shuai, Qi, Peigui, Zhang, Weiming, Zhang, Tianwei, Yu, Nenghai
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at https://github.com/kstanghere/GenderCARE-ccs24.