Overview
Clinical Insights: A Comprehensive Review of Language Models in Medicine
Neveditsin, Nikita, Lingras, Pawan, Mago, Vijay
This paper provides a detailed examination of the advancements and applications of large language models in the healthcare sector, with a particular emphasis on clinical applications. The study traces the evolution of LLMs from their foundational technologies to the latest developments in domain-specific models and multimodal integration. It explores the technical progression from encoder-based models requiring fine-tuning to sophisticated approaches that integrate textual, visual, and auditory data, thereby facilitating comprehensive AI solutions in healthcare. The paper discusses both the opportunities these technologies present for enhancing clinical efficiency and the challenges they pose in terms of ethics, data privacy, and implementation. Additionally, it critically evaluates the deployment strategies of LLMs, emphasizing the necessity of open-source models to ensure data privacy and adaptability within healthcare environments. Future research directions are proposed, focusing on empirical studies to evaluate the real-world efficacy of LLMs in healthcare and the development of open datasets for further research. This review aims to provide a comprehensive resource for both newcomers and multidisciplinary researchers interested in the intersection of AI and healthcare.
Advancing Machine Learning in Industry 4.0: Benchmark Framework for Rare-event Prediction in Chemical Processes
Sudarshan, Vikram, Seider, Warren D.
Previously, using forward-flux sampling (FFS) and machine learning (ML), we developed multivariate alarm systems to counter rare un-postulated abnormal events. Our alarm systems utilized ML-based predictive models to quantify committer probabilities as functions of key process variables (e.g., temperature, concentrations, and the like), with these data obtained in FFS simulations. Herein, we introduce a novel and comprehensive benchmark framework for rare-event prediction, comparing ML algorithms of varying complexity, including Linear Support-Vector Regressor and k-Nearest Neighbors, to more sophisticated algorithms, such as Random Forests, XGBoost, LightGBM, CatBoost, Dense Neural Networks, and TabNet. This evaluation uses comprehensive performance metrics, such as: $\textit{RMSE}$, model training, testing, hyperparameter tuning and deployment times, and number and efficiency of alarms. These balance model accuracy, computational efficiency, and alarm-system efficiency, identifying optimal ML strategies for predicting abnormal rare events, enabling operators to obtain safer and more reliable plant operations.
Formal Verification and Control with Conformal Prediction
Lindemann, Lars, Zhao, Yiqi, Yu, Xinyi, Pappas, George J., Deshmukh, Jyotirmoy V.
In this survey, we design formal verification and control algorithms for autonomous systems with practical safety guarantees using conformal prediction (CP), a statistical tool for uncertainty quantification. We focus on learning-enabled autonomous systems (LEASs) in which the complexity of learning-enabled components (LECs) is a major bottleneck that hampers the use of existing model-based verification and design techniques. Instead, we advocate for the use of CP, and we will demonstrate its use in formal verification, systems and control theory, and robotics. We argue that CP is specifically useful due to its simplicity (easy to understand, use, and modify), generality (requires no assumptions on learned models and data distributions, i.e., is distribution-free), and efficiency (real-time capable and accurate). We pursue the following goals with this survey. First, we provide an accessible introduction to CP for non-experts who are interested in using CP to solve problems in autonomy. Second, we show how to use CP for the verification of LECs, e.g., for verifying input-output properties of neural networks. Third and fourth, we review recent articles that use CP for safe control design as well as offline and online verification of LEASs. We summarize their ideas in a unifying framework that can deal with the complexity of LEASs in a computationally efficient manner. In our exposition, we consider simple system specifications, e.g., robot navigation tasks, as well as complex specifications formulated in temporal logic formalisms. Throughout our survey, we compare to other statistical techniques (e.g., scenario optimization, PAC-Bayes theory, etc.) and how these techniques have been used in verification and control. Lastly, we point the reader to open problems and future research directions.
Harnessing the Potential of Omnidirectional Multi-Rotor Aerial Vehicles in Cooperative Jamming Against Eavesdropping
Licea, Daniel Bonilla, Hammouti, Hajar El, Silano, Giuseppe, Saska, Martin
Recent research in communications-aware robotics has been propelled by advancements in 5G and emerging 6G technologies. This field now includes the integration of Multi-Rotor Aerial Vehicles (MRAVs) into cellular networks, with a specific focus on under-actuated MRAVs. These vehicles face challenges in independently controlling position and orientation due to their limited control inputs, which adversely affects communication metrics such as Signal-to-Noise Ratio. In response, a newer class of omnidirectional MRAVs has been developed, which can control both position and orientation simultaneously by tilting their propellers. However, exploiting this capability fully requires sophisticated motion planning techniques. This paper presents a novel application of omnidirectional MRAVs designed to enhance communication security and thwart eavesdropping. It proposes a strategy where one MRAV functions as an aerial Base Station, while another acts as a friendly jammer to secure communications. This study is the first to apply such a strategy to MRAVs in scenarios involving eavesdroppers.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Lu, Chris, Lu, Cong, Lange, Robert Tjarko, Foerster, Jakob, Clune, Jeff, Ha, David
One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models to perform research independently and communicate their findings. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community. We demonstrate its versatility by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a cost of less than $15 per paper. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems. Our code is open-sourced at https://github.com/SakanaAI/AI-Scientist
Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness
Large language models (LLMs), such as ChatGPT, have rapidly penetrated into people's work and daily lives over the past few years, due to their extraordinary conversational skills and intelligence. ChatGPT has become the fastest-growing software in terms of user numbers in human history and become an important foundational model for the next generation of artificial intelligence applications. However, the generations of LLMs are not entirely reliable, often producing content with factual errors, biases, and toxicity. Given their vast number of users and wide range of application scenarios, these unreliable responses can lead to many serious negative impacts. This thesis introduces the exploratory works in the field of language model reliability during the PhD study, focusing on the correctness, non-toxicity, and fairness of LLMs from both software testing and natural language processing perspectives. First, to measure the correctness of LLMs, we introduce two testing frameworks, FactChecker and LogicAsker, to evaluate factual knowledge and logical reasoning accuracy, respectively. Second, for the non-toxicity of LLMs, we introduce two works for red-teaming LLMs. Third, to evaluate the fairness of LLMs, we introduce two evaluation frameworks, BiasAsker and XCulturalBench, to measure the social bias and cultural bias of LLMs, respectively.
Towards understanding Diffusion Models (on Graphs)
Diffusion models have emerged from various theoretical and methodological perspectives, each offering unique insights into their underlying principles. In this work, we provide an overview of the most prominent approaches, drawing attention to their striking analogies - namely, how seemingly diverse methodologies converge to a similar mathematical formulation of the core problem. While our ultimate goal is to understand these models in the context of graphs, we begin by conducting experiments in a simpler setting to build foundational insights. Through an empirical investigation of different diffusion and sampling techniques, we explore three critical questions: (1) What role does noise play in these models? Our findings aim to enhance the understanding of diffusion models and in the long run their application in graph machine learning. The forward process is modelled by a Markov process. The reverse process is unknown and needs to be approximated; this is usually done with a neural network. Consider the analogy of dropping a small amount of paint into a glass of water. Initially, the paint is concentrated in one location, but over time, it diffuses throughout the water until it reaches a state of equilibrium.
Does Alignment Tuning Really Break LLMs' Internal Confidence?
Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration. This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods. Initial analysis showed that the relationship between alignment and calibration is not always a trade-off, but under stricter analysis conditions, we found the alignment process consistently harms calibration. This highlights the need for (1) a careful approach when measuring model confidences and calibration errors and (2) future research into algorithms that can help LLMs to achieve both instruction-following and calibration without sacrificing either.
Learning and Verifying Maximal Taylor-Neural Lyapunov functions
Barreau, Matthieu, Bastianello, Nicola
We introduce a novel neural network architecture, termed Taylor-neural Lyapunov functions, designed to approximate Lyapunov functions with formal certification. This architecture innovatively encodes local approximations and extends them globally by leveraging neural networks to approximate the residuals. Our method recasts the problem of estimating the largest region of attraction - specifically for maximal Lyapunov functions - into a learning problem, ensuring convergence around the origin through robust control theory. Physics-informed machine learning techniques further refine the estimation of the largest region of attraction. Remarkably, this method is versatile, operating effectively even without simulated data points. We validate the efficacy of our approach by providing numerical certificates of convergence across multiple examples. Our proposed methodology not only competes closely with state-of-the-art approaches, such as sum-of-squares and LyZNet, but also achieves comparable results even in the absence of simulated data. This work represents a significant advancement in control theory, with broad potential applications in the design of stable control systems and beyond.
From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education
Lee, Unggi, Bae, Jiyeong, Jung, Yeonji, Kang, Minji, Byun, Gyuri, Lee, Yeonseo, Kim, Dohee, Lee, Sookbun, Park, Jaekwon, Ahn, Taekyung, Lee, Gunho, Kim, Hyeoncheol
Knowledge Tracing (KT) is a critical component in online learning, but traditional approaches face limitations in interpretability and cross-domain adaptability. This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing (LKT) to programming education. CodeLKT leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models. We explore Domain Adaptive Pre-Training (DAPT) and Task Adaptive Pre-Training (TAPT), showing enhanced performance in the coding domain and investigating cross-domain transfer between mathematics and coding. Additionally, we present an theoretically-informed integrated system combining CodeLKT with large language models to generate personalized, in-depth feedback to support students' programming learning. This work advances the field of Code Knowledge Tracing by expanding the knowledge base with language model-based approach and offering practical implications for programming education through data-informed feedback.