Goto

Collaborating Authors

 South America


The Master-Slave Encoder Model for Improving Patent Text Summarization: A New Approach to Combining Specifications and Claims

arXiv.org Artificial Intelligence

In order to solve the problem of insufficient generation quality caused by traditional patent text abstract generation models only originating from patent specifications, the problem of new terminology OOV caused by rapid patent updates, and the problem of information redundancy caused by insufficient consideration of the high professionalism, accuracy, and uniqueness of patent texts, we proposes a patent text abstract generation model (MSEA) based on a master-slave encoder architecture; Firstly, the MSEA model designs a master-slave encoder, which combines the instructions in the patent text with the claims as input, and fully explores the characteristics and details between the two through the master-slave encoder; Then, the model enhances the consideration of new technical terms in the input sequence based on the pointer network, and further enhances the correlation with the input text by re weighing the "remembered" and "for-gotten" parts of the input sequence from the encoder; Finally, an enhanced repetition suppression mechanism for patent text was introduced to ensure accurate and non redundant abstracts generated. On a publicly available patent text dataset, compared to the state-of-the-art model, Improved Multi-Head Attention Mechanism (IMHAM), the MSEA model achieves an improvement of 0.006, 0.005, and 0.005 in Rouge-1, Rouge-2, and Rouge-L scores, respectively. MSEA leverages the characteristics of patent texts to effectively enhance the quality of patent text generation, demonstrating its advancement and effectiveness in the experiments.


REFOL: Resource-Efficient Federated Online Learning for Traffic Flow Forecasting

arXiv.org Artificial Intelligence

Multiple federated learning (FL) methods are proposed for traffic flow forecasting (TFF) to avoid heavy-transmission and privacy-leaking concerns resulting from the disclosure of raw data in centralized methods. However, these FL methods adopt offline learning which may yield subpar performance, when concept drift occurs, i.e., distributions of historical and future data vary. Online learning can detect concept drift during model training, thus more applicable to TFF. Nevertheless, the existing federated online learning method for TFF fails to efficiently solve the concept drift problem and causes tremendous computing and communication overhead. Therefore, we propose a novel method named Resource-Efficient Federated Online Learning (REFOL) for TFF, which guarantees prediction performance in a communication-lightweight and computation-efficient way. Specifically, we design a data-driven client participation mechanism to detect the occurrence of concept drift and determine clients' participation necessity. Subsequently, we propose an adaptive online optimization strategy, which guarantees prediction performance and meanwhile avoids meaningless model updates. Then, a graph convolution-based model aggregation mechanism is designed, aiming to assess participants' contribution based on spatial correlation without importing extra communication and computing consumption on clients. Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of REFOL in terms of prediction improvement and resource economization.


Exact and approximate error bounds for physics-informed neural networks

arXiv.org Artificial Intelligence

The use of neural networks to solve differential equations, as an alternative to traditional numerical solvers, has increased recently. However, error bounds for the obtained solutions have only been developed for certain equations. In this work, we report important progress in calculating error bounds of physics-informed neural networks (PINNs) solutions of nonlinear first-order ODEs. We give a general expression that describes the error of the solution that the PINN-based method provides for a nonlinear first-order ODE. In addition, we propose a technique to calculate an approximate bound for the general case and an exact bound for a particular case. The error bounds are computed using only the residual information and the equation structure. We apply the proposed methods to particular cases and show that they can successfully provide error bounds without relying on the numerical solution.


Transfer Learning on Transformers for Building Energy Consumption Forecasting -- A Comparative Study

arXiv.org Artificial Intelligence

This study investigates the application of Transfer Learning (TL) on Transformer architectures to enhance building energy consumption forecasting. Transformers are a relatively new deep learning architecture, which has served as the foundation for groundbreaking technologies such as ChatGPT. While TL has been studied in the past, prior studies considered either one data-centric TL strategy or used older deep learning models such as Recurrent Neural Networks or Convolutional Neural Networks. Here, we carry out an extensive empirical study on six different data-centric TL strategies and analyse their performance under varying feature spaces. In addition to the vanilla Transformer architecture, we also experiment with Informer and PatchTST, specifically designed for time series forecasting. We use 16 datasets from the Building Data Genome Project 2 to create building energy consumption forecasting models. Experimental results reveal that while TL is generally beneficial, especially when the target domain has no data, careful selection of the exact TL strategy should be made to gain the maximum benefit. This decision largely depends on the feature space properties such as the recorded weather features. We also note that PatchTST outperforms the other two Transformer variants (vanilla Transformer and Informer). Our findings advance the building energy consumption forecasting using advanced approaches like TL and Transformer architectures.


The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems

arXiv.org Artificial Intelligence

Mobile health has the potential to revolutionize health care delivery and patient engagement. In this work, we discuss how integrating Artificial Intelligence into digital health applications-focused on supply chain, patient management, and capacity building, among other use cases-can improve the health system and public health performance. We present an Artificial Intelligence and Reinforcement Learning platform that allows the delivery of adaptive interventions whose impact can be optimized through experimentation and real-time monitoring. The system can integrate multiple data sources and digital health applications. The flexibility of this platform to connect to various mobile health applications and digital devices and send personalized recommendations based on past data and predictions can significantly improve the impact of digital tools on health system outcomes. The potential for resource-poor settings, where the impact of this approach on health outcomes could be more decisive, is discussed specifically. This framework is, however, similarly applicable to improving efficiency in health systems where scarcity is not an issue.


Outlier-robust Mean Estimation near the Breakdown Point via Sum-of-Squares

arXiv.org Machine Learning

We revisit the problem of estimating the mean of a high-dimensional distribution in the presence of an $\varepsilon$-fraction of adversarial outliers. When $\varepsilon$ is at most some sufficiently small constant, previous works can achieve optimal error rate efficiently \cite{diakonikolas2018robustly, kothari2018robust}. As $\varepsilon$ approaches the breakdown point $\frac{1}{2}$, all previous algorithms incur either sub-optimal error rates or exponential running time. In this paper we give a new analysis of the canonical sum-of-squares program introduced in \cite{kothari2018robust} and show that this program efficiently achieves optimal error rate for all $\varepsilon \in[0,\frac{1}{2})$. The key ingredient for our results is a new identifiability proof for robust mean estimation that focuses on the overlap between the distributions instead of their statistical distance as in previous works. We capture this proof within the sum-of-squares proof system, thus obtaining efficient algorithms using the sum-of-squares proofs to algorithms paradigm \cite{raghavendra2018high}.


Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models

arXiv.org Machine Learning

Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less efficient. In particular, marginalizing some variables can greatly improve inference but is difficult for users to do manually. We develop an algorithm to easily marginalize random effects in LMMs. A naive approach introduces cubic time operations within an inference algorithm like HMC, but we reduce the running time to linear using fast linear algebra techniques. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.


AI-powered Digital Framework for Personalized Economical Quality Learning at Scale

arXiv.org Artificial Intelligence

The disparity in access to quality education is significant, both between developed and developing countries and within nations, regardless of their economic status. Socioeconomic barriers and rapid changes in the job market further intensify this issue, highlighting the need for innovative solutions that can deliver quality education at scale and low cost. This paper addresses these challenges by proposing an AI-powered digital learning framework grounded in Deep Learning (DL) theory. The DL theory emphasizes learner agency and redefines the role of teachers as facilitators, making it particularly suitable for scalable educational environments. We outline eight key principles derived from learning science and AI that are essential for implementing DL-based Digital Learning Environments (DLEs). Our proposed framework leverages AI for learner modelling based on Open Learner Modeling (OLM), activity suggestions, and AI-assisted support for both learners and facilitators, fostering collaborative and engaging learning experiences. Our framework provides a promising direction for scalable, high-quality education globally, offering practical solutions to some of the AI-related challenges in education.


The Global AI Vibrancy Tool

arXiv.org Artificial Intelligence

This paper presents the latest version of the Global AI Vibrancy Tool (GVT), an interactive suite of visualizations designed to facilitate the comparison of AI vibrancy across 36 countries, using 42 indicators organized into 8 pillars. The tool offers customizable features that allow users to conduct in-depth country-level comparisons and longitudinal analyses of AI-related metrics, all based on publicly available data. By providing a transparent assessment of national progress in AI, it serves the diverse needs of policymakers, industry leaders, researchers, and the general public. Using weights for indicators and pillars developed by AI Index's panel of experts and combined into an index, the Global AI Vibrancy Ranking for 2023 places the United States first by a significant margin, followed by China and the United Kingdom. The ranking also highlights the rise of smaller nations such as Singapore when evaluated on both absolute and per capita bases. The tool offers three sub-indices for evaluating Global AI Vibrancy along different dimensions: the Innovation Index, the Economic Competitiveness Index, and the Policy, Governance, and Public Engagement Index.


Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine

arXiv.org Artificial Intelligence

The remarkable capabilities of Large Language Models (LLMs) make them increasingly compelling for adoption in real-world healthcare applications. However, the risks associated with using LLMs in medical applications have not been systematically characterized. We propose using five key principles for safe and trustworthy medical AI: Truthfulness, Resilience, Fairness, Robustness, and Privacy, along with ten specific aspects. Under this comprehensive framework, we introduce a novel MedGuard benchmark with 1,000 expert-verified questions. Our evaluation of 11 commonly used LLMs shows that the current language models, regardless of their safety alignment mechanisms, generally perform poorly on most of our benchmarks, particularly when compared to the high performance of human physicians. Despite recent reports indicate that advanced LLMs like ChatGPT can match or even exceed human performance in various medical tasks, this study underscores a significant safety gap, highlighting the crucial need for human oversight and the implementation of AI safety guardrails.