Goto

Collaborating Authors

 Materials


Stochastic Configuration Machines for Industrial Artificial Intelligence

arXiv.org Artificial Intelligence

Industrial artificial intelligence (IAI) stresses the application of artificial intelligence techniques to industries, with some inherent challenges, such as uncertainties in sensory signals, real-time data processing, high modelling accuracy, and the interpretability of predictive models and results [1-7]. Recently, the IAI concept has received considerable attention worldwide due to the availability of cheaper sensors for data acquisition, powerful computing facilities and advanced algorithms that perform speedily at lower computational cost, larger storage devices and cloud computing technology for data management, and faster communication systems for sharing and delivering data. Although the IAI concept is not well-defined so far, the development of advanced machine learning algorithms is strongly expected so that they can meet these requirements of IAI. Machine learning has been a very active research area in AI over the past decades, and significant efforts in building predictive learner models have been made [8]. Among these approaches, the most popular and widely used ones include multilayer perceptrons with error-backpropagation algorithms (MLPs) [9], support vector machines (SVMs) [10], Bayesian networks (BNs) [11], and adaptive neuro-fuzzy inference systems (ANFIS) [12].


Spatial-temporal associations representation and application for process monitoring using graph convolution neural network

arXiv.org Artificial Intelligence

Thank you very much for the attention and concern of colleagues and scholars in this work. With the comments and guidance of experts, editors, and reviewers, this work has been accepted for publishing in the journal "Process Safety and Environmental Protection". The theme of this paper relies on the Spatial-temporal associations of numerous variables in the same industrial processes, which refers to numerous variables obtained in dynamic industrial processes with Spatial-temporal correlation characteristics, i.e., these variables are not only highly correlated in time but also interrelated in space. To handle this problem, three key issues need to be well addressed: variable characteristics modeling and representation, graph network construction (temporal information), and graph characteristics perception. The first issue is implemented by assuming the data follows one improved Gaussian distribution, while the graph network can be defined by the monitoring variables and their edges which are calculated by their characteristics in time. Finally, these networks corresponding to process states at different times are fed into a graph convolutional neural network to implement graph classification to achieve process monitoring. A benchmark experiment (Tennessee Eastman chemical process) and one application study (cobalt purification from zinc solution) are employed to demonstrate the feasibility and applicability of this paper.


Contrastive Post-training Large Language Models on Data Curriculum

arXiv.org Artificial Intelligence

Alignment serves as an important step to steer large language models (LLMs) towards human preferences. In this paper, we explore contrastive post-training techniques for alignment by automatically constructing preference pairs from multiple models of varying strengths (e.g., InstructGPT, ChatGPT and GPT-4). We carefully compare the contrastive techniques of SLiC and DPO to SFT baselines and find that DPO provides a step-function improvement even after continueing SFT saturates. We also explore a data curriculum learning scheme for contrastive posttraining, which starts by learning from "easier" pairs and transitioning to "harder" ones, which further improves alignment. Finally, we scale up our experiments to train with more data and larger models like Orca. Remarkably, contrastive post-training further improves the performance of Orca, already a state-of-the-art instruction learning model tuned with GPT-4 outputs, to exceed that of ChatGPT. The rapid evolution of Large Language Models (LLMs) has ushered in a new era of natural language processing capabilities. These models, when scaled to billions of parameters and pretrained over trillions of text tokens, demonstrate unprecedented proficiency in a wide array of tasks (Brown et al., 2020; Chowdhery et al., 2022). Various post-training procedures like supervised instruction tuning and Reinforcement Learning from Human Feedback (RLHF) fine-tune pretrained LLMs to better align with human expectations and preferences (Ouyang et al., 2022; OpenAI, 2023; Touvron et al., 2023a). This additional alignment procedure is crucial, because the pretraining objective of essentially predicting the next token in a text sequence is known to produce LLMs whose outputs are at times incorrect, irrelevant, or unsafe (Bai et al., 2022a). Traditionally, these post-training techniques rely on human preference annotations to inform an LLM which behaviors it ought to adopt in the scenario at hand. For instance, RLHF fits a reward model on these preference pairs, against which a LLM policy is then optimized (Ziegler et al., 2019; Bai et al., 2022a; Touvron et al., 2023b). However, such human feedback is expensive to obtain and often noisy (Stiennon et al., 2020; Ouyang et al., 2022; Bai et al., 2022a). To align an LLM without human feedback, other methods such as Reinforcement Learning from AI Feedback (RLAIF) harvest preference signals via automatic feedback from another LLM (Lee et al., 2023; Bai et al., 2022b). However, studies have found AI feedback has a low agreement rate with humans (Perez et al., 2022; Casper et al., 2023b; Lee et al., 2021). Also, these methods suffer from the same drawbacks as RLHF, such as reward hacking (Skalse et al., 2022).


A peridynamic-informed deep learning model for brittle damage prediction

arXiv.org Artificial Intelligence

In this study, a novel approach that combines the principles of peridynamic (PD) theory with PINN is presented to predict quasi-static damage and crack propagation in brittle materials. To achieve high prediction accuracy and convergence rate, the linearized PD governing equation is enforced in the PINN's residual-based loss function. The proposed PD-INN is able to learn and capture intricate displacement patterns associated with different geometrical parameters, such as pre-crack position and length. Several enhancements like cyclical annealing schedule and deformation gradient aware optimization technique are proposed to ensure the model would not get stuck in its trivial solution. The model's performance assessment is conducted by monitoring the behavior of loss function throughout the training process. The PD-INN predictions are also validated through several benchmark cases with the results obtained from high-fidelity techniques such as PD direct numerical method and Extended-Finite Element Method. Our results show the ability of the nonlocal PD-INN to predict damage and crack propagation accurately and efficiently.


Predicting emergence of crystals from amorphous matter with deep learning

arXiv.org Artificial Intelligence

Crystallization of the amorphous phases into metastable crystals plays a fundamental role in the formation of new matter, from geological to biological processes in nature to synthesis and development of new materials in the laboratory. Predicting the outcome of such phase transitions reliably would enable new research directions in these areas, but has remained beyond reach with molecular modeling or ab-initio methods. Here, we show that crystallization products of amorphous phases can be predicted in any inorganic chemistry by sampling the crystallization pathways of their local structural motifs at the atomistic level using universal deep learning potentials. We show that this approach identifies the crystal structures of polymorphs that initially nucleate from amorphous precursors with high accuracy across a diverse set of material systems, including polymorphic oxides, nitrides, carbides, fluorides, chlorides, chalcogenides, and metal alloys. Our results demonstrate that Ostwald's rule of stages can be exploited mechanistically at the molecular level to predictably access new metastable crystals from the amorphous phase in material synthesis.


Combining Deep Learning and GARCH Models for Financial Volatility and Risk Forecasting

arXiv.org Artificial Intelligence

In this paper, we develop a hybrid approach to forecasting the volatility and risk of financial instruments by combining common econometric GARCH time series models with deep learning neural networks. For the latter, we employ Gated Recurrent Unit (GRU) networks, whereas four different specifications are used as the GARCH component: standard GARCH, EGARCH, GJR-GARCH and APARCH. Models are tested using daily logarithmic returns on the S&P 500 index as well as gold price Bitcoin prices, with the three assets representing quite distinct volatility dynamics. As the main volatility estimator, also underlying the target function of our hybrid models, we use the price-range-based Garman-Klass estimator, modified to incorporate the opening and closing prices. Volatility forecasts resulting from the hybrid models are employed to evaluate the assets' risk using the Value-at-Risk (VaR) and Expected Shortfall (ES) at two different tolerance levels of 5% and 1%. Gains from combining the GARCH and GRU approaches are discussed in the contexts of both the volatility and risk forecasts. In general, it can be concluded that the hybrid solutions produce more accurate point volatility forecasts, although it does not necessarily translate into superior VaR and ES forecasts.


Resolving Knowledge Conflicts in Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) often encounter knowledge conflicts, scenarios where discrepancy arises between the internal parametric knowledge of LLMs and non-parametric information provided in the prompt context. In this work we ask what are the desiderata for LLMs when a knowledge conflict arises and whether existing LLMs fulfill them. We posit that LLMs should 1) identify knowledge conflicts, 2) pinpoint conflicting information segments, and 3) provide distinct answers or viewpoints in conflicting scenarios. To this end, we introduce KNOWLEDGE CONFLICT, an evaluation framework for simulating contextual knowledge conflicts and quantitatively evaluating to what extent LLMs achieve these goals. KNOWLEDGE CONFLICT includes diverse and complex situations of knowledge conflict, knowledge from diverse entities and domains, two synthetic conflict creation methods, and settings with progressively increasing difficulty to reflect realistic knowledge conflicts. Extensive experiments with the KNOWLEDGE CONFLICT framework reveal that while LLMs perform well in identifying the existence of knowledge conflicts, they struggle to determine the specific conflicting knowledge and produce a response with distinct answers amidst conflicting information. To address these challenges, we propose new instruction-based approaches that augment LLMs to better achieve the three goals. Further analysis shows that abilities to tackle knowledge conflicts are greatly impacted by factors such as knowledge domain and prompt text, while generating robust responses to knowledge conflict scenarios remains an open research question.


ChemCrow: Augmenting large-language models with chemistry tools

arXiv.org Machine Learning

Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.


Twin Neural Network Improved k-Nearest Neighbor Regression

arXiv.org Artificial Intelligence

Twin neural network regression is trained to predict differences between regression targets rather than the targets themselves. A solution to the original regression problem can be obtained by ensembling predicted differences between the targets of an unknown data point and multiple known anchor data points. Choosing the anchors to be the nearest neighbors of the unknown data point leads to a neural network-based improvement of k-nearest neighbor regression. This algorithm is shown to outperform both neural networks and k-nearest neighbor regression on small to medium-sized data sets.


High-curvature, high-force, vine robot for inspection

arXiv.org Artificial Intelligence

Robot performance has advanced considerably both in and out of the factory, however in tightly constrained, unknown environments such as inside a jet engine or the human heart, current robots are less adept. In such cases where a borescope or endoscope can't reach, disassembly or surgery are costly. One promising inspection device inspired by plant growth are "vine robots" that can navigate cluttered environments by extending from their tip. Yet, these vine robots are currently limited in their ability to simultaneously steer into tight curvatures and apply substantial forces to the environment. Here, we propose a plant-inspired method of steering by asymmetrically lengthening one side of the vine robot to enable high curvature and large force application. Our key development is the introduction of an extremely anisotropic, composite, wrinkled film with elastic moduli 400x different in orthogonal directions. The film is used as the vine robot body, oriented such that it can stretch over 120% axially, but only 3% circumferentially. With the addition of controlled layer jamming, this film enables a steering method inspired by plants in which the circumference of the robot is inextensible, but the sides can stretch to allow turns. This steering method and body pressure do not work against each other, allowing the robot to exhibit higher forces and tighter curvatures than previous vine robot architectures. This work advances the abilities of vine robots--and robots more generally--to not only access tightly constrained environments, but perform useful work once accessed.