Goto

Collaborating Authors

 Wang, Chaojun


Simple Full-Spectrum Correlated k-Distribution Model based on Multilayer Perceptron

arXiv.org Artificial Intelligence

While neural networks have been successfully applied to the full-spectrum k-distribution (FSCK) method at a large range of thermodynamics with k-values predicted by a trained multilayer perceptron (MLP) model, the required a-values still need to be calculated on-the-fly, which theoretically degrades the FSCK method and may lead to errors. On the other hand, too complicated structure of the current MLP model inevitably slows down the calculation efficiency. Therefore, to compensate among accuracy, efficiency and storage, the simple MLP designed based on the nature of FSCK method are developed, i.e., the simple FSCK MLP (SFM) model, from which those correlated k-values and corresponding ka-values can be efficiently obtained. Several test cases have been carried out to compare the developed SFM model and other FSCK tools including look-up tables and traditional FSCK MLP (TFM) model. Results show that the SFM model can achieve excellent accuracy that is even better than look-up tables at a tiny computational cost that is far less than that of TFM model. Considering accuracy, efficiency and portability, the SFM model is not only an excellent tool for the prediction of spectral properties, but also provides a method to reduce the errors due to nonlinear effects.


Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

arXiv.org Artificial Intelligence

We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data across all disciplines. Specifically, inspired by the systematic structure in human education system, we build the taxonomy by decomposing human knowledge and capabilities to various fields, sub-fields and ultimately, distinct disciplines semi-automatically, facilitated by LLMs. Subsequently, we generate a comprehensive list of subjects for every discipline and proceed to design a syllabus tailored to each subject, again utilizing LLMs. With the fine-grained key concepts detailed in every class session of the syllabus, we are able to generate diverse instructions with a broad coverage across the entire spectrum of human knowledge and skills. Extensive experiments on large language models (e.g., Mistral) demonstrate that GLAN excels in multiple dimensions from mathematical reasoning, coding, academic exams, logical reasoning to general instruction following without using task-specific training data of these tasks. In addition, GLAN allows for easy customization and new fields or skills can be added by simply incorporating a new node into our taxonomy.


Progressive Translation: Improving Domain Robustness of Neural Machine Translation with Intermediate Sequences

arXiv.org Artificial Intelligence

Previous studies show that intermediate supervision signals benefit various Natural Language Processing tasks. However, it is not clear whether there exist intermediate signals that benefit Neural Machine Translation (NMT). Borrowing techniques from Statistical Machine Translation, we propose intermediate signals which are intermediate sequences from the "source-like" structure to the "target-like" structure. Such intermediate sequences introduce an inductive bias that reflects a domain-agnostic principle of translation, which reduces spurious correlations that are harmful to out-of-domain generalisation. Furthermore, we introduce a full-permutation multi-task learning to alleviate the spurious causal relations from intermediate sequences to the target, which results from exposure bias. The Minimum Bayes Risk decoding algorithm is used to pick the best candidate translation from all permutations to further improve the performance. Experiments show that the introduced intermediate signals can effectively improve the domain robustness of NMT and reduces the amount of hallucinations on out-of-domain translation. Further analysis shows that our methods are especially promising in low-resource scenarios.