Goto

Collaborating Authors

 Shen, Ming


ToW: Thoughts of Words Improve Reasoning in Large Language Models

arXiv.org Artificial Intelligence

We introduce thoughts of words (ToW), a novel training-time data-augmentation method for next-word prediction. ToW views next-word prediction as a core reasoning task and injects fine-grained thoughts explaining what the next word should be and how it is related to the previous contexts in pre-training texts. Our formulation addresses two fundamental drawbacks of existing next-word prediction learning schemes: they induce factual hallucination and are inefficient for models to learn the implicit reasoning processes in raw texts. While there are many ways to acquire such thoughts of words, we explore the first step of acquiring ToW annotations through distilling from larger models. After continual pre-training with only 70K ToW annotations, we effectively improve models' reasoning performances by 7% to 9% on average and reduce model hallucination by up to 10%. At the same time, ToW is entirely agnostic to tasks and applications, introducing no additional biases on labels or semantics.


Rethinking Data Selection for Supervised Fine-Tuning

arXiv.org Artificial Intelligence

Although supervised finetuning (SFT) has emerged as an essential technique to align large language models with humans, it is considered superficial, with style learning being its nature. At the same time, recent works indicate the importance of data selection for SFT, showing that finetuning with high-quality and diverse subsets of the original dataset leads to superior downstream performance. In this work, we rethink the intuition behind data selection for SFT. Considering SFT is superficial, we propose that essential demonstrations for SFT should focus on reflecting human-like interactions instead of data quality or diversity. However, it is not straightforward to directly assess to what extent a demonstration reflects human styles. Towards an initial attempt in this direction, we find selecting instances with long responses is surprisingly more effective for SFT than utilizing full datasets or instances selected based on quality and diversity. We hypothesize that such a simple heuristic implicitly mimics a crucial aspect of human-style conversation: detailed responses are usually more helpful.


Blockage Prediction in Directional mmWave Links Using Liquid Time Constant Network

arXiv.org Artificial Intelligence

We propose to use a liquid time constant (LTC) network to predict the future blockage status of a millimeter wave (mmWave) link using only the received signal power as the input to the system. The LTC network is based on an ordinary differential equation (ODE) system inspired by biology and specialized for near-future prediction for time sequence observation as the input. Using an experimental dataset at 60 GHz, we show that our proposed use of LTC can reliably predict the occurrence of blockage and the length of the blockage without the need for scenario-specific data. The results show that the proposed LTC can predict with upwards of 97.85\% accuracy without prior knowledge of the outdoor scenario or retraining/tuning. These results highlight the promising gains of using LTC networks to predict time series-dependent signals, which can lead to more reliable and low-latency communication.


Robust and Efficient Fault Diagnosis of mm-Wave Active Phased Arrays using Baseband Signal

arXiv.org Artificial Intelligence

One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasible for on-site deployment. This paper proposes a novel method exploiting a Deep Neural Network (DNN) tailored to extract the features hidden in the baseband in-phase and quadrature signals for classifying the different faults. It requires only a single probe in one measurement point for fast and accurate diagnosis of the faulty elements and components in APAs. Validation of the proposed method is done using a commercial 28 GHz APA. Accuracies of 99% and 80% have been demonstrated for single- and multi-element failure detection, respectively. Three different test scenarios are investigated: on-off antenna elements, phase variations, and magnitude attenuation variations. In a low signal to noise ratio of 4 dB, stable fault detection accuracy above 90% is maintained. This is all achieved with a detection time of milliseconds (e.g 6~ms), showing a high potential for on-site deployment.


Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization

arXiv.org Artificial Intelligence

Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words and outperforms existing methods by 3.4 ROUGE-L points on SPACE and 0.5 ROUGE-1 point on OPOSUM+ for aspect-specific opinion summarization. Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words and outperforms existing approaches by 1.2 ROUGE-L points on SPACE for aspect-specific opinion summarization and remains competitive on other metrics.


Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments

arXiv.org Artificial Intelligence

Examples of such domains are "perfect information Learning to detect, characterize and accommodate novelties is a games" such as Chess, Go, or Ms.Pac-man, where the rules challenge that agents operating in open-world domains need to of the game, the goals of the players, and the entire state of the address to be able to guarantee satisfactory task performance. Certain game are always known by all agents [10, 24, 30]. This characteristic novelties (e.g., changes in environment dynamics) can interfere simplifies the game AI behavior by limiting the number of novelties with the performance or prevent agents from accomplishing task to instances of known types (e.g., a chess move with the bishop goals altogether. In this paper, we introduce general methods and a player has not seen before), thus allowing the development of architectural mechanisms for detecting and characterizing different the game AI without needing to anticipate any unknown scenarios types of novelties, and for building an appropriate adaptive within the bounds of the system (e.g., a novel piece with novel rules model to accommodate them utilizing logical representations and being introduced).