Pacific Ocean
Locating and Extracting Relational Concepts in Large Language Models
Wang, Zijian, White, Britney, Xu, Chang
Relational concepts are indeed foundational to the structure of knowledge representation, as they facilitate the association between various entity concepts, allowing us to express and comprehend complex world knowledge. By expressing relational concepts in natural language prompts, people can effortlessly interact with large language models (LLMs) and recall desired factual knowledge. However, the process of knowledge recall lacks interpretability, and representations of relational concepts within LLMs remain unknown to us. In this paper, we identify hidden states that can express entity and relational concepts through causal mediation analysis in fact recall processes. Our finding reveals that at the last token position of the input prompt, there are hidden states that solely express the causal effects of relational concepts. Based on this finding, we assume that these hidden states can be treated as relational representations and we can successfully extract them from LLMs. The experimental results demonstrate high credibility of the relational representations: they can be flexibly transplanted into other fact recall processes, and can also be used as robust entity connectors. Moreover, we also show that the relational representations exhibit significant potential for controllable fact recall through relation rewriting.
Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction
You, Junwei, Shi, Haotian, Wu, Keshu, Long, Keke, Fu, Sicheng, Chen, Sikai, Ran, Bin
Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS), enhancing road safety and traffic efficiency. While traditional methods have laid foundational work, modern deep learning techniques, particularly transformer-based models and generative approaches, have significantly improved prediction accuracy by capturing complex and non-linear patterns in vehicle motion and traffic interactions. However, these models often overlook the detailed car-following behaviors and inter-vehicle interactions essential for real-world driving scenarios. This study introduces a Cross-Attention Transformer Enhanced Conditional Diffusion Model (Crossfusor) specifically designed for car-following trajectory prediction. Crossfusor integrates detailed inter-vehicular interactions and car-following dynamics into a robust diffusion framework, improving both the accuracy and realism of predicted trajectories. The model leverages a novel temporal feature encoding framework combining GRU, location-based attention mechanisms, and Fourier embedding to capture historical vehicle dynamics. It employs noise scaled by these encoded historical features in the forward diffusion process, and uses a cross-attention transformer to model intricate inter-vehicle dependencies in the reverse denoising process. Experimental results on the NGSIM dataset demonstrate that Crossfusor outperforms state-of-the-art models, particularly in long-term predictions, showcasing its potential for enhancing the predictive capabilities of autonomous driving systems.
GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting
Zhou, Fan, Pan, Chen, Ma, Lintao, Liu, Yu, Zhang, James, Zhou, Jun, Mei, Hongyuan, Lin, Weitao, Zhuang, Zi, Ning, Wenxin, Hu, Yunhua, Xue, Siqiao
Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g., aggregation from the forecasts of fine granularity to the coarse ones, and allocation from the coarse granularity to the fine ones. These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy. In this paper, we propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance and also utilizes an adaptive reconciliation (AR) strategy to maintain coherence without performance loss. Furthermore, we introduce an optimization module to achieve task-based targets while adhering to more real-world constraints. Experiments on real-world datasets demonstrate that our framework (GMP-AR) achieves superior performances on temporal hierarchical forecasting tasks compared to state-of-the-art methods. In addition, our framework has been successfully applied to a real-world task of payment traffic management in Alipay by integrating with the task-based optimization module.
Nemotron-4 340B Technical Report
Nvidia, null, :, null, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework
Traffic prediction has become an essential component of Intelligent Transportation Systems (ITS), which encompasses various applications such as traffic management[21], route planning[2] and congestion avoidance[12]. The main challenge lies in efficiently capturing the complex and time-varying spatio-temporal dependencies of traffic data. Recurrent Neural Networks (RNNs)[4, 23] and their variants, such as LSTM[35] and GRU[8], are used to capture temporal dependencies of traffic data. Nonetheless, these methods fail to model spatial correlations. To address this limitation, recent research has combined Convolutional Neural Networks (CNNs)[14, 16, 29] and RNNs to capture spatio-temporal dependencies of grid-based traffic data, with models like ST-ResNet[37] and STDN[34] proposed for this purpose. However, CNNs have inherent limitations in handling common non-Euclidean data representations. Recently, Spatio-Temporal Graph Neural Networks (STGNNs) have been developed for traffic prediction. These models combine GNNs with either RNNs or Temporal Convo-lutional Networks (TCNs) to capture the spatio-temporal correlations of traffic data.
Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask
Senane, Zineb, Cao, Lele, Buchner, Valentin Leonhard, Tashiro, Yusuke, You, Lei, Herman, Pawel, Nordahl, Mats, Tu, Ruibo, von Ehrenheim, Vilhelm
Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.
Let Slip the Robot Dogs of War
The Chinese military recently unveiled a new kind of battle buddy for its soldiers: a "robot dog" with a machine gun strapped to its back. In video distributed by the state-run news agency CCTV, People's Liberation Army personnel are shown operating on a testing range alongside a four-legged robot with what appears to be a variant of the standard-issue 5.8 x 42-mm QBZ-95 assault rifle mounted on it as part of China's recent Golden Dragon 24 joint military exercises with Cambodia in the Gulf of Thailand. In one scenario, Chinese soldiers stand on either side of a doorway while the robot dog enters the building ahead of them; in another, the robot fires off a burst of bullets as it advances on a target. "It can serve as a new member in our urban combat operations, replacing our members to conduct reconnaissance and identify enemy [sic] and strike the target during our training," one Chinese soldier shown operating the robot told CCTV. This isn't the first time the Chinese military-industrial complex has shown off an armed robot dog. In October 2022, Chinese defense company Kestrel Defense published a video showing an unmanned aerial vehicle air-dropping a quadrupedal ground vehicle affixed with a 5.8 x 42-mm QBB-97 light machine gun on a roof during an urban warfare experiment.
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture
Li, Wenyan, Zhang, Xinyu, Li, Jiaang, Peng, Qiwei, Tang, Raphael, Zhou, Li, Zhang, Weijia, Hu, Guimin, Yuan, Yifei, Sรธgaard, Anders, Hershcovich, Daniel, Elliott, Desmond
Beijing Chaoshan Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions. FoodieQA comprises three multiplechoice question-answering tasks where models need to answer questions based on multiple images, Sichuan Guangdong a single image, and text-only descriptions, Figure 1: An example of regional food differences in respectively. While LLMs excel at text-based referring to hotpot in China. The depicted soups and question answering, surpassing human accuracy, dishware visually reflect the ingredients, flavors, and the open-weights VLMs still fall short by traditions of these regions: Beijing in the north, Sichuan 41% on multi-image and 21% on single-image in the southwest, and Guangdong in the south coast. VQA tasks, although closed-weights models perform closer to human levels (within 10%).
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
Wu, Taiqiang, Tao, Chaofan, Wang, Jiahao, Zhao, Zhe, Wong, Ngai
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine FKL and RKL. Metric-based and GPT-4-based evaluations demonstrate that the proposed AKL outperforms the baselines across various tasks and improves the diversity and quality of generated responses.
Generative AI-based Prompt Evolution Engineering Design Optimization With Vision-Language Model
Wong, Melvin, Rios, Thiago, Menzel, Stefan, Ong, Yew Soon
Engineering design optimization requires an efficient combination of a 3D shape representation, an optimization algorithm, and a design performance evaluation method, which is often computationally expensive. We present a prompt evolution design optimization (PEDO) framework contextualized in a vehicle design scenario that leverages a vision-language model for penalizing impractical car designs synthesized by a generative model. The backbone of our framework is an evolutionary strategy coupled with an optimization objective function that comprises a physics-based solver and a vision-language model for practical or functional guidance in the generated car designs. In the prompt evolutionary search, the optimizer iteratively generates a population of text prompts, which embed user specifications on the aerodynamic performance and visual preferences of the 3D car designs. Then, in addition to the computational fluid dynamics simulations, the pre-trained vision-language model is used to penalize impractical designs and, thus, foster the evolutionary algorithm to seek more viable designs. Our investigations on a car design optimization problem show a wide spread of potential car designs generated at the early phase of the search, which indicates a good diversity of designs in the initial populations, and an increase of over 20\% in the probability of generating practical designs compared to a baseline framework without using a vision-language model. Visual inspection of the designs against the performance results demonstrates prompt evolution as a very promising paradigm for finding novel designs with good optimization performance while providing ease of use in specifying design specifications and preferences via a natural language interface.