AITopics

2406.13184

Country:

Europe > Germany (0.04)
Asia > North Korea (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(15 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.93)
Education > Health & Safety > School Nutrition (0.68)
Health & Medicine > Consumer Health (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction

You, Junwei, Shi, Haotian, Wu, Keshu, Long, Keke, Fu, Sicheng, Chen, Sikai, Ran, Bin

Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS), enhancing road safety and traffic efficiency. While traditional methods have laid foundational work, modern deep learning techniques, particularly transformer-based models and generative approaches, have significantly improved prediction accuracy by capturing complex and non-linear patterns in vehicle motion and traffic interactions. However, these models often overlook the detailed car-following behaviors and inter-vehicle interactions essential for real-world driving scenarios. This study introduces a Cross-Attention Transformer Enhanced Conditional Diffusion Model (Crossfusor) specifically designed for car-following trajectory prediction. Crossfusor integrates detailed inter-vehicular interactions and car-following dynamics into a robust diffusion framework, improving both the accuracy and realism of predicted trajectories. The model leverages a novel temporal feature encoding framework combining GRU, location-based attention mechanisms, and Fourier embedding to capture historical vehicle dynamics. It employs noise scaled by these encoded historical features in the forward diffusion process, and uses a cross-attention transformer to model intricate inter-vehicle dependencies in the reverse denoising process. Experimental results on the NGSIM dataset demonstrate that Crossfusor outperforms state-of-the-art models, particularly in long-term predictions, showcasing its potential for enhancing the predictive capabilities of autonomous driving systems.

fut stu, trajectory prediction, vehicle, (13 more...)

2406.11941

Country:

North America > United States > Wisconsin > Dane County > Madison (0.15)
North America > United States > Texas (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(7 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Zhou, Fan, Pan, Chen, Ma, Lintao, Liu, Yu, Zhang, James, Zhou, Jun, Mei, Hongyuan, Lin, Weitao, Zhuang, Zi, Ning, Wenxin, Hu, Yunhua, Xue, Siqiao

Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g., aggregation from the forecasts of fine granularity to the coarse ones, and allocation from the coarse granularity to the fine ones. These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy. In this paper, we propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance and also utilizes an adaptive reconciliation (AR) strategy to maintain coherence without performance loss. Furthermore, we introduce an optimization module to achieve task-based targets while adhering to more real-world constraints. Experiments on real-world datasets demonstrate that our framework (GMP-AR) achieves superior performances on temporal hierarchical forecasting tasks compared to state-of-the-art methods. In addition, our framework has been successfully applied to a real-world task of payment traffic management in Alipay by integrating with the task-based optimization module.

forecast, granularity, node, (15 more...)

doi: 10.1609/aaai.v38i8.28795

2406.12242

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
Oceania > New Zealand (0.04)
(8 more...)

Genre: Research Report (0.84)

Industry:

Banking & Finance (0.68)
Government (0.47)
Information Technology (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Nemotron-4 340B Technical Report

Nvidia, null, :, null, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.

arxiv preprint arxiv, dataset, nemotron-4-340b-instruct, (14 more...)

2406.11704

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > Virginia (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Education (1.00)
Law (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lin, Jiaqi, Ren, Qianqian

Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework

Traffic prediction has become an essential component of Intelligent Transportation Systems (ITS), which encompasses various applications such as traffic management[21], route planning[2] and congestion avoidance[12]. The main challenge lies in efficiently capturing the complex and time-varying spatio-temporal dependencies of traffic data. Recurrent Neural Networks (RNNs)[4, 23] and their variants, such as LSTM[35] and GRU[8], are used to capture temporal dependencies of traffic data. Nonetheless, these methods fail to model spatial correlations. To address this limitation, recent research has combined Convolutional Neural Networks (CNNs)[14, 16, 29] and RNNs to capture spatio-temporal dependencies of grid-based traffic data, with models like ST-ResNet[37] and STDN[34] proposed for this purpose. However, CNNs have inherent limitations in handling common non-Euclidean data representations. Recently, Spatio-Temporal Graph Neural Networks (STGNNs) have been developed for traffic prediction. These models combine GNNs with either RNNs or Temporal Convo-lutional Networks (TCNs) to capture the spatio-temporal correlations of traffic data.

dependency, node, spatial dependency, (12 more...)

2406.11921

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Transportation > Infrastructure & Services (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask

Senane, Zineb, Cao, Lele, Buchner, Valentin Leonhard, Tashiro, Yusuke, You, Lei, Herman, Pawel, Nordahl, Mats, Tu, Ruibo, von Ehrenheim, Vilhelm

Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.

dataset, representation, tsde, (14 more...)

doi: 10.1145/3637528.3671673

2405.05959

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.45)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

WIREDJun-16-2024, 09:00:00 GMT

Let Slip the Robot Dogs of War

The Chinese military recently unveiled a new kind of battle buddy for its soldiers: a "robot dog" with a machine gun strapped to its back. In video distributed by the state-run news agency CCTV, People's Liberation Army personnel are shown operating on a testing range alongside a four-legged robot with what appears to be a variant of the standard-issue 5.8 x 42-mm QBZ-95 assault rifle mounted on it as part of China's recent Golden Dragon 24 joint military exercises with Cambodia in the Gulf of Thailand. In one scenario, Chinese soldiers stand on either side of a doorway while the robot dog enters the building ahead of them; in another, the robot fires off a burst of bullets as it advances on a target. "It can serve as a new member in our urban combat operations, replacing our members to conduct reconnaissance and identify enemy [sic] and strike the target during our training," one Chinese soldier shown operating the robot told CCTV. This isn't the first time the Chinese military-industrial complex has shown off an armed robot dog. In October 2022, Chinese defense company Kestrel Defense published a video showing an unmanned aerial vehicle air-dropping a quadrupedal ground vehicle affixed with a 5.8 x 42-mm QBB-97 light machine gun on a roof during an urban warfare experiment.

rifle, robot dog, robot dog outfitted, (9 more...)

WIRED

Country:

North America > United States (1.00)
Asia > China (1.00)
Pacific Ocean > North Pacific Ocean > Gulf of Thailand (0.26)
(3 more...)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Asia Government > China Government (1.00)
Government > Military > Army (0.98)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

arXiv.org Artificial IntelligenceJun-16-2024

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

Li, Wenyan, Zhang, Xinyu, Li, Jiaang, Peng, Qiwei, Tang, Raphael, Zhou, Li, Zhang, Weijia, Hu, Guimin, Yuan, Yifei, Søgaard, Anders, Hershcovich, Daniel, Elliott, Desmond

Beijing Chaoshan Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions. FoodieQA comprises three multiplechoice question-answering tasks where models need to answer questions based on multiple images, Sichuan Guangdong a single image, and text-only descriptions, Figure 1: An example of regional food differences in respectively. While LLMs excel at text-based referring to hotpot in China. The depicted soups and question answering, surpassing human accuracy, dishware visually reflect the ingredients, flavors, and the open-weights VLMs still fall short by traditions of these regions: Beijing in the north, Sichuan 41% on multi-image and 21% on single-image in the southwest, and Guangdong in the south coast. VQA tasks, although closed-weights models perform closer to human levels (within 10%).

cuisine type, dataset, information, (15 more...)

2406.1103

Country:

Asia > China > Beijing > Beijing (0.44)
Asia > China > Chongqing Province > Chongqing (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(12 more...)

Genre: Research Report (0.82)

Industry: Consumer Products & Services (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceJun-16-2024

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

Wu, Taiqiang, Tao, Chaofan, Wang, Jiahao, Zhao, Zhe, Wong, Ngai

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine FKL and RKL. Metric-based and GPT-4-based evaluations demonstrate that the proposed AKL outperforms the baselines across various tasks and improves the diversity and quality of generated responses.

akl, fkl and rkl, rkl, (14 more...)

2404.02657

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
North America > United States > New York (0.05)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.05)
(12 more...)

Genre: Research Report (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

arXiv.org Artificial IntelligenceJun-14-2024

Generative AI-based Prompt Evolution Engineering Design Optimization With Vision-Language Model

Wong, Melvin, Rios, Thiago, Menzel, Stefan, Ong, Yew Soon

Engineering design optimization requires an efficient combination of a 3D shape representation, an optimization algorithm, and a design performance evaluation method, which is often computationally expensive. We present a prompt evolution design optimization (PEDO) framework contextualized in a vehicle design scenario that leverages a vision-language model for penalizing impractical car designs synthesized by a generative model. The backbone of our framework is an evolutionary strategy coupled with an optimization objective function that comprises a physics-based solver and a vision-language model for practical or functional guidance in the generated car designs. In the prompt evolutionary search, the optimizer iteratively generates a population of text prompts, which embed user specifications on the aerodynamic performance and visual preferences of the 3D car designs. Then, in addition to the computational fluid dynamics simulations, the pre-trained vision-language model is used to penalize impractical designs and, thus, foster the evolutionary algorithm to seek more viable designs. Our investigations on a car design optimization problem show a wide spread of potential car designs generated at the early phase of the search, which indicates a good diversity of designs in the initial populations, and an increase of over 20\% in the probability of generating practical designs compared to a baseline framework without using a vision-language model. Visual inspection of the designs against the performance results demonstrates prompt evolution as a very promising paradigm for finding novel designs with good optimization performance while providing ease of use in specifying design specifications and preferences via a natural language interface.

car design, optimization, vision-language model, (14 more...)

2406.09143

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.83)