AITopics

2412.13746

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > United States > Michigan (0.05)
(29 more...)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Transportation (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-18-2024

Timer-XL: Long-Context Transformers for Unified Time Series Forecasting

Liu, Yong, Qin, Guo, Huang, Xiangdong, Wang, Jianmin, Long, Mingsheng

To uniformly predict 1D and 2D time series, we generalize next token prediction, predominantly adopted for causal generation of 1D sequences, to multivariate next token prediction. The proposed paradigm uniformly formulates various forecasting scenarios as a long-context generation problem. We opt for the generative Transformer, which can capture global-range and causal dependencies while providing contextual flexibility, to implement unified forecasting on univariate series characterized by non-stationarity, multivariate time series with complicated dynamics and correlations, and covariate-informed contexts that include both endogenous and exogenous time series. Technically, we propose a universal TimeAttention to facilitate generative Transformers on multiple time series, which can effectively capture fine-grained intra-and inter-series dependencies of flattened time series tokens (patches), and is further enhanced by deftly designed position embeddings for the temporal and variable dimensions. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach. Based on large-scale pre-training, Timer-XL also demonstrates notable zero-shot performance, making it a promising architecture for large time series models. Transformers have contributed significantly to the fields of natural language and computer vision (Radford et al., 2018; Dosovitskiy et al., 2020), and been extensively applied in time series forecasting, becoming the foundation of specialized forecasters (Zhou et al., 2021; Wu et al., 2021) and large models (Das et al., 2023). As a typical generative task, the quality of predictions heavily relies on the context (Dai et al., 2019). Reliable predictions are made by thoroughly considering endogenous temporal variations and retrieving relevant exogenous correlations into the context (Box, 2013). Further, the pre-training context length, which serves as an indicator of scaling (Kaplan et al., 2020), determines the maximum input and output of generative Transformers, ultimately enabling long-sequence, high-resolution, and high-frequency generation (Yin et al., 2023; Wang et al., 2024a). However, existing Transformers in the time series field crucially encounter the context bottleneck. As shown in Figure 1, unlike Transformers for natural language and vision that learn dependencies among thousands to millions of tokens (Kirillov et al., 2023; OpenAI, 2023), time-series Transformers typically work around limited contexts of up to hundreds of time series tokens (patches) (Nie et al., 2022). For univariate time series, a short context length leads to an insufficient perception of global tendencies, overlooking widespread non-stationarity in real-world time series (Hyndman, 2018). The excessive reliance on stationarization, such as normalization (Kim et al., 2021), restricts the model capacity and leads to overfitting of Transformers (Liu et al., 2022b).

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2410.04803

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Menon, Abhinav, Shrivastava, Manish, Krueger, David, Lubana, Ekdeep Singh

Analyzing (In)Abilities of SAEs via Formal Languages

arXiv.org Artificial IntelligenceDec-18-2024

Autoencoders have been used for finding interpretable and disentangled features underlying neural network representations in both image and text domains. While the efficacy and pitfalls of such methods are well-studied in vision, there is a lack of corresponding results, both qualitative and quantitative, for the text domain. We aim to address this gap by training sparse autoencoders (SAEs) on a synthetic testbed of formal languages. Specifically, we train SAEs on the hidden representations of models trained on formal languages (Dyck-2, Expr, and English PCFG) under a wide variety of hyperparameter settings, finding interpretable latents often emerge in the features learned by our SAEs. However, similar to vision, we find performance turns out to be highly sensitive to inductive biases of the training pipeline. Moreover, we show latents correlating to certain features of the input do not always induce a causal impact on model's computation. We thus argue that causality has to become a central target in SAE training: learning of causal features should be incentivized from the ground-up. Motivated by this, we propose and perform preliminary investigations for an approach that promotes learning of causally relevant features in our formal language setting.

logic & formal reasoning, machine learning, natural language, (21 more...)

2410.11767

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceDec-17-2024

Generating Unseen Nonlinear Evolution in Sea Surface Temperature Using a Deep Learning-Based Latent Space Data Assimilation Framework

Zheng, Qingyu, Han, Guijun, Li, Wei, Cao, Lige, Zhou, Gongfu, Wu, Haowen, Shao, Qi, Wang, Ru, Wu, Xiaobo, Cui, Xudong, Li, Hong, Wang, Xuan

Advances in data assimilation (DA) methods have greatly improved the accuracy of Earth system predictions. To fuse multi-source data and reconstruct the nonlinear evolution missing from observations, geoscientists are developing future-oriented DA methods. In this paper, we redesign a purely data-driven latent space DA framework (DeepDA) that employs a generative artificial intelligence model to capture the nonlinear evolution in sea surface temperature. Under variational constraints, DeepDA embedded with nonlinear features can effectively fuse heterogeneous data. The results show that DeepDA remains highly stable in capturing and generating nonlinear evolutions even when a large amount of observational information is missing. It can be found that when only 10% of the observation information is available, the error increase of DeepDA does not exceed 40%. Furthermore, DeepDA has been shown to be robust in the fusion of real observations and ensemble simulations. In particular, this paper provides a mechanism analysis of the nonlinear evolution generated by DeepDA from the perspective of physical patterns, which reveals the inherent explainability of our DL model in capturing multi-scale ocean signals.

artificial intelligence, deepda, machine learning, (16 more...)

2412.13477

Country:

North America > United States (0.28)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-17-2024

A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting

Kiefer, Nicholas, Weyrauch, Arvid, Öz, Muhammed, Streit, Achim, Götz, Markus, Debus, Charlotte

The current landscape in time-series forecasting is dominated by Transformer-based models. Their high parameter count and corresponding demand in computational resources pose a challenge to real-world deployment, especially for commercial and scientific applications with low-power embedded devices. Pruning is an established approach to reduce neural network parameter count and save compute. However, the implications and benefits of pruning Transformer-based models for time series forecasting are largely unknown. To close this gap, we provide a comparative benchmark study by evaluating unstructured and structured pruning on various state-of-the-art multivariate time series models. We study the effects of these pruning strategies on model predictive performance and computational aspects like model size, operations, and inference time. Our results show that certain models can be pruned even up to high sparsity levels, outperforming their dense counterpart. However, fine-tuning pruned models is necessary. Furthermore, we demonstrate that even with corresponding hardware and software support, structured pruning is unable to provide significant time savings.

data mining, machine learning, pruning, (18 more...)

2412.12883

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.06)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zubić, Nikola, Scaramuzza, Davide

GG-SSMs: Graph-Generating State Space Models

State Space Models (SSMs) are powerful tools for modeling sequential data in computer vision and time series analysis domains. However, traditional SSMs are limited by fixed, one-dimensional sequential processing, which restricts their ability to model non-local interactions in high-dimensional data. While methods like Mamba and VMamba introduce selective and flexible scanning strategies, they rely on predetermined paths, which fails to efficiently capture complex dependencies. We introduce Graph-Generating State Space Models (GG-SSMs), a novel framework that overcomes these limitations by dynamically constructing graphs based on feature relationships. Using Chazelle's Minimum Spanning Tree algorithm, GG-SSMs adapt to the inherent data structure, enabling robust feature propagation across dynamically generated graphs and efficiently modeling complex dependencies. We validate GG-SSMs on 11 diverse datasets, including event-based eye-tracking, ImageNet classification, optical flow estimation, and six time series datasets. GG-SSMs achieve state-of-the-art performance across all tasks, surpassing existing methods by significant margins. Specifically, GG-SSM attains a top-1 accuracy of 84.9% on ImageNet, outperforming prior SSMs by 1%, reducing the KITTI-15 error rate to 2.77%, and improving eye-tracking detection rates by up to 0.33% with fewer parameters. These results demonstrate that dynamic scanning based on feature relationships significantly improves SSMs' representational power and efficiency, offering a versatile tool for various applications in computer vision and beyond.

artificial intelligence, deep learning, machine learning, (16 more...)

2412.12423

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Alabama (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

CharacterBench: Benchmarking Character Customization of Large Language Models

Zhou, Jinfeng, Huang, Yongkang, Wen, Bosi, Bi, Guanqun, Chen, Yuxuan, Ke, Pei, Chen, Zhuang, Xiao, Xiyao, Peng, Libiao, Tang, Kuntian, Zhang, Rongsheng, Zhang, Le, Lv, Tangjie, Hu, Zhipeng, Wang, Hongning, Huang, Minlie

Character-based dialogue (aka role-playing) enables users to freely customize characters for interaction, which often relies on LLMs, raising the need to evaluate LLMs' character customization capability. However, existing benchmarks fail to ensure a robust evaluation as they often only involve a single character category or evaluate limited dimensions. Moreover, the sparsity of character features in responses makes feature-focused generative evaluation both ineffective and inefficient. To address these issues, we propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters from 25 detailed character categories. We define 11 dimensions of 6 aspects, classified as sparse and dense dimensions based on whether character features evaluated by specific dimensions manifest in each response. We enable effective and efficient evaluation by crafting tailored queries for each dimension to induce characters' responses related to specific dimensions. Further, we develop CharacterJudge model for cost-effective and stable evaluations. Experiments show its superiority over SOTA automatic judges (e.g., GPT-4) and our benchmark's potential to optimize LLMs' character customization. Our repository is at https://github.com/thu-coai/CharacterBench.

large language model, machine learning, natural language, (22 more...)

2412.11912

Country:

Asia > Russia (0.14)
Europe > United Kingdom > England (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(14 more...)

Genre:

Personal > Interview (1.00)
Overview (0.67)
Research Report (0.63)

Industry:

Health & Medicine (1.00)
Government > Military (1.00)
Education (1.00)
Water & Waste Management (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Li, Xiaoxi, Jin, Jiajie, Zhou, Yujia, Wu, Yongkang, Li, Zhonghua, Ye, Qi, Dou, Zhicheng

Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens from retrieved text chunks, and the lack of joint optimization of retrieval and generation. To address these issues, we propose \textbf{RetroLLM}, a unified framework that integrates retrieval and generation into a single, cohesive process, enabling LLMs to directly generate fine-grained evidence from the corpus with constrained decoding. Moreover, to mitigate false pruning in the process of constrained evidence generation, we introduce (1) hierarchical FM-Index constraints, which generate corpus-constrained clues to identify a subset of relevant documents before evidence generation, reducing irrelevant decoding space; and (2) a forward-looking constrained decoding strategy, which considers the relevance of future sequences to improve evidence accuracy. Extensive experiments on five open-domain QA datasets demonstrate RetroLLM's superior performance across both in-domain and out-of-domain tasks. The code is available at \url{https://github.com/sunnynexus/RetroLLM}.

large language model, machine learning, natural language, (20 more...)

2412.11919

Country:

North America > United States > District of Columbia > Washington (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > California > San Francisco County > San Francisco (0.04)
(24 more...)

Genre:

Personal > Honors (1.00)
Research Report > New Finding (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Media > Music (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning

Liu, Yuti, Liu, Shice, Gao, Junyuan, Jiang, Pengtao, Zhang, Hao, Chen, Jinwei, Li, Bo

Image Aesthetic Assessment (IAA) is a vital and intricate task that entails analyzing and assessing an image's aesthetic values, and identifying its highlights and areas for improvement. Traditional methods of IAA often concentrate on a single aesthetic task and suffer from inadequate labeled datasets, thus impairing in-depth aesthetic comprehension. Despite efforts to overcome this challenge through the application of Multi-modal Large Language Models (MLLMs), such models remain underdeveloped for IAA purposes. To address this, we propose a comprehensive aesthetic MLLM capable of nuanced aesthetic insight. Central to our approach is an innovative multi-scale text-guided self-supervised learning technique. This technique features a multi-scale feature alignment module and capitalizes on a wealth of unlabeled data in a self-supervised manner to structurally and functionally enhance aesthetic ability. The empirical evidence indicates that accompanied with extensive instruct-tuning, our model sets new state-of-the-art benchmarks across multiple tasks, including aesthetic scoring, aesthetic commenting, and personalized image aesthetic assessment. Remarkably, it also demonstrates zero-shot learning capabilities in the emerging task of aesthetic suggesting. Furthermore, for personalized image aesthetic assessment, we harness the potential of in-context learning and showcase its inherent advantages.

large language model, machine learning, natural language, (23 more...)

2412.11952

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > East China Sea > Yellow Sea (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Industry:

Media > Photography (1.00)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Montealegre-Mora, Felipe, Boettiger, Carl, Walters, Carl J., Cahill, Christopher L.

Using machine learning to inform harvest control rule design in complex fishery settings

In fishery science, harvest management of size-structured stochastic populations is a long-standing and difficult problem. Rectilinear precautionary policies based on biomass and harvesting reference points have now become a standard approach to this problem. While these standard feedback policies are adapted from analytical or dynamic programming solutions assuming relatively simple ecological dynamics, they are often applied to more complicated ecological settings in the real world. In this paper we explore the problem of designing harvest control rules for partially observed, age-structured, spasmodic fish populations using tools from reinforcement learning (RL) and Bayesian optimization. Our focus is on the case of Walleye fisheries in Alberta, Canada, whose highly variable recruitment dynamics have perplexed managers and ecologists. We optimized and evaluated policies using several complementary performance metrics. The main questions we addressed were: 1. How do standard policies based on reference points perform relative to numerically optimized policies? 2. Can an observation of mean fish weight, in addition to stock biomass, aid policy decisions?

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2412.124

Country:

North America > Canada > Alberta (0.24)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry: Food & Agriculture > Fishing (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.73)