AITopics | Pacific Ocean

Collaborating Authors

Pacific Ocean

In-context Time Series Predictor

arXiv.org Machine LearningMay-23-2024

Recent Transformer-based large language models (LLMs) demonstrate in-context learning ability to perform various functions based solely on the provided context, without updating model parameters. To fully utilize the in-context capabilities in time series forecasting (TSF) problems, unlike previous Transformer-based or LLM-based time series forecasting methods, we reformulate "time series forecasting tasks" as input tokens by constructing a series of (lookback, future) pairs within the tokens. This method aligns more closely with the inherent in-context mechanisms, and is more parameter-efficient without the need of using pre-trained LLM parameters. Furthermore, it addresses issues such as overfitting in existing Transformer-based TSF models, consistently achieving better performance across full-data, few-shot, and zero-shot settings compared to previous architectures.

dataset, ictsp, transformer, (17 more...)

arXiv.org Machine Learning

2405.14982

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Activity Model: A Generative Approach for Human Mobility Pattern Synthesis

Liao, Xishun, He, Brian Yueshuai, Jiang, Qinhua, Kuai, Chenchen, Ma, Jiaqi

arXiv.org Artificial IntelligenceMay-23-2024

Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying semantic interdependency among activities. Moreover, they are also constrained by the data source. These two factors thereby limit their realism and adaptability, respectively. Meanwhile, traditional activity-based models (ABMs) in transportation modeling rely on rigid assumptions and are costly and time-consuming to calibrate, making them difficult to adapt and scale to new regions, especially those regions with limited amount of required conventional travel data. To address these limitations, we develop a novel generative deep learning approach for human mobility modeling and synthesis, using ubiquitous and open-source data. Additionally, the model can be fine-tuned with local data, enabling adaptable and accurate representations of mobility patterns across different regions. The model is evaluated on a nationwide dataset of the United States, where it demonstrates superior performance in generating activity chains that closely follow ground truth distributions. Further tests using state- or city-specific datasets from California, Washington, and Mexico City confirm its transferability. This innovative approach offers substantial potential to advance mobility modeling research, especially in generating human activity chains as input for downstream activity-based mobility simulation models and providing enhanced tools for urban planners and policymakers.

activity chain, activity type, dataset, (15 more...)

arXiv.org Artificial Intelligence

2405.17468

Country:

North America > Mexico > Mexico City > Mexico City (0.25)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Washington (0.14)
(4 more...)

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Transportation (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Information Technology (0.93)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

Zhou, Kun, Zhang, Beichen, Wang, Jiapeng, Chen, Zhipeng, Zhao, Wayne Xin, Sha, Jing, Sheng, Zhichao, Wang, Shijin, Wen, Ji-Rong

arXiv.org Artificial IntelligenceMay-23-2024

Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications. To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (\eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis. To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. To achieve it, we create a dataset using GPT-4 to distill its data synthesis capability into the small LLM. Concretely, we craft a set of prompts based on human education stages to guide GPT-4, to synthesize problems covering diverse math knowledge and difficulty levels. Besides, we adopt the gradient-based influence estimation method to select the most valuable math-related texts. The both are fed into GPT-4 for creating the knowledge distillation dataset to train the small LLM. We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data. Experimental results have shown that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, under both natural language reasoning and tool manipulation settings. Our code and data will be publicly released in \url{https://github.com/RUCAIBox/JiuZhang3.0}.

dataset, jiuzhang3, math problem, (15 more...)

arXiv.org Artificial Intelligence

2405.14365

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > K-12 Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning

Liu, Peiyuan, Guo, Hang, Dai, Tao, Li, Naiqi, Bao, Jigang, Ren, Xudong, Jiang, Yong, Xia, Shu-Tao

arXiv.org Artificial IntelligenceMay-23-2024

Deep learning (e.g., Transformer) has been widely and successfully used in multivariate time series forecasting (MTSF). Unlike existing methods that focus on training models from a single modal of time series input, large language models (LLMs) based MTSF methods with cross-modal text and time series input have recently shown great superiority, especially with limited temporal data. However, current LLM-based MTSF methods usually focus on adapting and fine-tuning LLMs, while neglecting the distribution discrepancy between textual and temporal input tokens, thus leading to sub-optimal performance. To address this issue, we propose a novel Cross-Modal LLM Fine-Tuning (CALF) framework for MTSF by reducing the distribution discrepancy between textual and temporal data, which mainly consists of the temporal target branch with temporal input and the textual source branch with aligned textual input. To reduce the distribution discrepancy, we develop the cross-modal match module to first align cross-modal input distributions. Additionally, to minimize the modality distribution gap in both feature and output spaces, feature regularization loss is developed to align the intermediate features between the two branches for better weight updates, while output consistency loss is introduced to allow the output representations of both branches to correspond effectively. Thanks to the modality alignment, CALF establishes state-of-the-art performance for both long-term and short-term forecasting tasks with low computational complexity, and exhibiting favorable few-shot and zero-shot abilities similar to that in LLMs. Code is available at \url{https://github.com/Hank0626/LLaTA}.

dataset, forecasting, series forecasting, (13 more...)

arXiv.org Artificial Intelligence

2403.073

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Cheng, Xin, Chen, Xiuying, Li, Shuqi, Luo, Di, Wang, Xun, Zhao, Dongyan, Yan, Rui

arXiv.org Artificial IntelligenceMay-22-2024

Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individual series into separate variate tokens. The former method faces challenges in learning variate-centric representations, while the latter risks missing essential temporal information critical for accurate forecasting. In our work, we introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions based on a vanilla Transformer. We regard the input time series data as a grid, where the $x$-axis represents the time steps and the $y$-axis represents the variates. A vertical slicing of this grid combines the variates at each time step into a \textit{time token}, while a horizontal slicing embeds the individual series across all time steps into a \textit{variate token}. Correspondingly, a \textit{horizontal attention mechanism} focuses on time tokens to comprehend the correlations between data at various time steps, while a \textit{vertical}, variate-aware \textit{attention} is employed to grasp multivariate correlations. This combination enables efficient processing of information across both time and variate dimensions, thereby enhancing the model's analytical strength. % We also integrate the patch technique, segmenting time tokens into subseries-level patches, ensuring that local semantic information is retained in the embedding. The GridTST model consistently delivers state-of-the-art performance across various real-world datasets.

dataset, forecasting, gridtst, (14 more...)

arXiv.org Artificial Intelligence

2405.1381

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.89)
Energy (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Interpretable Multivariate Time Series Forecasting Using Neural Fourier Transform

Koren, Noam, Radinsky, Kira

arXiv.org Artificial IntelligenceMay-22-2024

Time series forecasting, the process of predicting future values based on observed historical data, is a pivotal task in various fields such as economics, finance, medicine, and environmental science. This task becomes particularly complex when dealing with multivariate temporal data, where predicting future values involves understanding the intricate interdependencies among multiple variables. This is essential in scenarios like weather forecasting, where variables such as temperature and humidity are interlinked, or in financial markets, where the stock prices of interconnected companies are observed. Recent advancements in time series forecasting have transitioned from conventional statistical approaches [1, 2] to sophisticated machine learning techniques, notably deep learning [3, 4, 5]. However, the field still grapples with a scarcity of both precise and interpretable models for multivariate time series prediction.

dataset, forecasting, fourier transform, (12 more...)

arXiv.org Artificial Intelligence

2405.13812

Country:

Europe > Italy (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Banking & Finance > Trading (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Data Science > Data Quality > Data Transformation (0.72)

Add feedback

Measuring Social Norms of Large Language Models

Yuan, Ye, Tang, Kexin, Shen, Jianhao, Zhang, Ming, Wang, Chenguang

arXiv.org Artificial IntelligenceMay-22-2024

We present a new challenge to examine whether large language models understand social norms. In contrast to existing datasets, our dataset requires a fundamental understanding of social norms to solve. Our dataset features the largest set of social norm skills, consisting of 402 skills and 12,383 questions covering a wide set of social norms ranging from opinions and arguments to culture and laws. We design our dataset according to the K-12 curriculum. This enables the direct comparison of the social understanding of large language models to humans, more specifically, elementary students. While prior work generates nearly random accuracy on our benchmark, recent large language models such as GPT3.5-Turbo and LLaMA2-Chat are able to improve the performance significantly, only slightly below human performance. We then propose a multi-agent framework based on large language models to improve the models' ability to understand social norms. This method further improves large language models to be on par with humans. Given the increasing adoption of large language models in real-world applications, our finding is particularly important and presents a unique direction for future improvements.

5-turbo socialagent gpt3, language art description, social study description, (14 more...)

arXiv.org Artificial Intelligence

2404.02491

Country:

Asia > China (0.14)
Europe > Russia (0.14)
Asia > Russia (0.14)
(64 more...)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)

Industry:

Transportation (1.00)
Media > Film (1.00)
Leisure & Entertainment > Sports (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

No One Truly Knows How AI Systems Work. A New Discovery Could Change That

TIME - TechMay-21-2024, 15:00:00 GMT

Today's artificial intelligence is often described as a "black box." AI developers don't write explicit rules for these systems; instead, they feed in vast quantities of data and the systems learn on their own to spot patterns. But the inner workings of the AI models remain opaque, and efforts to peer inside them to check exactly what is happening haven't progressed very far. Beneath the surface, neural networks--today's most powerful type of AI--consist of billions of artificial "neurons" represented as decimal-point numbers. Nobody truly understands what they mean, or how they work.

artificial intelligence, machine learning, neuron, (16 more...)

TIME - Tech

Country: Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.05)

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)

Add feedback

Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models

Bai, Yang, Pei, Ge, Gu, Jindong, Yang, Yong, Ma, Xingjun

arXiv.org Artificial IntelligenceMay-20-2024

Large language models (LLMs) have achieved remarkable performance on a wide range of tasks. However, recent studies have shown that LLMs can memorize training data and simple repeated tokens can trick the model to leak the data. In this paper, we take a step further and show that certain special characters or their combinations with English letters are stronger memory triggers, leading to more severe data leakage. The intuition is that, since LLMs are trained with massive data that contains a substantial amount of special characters (e.g. structural symbols {, } of JSON files, and @, # in emails and online posts), the model may memorize the co-occurrence between these special characters and the raw texts. This motivates us to propose a simple but effective Special Characters Attack (SCA) to induce training data leakage. Our experiments verify the high effectiveness of SCA against state-of-the-art LLMs: they can leak diverse training data, such as code corpus, web pages, and personally identifiable information, and sometimes generate non-stop outputs as a byproduct. We further show that the composition of the training data corpus can be revealed by inspecting the leaked data -- one crucial piece of information for pre-training high-performance LLMs. Our work can help understand the sensitivity of LLMs to special characters and identify potential areas for improvement.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.0599

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Oceania > Australia (0.04)
(34 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Voting & Elections (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Alternators For Sequence Modeling

Rezaei, Mohammad Reza, Dieng, Adji Bousso

arXiv.org Machine LearningMay-20-2024

This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively, over a cycle. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. When alternators are used as generative models, the FTN produces interpretable low-dimensional latent variables that capture the dynamics governing the observations. When alternators are used as sequence-to-sequence predictors, the FTN learns to predict the observed features. In both cases, the OTN learns to produce sequences that match the data. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience, to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and outperform strong baselines such as neural ODEs and diffusion models in the domains we studied.

alternator, sequence, trajectory, (13 more...)

arXiv.org Machine Learning

2405.11848

Country:

North America > Canada > Ontario > Toronto (0.14)
Pacific Ocean (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback