AITopics | subtoken

Collaborating Authors

subtoken

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Identifying attributions of causality in political text

Garcia-Corral, Paulina

arXiv.org Artificial IntelligenceDec-4-2025

Causal attributions are claims that link an outcome to a cause (Kirfel et al., 2022). Causality is so embedded in human reasoning that causal attributions have been shown to emerge immediately in times of crisis (Graham and Singh, 2024), as well as offered spontaneously when people are asked to think about political issues (Iyengar, 1987). Furthermore, because causal attributions are relational, rather than treating actors and events as isolated, they highlight the underlying relational reasoning people use to connect events, assign responsibility, and justify actions (V ossing, 2023). Framing is fundamentally a process of making causal explanations, or communicating causal attributions: "[Frames] define problems-determine what a causal agent is doing with what costs and benefits, usually measured in terms of common cultural values; diagnose causes-identify the forces creating the problem; make moral judgments-evaluate causal agents and their effects; and suggest remedies-offer and justify treatments for the problems and predict their likely effects."(Entman,

data mining, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2512.03214

Country:

Europe (1.00)
Asia > Middle East > Palestine (0.30)

Genre: Research Report > Experimental Study (0.94)

Industry:

Government (1.00)
Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

ea96efc03b9a050d895110db8c4af057-Supplemental.pdf

Neural Information Processing SystemsAug-18-2025, 11:49:49 GMT

artificial intelligence, machine learning, nexttoken child, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Kronos: A Foundation Model for the Language of Financial Markets

Shi, Yu, Fu, Zongliang, Chen, Shuo, Zhao, Bohan, Xu, Wei, Zhang, Changshui, Li, Jian

arXiv.org Artificial IntelligenceAug-6-2025

The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to financial candlestick (K-line) data remains limited, often underperforming non-pre-trained architectures. Moreover, existing TSFMs often overlook crucial downstream tasks such as volatility prediction and synthetic data generation. To address these limitations, we propose Kronos, a unified, scalable pre-training framework tailored to financial K-line modeling. Kronos introduces a specialized tokenizer that discretizes continuous market information into token sequences, preserving both price dynamics and trade activity patterns. We pre-train Kronos using an autoregressive objective on a massive, multi-market corpus of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. Kronos excels in a zero-shot setting across a diverse set of financial tasks. On benchmark datasets, Kronos boosts price series forecasting RankIC by 93% over the leading TSFM and 87% over the best non-pre-trained baseline. It also achieves a 9% lower MAE in volatility forecasting and a 22% improvement in generative fidelity for synthetic K-line sequences. These results establish Kronos as a robust, versatile foundation model for end-to-end financial time series analysis. Our pre-trained model is publicly available at https://github.com/shiyu-coder/Kronos.

large language model, machine learning, rankic 0, (17 more...)

arXiv.org Artificial Intelligence

2508.02739

Country:

Europe (1.00)
Asia > China (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse

Guo, Tianyu, Dong, Hande, Leng, Yichong, Liu, Feng, Lin, Cheater, Xiao, Nong, Zhang, Xianwei

arXiv.org Artificial IntelligenceMay-30-2025

Large language models (LLMs) are often used for infilling tasks, which involve predicting or generating missing information in a given text. These tasks typically require multiple interactions with similar context. To reduce the computation of repeated historical tokens, cross-request key-value (KV) cache reuse, a technique that stores and reuses intermediate computations, has become a crucial method in multi-round interactive services. However, in infilling tasks, the KV cache reuse is often hindered by the structure of the prompt format, which typically consists of a prefix and suffix relative to the insertion point. Specifically, the KV cache of the prefix or suffix part is frequently invalidated as the other part (suffix or prefix) is incrementally generated. To address the issue, we propose EFIM, a transformed prompt format of FIM to unleash the performance potential of KV cache reuse. Although the transformed prompt can solve the inefficiency, it exposes subtoken generation problems in current LLMs, where they have difficulty generating partial words accurately. Therefore, we introduce a fragment tokeniza-tion training method which splits text into multiple fragments before tokenization during data processing. Experiments on two representative LLMs show that LLM serving with EFIM can lower the latency by 52% and improve the throughput by 98% while maintaining the original infilling capability.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.21889

Country: Asia > China (0.47)

Genre: Research Report (0.50)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Transformer-based Named Entity Recognition with Combined Data Representation

Marcińczuk, Michał

arXiv.org Artificial IntelligenceJun-25-2024

This study examines transformer-based models and their effectiveness in named entity recognition tasks. The study investigates data representation strategies, including single, merged, and context, which respectively use one sentence, multiple sentences, and sentences joined with attention to context per vector. Analysis shows that training models with a single strategy may lead to poor performance on different data representations. To address this limitation, the study proposes a combined training procedure that utilizes all three strategies to improve model stability and adaptability. The results of this approach are presented and discussed for four languages (English, Polish, Czech, and German) across various datasets, demonstrating the effectiveness of the combined strategy.

data representation, dataset, representation, (15 more...)

arXiv.org Artificial Intelligence

2406.17474

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts

Voznyuk, Anastasia, Konovalov, Vasily

arXiv.org Artificial IntelligenceMay-17-2024

The Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection shared task in the SemEval-2024 competition aims to tackle the problem of misusing collaborative human-AI writing. Although there are a lot of existing detectors of AI content, they are often designed to give a binary answer and thus may not be suitable for more nuanced problem of finding the boundaries between human-written and machine-generated texts, while hybrid human-AI writing becomes more and more popular. In this paper, we address the boundary detection problem. Particularly, we present a pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. We receive new best MAE score, according to the leaderboard of the competition, with this pipeline.

computational linguistic, dataset, detection, (15 more...)

arXiv.org Artificial Intelligence

2405.10629

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Mexico (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry: Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Epicure: Distilling Sequence Model Predictions into Patterns

Allamanis, Miltiadis, Barr, Earl T.

arXiv.org Artificial IntelligenceAug-16-2023

Most machine learning models predict a probability distribution over concrete outputs and struggle to accurately predict names over high entropy sequence distributions. Here, we explore finding abstract, high-precision patterns intrinsic to these predictions in order to make abstract predictions that usefully capture rare sequences. In this short paper, we present Epicure, a method that distils the predictions of a sequence model, such as the output of beam search, into simple patterns. Epicure maps a model's predictions into a lattice that represents increasingly more general patterns that subsume the concrete model predictions. On the tasks of predicting a descriptive name of a function given the source code of its body and detecting anomalous names given a function, we show that Epicure yields accurate naming patterns that match the ground truth more often compared to just the highest probability model prediction. For a false alarm rate of 10%, Epicure predicts patterns that match 61% more ground-truth names compared to the best model prediction, making Epicure well-suited for scenarios that require high precision.

machine learning, natural language, prediction, (17 more...)

arXiv.org Artificial Intelligence

2308.08203

Country: Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Context-NER : Contextual Phrase Generation at Scale

Gupta, Himanshu, Verma, Shreyas, Mashetty, Santosh, Mishra, Swaroop

arXiv.org Artificial IntelligenceJun-8-2023

Named Entity Recognition (NER) has seen significant progress in recent years, with numerous state-of-the-art (SOTA) models achieving high performance. However, very few studies have focused on the generation of entities' context. In this paper, we introduce CONTEXT-NER, a task that aims to generate the relevant context for entities in a sentence, where the context is a phrase describing the entity but not necessarily present in the sentence. To facilitate research in this task, we also present the EDGAR10-Q dataset, which consists of annual and quarterly reports from the top 1500 publicly traded companies. The dataset is the largest of its kind, containing 1M sentences, 2.8M entities, and an average of 35 tokens per sentence, making it a challenging dataset. We propose a baseline approach that combines a phrase generation algorithm with inferencing using a 220M language model, achieving a ROUGE-L score of 27% on the test split. Additionally, we perform a one-shot inference with ChatGPT, which obtains a 30% ROUGE-L, highlighting the difficulty of the dataset. We also evaluate models such as T5 and BART, which achieve a maximum ROUGE-L of 49% after supervised finetuning on EDGAR10-Q. We also find that T5-large, when pre-finetuned on EDGAR10-Q, achieve SOTA results on downstream finance tasks such as Headline, FPB, and FiQA SA, outperforming vanilla version by 10.81 points. To our surprise, this 66x smaller pre-finetuned model also surpasses the finance-specific LLM BloombergGPT-50B by 15 points. We hope that our dataset and generated artifacts will encourage further research in this direction, leading to the development of more sophisticated language models for financial text analysis

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2109.08079

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology (1.00)
Government (1.00)
Banking & Finance > Trading (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Probing Pretrained Models of Source Code

Troshin, Sergey, Chirkova, Nadezhda

arXiv.org Artificial IntelligenceNov-17-2022

Deep learning models are widely used for solving challenging code processing tasks, such as code generation or code summarization. Traditionally, a specific model architecture was carefully built to solve a particular code processing task. However, recently general pretrained models such as CodeBERT or CodeT5 have been shown to outperform task-specific models in many applications. While pretrained models are known to learn complex patterns from data, they may fail to understand some properties of source code. To test diverse aspects of code understanding, we introduce a set of diagnosting probing tasks. We show that pretrained models of code indeed contain information about code syntactic structure and correctness, the notions of identifiers, data flow and namespaces, and natural language naming. We also investigate how probing results are affected by using code-specific pretraining objectives, varying the model size, or finetuning.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2202.08975

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

Liu, Aiwei, Yu, Honghai, Hu, Xuming, Li, Shu'ang, Lin, Li, Ma, Fukun, Yang, Yawen, Wen, Lijie

arXiv.org Artificial IntelligenceOct-30-2022

We propose the first character-level white-box adversarial attack method against transformer models. The intuition of our method comes from the observation that words are split into subtokens before being fed into the transformer models and the substitution between two close subtokens has a similar effect to the character modification. Our method mainly contains three steps. First, a gradient-based method is adopted to find the most vulnerable words in the sentence. Then we split the selected words into subtokens to replace the origin tokenization result from the transformer tokenizer. Finally, we utilize an adversarial loss to guide the substitution of attachable subtokens in which the Gumbel-softmax trick is introduced to ensure gradient propagation. Meanwhile, we introduce the visual and length constraint in the optimization process to achieve minimum character modifications. Extensive experiments on both sentence-level and token-level tasks demonstrate that our method could outperform the previous attack methods in terms of success rate and edit distance. Furthermore, human evaluation verifies our adversarial examples could preserve their origin labels.

machine learning, natural language, subtoken, (20 more...)

arXiv.org Artificial Intelligence

2210.17004

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
(14 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.93)
(2 more...)

Add feedback