AITopics

Country: North America > United States > Texas (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(3 more...)

Neural Information Processing SystemsFeb-17-2026, 02:44:19 GMT

Intervention and Conditioning in Causal Bayesian Networks

Causal models are crucial for understanding complex systems and identifying causal relationships among variables. Even though causal models are extremely popular, conditional probability calculation of formulas involving interventions pose significant challenges. In case of Causal Bayesian Networks (CBNs), Pearl assumes autonomy of mechanisms that determine interventions to calculate a range of probabilities. We show that by making simple yet often realistic independence assumptions, it is possible to uniquely estimate the probability of an interventional formula (including the well-studied notions of probability of sufficiency and necessity). We discuss when these assumptions are appropriate. Importantly, in many cases of interest, when the assumptions are appropriate, these probability estimates can be evaluated using observational data, which carries immense significance in scenarios where conducting experiments is impractical or unfeasible.

artificial intelligence, bayesian inference, machine learning, (20 more...)

Country:

North America > Greenland (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Neural Information Processing SystemsFeb-17-2026, 00:15:48 GMT

9e4b14eb6f16fe7b5818a8d633a0606a-Paper-Conference.pdf

artificial intelligence, causal effect, machine learning, (16 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Asia > Japan (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 23:15:11 GMT

Con

artificial intelligence, machine learning, section 3, (15 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Texas > Brazos County > College Station (0.04)
Europe > Switzerland > Jura > Delémont (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-23-2025, 21:58:58 GMT

Convex Polytope Trees

A decision tree is commonly restricted to use a single hyperplane to split the covariate space at each of its internal nodes. It often requires a large number of nodes to achieve high accuracy. In this paper, we propose convex polytope trees (CPT) to expand the family of decision trees by an interpretable generalization of their decision boundary. The splitting function at each node of CPT is based on the logical disjunction of a community of differently weighted probabilistic linear decision-makers, which also geometrically corresponds to a convex polytope in the covariate space. We use a nonparametric Bayesian prior at each node to infer the community's size, encouraging simpler decision boundaries by shrinking the number of polytope facets. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given. We empirically demonstrate the efficiency of CPT over existing state-of-the-art decision trees in several real-world classification and regression tasks from diverse domains.

convex polytope tree, decision tree, name change, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-10-2025

Prospect Theory in Physical Human-Robot Interaction: A Pilot Study of Probability Perception

Lin, Yixiang, Yang, Tiancheng, Eden, Jonathan, Tan, Ying

Understanding how humans respond to uncertainty is critical for designing safe and effective physical human-robot interaction (pHRI), as physically working with robots introduces multiple sources of uncertainty, including trust, comfort, and perceived safety. Conventional pHRI control frameworks typically build on optimal control theory, which assumes that human actions minimize a cost function; however, human behavior under uncertainty often departs from such optimal patterns. To address this gap, additional understanding of human behavior under uncertainty is needed. This pilot study implemented a physically coupled target-reaching task in which the robot delivered assistance or disturbances with systematically varied probabilities (10\% to 90\%). Analysis of participants' force inputs and decision-making strategies revealed two distinct behavioral clusters: a "trade-off" group that modulated their physical responses according to disturbance likelihood, and an "always-compensate" group characterized by strong risk aversion irrespective of probability. These findings provide empirical evidence that human decision-making in pHRI is highly individualized and that the perception of probability can differ to its true value. Accordingly, the study highlights the need for more interpretable behavioral models, such as cumulative prospect theory (CPT), to more accurately capture these behaviors and inform the design of future adaptive robot controllers.

artificial intelligence, machine learning, participant, (17 more...)

2512.08481

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

arXiv.org Artificial IntelligenceDec-4-2025

Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study

Chen, Lifeng, Lai, Ryan, Liu, Tianming

Adapting large language models (LLMs) to low-resource languages remains a major challenge due to data scarcity and cross-lingual drift. This work presents a two-stage adaptation of Qwen2.5-3B to Tibetan, a morphologically rich and underrepresented language. We employ Continual Pretraining (CPT) to establish Tibetan linguistic grounding, followed by Supervised Fine-Tuning (SFT) for task and translation specialization. Empirical evaluations demonstrate a consistent decrease in perplexity (from 2.98 $\rightarrow$ 1.54) and substantial improvements in Chinese$\rightarrow$Tibetan translation quality (BLEU: 0.046 $\rightarrow$ 0.261; chrF: 2.2 $\rightarrow$ 6.6). Layer-wise analysis across 435 layers in Qwen3-4B reveals that adaptation primarily concentrates on embedding and output heads, with mid--late MLP projections encoding domain-specific transformations. Our findings suggest that CPT constructs a Tibetan semantic manifold while SFT sharpens task alignment with minimal representational disruption. This study provides the first quantitative exploration of Tibetan adaptation dynamics for LLMs, and offers an open, reproducible framework for extending multilingual foundation models to low-resource settings.

large language model, machine learning, natural language, (18 more...)

2512.03976

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Machine LearningNov-26-2025

Cisco Time Series Model Technical Report

Gou, Liang, Khare, Archit, Pabolu, Praneet, Patel, Prachi, Ross, Joseph, Shen, Hercy, Yuhan, null, Song, null, Sun, Jingze, Curtis, Kristal, Dharnidharka, Vedant, Mathur, Abhinav, Yang, Hao

Modern LLMs are capable of learning complex statistical properties of language from a vast corpus of text. Rather than being trained to emulate a particular style or perform a particular task, they learn structure across diverse examples of token sequences, and the learned representations can be transferred to many downstream tasks and applications. The main idea of a time series foundation model (TSFM) is to apply the same playbook - including the transformer architecture that has revolutionized natural language processing - to sequences of numerical data, i.e., time series. Our present focus is to train a univariate TSFM capable of high-quality zero-shot forecasting, with emphasis on time series arising in certain business domains (initially, observability). Thus, having been exposed to patterns across many time series during training, given a segment of a new (unseen) time series, the TSFM is expected to predict its subsequent segment without any auxiliary parameter adjustment or fitting. Architectural differences among TSFMs can be found in their approaches to tokenization, transformer configuration, and prediction heads. PatchTST [Nie+23] introduces the idea of a time series patch as the analogue of a token, uses a linear transformation of a patch as a replacement for the token embedding, and finally applies a standard transformer encoder architecture. TimesFM [Das+24] uses a residual block to embed time series patches, enabling learning of more complex representations, and applies a decoder-only architecture. Chronos [Ans+24] tokenizes individual data points via scaling and then applies the (encoder-decoder) T5 architecture [Raf+20], notably formulating forecasting as a classification problem; subsequent versions (Chronos-Bolt, Chronos-2 [Ans+25]) utilize patching and "meta features" before applying transformer layers, and Chronos-2 uses a T5 encoder.

dataset, resolution, time sery, (16 more...)

arXiv.org Machine Learning

2511.19841

Country: Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Kapusuzoglu, Berkcan, Chakraborty, Supriyo, Ni, Renkun, Rawls, Stephen, Sahu, Sambit

SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation

arXiv.org Artificial IntelligenceNov-12-2025

Abstract--Large language models (LLMs) adapted to financial domains often suffer from catastrophic forgetting of general reasoning capabilities essential for customer interactions and complex financial analysis. Our method approximates layer-wise impact on external benchmarks through post-hoc analysis, then selectively freezes or restores transformer layers via spherical interpolation merging. Applied to LLaMA-3.1-8B for financial tasks, SPEAR-MM achieves 91.2% retention of general capabilities versus 69.7% for standard continual pretraining, while maintaining 94% of domain adaptation gains. The approach provides interpretable trade-off control and reduces computational costs by 90% crucial for resource-constrained financial institutions. Financial institutions increasingly require domain-specific language models that can understand regulatory documents, analyze market data, and provide accurate customer support while maintaining broad reasoning capabilities for complex financial scenarios.

artificial intelligence, large language model, natural language, (13 more...)

2511.085

Country: North America > United States (0.14)

Genre: Research Report (0.85)

Industry: Banking & Finance (0.89)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Zychlinski, Shaked, Kainan, Yuval

Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token

arXiv.org Artificial IntelligenceNov-3-2025

Large Language Models (LLMs) are susceptible to jailbreak attacks where malicious prompts are disguised using ciphers and character-level encodings to bypass safety guardrails. While these guardrails often fail to interpret the encoded content, the underlying models can still process the harmful instructions. We introduce CPT-Filtering, a novel, model-agnostic with negligible-costs and near-perfect accuracy guardrail technique that aims to mitigate these attacks by leveraging the intrinsic behavior of Byte-Pair Encoding (BPE) tokenizers. Our method is based on the principle that tokenizers, trained on natural language, represent out-of-distribution text, such as ciphers, using a significantly higher number of shorter tokens. Our technique uses a simple yet powerful artifact of using language models: the average number of Characters Per Token (CPT) in the text. This approach is motivated by the high compute cost of modern methods - relying on added modules such as dedicated LLMs or perplexity models. We validate our approach across a large dataset of over 100,000 prompts, testing numerous encoding schemes with several popular tokenizers. Our experiments demonstrate that a simple CPT threshold robustly identifies encoded text with high accuracy, even for very short inputs. CPT-Filtering provides a practical defense layer that can be immediately deployed for real-time text filtering and offline data curation.

large language model, machine learning, natural language, (20 more...)

2510.26847

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)