Goto

Collaborating Authors

 cpt


Convex Polytope Trees

Neural Information Processing Systems

A decision tree is commonly restricted to use a single hyperplane to split the covariate space at each of its internal nodes. It often requires a large number of nodes to achieve high accuracy. In this paper, we propose convex polytope trees (CPT) to expand the family of decision trees by an interpretable generalization of their decision boundary. The splitting function at each node of CPT is based on the logical disjunction of a community of differently weighted probabilistic linear decision-makers, which also geometrically corresponds to a convex polytope in the covariate space. We use a nonparametric Bayesian prior at each node to infer the community's size, encouraging simpler decision boundaries by shrinking the number of polytope facets. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given. We empirically demonstrate the efficiency of CPT over existing state-of-the-art decision trees in several real-world classification and regression tasks from diverse domains.


Prospect Theory in Physical Human-Robot Interaction: A Pilot Study of Probability Perception

Lin, Yixiang, Yang, Tiancheng, Eden, Jonathan, Tan, Ying

arXiv.org Artificial Intelligence

Understanding how humans respond to uncertainty is critical for designing safe and effective physical human-robot interaction (pHRI), as physically working with robots introduces multiple sources of uncertainty, including trust, comfort, and perceived safety. Conventional pHRI control frameworks typically build on optimal control theory, which assumes that human actions minimize a cost function; however, human behavior under uncertainty often departs from such optimal patterns. To address this gap, additional understanding of human behavior under uncertainty is needed. This pilot study implemented a physically coupled target-reaching task in which the robot delivered assistance or disturbances with systematically varied probabilities (10\% to 90\%). Analysis of participants' force inputs and decision-making strategies revealed two distinct behavioral clusters: a "trade-off" group that modulated their physical responses according to disturbance likelihood, and an "always-compensate" group characterized by strong risk aversion irrespective of probability. These findings provide empirical evidence that human decision-making in pHRI is highly individualized and that the perception of probability can differ to its true value. Accordingly, the study highlights the need for more interpretable behavioral models, such as cumulative prospect theory (CPT), to more accurately capture these behaviors and inform the design of future adaptive robot controllers.


Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study

Chen, Lifeng, Lai, Ryan, Liu, Tianming

arXiv.org Artificial Intelligence

Adapting large language models (LLMs) to low-resource languages remains a major challenge due to data scarcity and cross-lingual drift. This work presents a two-stage adaptation of Qwen2.5-3B to Tibetan, a morphologically rich and underrepresented language. We employ Continual Pretraining (CPT) to establish Tibetan linguistic grounding, followed by Supervised Fine-Tuning (SFT) for task and translation specialization. Empirical evaluations demonstrate a consistent decrease in perplexity (from 2.98 $\rightarrow$ 1.54) and substantial improvements in Chinese$\rightarrow$Tibetan translation quality (BLEU: 0.046 $\rightarrow$ 0.261; chrF: 2.2 $\rightarrow$ 6.6). Layer-wise analysis across 435 layers in Qwen3-4B reveals that adaptation primarily concentrates on embedding and output heads, with mid--late MLP projections encoding domain-specific transformations. Our findings suggest that CPT constructs a Tibetan semantic manifold while SFT sharpens task alignment with minimal representational disruption. This study provides the first quantitative exploration of Tibetan adaptation dynamics for LLMs, and offers an open, reproducible framework for extending multilingual foundation models to low-resource settings.


Cisco Time Series Model Technical Report

Gou, Liang, Khare, Archit, Pabolu, Praneet, Patel, Prachi, Ross, Joseph, Shen, Hercy, Yuhan, null, Song, null, Sun, Jingze, Curtis, Kristal, Dharnidharka, Vedant, Mathur, Abhinav, Yang, Hao

arXiv.org Machine Learning

Modern LLMs are capable of learning complex statistical properties of language from a vast corpus of text. Rather than being trained to emulate a particular style or perform a particular task, they learn structure across diverse examples of token sequences, and the learned representations can be transferred to many downstream tasks and applications. The main idea of a time series foundation model (TSFM) is to apply the same playbook - including the transformer architecture that has revolutionized natural language processing - to sequences of numerical data, i.e., time series. Our present focus is to train a univariate TSFM capable of high-quality zero-shot forecasting, with emphasis on time series arising in certain business domains (initially, observability). Thus, having been exposed to patterns across many time series during training, given a segment of a new (unseen) time series, the TSFM is expected to predict its subsequent segment without any auxiliary parameter adjustment or fitting. Architectural differences among TSFMs can be found in their approaches to tokenization, transformer configuration, and prediction heads. PatchTST [Nie+23] introduces the idea of a time series patch as the analogue of a token, uses a linear transformation of a patch as a replacement for the token embedding, and finally applies a standard transformer encoder architecture. TimesFM [Das+24] uses a residual block to embed time series patches, enabling learning of more complex representations, and applies a decoder-only architecture. Chronos [Ans+24] tokenizes individual data points via scaling and then applies the (encoder-decoder) T5 architecture [Raf+20], notably formulating forecasting as a classification problem; subsequent versions (Chronos-Bolt, Chronos-2 [Ans+25]) utilize patching and "meta features" before applying transformer layers, and Chronos-2 uses a T5 encoder.


Intervention and Conditioning in Causal Bayesian Networks

Neural Information Processing Systems

Causal models are crucial for understanding complex systems and identifying causal relationships among variables. Even though causal models are extremely popular, conditional probability calculation of formulas involving interventions pose significant challenges. In case of Causal Bayesian Networks (CBNs), Pearl assumes autonomy of mechanisms that determine interventions to calculate a range of probabilities. We show that by making simple yet often realistic independence assumptions, it is possible to uniquely estimate the probability of an interventional formula (including the well-studied notions of probability of sufficiency and necessity). We discuss when these assumptions are appropriate. Importantly, in many cases of interest, when the assumptions are appropriate, these probability estimates can be evaluated using observational data, which carries immense significance in scenarios where conducting experiments is impractical or unfeasible.



We thank the reviewers for their kind comments, and for their consensus view that our approach of porting decision

Neural Information Processing Systems

We agree with the reviewer that human risk measures are not necessarily fair. Whether this will lead to a "fairer" model is what We thank the reviewer for pointing out the work of Agarwal et. In contrast to EHRM, the authors' method requires access to an explicit set of protected attributes during training. Our primary goal is to introduce CPT inspired risk measures and study the consequences of its use within ML. EHRM is an interesting open problem.



SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation

Kapusuzoglu, Berkcan, Chakraborty, Supriyo, Ni, Renkun, Rawls, Stephen, Sahu, Sambit

arXiv.org Artificial Intelligence

Abstract--Large language models (LLMs) adapted to financial domains often suffer from catastrophic forgetting of general reasoning capabilities essential for customer interactions and complex financial analysis. Our method approximates layer-wise impact on external benchmarks through post-hoc analysis, then selectively freezes or restores transformer layers via spherical interpolation merging. Applied to LLaMA-3.1-8B for financial tasks, SPEAR-MM achieves 91.2% retention of general capabilities versus 69.7% for standard continual pretraining, while maintaining 94% of domain adaptation gains. The approach provides interpretable trade-off control and reduces computational costs by 90% crucial for resource-constrained financial institutions. Financial institutions increasingly require domain-specific language models that can understand regulatory documents, analyze market data, and provide accurate customer support while maintaining broad reasoning capabilities for complex financial scenarios.


Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token

Zychlinski, Shaked, Kainan, Yuval

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are susceptible to jailbreak attacks where malicious prompts are disguised using ciphers and character-level encodings to bypass safety guardrails. While these guardrails often fail to interpret the encoded content, the underlying models can still process the harmful instructions. We introduce CPT-Filtering, a novel, model-agnostic with negligible-costs and near-perfect accuracy guardrail technique that aims to mitigate these attacks by leveraging the intrinsic behavior of Byte-Pair Encoding (BPE) tokenizers. Our method is based on the principle that tokenizers, trained on natural language, represent out-of-distribution text, such as ciphers, using a significantly higher number of shorter tokens. Our technique uses a simple yet powerful artifact of using language models: the average number of Characters Per Token (CPT) in the text. This approach is motivated by the high compute cost of modern methods - relying on added modules such as dedicated LLMs or perplexity models. We validate our approach across a large dataset of over 100,000 prompts, testing numerous encoding schemes with several popular tokenizers. Our experiments demonstrate that a simple CPT threshold robustly identifies encoded text with high accuracy, even for very short inputs. CPT-Filtering provides a practical defense layer that can be immediately deployed for real-time text filtering and offline data curation.