Goto

Collaborating Authors

 inflection point


Speculative Sampling for Parametric Temporal Point Processes

Biloš, Marin, Schneider, Anderson, Nevmyvaka, Yuriy

arXiv.org Artificial Intelligence

Temporal point processes are powerful generative models for event sequences that capture complex dependencies in time-series data. They are commonly specified using autoregressive models that learn the distribution of the next event from the previous events. This makes sampling inherently sequential, limiting efficiency. In this paper, we propose a novel algorithm based on rejection sampling that enables exact sampling of multiple future values from existing TPP models, in parallel, and without requiring any architectural changes or retraining. Besides theoretical guarantees, our method demonstrates empirical speedups on real-world datasets, bridging the gap between expressive modeling and efficient parallel generation for large-scale TPP applications.


The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas

Xu, Baixuan, Zheng, Tianshi, Wang, Zhaowei, Tsang, Hong Ting, Wang, Weiqi, Fang, Tianqing, Song, Yangqiu

arXiv.org Artificial Intelligence

Enabling LLMs to effectively operate long-horizon task which requires long-term planning and multiple interactions is essential for open-world autonomy. Conventional methods adopt planning with actions where a executable action list would be provided as reference. However, this action representation choice would be impractical when the environment action space is combinatorial exploded (e.g., open-ended real world). This naturally leads to a question: As environmental action space scales, what is the optimal action representation for long-horizon agents? In this paper, we systematically study the effectiveness of two different action representations. The first one is conventional planning with actions (PwA) which is predominantly adopted for its effectiveness on existing benchmarks. The other one is planning with schemas (PwS) which instantiate an action schema into action lists (e.g., "move [OBJ] to [OBJ]" -> "move apple to desk") to ensure concise action space and reliable scalability. This alternative is motivated by its alignment with human cognition and its compliance with environment-imposed action format restriction. We propose cognitive bandwidth perspective as a conceptual framework to qualitatively understand the differences between these two action representations and empirically observe a representation-choice inflection point between ALFWorld (~35 actions) and SciWorld (~500 actions), which serve as evidence of the need for scalable representations. We further conduct controlled experiments to study how the location of this inflection point interacts with different model capacities: stronger planning proficiency shifts the inflection rightward, whereas better schema instantiation shifts it leftward. Finally, noting the suboptimal performance of PwS agents, we provide an actionable guide for building more capable PwS agents for better scalable autonomy.


ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding

Chen, Kehua

arXiv.org Artificial Intelligence

In this paper, we propose a novel video understanding framework, called ChronoForge-RL, which combines Temporal Apex Distillation (T AD) and KeyFrame-aware Group Relative Policy Optimization (KF-GRPO) to tackle these issues. Concretely, we introduce a differentiable keyframe selection mechanism that systematically identifies semantic inflection points through a three-stage process to enhance computational efficiency while preserving temporal information. Then, two particular modules are proposed to enable effective temporal reasoning: Firstly, T AD leverages variation scoring, inflection detection, and prioritized distillation to select the most informative frames. Secondly, we introduce KF-GRPO which implements a contrastive learning paradigm with a saliency-enhanced reward mechanism that explicitly incentivizes models to leverage both frame content and temporal relationships. Finally, our proposed ChronoForge-RL achieves 69.1% on VideoMME and 52.7% on L VBench compared to baseline methods, clearly surpassing previous approaches while enabling our 7B parameter model to achieve performance comparable to 72B parameter alternatives, a 10 improvement in performance-to-parameter ratio.


User Behavior Prediction as a Generic, Robust, Scalable, and Low-Cost Evaluation Strategy for Estimating Generalization in LLMs

Saha, Sougata, Choudhury, Monojit

arXiv.org Artificial Intelligence

Measuring the generalization ability of Large Language Models (LLMs) is challenging due to data contamination. As models grow and computation becomes cheaper, ensuring tasks and test cases are unseen during training phases will become nearly impossible. We argue that knowledge-retrieval and reasoning tasks are not ideal for measuring generalization, as LLMs are not trained for specific tasks. Instead, we propose user behavior prediction, also a key aspect of personalization, as a theoretically sound, scalable, and robust alternative. We introduce a novel framework for this approach and test it on movie and music recommendation datasets for GPT-4o, GPT-4o-mini, and Llama-3.1-8B-Instruct. Results align with our framework's predictions, showing GPT-4o outperforms GPT-4o-mini and Llama, though all models have much room for improvement, especially Llama.


New Evidence of the Two-Phase Learning Dynamics of Neural Networks

Zhou, Zhanpeng, Yang, Yongyi, Sugiyama, Mahito, Yan, Junchi

arXiv.org Artificial Intelligence

Understanding how deep neural networks learn remains a fundamental challenge in modern machine learning. A growing body of evidence suggests that training dynamics undergo a distinct phase transition, yet our understanding of this transition is still incomplete. In this paper, we introduce an interval-wise perspective that compares network states across a time window, revealing two new phenomena that illuminate the two-phase nature of deep learning. i) \textbf{The Chaos Effect.} By injecting an imperceptibly small parameter perturbation at various stages, we show that the response of the network to the perturbation exhibits a transition from chaotic to stable, suggesting there is an early critical period where the network is highly sensitive to initial conditions; ii) \textbf{The Cone Effect.} Tracking the evolution of the empirical Neural Tangent Kernel (eNTK), we find that after this transition point the model's functional trajectory is confined to a narrow cone-shaped subset: while the kernel continues to change, it gets trapped into a tight angular region. Together, these effects provide a structural, dynamical view of how deep networks transition from sensitive exploration to stable refinement during training.


Two-parameter superposable S-curves

S, Vijay Prakash

arXiv.org Artificial Intelligence

Straight line equation $y=mx$ with slope $m$, when singularly perturbed as $ay^3+y=mx$ with a positive parameter $a$, results in S-shaped curves or S-curves on a real plane. As $a\rightarrow 0$, we get back $y=mx$ which is a cumulative distribution function of a continuous uniform distribution that describes the occurrence of every event in an interval to be equally probable. As $a\rightarrow\infty$, the derivative of $y$ has finite support only at $y=0$ resembling a degenerate distribution. Based on these arguments, in this work, we propose that these S-curves can represent maximum entropy uniform distribution to a zero entropy single value. We also argue that these S-curves are superposable as they are only parametrically nonlinear but fundamentally linear. So far, the superposed forms have been used to capture the patterns of natural systems such as nonlinear dynamics of biological growth and kinetics of enzyme reactions. Here, we attempt to use the S-curve and its superposed form as statistical models. We fit the models on a classical dataset containing flower measurements of iris plants and analyze their usefulness in pattern recognition. Based on these models, we claim that any non-uniform pattern can be represented as a singular perturbation to uniform distribution. However, our parametric estimation procedure have some limitations such as sensitivity to initial conditions depending on the data at hand.


No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs

Kacprzyk, Krzysztof, van der Schaar, Mihaela

arXiv.org Artificial Intelligence

Data-driven modeling of dynamical systems is a crucial area of machine learning. In many scenarios, a thorough understanding of the model's behavior becomes essential for practical applications. For instance, understanding the behavior of a pharmacokinetic model, constructed as part of drug development, may allow us to both verify its biological plausibility (e.g., the drug concentration curve is non-negative and decays to zero) and to design dosing guidelines. Discovery of closed-form ordinary differential equations (ODEs) can be employed to obtain such insights by finding a compact mathematical equation and then analyzing it (a two-step approach). However, its widespread use is currently hindered because the analysis process may be time-consuming, requiring substantial mathematical expertise, or even impossible if the equation is too complex. Moreover, if the found equation's behavior does not satisfy the requirements, editing it or influencing the discovery algorithms to rectify it is challenging as the link between the symbolic form of an ODE and its behavior can be elusive. This paper proposes a conceptual shift to modeling low-dimensional dynamical systems by departing from the traditional two-step modeling process. Instead of first discovering a closed-form equation and then analyzing it, our approach, direct semantic modeling, predicts the semantic representation of the dynamical system (i.e., description of its behavior) directly from data, bypassing the need for complex post-hoc analysis. This direct approach also allows the incorporation of intuitive inductive biases into the optimization algorithm and editing the model's behavior directly, ensuring that the model meets the desired specifications. Our approach not only simplifies the modeling pipeline but also enhances the transparency and flexibility of the resulting models compared to traditional closed-form ODEs.


Steering Flexible Linear Objects in Planar Environments by Two Robot Hands Using Euler's Elastica Solutions

Levin, Aharon, Rimon, Elon, Shapiro, Amir

arXiv.org Artificial Intelligence

The manipulation of flexible objects such as cables, wires and fresh food items by robot hands forms a special challenge in robot grasp mechanics. This paper considers the steering of flexible linear objects in planar environments by two robot hands. The flexible linear object, modeled as an elastic non-stretchable rod, is manipulated by varying the gripping endpoint positions while keeping equal endpoint tangents. The flexible linear object shape has a closed form solution in terms of the grasp endpoint positions and tangents, called Euler's elastica. This paper obtains the elastica solutions under the optimal control framework, then uses the elastica solutions to obtain closed-form criteria for non self-intersection, stability and obstacle avoidance of the flexible linear object. The new tools are incorporated into a planning scheme for steering flexible linear objects in planar environments populated by sparsely spaced obstacles. The scheme is fully implemented and demonstrated with detailed examples.


AI isn't about unleashing our imaginations, it's about outsourcing them. The real purpose is profit

The Guardian

Back in 2022, when ChatGPT arrived, I was part of the first wave of users. Delighted but also a little uncertain what to do with it, I asked the system to generate all kinds of random things. The quality of what it produced was variable, but it made clear something that is even more apparent now than it was then. Instead its arrival is an inflection point in human history. Over coming years and decades, AI will transform every aspect of our lives.


Pareto Data Framework: Steps Towards Resource-Efficient Decision Making Using Minimum Viable Data (MVD)

Ahmed, Tashfain, Siegel, Josh

arXiv.org Artificial Intelligence

This paper introduces the Pareto Data Framework, an approach for identifying and selecting the Minimum Viable Data (MVD) required for enabling machine learning applications on constrained platforms such as embedded systems, mobile devices, and Internet of Things (IoT) devices. We demonstrate that strategic data reduction can maintain high performance while significantly reducing bandwidth, energy, computation, and storage costs. The framework identifies Minimum Viable Data (MVD) to optimize efficiency across resource-constrained environments without sacrificing performance. It addresses common inefficient practices in an IoT application such as overprovisioning of sensors and overprecision, and oversampling of signals, proposing scalable solutions for optimal sensor selection, signal extraction and transmission, and data representation. An experimental methodology demonstrates effective acoustic data characterization after downsampling, quantization, and truncation to simulate reduced-fidelity sensors and network and storage constraints; results shows that performance can be maintained up to 95\% with sample rates reduced by 75\% and bit depths and clip length reduced by 50\% which translates into substantial cost and resource reduction. These findings have implications on the design and development of constrained systems. The paper also discusses broader implications of the framework, including the potential to democratize advanced AI technologies across IoT applications and sectors such as agriculture, transportation, and manufacturing to improve access and multiply the benefits of data-driven insights.