Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
Hou, Xinyi, Zhao, Yanjie, Wang, Shenao, Wang, Haoyu
The Model Context Protocol (MCP) is a standardized interface designed to enable seamless interaction between AI models and external tools and resources, breaking down data silos and facilitating interoperability across diverse systems. This paper provides a comprehensive overview of MCP, focusing on its core components, workflow, and the lifecycle of MCP servers, which consists of three key phases: creation, operation, and update. We analyze the security and privacy risks associated with each phase and propose strategies to mitigate potential threats. The paper also examines the current MCP landscape, including its adoption by industry leaders and various use cases, as well as the tools and platforms supporting its integration. We explore future directions for MCP, highlighting the challenges and opportunities that will influence its adoption and evolution within the broader AI ecosystem. Finally, we offer recommendations for MCP stakeholders to ensure its secure and sustainable development as the AI landscape continues to evolve.
A Novel Cholesky Kernel based Support Vector Classifier
Sahoo, Satyajeet, Maiti, Jhareswar
Support Vector Machine (SVM) is a popular supervised classification model that works by first finding the margin boundaries for the training data classes and then calculating the decision boundary, which is then used to classify the test data. This study demonstrates limitations of traditional support vector classification which uses cartesian coordinate geometry to find the margin and decision boundaries in an input space using only a few support vectors, without considering data variance and correlation. Subsequently, the study proposes a new Cholesky Kernel that adjusts for the effects of variance-covariance structure of the data in the decision boundary equation and margin calculations. The study demonstrates that SVM model is valid only in the Euclidean space, and the Cholesky kernel obtained by decomposing covariance matrix acts as a transformation matrix, which when applied on the original data transforms the data from the input space to the Euclidean space. The effectiveness of the Cholesky kernel based SVM classifier is demonstrated by classifying the Wisconsin Breast Cancer (Diagnostic) Dataset and comparing with traditional SVM approaches. The Cholesky kernel based SVM model shows marked improvement in the precision, recall and F1 scores compared to linear and other kernel SVMs.
Semiparametric Counterfactual Regression
We study counterfactual regression, which aims to map input features to outcomes under hypothetical scenarios that differ from those observed in the data. This is particularly useful for decision-making when adapting to sudden shifts in treatment patterns is essential. We propose a doubly robust-style estimator for counterfactual regression within a generalizable framework that accommodates a broad class of risk functions and flexible constraints, drawing on tools from semiparametric theory and stochastic optimization. Our approach uses incremental interventions to enhance adaptability while maintaining consistency with standard methods. We formulate the target estimand as the optimal solution to a stochastic optimization problem and develop an efficient estimation strategy, where we can leverage rapid development of modern optimization algorithms. We go on to analyze the rates of convergence and characterize the asymptotic distributions. Our analysis shows that the proposed estimators can achieve $\sqrt{n}$-consistency and asymptotic normality for a broad class of problems. Numerical illustrations highlight their effectiveness in adapting to unseen counterfactual scenarios while maintaining parametric convergence rates.
A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design
Tian, Jie, Sobczak, Martin Taylor, Patil, Dhanush, Hou, Jixin, Pang, Lin, Ramanathan, Arunachalam, Yang, Libin, Chen, Xianyan, Golan, Yuval, Zhai, Xiaoming, Sun, Hongyue, Song, Kenan, Wang, Xianqiao
Metamaterials, renowned for their exceptional mechanical, electromagnetic, and thermal properties, hold transformative potential across diverse applications, yet their design remains constrained by labor - intensive trial - and - error methods and limited data interoperability. Here, we introduce CrossMatAgent -- a novel multi - agent framework that synergistically integrates large language models with state - of - the - art generative AI to revolutionize metamaterial design. By orchestrating a hierarchical team of agents -- e ach specializing in tasks such as pattern analysis, architectural synthesis, prompt engineering, and supervisory feedback -- our system leverages the multimodal reasoning of GPT - 4o alongside the generative precision of DALL - E 3 and a fine - tuned Stable Diffusion Extra Large ( XL) model. This integrated approach automates data augmentation, enhances design fidelity, and produces simulation - and 3D printing - ready metamaterial patterns. Comprehensive evaluations, including Contrastive Language - Image Pre - training ( C LIP) - based alignment, SHAP ( SHapley Additive exPlanations) interpretability analyses, and mechanical simulations under varied load conditions, demonstrate the framework's ability to generate diverse, reproducible, and application - ready designs . CrossMatAgent thus establishes a scalable, AI - driven paradigm that bridges the gap between conceptual innovation and practical realization, paving the way for accelerated metamaterial development.
High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise
Fang, Yuchen, Lavaei, Javad, Na, Sen
In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $\epsilon$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under heavy-tailed noise conditions, our SSQP method achieves the same high-probability first-order iteration complexity bounds as in the light-tailed noise setting, while further exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-2})$ iterations and a second-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-3})$ iterations with high probability, provided that $\epsilon$ is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.
Distortion Bounds of Subdivision Models for SO(3)
In the subdivision approach to robot path planning, we need to subdivide the configuration space of a robot into nice cells to perform various computations. For a rigid spatial robot, this configuration space is $SE(3)=\mathbb{R}^3\times SO(3)$. The subdivision of $\mathbb{R}^3$ is standard but so far, there are no global subdivision schemes for $SO(3)$. We recently introduced a representation for $SO(3)$ suitable for subdivision. This paper investigates the distortion of the natural metric on $SO(3)$ caused by our representation. The proper framework for this study lies in the Riemannian geometry of $SO(3)$, enabling us to obtain sharp distortion bounds.
Interval-Valued Time Series Classification Using $D_K$-Distance
In recent years, modeling and analysis of interval-valued time series have garnered increasing attention in econometrics, finance, and statistics. However, these studies have predominantly focused on statistical inference in the forecasting of univariate and multivariate interval-valued time series, overlooking another important aspect: classification. In this paper, we introduce a classification approach that treats intervals as unified entities, applicable to both univariate and multivariate interval-valued time series. Specifically, we first extend the point-valued time series imaging methods to interval-valued scenarios using the $D_K$-distance, enabling the imaging of interval-valued time series. Then, we employ suitable deep learning model for classification on the obtained imaging dataset, aiming to achieve classification for interval-valued time series. In theory, we derived a sharper excess risk bound for deep multiclassifiers based on offset Rademacher complexity. Finally, we validate the superiority of the proposed method through comparisons with various existing point-valued time series classification methods in both simulation studies and real data applications.
Scalable Approximate Algorithms for Optimal Transport Linear Models
Kacprzak, Tomasz, Kamper, Francois, Heiss, Michael W., Janka, Gianluca, Dillner, Ann M., Takahama, Satoshi
Recently, linear regression models incorporating an optimal transport (OT) loss have been explored for applications such as supervised unmixing of spectra, music transcription, and mass spectrometry. However, these task-specific approaches often do not generalize readily to a broader class of linear models. In this work, we propose a novel algorithmic framework for solving a general class of non-negative linear regression models with an entropy-regularized OT datafit term, based on Sinkhorn-like scaling iterations. Our framework accommodates convex penalty functions on the weights (e.g. squared-$\ell_2$ and $\ell_1$ norms), and admits additional convex loss terms between the transported marginal and target distribution (e.g. squared error or total variation). We derive simple multiplicative updates for common penalty and datafit terms. This method is suitable for large-scale problems due to its simplicity of implementation and straightforward parallelization.
Better Rates for Random Task Orderings in Continual Linear Models
Evron, Itay, Levinstein, Ran, Schliserman, Matan, Sherman, Uri, Koren, Tomer, Soudry, Daniel, Srebro, Nathan
We study the common continual learning setup where an overparameterized model is sequentially fitted to a set of jointly realizable tasks. We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations. For linear models, we prove that fitting a task is equivalent to a single stochastic gradient descent (SGD) step on a modified objective. We develop novel last-iterate SGD upper bounds in the realizable least squares setup, and apply them to derive new results for continual learning. Focusing on random orderings over $T$ tasks, we establish universal forgetting rates, whereas existing rates depend on the problem dimensionality or complexity. Specifically, in continual regression with replacement, we improve the best existing rate from $O((d-r)/k)$ to $O(\min(k^{-1/4}, \sqrt{d-r}/k, \sqrt{Tr}/k))$, where $d$ is the dimensionality and $r$ the average task rank. Furthermore, we establish the first rates for random task orderings without replacement. The obtained rate of $O(\min(T^{-1/4}, (d-r)/T))$ proves for the first time that randomization alone, with no task repetition, can prevent catastrophic forgetting in sufficiently long task sequences. Finally, we prove a similar $O(k^{-1/4})$ universal rate for the forgetting in continual linear classification on separable data. Our universal rates apply for broader projection methods, such as block Kaczmarz and POCS, illuminating their loss convergence under i.i.d and one-pass orderings.
New Intent Discovery with Pre-training and Contrastive Learning
Zhang, Yuwei, Zhang, Haode, Zhan, Li-Ming, Lam, Albert Y. S., Wu, Xiao-Ming
New intent discovery aims to uncover novel intent categories from user utterances to expand the set of supported intent classes. It is a critical task for the development and service expansion of a practical dialogue system. Despite its importance, this problem remains under-explored in the literature. Existing approaches typically rely on a large amount of labeled utterances and employ pseudo-labeling methods for representation learning and clustering, which are label-intensive, inefficient, and inaccurate. In this paper, we provide new solutions to two important research questions for new intent discovery: (1) how to learn semantic utterance representations and (2) how to better cluster utterances. Particularly, we first propose a multi-task pre-training strategy to leverage rich unlabeled data along with external labeled data for representation learning. Then, we design a new contrastive loss to exploit self-supervisory signals in unlabeled data for clustering. Extensive experiments on three intent recognition benchmarks demonstrate the high effectiveness of our proposed method, which outperforms state-of-the-art methods by a large margin in both unsupervised and semi-supervised scenarios. The source code will be available at https://github.com/zhang-yu-wei/MTP-CLNN.