AITopics | slm

Functional constrained optimization (FCO) has emerged as a powerful tool for solving various machine learning problems. However, with the rapid increase in applications of neural networks in recent years, it has become apparent that both the objective and constraints often involve nonconvex functions, which poses significant challenges in obtaining high-quality solutions. In this work, we focus on a class of nonconvex FCO problems with nonconvex constraints, where the two optimization variables are nonlinearly coupled in the inequality constraint. Leveraging the primal-dual optimization framework, we propose a smoothed first-order Lagrangian method (SLM) for solving this class of problems. We establish the theoretical convergence guarantees of SLM to the Karush-Kuhn-Tucker (KKT) solutions through quantifying dual error bounds. By establishing connections between this structured FCO and equilibrium-constrained nonconvex problems (also known as bilevel optimization), we apply the proposed SLM to tackle bilevel optimization oriented problems where the lower-level problem is nonconvex. Numerical results obtained from both toy examples and hyper-data cleaning problems demonstrate the superiority of SLM compared to benchmark methods.

artificial intelligence, machine learning, optimization, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

fe90657b12193c7b52a3418bdc351807-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 04:01:49 GMT

artificial intelligence, machine learning, optimization, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

486ff0b164cf92b0255fe39863bcf99e-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 17:34:45 GMT

bidirectional context, slm, transcormer, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Transcormer: TransformerforSentenceScoringwith SlidingLanguageModeling

Neural Information Processing SystemsFeb-8-2026, 17:34:41 GMT

Sentence scoring aims at measuring the likelihood score of a sentence and is widely usedinnatural language processing scenarios, likereranking, which isto select the best sentence from multiple candidates.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

2c5201a7391fedbc40c3cc6aa057a029-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 22:44:48 GMT

Commonly used classification algorithms in machine learning, such as support vector machines, minimize a convex surrogate loss on training examples. In practice, these algorithms are surprisingly robusttoerrors inthe training data.

artificial intelligence, inductive learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

What are small language models and how do they differ from large ones?

AIHubJan-6-2026, 11:27:15 GMT

What are small language models and how do they differ from large ones? Microsoft recently released its latest small language model that can operate directly on the user's computer. If you haven't followed the AI industry closely, you might be asking: what exactly a small language model (SLM)? As AI becomes increasingly central to how we work, learn and solve problems, understanding the different types of AI models has never been more important. Large language models (LLMs) such as ChatGPT, Claude, Gemini and others are in widespread use.

language model, small language model, university, (10 more...)

AIHub

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Asia > India (0.05)

Industry: Transportation (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

SLM: A Smoothed First-Order Lagrangian Method for Structured Constrained Nonconvex Optimization

Neural Information Processing SystemsDec-27-2025, 07:29:30 GMT

Functional constrained optimization (FCO) has emerged as a powerful tool for solving various machine learning problems. However, with the rapid increase in applications of neural networks in recent years, it has become apparent that both the objective and constraints often involve nonconvex functions, which poses significant challenges in obtaining high-quality solutions. In this work, we focus on a class of nonconvex FCO problems with nonconvex constraints, where the two optimization variables are nonlinearly coupled in the inequality constraint. Leveraging the primal-dual optimization framework, we propose a smoothed first-order Lagrangian method (SLM) for solving this class of problems. We establish the theoretical convergence guarantees of SLM to the Karush-Kuhn-Tucker (KKT) solutions through quantifying dual error bounds. By establishing connections between this structured FCO and equilibrium-constrained nonconvex problems (also known as bilevel optimization), we apply the proposed SLM to tackle bilevel optimization oriented problems where the lower-level problem is nonconvex. Numerical results obtained from both toy examples and hyper-data cleaning problems demonstrate the superiority of SLM compared to benchmark methods.

name change, smoothed first-order lagrangian method, structured constrained nonconvex optimization, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.98)

Add feedback

RaX-Crash: A Resource Efficient and Explainable Small Model Pipeline with an Application to City Scale Injury Severity Prediction

Zhu, Di, Xie, Chen, Wang, Ziwei, Zhang, Haoyun

arXiv.org Artificial IntelligenceDec-10-2025

New York City reports over one hundred thousand motor vehicle collisions each year, creating substantial injury and public health burden. We present RaX-Crash, a resource efficient and explainable small model pipeline for structured injury severity prediction on the official NYC Motor Vehicle Collisions dataset. RaX-Crash integrates three linked tables with tens of millions of records, builds a unified feature schema in partitioned storage, and trains compact tree based ensembles (Random Forest and XGBoost) on engineered tabular features, which are compared against locally deployed small language models (SLMs) prompted with textual summaries. On a temporally held out test set, XGBoost and Random Forest achieve accuracies of 0.7828 and 0.7794, clearly outperforming SLMs (0.594 and 0.496); class imbalance analysis shows that simple class weighting improves fatal recall with modest accuracy trade offs, and SHAP attribution highlights human vulnerability factors, timing, and location as dominant drivers of predicted severity. Overall, RaX-Crash indicates that interpretable small model ensembles remain strong baselines for city scale injury analytics, while hybrid pipelines that pair tabular predictors with SLM generated narratives improve communication without sacrificing scalability.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2512.07848

Country: