AITopics

2501.10555

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
Asia > Macao (0.04)
(12 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(8 more...)

arXiv.org Machine LearningJan-17-2025

A Family of Controllable Momentum Coefficients for Forward-Backward Accelerated Algorithms

Fu, Mingwei, Shi, Bin

Nesterov's accelerated gradient method (NAG) marks a pivotal advancement in gradient-based optimization, achieving faster convergence compared to the vanilla gradient descent method for convex functions. However, its algorithmic complexity when applied to strongly convex functions remains unknown, as noted in the comprehensive review by Chambolle and Pock [2016]. This issue, aside from the critical step size, was addressed by Li et al. [2024b], with the monotonic case further explored by Fu and Shi [2024]. In this paper, we introduce a family of controllable momentum coefficients for forward-backward accelerated methods, focusing on the critical step size $s=1/L$. Unlike traditional linear forms, the proposed momentum coefficients follow an $\alpha$-th power structure, where the parameter $r$ is adaptively tuned to $\alpha$. Using a Lyapunov function specifically designed for $\alpha$, we establish a controllable $O\left(1/k^{2\alpha} \right)$ convergence rate for the NAG-$\alpha$ method, provided that $r > 2\alpha$. At the critical step size, NAG-$\alpha$ achieves an inverse polynomial convergence rate of arbitrary degree by adjusting $r$ according to $\alpha > 0$. We further simplify the Lyapunov function by expressing it in terms of the iterative sequences $x_k$ and $y_k$, eliminating the need for phase-space representations. This simplification enables us to extend the controllable $O \left(1/k^{2\alpha} \right)$ rate to the monotonic variant, M-NAG-$\alpha$, thereby enhancing optimization efficiency. Finally, by leveraging the fundamental inequality for composite functions, we extended the controllable $O\left(1/k^{2\alpha} \right)$ rate to proximal algorithms, including the fast iterative shrinkage-thresholding algorithm (FISTA-$\alpha$) and its monotonic counterpart (M-FISTA-$\alpha$).

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2501.10051

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Machine LearningJan-17-2025

Differentiable Adversarial Attacks for Marked Temporal Point Processes

Chakraborty, Pritish, Gupta, Vinayak, R, Rahul, Bedathur, Srikanta J., De, Abir

Marked temporal point processes (MTPPs) have been shown to be extremely effective in modeling continuous time event sequences (CTESs). In this work, we present adversarial attacks designed specifically for MTPP models. A key criterion for a good adversarial attack is its imperceptibility. For objects such as images or text, this is often achieved by bounding perturbation in some fixed $L_p$ norm-ball. However, similarly minimizing distance norms between two CTESs in the context of MTPPs is challenging due to their sequential nature and varying time-scales and lengths. We address this challenge by first permuting the events and then incorporating the additive noise to the arrival timestamps. However, the worst case optimization of such adversarial attacks is a hard combinatorial problem, requiring exploration across a permutation space that is factorially large in the length of the input sequence. As a result, we propose a novel differentiable scheme PERMTPP using which we can perform adversarial attacks by learning to minimize the likelihood, while minimizing the distance between two CTESs. Our experiments on four real-world datasets demonstrate the offensive and defensive capabilities, and lower inference times of PERMTPP.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2501.10606

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Data Science > Data Mining (0.68)
(2 more...)

Neural Information Processing SystemsJan-16-2025, 12:59:27 GMT

High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization

Contextual policies are used in many settings to customize system parameters and actions to the specifics of a particular setting. In some real-world settings, such as randomized controlled trials or A/B tests, it may not be possible to measure policy outcomes at the level of context--we observe only aggregate rewards across a distribution of contexts. This makes policy optimization much more difficult because we must solve a high-dimensional optimization problem over the entire space of contextual policies, for which existing optimization methods are not suitable. We develop effective models that leverage the structure of the search space to enable contextual policy optimization directly from the aggregate rewards using Bayesian optimization. We use a collection of simulation studies to characterize the performance and robustness of the models, and show that our approach of inferring a low-dimensional context embedding performs best.

bayesian optimization, high-dimensional contextual policy search, unknown context reward, (1 more...)

Neural Information Processing Systems

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Neural Information Processing SystemsJan-16-2025, 07:42:18 GMT

HONOR: Hybrid Optimization for NOn-convex Regularized problems

Recent years have witnessed the superiority of non-convex sparse learning formulations over their convex counterparts in both theory and practice. However, due to the non-convexity and non-smoothness of the regularizer, how to efficiently solve the non-convex optimization problem for large-scale data is still quite challenging. Specifically, we develop a hybrid scheme which effectively integrates a Quasi-Newton (QN) step and a Gradient Descent (GD) step. Our contributions are as follows: (1) HONOR incorporates the second-order information to greatly speed up the convergence, while it avoids solving a regularized quadratic programming and only involves matrix-vector multiplications without explicitly forming the inverse Hessian matrix.

hybrid optimization, non-convex regularized problem, underline, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.63)

Client-Centric Federated Adaptive Optimization

Sun, Jianhui, Wu, Xidong, Huang, Heng, Zhang, Aidong

Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private. With an increasing scale of clients and models, FL encounters two key challenges, client drift due to a high degree of statistical/system heterogeneity, and lack of adaptivity. However, most existing FL research is based on unrealistic assumptions that virtually ignore system heterogeneity. In this paper, we propose Client-Centric Federated Adaptive Optimization, which is a class of novel federated adaptive optimization approaches. We enable several features in this framework such as arbitrary client participation, asynchronous server aggregation, and heterogeneous local computing, which are ubiquitous in real-world FL systems but are missed in most existing works. We provide a rigorous convergence analysis of our proposed framework for general nonconvex objectives, which is shown to converge with the best-known rate. Extensive experiments show that our approaches consistently outperform the baseline by a large margin across benchmarks.

artificial intelligence, machine learning, natural language, (15 more...)

2501.09946

Country:

North America > United States > District of Columbia > Washington (0.06)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Cartis, Coralia, Shao, Zhen, Tansley, Edward

Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions

We propose and analyze random subspace variants of the second-order Adaptive Regularization using Cubics (ARC) algorithm. These methods iteratively restrict the search space to some random subspace of the parameters, constructing and minimizing a local model only within this subspace. Thus, our variants only require access to (small-dimensional) projections of first- and second-order problem derivatives and calculate a reduced step inexpensively. Under suitable assumptions, the ensuing methods maintain the optimal first-order, and second-order, global rates of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our adaptive variant naturally adapts the subspace size to the true rank of the function, without knowing it a priori.

dimension, iteration, matrix, (15 more...)

2501.09734

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Toutouh, Jamal, Alba, Enrique

Parallel multi-objective metaheuristics for smart communications in vehicular networks

VANETs improve the safety and efficiency of the road traffic through powerful cooperative applications that gather and broadcast real-time road traffic information. Routing in VANETs is a critical issue in today's research due to the high speed of the nodes, rate of topology variability, and real-time restrictions of their applications. Hence, the research community is very active with hot topics, creating new VANET protocols and improving the existent ones (Lee et al. 2009). The Ad hoc On Demand Vector (AODV) routing proto-col (Perkins et al. 2003), which is optimized in this study, has been previously analyzed for use in vehicular environments. Some authors have proposed changes to its parameter configuration to gain huge improvements over its quality-of-service (QoS) in VANETs (Said and Nakamura 2014). The configuration parameters of AODV have a strongly non-linear relationship with each other and a complex influence on its final performance. In fact, they represent a mix of discrete plus continuous variables which makes it a hard challenge to find the "best" configuration in a real-world scenario. Thus, exact and enumerative methods are not applicable for solving the underlying optimization problem of finding the "best" AODV configuration, because they require critically long execution times to perform the search, and because we are far from having a traditional analytical equation. In this context, soft computing methods are a promising approach to find accurate QoS-efficient AODV configurations in rea-sonable times.

accepted version, algorithm, configuration, (15 more...)

doi: 10.1007/s00500-015-1891-2

2501.09725

Country:

North America > United States > New York (0.04)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
Europe > Germany > Bremen > Bremen (0.04)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

U-Fair: Uncertainty-based Multimodal Multitask Learning for Fairer Depression Detection

Cheong, Jiaee, Bangar, Aditya, Kalkan, Sinan, Gunes, Hatice

We propose accounting for this gender difference in PHQ-8 distributions via U-Fair. Moreover, each gender may display different PHQ-approach towards building relevant ML for healthcare 8 task distribution which may results in different solutions, we propose a novel method, U-Fair, which PHQ-8 distribution and variance. Although investigation accounts for the gender difference in PHQ-8 distribution on the relationship between the PHQ-8 and and leverages on uncertainty as a MTL task gender has been explored in other fields such as psychiatry reweighing mechanism to achieve better gender fairness (Thibodeau and Asmundson, 2014; Vetter for depression detection. Our key contributions et al., 2013; Leung et al., 2020), this has not been investigated are as follow: nor accounted for in any of the existing ML We conduct the first analysis to investigate how for depression detection methods. Moreover, existing MTL impacts fairness in depression detection by work has demonstrated the risk of a fairness-accuracy using each PHQ-8 subcriterion as a task. We trade-off (Pleiss et al., 2017) and how mainstream show that a simplistic baseline MTL approach MTL objectives might not correlate well with fairness runs the risk of incurring negative transfer and goals (Wang et al., 2021b). No work has investigated may not improve on the Pareto frontier. A how a MTL approach impacts performance Pareto frontier can be understood as the set of across fairness for the task of depression detection.

fairness, u-fair, uncertainty-based multimodal multitask learning, (10 more...)

2501.09687

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.42)

Dahan, Tehila, Levy, Kfir Y.

Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML

We address the challenges of Byzantine-robust training in asynchronous distributed machine learning systems, aiming to enhance efficiency amid massive parallelization and heterogeneous computing resources. Asynchronous systems, marked by independently operating workers and intermittent updates, uniquely struggle with maintaining integrity against Byzantine failures, which encompass malicious or erroneous actions that disrupt learning. The inherent delays in such settings not only introduce additional bias to the system but also obscure the disruptions caused by Byzantine faults. To tackle these issues, we adapt the Byzantine framework to asynchronous dynamics by introducing a novel weighted robust aggregation framework. This allows for the extension of robust aggregators and a recent meta-aggregator to their weighted versions, mitigating the effects of delayed updates. By further incorporating a recent variance-reduction technique, we achieve an optimal convergence rate for the first time in an asynchronous Byzantine environment. Our methodology is rigorously validated through empirical and theoretical analysis, demonstrating its effectiveness in enhancing fault tolerance and optimizing performance in asynchronous ML systems.

aggregator, byzantine worker, inequality, (14 more...)

2501.09621

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)