Optimization
Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation
Wang, Dongjie, Huang, Yanyong, Ying, Wangyang, Bai, Haoyue, Gong, Nanxu, Wang, Xinyuan, Dong, Sixun, Zhe, Tao, Liu, Kunpeng, Xiao, Meng, Wang, Pengfei, Wang, Pengyang, Xiong, Hui, Fu, Yanjie
Tabular data is one of the most widely used formats across industries, driving critical applications in areas such as finance, healthcare, and marketing. In the era of data-centric AI, improving data quality and representation has become essential for enhancing model performance, particularly in applications centered around tabular data. This survey examines the key aspects of tabular data-centric AI, emphasizing feature selection and feature generation as essential techniques for data space refinement. We provide a systematic review of feature selection methods, which identify and retain the most relevant data attributes, and feature generation approaches, which create new features to simplify the capture of complex data patterns. This survey offers a comprehensive overview of current methodologies through an analysis of recent advancements, practical applications, and the strengths and limitations of these techniques. Finally, we outline open challenges and suggest future perspectives to inspire continued innovation in this field.
A Family of Controllable Momentum Coefficients for Forward-Backward Accelerated Algorithms
Nesterov's accelerated gradient method (NAG) marks a pivotal advancement in gradient-based optimization, achieving faster convergence compared to the vanilla gradient descent method for convex functions. However, its algorithmic complexity when applied to strongly convex functions remains unknown, as noted in the comprehensive review by Chambolle and Pock [2016]. This issue, aside from the critical step size, was addressed by Li et al. [2024b], with the monotonic case further explored by Fu and Shi [2024]. In this paper, we introduce a family of controllable momentum coefficients for forward-backward accelerated methods, focusing on the critical step size $s=1/L$. Unlike traditional linear forms, the proposed momentum coefficients follow an $\alpha$-th power structure, where the parameter $r$ is adaptively tuned to $\alpha$. Using a Lyapunov function specifically designed for $\alpha$, we establish a controllable $O\left(1/k^{2\alpha} \right)$ convergence rate for the NAG-$\alpha$ method, provided that $r > 2\alpha$. At the critical step size, NAG-$\alpha$ achieves an inverse polynomial convergence rate of arbitrary degree by adjusting $r$ according to $\alpha > 0$. We further simplify the Lyapunov function by expressing it in terms of the iterative sequences $x_k$ and $y_k$, eliminating the need for phase-space representations. This simplification enables us to extend the controllable $O \left(1/k^{2\alpha} \right)$ rate to the monotonic variant, M-NAG-$\alpha$, thereby enhancing optimization efficiency. Finally, by leveraging the fundamental inequality for composite functions, we extended the controllable $O\left(1/k^{2\alpha} \right)$ rate to proximal algorithms, including the fast iterative shrinkage-thresholding algorithm (FISTA-$\alpha$) and its monotonic counterpart (M-FISTA-$\alpha$).
Differentiable Adversarial Attacks for Marked Temporal Point Processes
Chakraborty, Pritish, Gupta, Vinayak, R, Rahul, Bedathur, Srikanta J., De, Abir
Marked temporal point processes (MTPPs) have been shown to be extremely effective in modeling continuous time event sequences (CTESs). In this work, we present adversarial attacks designed specifically for MTPP models. A key criterion for a good adversarial attack is its imperceptibility. For objects such as images or text, this is often achieved by bounding perturbation in some fixed $L_p$ norm-ball. However, similarly minimizing distance norms between two CTESs in the context of MTPPs is challenging due to their sequential nature and varying time-scales and lengths. We address this challenge by first permuting the events and then incorporating the additive noise to the arrival timestamps. However, the worst case optimization of such adversarial attacks is a hard combinatorial problem, requiring exploration across a permutation space that is factorially large in the length of the input sequence. As a result, we propose a novel differentiable scheme PERMTPP using which we can perform adversarial attacks by learning to minimize the likelihood, while minimizing the distance between two CTESs. Our experiments on four real-world datasets demonstrate the offensive and defensive capabilities, and lower inference times of PERMTPP.
High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization
Contextual policies are used in many settings to customize system parameters and actions to the specifics of a particular setting. In some real-world settings, such as randomized controlled trials or A/B tests, it may not be possible to measure policy outcomes at the level of context--we observe only aggregate rewards across a distribution of contexts. This makes policy optimization much more difficult because we must solve a high-dimensional optimization problem over the entire space of contextual policies, for which existing optimization methods are not suitable. We develop effective models that leverage the structure of the search space to enable contextual policy optimization directly from the aggregate rewards using Bayesian optimization. We use a collection of simulation studies to characterize the performance and robustness of the models, and show that our approach of inferring a low-dimensional context embedding performs best.
HONOR: Hybrid Optimization for NOn-convex Regularized problems
Recent years have witnessed the superiority of non-convex sparse learning formulations over their convex counterparts in both theory and practice. However, due to the non-convexity and non-smoothness of the regularizer, how to efficiently solve the non-convex optimization problem for large-scale data is still quite challenging. Specifically, we develop a hybrid scheme which effectively integrates a Quasi-Newton (QN) step and a Gradient Descent (GD) step. Our contributions are as follows: (1) HONOR incorporates the second-order information to greatly speed up the convergence, while it avoids solving a regularized quadratic programming and only involves matrix-vector multiplications without explicitly forming the inverse Hessian matrix.
Client-Centric Federated Adaptive Optimization
Sun, Jianhui, Wu, Xidong, Huang, Heng, Zhang, Aidong
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private. With an increasing scale of clients and models, FL encounters two key challenges, client drift due to a high degree of statistical/system heterogeneity, and lack of adaptivity. However, most existing FL research is based on unrealistic assumptions that virtually ignore system heterogeneity. In this paper, we propose Client-Centric Federated Adaptive Optimization, which is a class of novel federated adaptive optimization approaches. We enable several features in this framework such as arbitrary client participation, asynchronous server aggregation, and heterogeneous local computing, which are ubiquitous in real-world FL systems but are missed in most existing works. We provide a rigorous convergence analysis of our proposed framework for general nonconvex objectives, which is shown to converge with the best-known rate. Extensive experiments show that our approaches consistently outperform the baseline by a large margin across benchmarks.
Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions
Cartis, Coralia, Shao, Zhen, Tansley, Edward
We propose and analyze random subspace variants of the second-order Adaptive Regularization using Cubics (ARC) algorithm. These methods iteratively restrict the search space to some random subspace of the parameters, constructing and minimizing a local model only within this subspace. Thus, our variants only require access to (small-dimensional) projections of first- and second-order problem derivatives and calculate a reduced step inexpensively. Under suitable assumptions, the ensuing methods maintain the optimal first-order, and second-order, global rates of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our adaptive variant naturally adapts the subspace size to the true rank of the function, without knowing it a priori.
Parallel multi-objective metaheuristics for smart communications in vehicular networks
VANETs improve the safety and efficiency of the road traffic through powerful cooperative applications that gather and broadcast real-time road traffic information. Routing in VANETs is a critical issue in today's research due to the high speed of the nodes, rate of topology variability, and real-time restrictions of their applications. Hence, the research community is very active with hot topics, creating new VANET protocols and improving the existent ones (Lee et al. 2009). The Ad hoc On Demand Vector (AODV) routing proto-col (Perkins et al. 2003), which is optimized in this study, has been previously analyzed for use in vehicular environments. Some authors have proposed changes to its parameter configuration to gain huge improvements over its quality-of-service (QoS) in VANETs (Said and Nakamura 2014). The configuration parameters of AODV have a strongly non-linear relationship with each other and a complex influence on its final performance. In fact, they represent a mix of discrete plus continuous variables which makes it a hard challenge to find the "best" configuration in a real-world scenario. Thus, exact and enumerative methods are not applicable for solving the underlying optimization problem of finding the "best" AODV configuration, because they require critically long execution times to perform the search, and because we are far from having a traditional analytical equation. In this context, soft computing methods are a promising approach to find accurate QoS-efficient AODV configurations in rea-sonable times.
U-Fair: Uncertainty-based Multimodal Multitask Learning for Fairer Depression Detection
Cheong, Jiaee, Bangar, Aditya, Kalkan, Sinan, Gunes, Hatice
We propose accounting for this gender difference in PHQ-8 distributions via U-Fair. Moreover, each gender may display different PHQ-approach towards building relevant ML for healthcare 8 task distribution which may results in different solutions, we propose a novel method, U-Fair, which PHQ-8 distribution and variance. Although investigation accounts for the gender difference in PHQ-8 distribution on the relationship between the PHQ-8 and and leverages on uncertainty as a MTL task gender has been explored in other fields such as psychiatry reweighing mechanism to achieve better gender fairness (Thibodeau and Asmundson, 2014; Vetter for depression detection. Our key contributions et al., 2013; Leung et al., 2020), this has not been investigated are as follow: nor accounted for in any of the existing ML We conduct the first analysis to investigate how for depression detection methods. Moreover, existing MTL impacts fairness in depression detection by work has demonstrated the risk of a fairness-accuracy using each PHQ-8 subcriterion as a task. We trade-off (Pleiss et al., 2017) and how mainstream show that a simplistic baseline MTL approach MTL objectives might not correlate well with fairness runs the risk of incurring negative transfer and goals (Wang et al., 2021b). No work has investigated may not improve on the Pareto frontier. A how a MTL approach impacts performance Pareto frontier can be understood as the set of across fairness for the task of depression detection.
Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML
We address the challenges of Byzantine-robust training in asynchronous distributed machine learning systems, aiming to enhance efficiency amid massive parallelization and heterogeneous computing resources. Asynchronous systems, marked by independently operating workers and intermittent updates, uniquely struggle with maintaining integrity against Byzantine failures, which encompass malicious or erroneous actions that disrupt learning. The inherent delays in such settings not only introduce additional bias to the system but also obscure the disruptions caused by Byzantine faults. To tackle these issues, we adapt the Byzantine framework to asynchronous dynamics by introducing a novel weighted robust aggregation framework. This allows for the extension of robust aggregators and a recent meta-aggregator to their weighted versions, mitigating the effects of delayed updates. By further incorporating a recent variance-reduction technique, we achieve an optimal convergence rate for the first time in an asynchronous Byzantine environment. Our methodology is rigorously validated through empirical and theoretical analysis, demonstrating its effectiveness in enhancing fault tolerance and optimizing performance in asynchronous ML systems.