Goto

Collaborating Authors

 Optimization


Clustering-aware Graph Construction: A Joint Learning Perspective

arXiv.org Machine Learning

Graph-based clustering methods have demonstrated the effectiveness in various applications. Generally, existing graph-based clustering methods first construct a graph to represent the input data and then partition it to generate the clustering result. However, such a stepwise manner may make the constructed graph not fit the requirements for the subsequent decomposition, leading to compromised clustering accuracy. To this end, we propose a joint learning framework, which is able to learn the graph and the clustering result simultaneously, such that the resulting graph is tailored to the clustering task. The proposed model is formulated as a well-defined nonnegative and off-diagonal constrained optimization problem, which is further efficiently solved with convergence theoretically guaranteed. The advantage of the proposed model is demonstrated by comparing with 19 state-of-the-art clustering methods on 10 datasets with 4 clustering metrics.


Early Detection of Long Term Evaluation Criteria in Online Controlled Experiments

arXiv.org Artificial Intelligence

A common dilemma encountered by many upon implementing an optimization method or experiment, whether it be a reinforcement learning algorithm, or A/B testing, is deciding on what metric to optimize for. Very often short-term metrics, which are easier to measure are chosen over long term metrics which have undesirable time considerations and often a more complex calculation. In this paper, we argue the importance of choosing a metrics that focuses on long term effects. With this comes the necessity in the ability to measure significant differences between groups relatively early. We present here an efficient methodology for early detection of lifetime differences between groups based on bootstrap hypothesis testing of the lifetime forecast of the response. We present an application of this method in the domain of online advertising and we argue that approach not only allows one to focus on the ultimate metric of importance but also provides a means of accelerating the testing period.


Meta-heuristic for non-homogeneous peak density spaces and implementation on 2 real-world parameter learning/tuning applications

arXiv.org Artificial Intelligence

Observer effect in physics (/psychology) regards bias in measurement (/perception) due to the interference of instrument (/knowledge). Based on these concepts, a new meta-heuristic algorithm is proposed for controlling memory usage per localities without pursuing Tabu-like cut-off approaches. In this paper, first, variations of observer effect are explained in different branches of science from physics to psychology. Then, a metaheuristic algorithm is proposed based on observer effect concepts and the used metrics are explained. The derived optimizer performance has been compared between 1st, non-homogeneous-peaks-density functions, and 2nd, homogeneous-peaks-density functions to verify the algorithm outperformance in the 1st scheme. Finally, performance analysis of the novel algorithms is derived using two real-world engineering applications in Electroencephalogram feature learning and Distributed Generator parameter tuning, each of which having nonlinearity and complex multi-modal peaks distributions as its characteristics. Also, the effect of version improvement has been assessed. The performance analysis among other optimizers in the same context suggests that the proposed algorithm is useful both solely and in hybrid Gradient Descent settings where problem's search space is nonhomogeneous in terms of local peaks density.


Topic Modeling via Full Dependence Mixtures

arXiv.org Machine Learning

We consider the topic modeling problem for large datasets. For this problem, Latent Dirichlet Allocation (LDA) with a collapsed Gibbs sampler optimization is the state-of-the-art approach in terms of topic quality. However, LDA is a slow approach, and running it on large datasets is impractical even with modern hardware. In this paper we propose to fit topics directly to the co-occurances data of the corpus. In particular, we introduce an extension of a mixture model, the Full Dependence Mixture (FDM), which arises naturally as a model of a second moment under general generative assumptions on the data. While there is some previous work on topic modeling using second moments, we develop a direct stochastic optimization procedure for fitting an FDM with a single Kullback Leibler objective. While moment methods in general have the benefit that an iteration no longer needs to scale with the size of the corpus, our approach also allows us to leverage standard optimizers and GPUs for the problem of topic modeling. We evaluate the approach on synthetic and semi-synthetic data, as well as on the SOTU and Neurips Papers corpora, and show that the approach outperforms LDA, where LDA is run on both full and sub-sampled data.


Empowering swarm-based optimizers by multi-scale search to enhance Gradient Descent initialization performance

arXiv.org Machine Learning

Swarm-based optimizers like Particle Swarm Optimization or Imperialistic Competitive Algorithm that act under influences of cooperation or competition among groups, are unable to search in multiple volumes of locality or globality and do not have nested localities. As hybrid optimizers, they may not give satisfactory results as initializers in Gradient Descent approximators used in plenty of multimodal problems like nonlinear subspace learning and neural network training, which have hierarchies of convex spaces due to nonlinearity and multi-layer nature of these models. To search in various levels of scale in a homogenous way, a framework is proposed to equip PSO and ICA a multi-scale search capability. Then, the resulted optimizers are evaluated in single and GD-hybridized mode. Hybrid evaluation as GD randomizer is implemented with the help of a nonlinear subspace filtering objective function over EEG data and optimization loss and validation data accuracy is compared with other hybrids containing GD. A single evaluation is also taken place between the proposed ones, PSO, ICA, CLPSO, and CICA, which are used more in hybrid learning-based approaches. Evaluations were with respect to solution error. Before concluding the paper, it is shown and analyzed that proposed optimizers outperform algorithms of related context both in single and hybrid-GD mode.


A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games

arXiv.org Machine Learning

We consider differentiable games: multi-objective minimization problems, where the goal is to find a Nash equilibrium. The machine learning community has recently started using extrapolation-based variants of the gradient method. A prime example is the extragradient, which yields linear convergence in cases like bilinear games, where the standard gradient method fails. The full benefits of extrapolation-based methods are not known: i) there is no unified analysis for a large class of games that includes both strongly monotone and bilinear games; ii) it is not known whether the rate achieved by extragradient can be improved, e.g. by considering multiple extrapolation steps. We answer these questions through new analysis of the extragradient's local and global convergence properties. Our analysis covers the whole range of settings between purely bilinear and strongly monotone games. It reveals that extragradient converges via different mechanisms at these extremes; in between, it exploits the most favorable mechanism for the given problem. We then present lower bounds on the rate of convergence for a wide class of algorithms with any number of extrapolations. Our bounds prove that the extragradient achieves the optimal rate in this class, and that our upper bounds are tight. Our precise characterization of the extragradient's convergence behavior in games shows that, unlike in convex optimization, the extragradient method may be much faster than the gradient method.


Pairwise Fairness for Ranking and Regression

arXiv.org Machine Learning

We present pairwise metrics of fairness for ranking and regression models that form analogues of statistical fairness notions such as equal opportunity or equal accuracy, as well as statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using constrained optimization and robust optimization techniques based on two player game algorithms developed for fair classification. Experiments illustrate the broad applicability and trade-offs of these methods.


Generalizable Adversarial Attacks Using Generative Models

arXiv.org Machine Learning

Adversarial attacks on deep neural networks traditionally rely on a constrained optimization paradigm, where an optimization procedure is used to obtain a single adversarial perturbation for a given input example. Here, we instead view adversarial attacks as a generative modelling problem, with the goal of producing entire distributions of adversarial examples given an unperturbed input. We show that this generative perspective can be used to design a unified encoder-decoder framework, which is domain-agnostic in that the same framework can be employed to attack different domains with minimal modification. Across three diverse domains---images, text, and graphs---our approach generates whitebox attacks with success rates that are competitive with or superior to existing approaches, with a new state-of-the-art achieved in the graph domain. Finally, we demonstrate that our generative framework can efficiently generate a diverse set of attacks for a single given input, and is even capable of attacking unseen test instances in a zero-shot manner, exhibiting attack generalization.


Meta-Learning via Learned Loss

arXiv.org Artificial Intelligence

We present a meta-learning approach based on learning an adaptive, high-dimensional loss function that can generalize across multiple tasks and different model architectures. We develop a fully differentiable pipeline for learning a loss function targeted at maximizing the performance of an optimizee trained using this loss function. We observe that the loss landscape produced by our learned loss significantly improves upon the original task-specific loss. We evaluate our method on supervised and reinforcement learning tasks. Furthermore, we show that our pipeline is able to operate in sparse reward and self-supervised reinforcement learning scenarios.


Joint Reasoning for Temporal and Causal Relations

arXiv.org Artificial Intelligence

Understanding temporal and causal relations between events is a fundamental natural language understanding task. Because a cause must be before its effect in time, temporal and causal relations are closely related and one relation even dictates the other one in many cases. However, limited attention has been paid to studying these two relations jointly. This paper presents a joint inference framework for them using constrained conditional models (CCMs). Specifically, we formulate the joint problem as an integer linear programming (ILP) problem, enforcing constraints inherently in the nature of time and causality. We show that the joint inference framework results in statistically significant improvement in the extraction of both temporal and causal relations from text.