Goto

Collaborating Authors

 Country


Profile-based Resource Allocation for Virtualized Network Functions

arXiv.org Machine Learning

--The virtualization of compute and network resources enables an unseen flexibility for deploying network services. A wide spectrum of emerging technologies allows an ever-growing range of orchestration possibilities in cloud-based environments. But in this context it remains challenging to rhyme dynamic cloud configurations with deterministic performance. The service operator must somehow map the performance specification in the Service Level Agreement (SLA) to an adequate resource allocation in the virtualized infrastructure. We propose the use of a VNF profile to alleviate this process. This is illustrated by profiling the performance of four example network functions (a virtual router, switch, firewall and cache server) under varying workloads and resource configurations. We then compare several methods to derive a model from the profiled datasets. We select the most accurate method to further train a model which predicts the services' performance, in function of incoming workload and allocated resources. Our presented method can offer the service operator a recommended resource allocation for the targeted service, in function of the targeted performance and maximum workload specified in the SLA. This helps to deploy the softwarized service with an optimal amount of resources to meet the SLA requirements, thereby avoiding unnecessary scaling steps. HE advancements in the domain of cloud computing, Software Defined Networking (SDN) and Network Function Virtualization (NFV) enable a unseen flexibility and pro-grammability of both compute and network configurations. By softwarizing network functions, we move away from dedicated hardware based, monolithic systems to a virtualized solution for offering telecom services. The service is decomposed into multiple microservices which each get an allocated share of resources such as CPU time, memory access or network bandwidth. Typical tasks involved in network services include packet forwarding, routing, inspection or any other form of network traffic processing. Beyond the application layer, the deeper layers of the network traffic are checked or manipulated in a chained configuration. This means that network traffic is sequentially steered through a, possibly lengthy, chain of processors such as routers, firewalls, load-balancers or proxy-servers. In the NFV domain, the main aim is to provide softwarized solutions for each of those network functions, which can be deployed on commercial-of-the-shelf (COTS) servers. Ideally, equally high performance is expected compared to rigid, dedicated hardware middleboxes, but at a lower cost, higher flexibility regarding scaling, configuration and less prone to vendor and technology lock-in. At deployment time of the network service, an estimation of the required capacity and related resource allocation needs to be made. The performance contract is given in the Service Level Agreement (SLA) and should be translated to the required resources.


SySCD: A System-Aware Parallel Coordinate Descent Algorithm

arXiv.org Machine Learning

In this paper we propose a novel parallel stochastic coordinate descent (SCD) algorithm with convergence guarantees that exhibits strong scalability. We start by studying a state-of-the-art parallel implementation of SCD and identify scalability as well as system-level performance bottlenecks of the respective implementation. We then take a principled approach to develop a new SCD variant which is designed to avoid the identified system bottlenecks, such as limited scaling due to coherence traffic of model sharing across threads, and inefficient CPU cache accesses. Our proposed system-aware parallel coordinate descent algorithm (SySCD) scales to many cores and across numa nodes, and offers a consistent bottom line speedup in training time of up to x12 compared to an optimized asynchronous parallel SCD algorithm and up to x42, compared to state-of-the-art GLM solvers (scikit-learn, Vowpal Wabbit, and H2O) on a range of datasets and multi-core CPU architectures.


The Effectiveness of Variational Autoencoders for Active Learning

arXiv.org Machine Learning

The high cost of acquiring labels is one of the main challenges in deploying supervised machine learning algorithms. Active learning is a promising approach to control the learning process and address the difficulties of data labeling by selecting labeled training examples from a large pool of unlabeled instances. In this paper, we propose a new data-driven approach to active learning by choosing a small set of labeled data points that are both informative and representative. To this end, we present an efficient geometric technique to select a diverse core-set in a low-dimensional latent space obtained by training a Variational Autoencoder (VAE). Our experiments demonstrate an improvement in accuracy over two related techniques and, more importantly, signify the representation power of generative modeling for developing new active learning methods in high-dimensional data settings.


Feedback Control for Online Training of Neural Networks

arXiv.org Machine Learning

Zilong Zhao 1, Sophie Cerf 1, Bogdan Robu 1 and Nicolas Marchand 1 Abstract -- Convolutional neural networks (CNNs) are commonly used for image classification tasks, raising the challenge of their application on data flows. During their training, adaptation is often performed by tuning the learning rate. Usual learning rate strategies are time-based i.e. monotonously decreasing. In this paper, we advocate switching to a performance-based adaptation, in order to improve the learning efficiency. We present E (Exponential)/PD (Proportional Derivative)-Control, a conditional learning rate strategy that combines a feedback PD controller based on the CNN loss function, with an exponential control signal to smartly boost the learning and adapt the PD parameters. Stability proof is provided as well as an experimental evaluation using two state of the art image datasets (CIF AR-10 and Fashion-MNIST). Results show better performances than the related works (faster network accuracy growth reaching higher levels) and robustness of the E/PD-Control regarding its parametrization. I NTRODUCTION Convolutional neural networks (CNNs) are popular machine learning algorithms for image classification, as they are well suited for visual pattern recognition and require low preprocessing [1].


A New Ensemble Adversarial Attack Powered by Long-term Gradient Memories

arXiv.org Machine Learning

Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of pre-trained source models can transfer to other new target models, thus pose a security threat to black-box applications (when the attackers have no access to the target models). Despite adopting diverse architectures and parameters, source and target models often share similar decision boundaries. Therefore, if an adversary is capable of fooling several source models concurrently, it can potentially capture intrinsic transferable adversarial information that may allow it to fool a broad class of other black-box target models. Current ensemble attacks, however, only consider a limited number of source models to craft an adversary, and obtain poor transferability.


Towards Quantification of Bias in Machine Learning for Healthcare: A Case Study of Renal Failure Prediction

arXiv.org Machine Learning

Departments of Population Health and Radiology Center for Data Science New Y ork University Langone Medical Center Abstract As machine learning (ML) models, trained on real-world datasets, become common practice, it is critical to measure and quantify their potential biases. In this paper, we focus on renal failure and compare a commonly used traditional risk score, Tangri, with a more powerful machine learning model, which has access to a larger variable set and trained on 1.6 million patients' EHR data. We will compare and discuss the generalization and applicability of these two models, in an attempt to quantify biases of status quo clinical practice, compared to MLdriven models. 1 Introduction Data-driven models have become more common in the U.S. healthcare field as their use in clinical operations and diagnosing procedures have expanded exponentially. The ever-increasing processing power of machine-learning algorithms allows automatic analysis of huge quantities of data, theoretically maximizing the efficiency and accuracy of the medical diagnosing process. Predictions from machine-learning models already drive important healthcare decisions for over 70 million people across the United States[7].


Learning with Good Feature Representations in Bandits and in RL with a Generative Model

arXiv.org Machine Learning

The construction in the recent paper by Du et al. [2019] implies that searching for a near-optimal action in a bandit sometimes requires examining essentially all the actions, even if the learner is given linear features in $\mathbb R^d$ that approximate the rewards with a small uniform error. In this note we use the Kiefer-Wolfowitz theorem to show that by checking only a few actions, a learner can always find an action which is suboptimal with an error of at most $O(\varepsilon \sqrt{d})$ where $\varepsilon$ is the approximation error of the features. Thus, features are useful when the approximation error is small relative to the dimensionality of the features. The idea is applied to stochastic bandits and reinforcement learning with a generative model where the learner has access to $d$-dimensional linear features that approximate the action-value functions for all policies to an accuracy of $\varepsilon$. For bandits we prove a bound on the regret of order $\sqrt{dn \log(k)} + \varepsilon n \sqrt{d} \log(n)$ with $k$ the number of actions and $n$ the horizon. For RL we show that approximate policy iteration can learn a policy that is optimal up to an additive error of order $\varepsilon \sqrt{d} / (1 - \gamma)^2$ and using about $d / (\varepsilon^2(1-\gamma)^4)$ samples from the generative model.


GraLSP: Graph Neural Networks with Local Structural Patterns

arXiv.org Machine Learning

It is not until recently that graph neural networks (GNNs) are adopted to perform graph representation learning, among which, those based on the aggregation of features within the neighborhood of a node achieved great success. However, despite such achievements, GNNs illustrate defects in identifying some common structural patterns which, unfortunately, play significant roles in various network phenomena. In this paper, we propose GraLSP, a GNN framework which explicitly incorporates local structural patterns into the neighborhood aggregation through random anonymous walks. Specifically, we capture local graph structures via random anonymous walks, powerful and flexible tools that represent structural patterns. The walks are then fed into the feature aggregation, where we design various mechanisms to address the impact of structural features, including adaptive receptive radius, attention and amplification. In addition, we design objectives that capture similarities between structures and are optimized jointly with node proximity objectives. With the adequate leverage of structural patterns, our model is able to outperform competitive counterparts in various prediction tasks in multiple datasets.


Convex Formulation of Overparameterized Deep Neural Networks

arXiv.org Machine Learning

Analysis of over-parameterized neural networks has drawn significant attention in recentyears. It was shown that such systems behave like convex systems under various restrictedsettings, such as for two-level neural networks, and when learning is only restricted locally inthe so-called neural tangent kernel space around specialized initializations. However, there areno theoretical techniques that can analyze fully trained deep neural networks encountered inpractice. This paper solves this fundamental problem by investigating such overparameterizeddeep neural networks when fully trained. We generalize a new technique called neural feature repopulation, originally introduced in (Fang et al., 2019a) for two-level neural networks, to analyze deep neural networks. It is shown that under suitable representations, overparameterized deep neural networks are inherently convex, and when optimized, the system can learn effective features suitable for the underlying learning task under mild conditions. This new analysis is consistent with empirical observations that deep neural networks are capable of learning efficient feature representations. Therefore, the highly unexpected result of this paper can satisfactorily explain the practical success of deep neural networks. Empirical studies confirm that predictions of our theory are consistent with results observed in practice.


Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization

arXiv.org Machine Learning

Although ADAM is a very popular algorithm for optimizing the weights of neural networks, it has been recently shown that it can diverge even in simple convex optimization examples. Several variants of ADAM have been proposed to circumvent this convergence issue. In this work, we study the ADAM algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Lojasiewicz property.