Goto

Collaborating Authors

 popular technique


Error Compensated Distributed SGD Can Be Accelerated

Neural Information Processing Systems

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that can work for any compressor satisfying a certain contraction property, which includes both unbiased (after appropriate scaling) and biased compressors such as RandK and TopK. Applied naively, gradient compression introduces errors that either slow down convergence or lead to divergence. A popular technique designed to tackle this issue is error compensation/error feedback. Due to the difficulties associated with analyzing biased compressors, it is not known whether gradient compression with error compensation can be combined with acceleration. In this work, we show for the first time that error compensated gradient compression methods can be accelerated. In particular, we propose and study the error compensated loopless Katyusha method, and establish an accelerated linear convergence rate under standard assumptions. We show through numerical experiments that the proposed method converges with substantially fewer communication rounds than previous error compensated algorithms.


Technique Inference Engine: A Recommender Model to Support Cyber Threat Hunting

Turner, Matthew J., Carenzo, Mike, Lasky, Jackie, Morris-King, James, Ross, James

arXiv.org Artificial Intelligence

Cyber threat hunting is the practice of proactively searching for latent threats in a network. Engaging in threat hunting can be difficult due to the volume of network traffic, variety of adversary techniques, and constantly evolving vulnerabilities. To aid analysts in identifying techniques which may be co-occurring as part of a campaign, we present the Technique Inference Engine, a tool to infer tactics, techniques, and procedures (TTPs) which may be related to existing observations of adversarial behavior. We compile the largest (to our knowledge) available dataset of cyber threat intelligence (CTI) reports labeled with relevant TTPs. With the knowledge that techniques are chronically under-reported in CTI, we apply several implicit feedback recommender models to the data in order to predict additional techniques which may be part of a given campaign. We evaluate the results in the context of the cyber analyst's use case and apply t-SNE to visualize the model embeddings. We provide our code and a web interface.


Error Compensated Distributed SGD Can Be Accelerated

Neural Information Processing Systems

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that can work for any compressor satisfying a certain contraction property, which includes both unbiased (after appropriate scaling) and biased compressors such as RandK and TopK. Applied naively, gradient compression introduces errors that either slow down convergence or lead to divergence. A popular technique designed to tackle this issue is error compensation/error feedback. Due to the difficulties associated with analyzing biased compressors, it is not known whether gradient compression with error compensation can be combined with acceleration.


Who are the creators of AI?

#artificialintelligence

Alan Turing: Considered one of the fathers of modern computing, Turing proposed the concept of a machine that could perform any computation that could be done by a human. John McCarthy: McCarthy is known as the "father of artificial intelligence" and is credited with coining the term "artificial intelligence" in 1955. Marvin Minsky: Minsky was a pioneer in the field of AI, and was one of the founders of the Massachusetts Institute of Technology's Media Lab, where much of the early research in AI was conducted. Claude Shannon: Shannon is considered the father of information theory, which is a fundamental field in AI, his work in the field of communication and cryptography laid the foundation for many of the AI algorithms. Herbert Simon and Allen Newell: They developed the field of artificial intelligence by creating the Logic Theorist, the first general problem-solving program and the General Problem Solver (GPS), which were early AI programs that used heuristic search to find solutions to problems.


Deep Learning Prerequisites: Linear Regression in Python

#artificialintelligence

This course teaches you about one popular technique used in machine learning, data science and statistics: linear regression. This course teaches you about one popular technique used in machine learning, data science and statistics: linear regression. We cover the theory from the ground up: derivation of the solution, and applications to real-world problems. We show you how one might code their own linear regression module in Python. Linear regression is the simplest machine learning model you can learn, yet there is so much depth that you'll be returning to it for years to come.


#018 PyTorch - Popular techniques to prevent the Overfitting in a Neural Networks

#artificialintelligence

In today's post, we will discuss one of the most common problems that arise during the training of deep neural networks. It is called overfitting, and it usually occurs when we increase the complexity of the network. In this post, you will learn the most common techniques to reduce overfitting while training neural networks. When building a neural network our goal is to develop a model that performs well on the training dataset, but also on the new data that it wasn't trained on. However, when our model is too complex, sometimes it can start to learn the irrelevant information in the dataset. That means that model memorizes the noise that is closely related only to the training dataset.


Tilted empirical risk minimization

AIHub

Classical ERM () minimizes the average loss and is shown in pink. As (blue), TERM finds a line of best fit while ignoring outliers. In some applications, these'outliers' may correspond to minority samples that should not be ignored. As (red), TERM recovers the min-max solution, which minimizes the worst loss. This can ensure the model is a reasonable fit for all samples, reducing unfairness related to representation disparity.


Introduction to Word Embedding

#artificialintelligence

Humans have always excelled at understanding languages. It is easy for humans to understand the relationship between words but for computers, this task may not be simple. For example, we humans understand the words like king and queen, man and woman, tiger and tigress have a certain type of relation between them but how can a computer figure this out? Word embeddings are basically a form of word representation that bridges the human understanding of language to that of a machine. They have learned representations of text in an n-dimensional space where words that have the same meaning have a similar representation.


Cluster Analysis- Theory & workout using SAS and R

#artificialintelligence

About the course - Cluster analysis is one of the most popular techniques used in data mining for marketing needs. The idea behind cluster analysis is to find natural groups within data in such a way that each element in the group is as similar to each other as possible. At the same time, the groups are as dissimilar to other groups as possible. Course materials- The course contains video presentations (power point presentations with voice), pdf, excel work book and sas codes. Course duration- The course should take roughly 10 hours to understand and internalize the concepts.