Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


MetAL: Active Semi-Supervised Learning on Graphs via Meta Learning

arXiv.org Machine Learning

The objective of active learning (AL) is to train classification models with less number of labeled instances by selecting only the most informative instances for labeling. The AL algorithms designed for other data types such as images and text do not perform well on graph-structured data. Although a few heuristics-based AL algorithms have been proposed for graphs, a principled approach is lacking. In this paper, we propose MetAL, an AL approach that selects unlabeled instances that directly improve the future performance of a classification model. For a semi-supervised learning problem, we formulate the AL task as a bilevel optimization problem. Based on recent work in meta-learning, we use the meta-gradients to approximate the impact of retraining the model with any unlabeled instance on the model performance. Using multiple graph datasets belonging to different domains, we demonstrate that MetAL efficiently outperforms existing state-of-the-art AL algorithms.


Efficient Graph-Based Active Learning with Probit Likelihood via Gaussian Approximations

arXiv.org Machine Learning

We present a novel adaptation of active learning to graph-based semi-supervised learning (SSL) under non-Gaussian Bayesian models. We present an approximation of non-Gaussian distributions to adapt previously Gaussian-based acquisition functions to these more general cases. We develop an efficient rank-one update for applying "look-ahead" based methods as well as model retraining. We also introduce a novel "model change" acquisition function based on these approximations that further expands the available collection of active learning acquisition functions for such methods.


Semi-Supervised Learning Approach to Discover Enterprise User Insights from Feedback and Support

arXiv.org Machine Learning

With the evolution of the cloud and customer centric culture, we inherently accumulate huge repositories of textual reviews, feedback, and support data.This has driven enterprises to seek and research engagement patterns, user network analysis, topic detections, etc.However, huge manual work is still necessary to mine data to be able to mine actionable outcomes. In this paper, we proposed and developed an innovative Semi-Supervised Learning approach by utilizing Deep Learning and Topic Modeling to have a better understanding of the user voice.This approach combines a BERT-based multiclassification algorithm through supervised learning combined with a novel Probabilistic and Semantic Hybrid Topic Inference (PSHTI) Model through unsupervised learning, aiming at automating the process of better identifying the main topics or areas as well as the sub-topics from the textual feedback and support.There are three major break-through: 1. As the advancement of deep learning technology, there have been tremendous innovations in the NLP field, yet the traditional topic modeling as one of the NLP applications lag behind the tide of deep learning. In the methodology and technical perspective, we adopt transfer learning to fine-tune a BERT-based multiclassification system to categorize the main topics and then utilize the novel PSHTI model to infer the sub-topics under the predicted main topics. 2. The traditional unsupervised learning-based topic models or clustering methods suffer from the difficulty of automatically generating a meaningful topic label, but our system enables mapping the top words to the self-help issues by utilizing domain knowledge about the product through web-crawling. 3. This work provides a prominent showcase by leveraging the state-of-the-art methodology in the real production to help shed light to discover user insights and drive business investment priorities.


Alexa and Google Assistant execs on future trends for AI assistants

#artificialintelligence

Businesses and developers making conversational AI experiences should start with the understanding that you're going to have to use unsupervised learning to scale, said Prem Natarajan, Amazon head of product and VP of Alexa AI and NLP. He spoke with Barak Turovsky, Google AI director of product for the NLU team, at VentureBeat's Transform 2020 AI conference today as part of a conversation about future trends for AI assistants. Natarajan called unsupervised learning for language models an important trend for AI assistants and an essential part of creating conversational AI that works for everyone. "Don't wait for the unsupervised learning realization to come to you yet again. Start from the understanding that you're going to have to use unsupervised learning at some level of scale," he said.


A new nature inspired modularity function adapted for unsupervised learning involving spatially embedded networks: A comparative analysis

arXiv.org Machine Learning

Unsupervised machine learning methods can be of great help in many traditional engineering disciplines, where huge amount of labeled data is not readily available or is extremely difficult or costly to generate. Two specific examples include the structure of granular materials and atomic structure of metallic glasses. While the former is critically important for several hundreds of billion dollars global industries, the latter is still a big puzzle in fundamental science. One thing is common in both the examples is that the particles are the elements of the ensembles that are embedded in Euclidean space and one can create a spatially embedded network to represent their key features. Some recent studies show that clustering, which generically refers to unsupervised learning, holds great promise in partitioning these networks. In many complex networks, the spatial information of nodes play very important role in determining the network properties. So understanding the structure of such networks is very crucial. We have compared the performance of our newly developed modularity function with some of the well-known modularity functions. We performed this comparison by finding the best partition in 2D and 3D granular assemblies. We show that for the class of networks considered in this article, our method produce much better results than the competing methods.


Alexa and Google Assistant execs on future trends for AI assistants

#artificialintelligence

Businesses and developers making conversational AI experiences should start with the understanding that you're going to have to use unsupervised learning to scale, said Prem Natarajan, Amazon head of product and VP of Alexa AI and NLP. He spoke with Barak Turovsky, Google AI director of product for the NLU team, at VentureBeat's Transform 2020 AI conference today as part of a conversation about future trends for AI assistants. Natarajan called unsupervised learning for language models an important trend for AI assistants and an essential part of creating conversational AI that works for everyone. "Don't wait for the unsupervised learning realization to come to you yet again. Start from the understanding that you're going to have to use unsupervised learning at some level of scale," he said.


Top Libraries For Quick Implementation Of GANs

#artificialintelligence

"GANs and the variations are the most interesting idea in the last 10 years in ML." The potential of Generative Adversarial Networks (GANs) was already witnessed at the Sotheby's auction, a couple of years ago when the painting titled Edmond de Belamy, from La Famille de Belamy was sold for a whopping $432,500, and it now hangs opposite the works of pop art geniuses like Andy Warhol. The applications of GANs have made their presence felt in the esoteric trading floors, nuclear facilities and even the office of Presidents. The rise in its popularity indicates that its usage is only limited by the imagination of its users. In the era of APIs, it's a no-brainer to not to build algorithms from scratch.


Mapping the world to help aid workers, with weakly, semi-supervised learning

#artificialintelligence

When disaster or disease strikes, relief agencies respond more effectively when they have detailed mapping tools to know exactly where to deliver assistance. But extremely reliable and precise maps often are not available. So, our team, composed of artificial intelligence researchers and data scientists in Facebook's Boston office, used our computer vision expertise to create and share population density maps that are more accurate and higher resolution than any of their predecessors. Building on our previous publication of similar high-resolution population maps for 22 countries, we're now releasing new maps of the majority of the African continent, and the project will eventually map nearly the whole world's population. When it is completed, humanitarian agencies will be able to determine how populations are distributed even in remote areas, so that health care workers can better reach households and relief workers can better distribute aid.


Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

arXiv.org Machine Learning

While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class-imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes.


Microsoft's SoftNER AI uses unsupervised learning to help triage cloud service outages

#artificialintelligence

Microsoft is using unsupervised learning techniques to extract knowledge about disruptions to cloud services. In a paper published on the preprint server Arxiv.org, They claim it eliminates the need to annotate a large amount of training data while scaling to a high volume of timeouts, slow connections, and other product interruptions. Structured information has inherent value, particularly in the high-stakes cloud and web operations domains. Not only can it be used to build AI models tailored to tasks like triaging, but it can save time and effort for engineers by automating processes like running checks on resources.