Country
Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation
Bražinskas, Arthur, Lapata, Mirella, Titov, Ivan
Summarization of opinions is the process of automatically creating text summaries that reflect subjective information expressed in input documents, such as product reviews. While most previous research in opinion summarization has focused on the extractive setting, i.e. selecting fragments of the input documents to produce a summary, we let the model generate novel sentences and hence produce fluent text. Supervised abstractive summarization methods typically rely on large quantities of document-summary pairs which are expensive to acquire. In contrast, we consider the unsupervised setting, in other words, we do not use any summaries in training. We define a generative model for a multi-product review collection. Intuitively, we want to design such a model that, when generating a new review given a set of other reviews of the product, we can control the `amount of novelty' going into the new review or, equivalently, vary the degree of deviation from the input reviews. At test time, when generating summaries, we force the novelty to be minimal, and produce a text reflecting consensus opinions. We capture this intuition by defining a hierarchical variational autoencoder model. Both individual reviews and products they correspond to are associated with stochastic latent codes, and the review generator ('decoder') has direct access to the text of input reviews through the pointer-generator mechanism. In experiments on Amazon and Yelp data, we show that in this model by setting at test time the review's latent code to its mean, we produce fluent and coherent summaries.
The gradient complexity of linear regression
Braverman, Mark, Hazan, Elad, Simchowitz, Max, Woodworth, Blake
We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle. We show that for polynomial accuracy, $\Theta(d)$ calls to the oracle are necessary and sufficient even for a randomized algorithm. Our lower bound is based on a reduction to estimating the least eigenvalue of a random Wishart matrix. This simple distribution enables a concise proof, leveraging a few key properties of the random Wishart ensemble.
Machine Learning using the Variational Predictive Information Bottleneck with a Validation Set
Zellner (1988) modeled statistical inference in terms of information processing and postulated the Information Conservation Principle (ICP) between the input and output of the information processing block, showing that this yielded Bayesian inference as the optimum information processing rule. Recently, Alemi (2019) reviewed Zellner's work in the context of machine learning and showed that the ICP could be seen as a special case of a more general optimum information processing criterion, namely the Predictive Information Bottleneck Objective. However, Alemi modeled the model training step in machine learning as using training and test data sets only, and did not account for the use of a validation data set during training. The present note is an attempt to extend Alemi's information processing formulation of machine learning, and the predictive information bottleneck objective for model training, to the widely-used scenario where training utilizes not only a training but also a validation data set.
A Scalable Multilabel Classification to Deploy Deep Learning Architectures For Edge Devices
Odetola, Tolulope A., Oderhohwo, Ogheneuriri, Hasan, Syed Rafay
Convolution Neural Networks (CNN) have performed well in many applications such as object detection, pattern recognition, video surveillance and so on. CNN carryout feature extraction on labelled data to perform classification. Multi-label classification assigns more than one label to a particular data sample in a data set. In multi-label classification, properties of a data point that are considered to be mutually exclusive are classified. However, existing multi-label classification requires some form of data pre-processing that involves image training data cropping or image tiling. The computation and memory requirement of these multi-label CNN models makes their deployment on edge devices challenging. In this paper, we propose a methodology that solves this problem by extending the capability of existing multi-label classification and provide models with lower latency that requires smaller memory size when deployed on edge devices. We make use of a single CNN model designed with multiple loss layers and multiple accuracy layers. This methodology is tested on state-of-the-art deep learning algorithms such as AlexNet, GoogleNet and SqueezeNet using the Stanford Cars Dataset and deployed on Raspberry Pi3. From the results the proposed methodology achieves comparable accuracy with 1.8x less MACC operation, 0.97x reduction in latency and 0.5x, 0.84x and 0.97x reduction in size for the generated AlexNet, GoogleNet and SqueezeNet CNN models respectively when compared to conventional ways of achieving multi-label classification like hard-coding multi-label instances into single labels. The methodology also yields CNN models that achieve 50\% less MACC operations, 50% reduction in latency and size of generated versions of AlexNet, GoogleNet and SqueezeNet respectively when compared to conventional ways using 2 different single-labelled models to achieve multi-label classification.
Weakly Supervised Fine Tuning Approach for Brain Tumor Segmentation Problem
Pavlov, Sergey, Artemov, Alexey, Sharaev, Maksim, Bernstein, Alexander, Burnaev, Evgeny
Segmentation of tumors in brain MRI images is a challenging task, where most recent methods demand large volumes of data with pixel-level annotations, which are generally costly to obtain. In contrast, image-level annotations, where only the presence of lesion is marked, are generally cheap, generated in far larger volumes compared to pixel-level labels, and contain less labeling noise. In the context of brain tumor segmentation, both pixel-level and image-level annotations are commonly available; thus, a natural question arises whether a segmentation procedure could take advantage of both. In the present work we: 1) propose a learning-based framework that allows simultaneous usage of both pixel- and image-level annotations in MRI images to learn a segmentation model for brain tumor; 2) study the influence of comparative amounts of pixel- and image-level annotations on the quality of brain tumor segmentation; 3) compare our approach to the traditional fully-supervised approach and show that the performance of our method in terms of segmentation quality may be competitive.
Global Adaptive Generative Adjustment
Wang, Bin, Wang, Xiaofei, Guo, Jianhua
Many traditional signal recovery approaches can behave well basing on the penalized likelihood. However, they have to meet with the difficulty in the selection of hyperparameters or tuning parameters in the penalties. In this article, we propose a global adaptive generative adjustment (GAGA) algorithm for signal recovery, in which multiple hyperpameters are automatically learned and alternatively updated with the signal. W e further prove that the output of our algorithm directly guarantees the consistency of model selection and the asymptotic normality of signal estimate. Moreover, we also propose a variant GAGA algorithm for improving the computational efficiency in the high-dimensional data analysis. Finally, in the simulated experiment, we consider the consistency of the outputs of our algorithms, and compare our algorithms to other penalized likelihood methods: the Adaptive LASSO, the SCAD and the MCP . The simulation results support the efficiency of our algorithms for signal recovery, and demonstrate that our algorithms outperform the other algorithms.
Structure of Deep Neural Networks with a Priori Information in Wireless Tasks
--Deep neural networks (DNNs) have been employed for designing wireless networks in many aspects, such as transceiver optimization, resource allocation, and information prediction. Existing works either use fully-connected DNN or the DNNs with specific structures that are designed in other domains. In this paper, we show that a priori information widely existed in wireless tasks is permutation invariant. For these tasks, we propose a DNN with special structure, where the weight matrices between layers of the DNN only consist of two smaller sub-matrices. By such way of parameter sharing, the number of model parameters reduces, giving rise to low sample and computational complexity for training a DNN. We take predictive resource allocation as an example to show how the designed DNN can be applied for learning the optimal policy with unsupervised learning. Simulations results validate our analysis and show dramatic gain of the proposed structure in terms of reducing training complexity. I NTRODUCTION Deep neural networks (DNNs) have been introduced to design wireless networks recently in various aspects, ranging from signal detection and channel estimation [1], multi-cell coordinated beamforming [2], inter-cell interference management [3], resource allocation [4]-[7], traffic load prediction [8], and uplink/downlink channel calibration [9], etc.
Probabilistic Similarity Networks
Normative expert systems have not become commonplace because they have been difficult to build and use. Over the past decade, however, researchers have developed the influence diagram, a graphical representation of a decision maker's beliefs, alternatives, and preferences that serves as the knowledge base of a normative expert system. Most people who have seen the representation find it intuitive and easy to use. Consequently, the influence diagram has overcome significantly the barriers to constructing normative expert systems. Nevertheless, building influence diagrams is not practical for extremely large and complex domains. In this book, I address the difficulties associated with the construction of the probabilistic portion of an influence diagram, called a knowledge map, belief network, or Bayesian network. I introduce two representations that facilitate the generation of large knowledge maps. In particular, I introduce the similarity network, a tool for building the network structure of a knowledge map, and the partition, a tool for assessing the probabilities associated with a knowledge map. I then use these representations to build Pathfinder, a large normative expert system for the diagnosis of lymph-node diseases (the domain contains over 60 diseases and over 100 disease findings). In an early version of the system, I encoded the knowledge of the expert using an erroneous assumption that all disease findings were independent, given each disease. When the expert and I attempted to build a more accurate knowledge map for the domain that would capture the dependencies among the disease findings, we failed. Using a similarity network, however, we built the knowledge-map structure for the entire domain in approximately 40 hours. Furthermore, the partition representation reduced the number of probability assessments required by the expert from 75,000 to 14,000.
Conversation Generation with Concept Flow
Zhang, Houyu, Liu, Zhenghao, Xiong, Chenyan, Liu, Zhiyuan
Human conversations naturally evolve around related entities and connected concepts, while may also shift from topic to topic. This paper presents ConceptFlow, which leverages commonsense knowledge graphs to explicitly model such conversation flows for better conversation response generation. ConceptFlow grounds the conversation inputs to the latent concept space and represents the potential conversation flow as a concept flow along the commonsense relations. The concept is guided by a graph attention mechanism that models the possibility of the conversation evolving towards different concepts. The conversation response is then decoded using the encodings of both utterance texts and concept flows, integrating the learned conversation structure in the concept space. Our experiments on Reddit conversations demonstrate the advantage of ConceptFlow over previous commonsense aware dialog models and fine-tuned GPT -2 models, while using much fewer parameters but with explicit modeling of conversation structures. The rapid advancements of language modeling and natural language generation (NLG) techniques have enabled fully data-driven conversation models, which take user inputs (utterances) and directly generate natural language responses (Shang et al., 2015; Vinyals & Le, 2015; Li et al., 2016). On the other hand, the current generation models may still degenerate dull and repetitive contents (Holtz-man et al., 2019; Welleck et al., 2019), which, in conversation assistants, lead to irrelevant, off-topic, and non-useful responses that would damage user experiences (Tang et al., 2019; Zhang et al., 2018; Gao et al., 2019).
Towards An Angry-Birds-like Game System for Promoting Mental Well-being of Players Using Art-Therapy-embedded PCG
Fang, Zhou, Paliyawan, Pujana, Thawonmas, Ruck, Harada, Tomohiro
T owards an Angry-Birds-Like Game System for Promoting Mental Well-Being of Players Using Art-Therapy-Embedded Procedural Content Generation Zhou Fang 1, Pujana Paliyawan 2, Ruck Thawonmas 1 and Tomohiro Harada 1 1 College of Information Science and Engineering 2 Research Organization of Science and Technology Ritsumeikan University, Japan ruck@is.ritsumei.ac.jp Abstract -- This paper presents an integration of a game system and the art therapy concept for promoting the mental wellbeing of video game players. In the proposed game system, the player plays an Angry-Birds-like game in which levels in the game are generated based on images they draw. Upon finishing a game level, the player also receives positive feedback (praising words) toward their drawing and the generated level from an Art Therapy AI. The proposed system is composed of three major parts: (1) a drawing recognizer that identifies what object is drawn by the player (Sketcher), (2) a level generator that converts the drawing image into a pixel image, then a set of blocks representing a game level (PCG AI), and (3) the Art Therapy AI that encourages the player and improves their emotion. This paper describes an overview of the system and explains how its major components function.