reduction


Telcos collaborate to scale the benefits of AIOps - TM Forum Inform

#artificialintelligence

The AIOps Catalyst team's work has resulted in a new collaborative workstream focused around the topic within TM Forum. Artificial intelligence (AI) offers huge opportunities for communications service providers (CSPs) to do things better, faster and cheaper. In fact, they have no choice but to introduce AI into operations and business processes due to growing complexity and the sheer volume of data and transactions. However, as well as delivering huge benefits, the introduction of AI also creates new challenges relating to the management of services and processes. A TM Forum Catalyst team is taking a two-pronged approach, tackling both these areas simultaneously to ensure CSPs – and their customers – reap the rewards of AI.


Reducing Reparameterization Gradient Variance

Neural Information Processing Systems

Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the reparameterization trick,'' represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One way to reduce noise is to generate more samples for the gradient estimate, but this can be computationally expensive. Instead, we view the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample.


Correlated Bigram LSA for Unsupervised Language Model Adaptation

Neural Information Processing Systems

We propose using correlated bigram LSA for unsupervised LM adaptation for automatic speech recognition. The model is trained using efficient variational EM and smoothed using the proposed fractional Kneser-Ney smoothing which handles fractional counts. Our approach can be scalable to large training corpora via bootstrapping of bigram LSA from unigram LSA. For LM adaptation, unigram and bigram LSA are integrated into the background N-gram LM via marginal adaptation and linear interpolation respectively. Experimental results show that applying unigram and bigram LSA together yields 6%--8% relative perplexity reduction and 0.6% absolute character error rates (CER) reduction compared to applying only unigram LSA on the Mandarin RT04 test set.


Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

Neural Information Processing Systems

We consider the problem of using nearest neighbor methods to provide a conditional probability estimate, P(y a), when the number of labels y is large and the labels share some underlying structure. We propose a method for learning error-correcting output codes (ECOCs) to model the similarity between labels within a nearest neighbor framework. The learned ECOCs and nearest neighbor information are used to provide conditional probability estimates. We apply these estimates to the problem of acoustic modeling for speech recognition. We demonstrate an absolute reduction in word error rate (WER) of 0.9% (a 2.5% relative reduction in WER) on a lecture recognition task over a state-of-the-art baseline GMM model.


Differential Privacy Has Disparate Impact on Model Accuracy

Neural Information Processing Systems

Differential privacy (DP) is a popular mechanism for training machine learning models with bounded leakage about the presence of specific points in the training data. The cost of differential privacy is a reduction in the model's accuracy. We demonstrate that in the neural networks trained using differentially private stochastic gradient descent (DP-SGD), this cost is not borne equally: accuracy of DP models drops much more for the underrepresented classes and subgroups. For example, a gender classification model trained using DP-SGD exhibits much lower accuracy for black faces than for white faces. Critically, this gap is bigger in the DP model than in the non-DP model, i.e., if the original model is unfair, the unfairness becomes worse once DP is applied.


Multilabel reductions: what is my loss optimising?

Neural Information Processing Systems

Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging. A popular approach to this problem is to employ a reduction to a suitable series of binary or multiclass problems (e.g., computing a softmax based cross-entropy over the relevant labels). While such methods have seen empirical success, less is understood about how well they approximate two fundamental performance measures: precision@$k$ and recall@$k$. In this paper, we study five commonly used reductions, including the one-versus-all reduction, a reduction to multiclass classification, and normalised versions of the same, wherein the contribution of each instance is normalised by the number of relevant labels. Our main result is a formal justification of each reduction: we explicate their underlying risks, and show they are each consistent with respect to either precision or recall.


A Reduction for Efficient LDA Topic Reconstruction

Neural Information Processing Systems

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from {\em the same set of topics} but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. Furthermore, this reduction is approximation preserving, in the sense that approximate distributions-- the only ones we can hope to compute in practice-- are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Compute an approximate document distribution from the given corpus, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single topic world-- a much simpler task than direct LDA reconstruction.


Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets

Neural Information Processing Systems

Sparse-Group Lasso (SGL) has been shown to be a powerful regression technique for simultaneously discovering group and within-group sparse patterns by using a combination of the l1 and l2 norms. However, in large-scale applications, the complexity of the regularizers entails great computational challenges. In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. The two-layer reduction is able to quickly identify the inactive groups and the inactive features, respectively, which are guaranteed to be absent from the sparse representation and can be removed from the optimization. Existing feature reduction methods are only applicable for sparse models with one sparsity-inducing regularizer.


The pursuit of excellence in new-drug development

#artificialintelligence

We are living in a time of enormous scientific innovation and promise for improved human health. Our understanding of biology is expanding enormously alongside increased identification of novel targets and their associated modalities. Still, drug-development costs and timelines continue to rise, and the likelihood of success continues to fall. Collectively, the top 20 pharmaceutical companies spend approximately $60 billion on drug development each year, and the estimated average cost of bringing a drug to market (including drug failures) is now $2.6 billion--a 140 percent increase in the past ten years. 1 1. We believe the time is right for a true step change in drug development.


Google AI model beats humans in detecting breast cancer

#artificialintelligence

In a ray of hope for those who have to go for breast cancer screening and even for healthy women who get false alarms during digital mammography, an Artificial Intelligence (AI)-based Google model has left radiologists behind in spotting breast cancer by just scanning the X-ray results. Reading mammograms is a difficult task, even for experts, and can often result in both false positives and false negatives. In turn, these inaccuracies can lead to delays in detection and treatment, unnecessary stress for patients and a higher workload for radiologists who are already in short supply, Google said in a blog post on Wednesday. Google's AI model spotted breast cancer in de-identified screening mammograms (where identifiable information has been removed) with greater accuracy, fewer false positives and fewer false negatives than experts. "This sets the stage for future applications where the model could potentially support radiologists performing breast cancer screenings," said Shravya Shetty, Technical Lead, Google Health.