Goto

Collaborating Authors

 mikolov


Breaking the Activation Function Bottleneck through Adaptive Parameterization

Neural Information Processing Systems

Adaptive parameterization is a means of increasing this flexibility and thereby increasing the model's capacity to learn non-linear patterns. We focus on the feed-forward layer, f(x):= ฯ†(W x+b),for some activation functionฯ†: R 7 R. Define the pre-activation layer as a = A(x):= Wx+band denote byg(a):= ฯ†(a)/athe activation effect ofฯ†givena, where divisioniselement-wise.


Review for NeurIPS paper: OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling

Neural Information Processing Systems

Additional Feedback: Some minor suggestions and typos: -Line 21, missing an "and" -Line 33, "while other developed" - "while other authors developed" -Line 50, and elsewhere in the paper, it is stated that LDA/PLSI use a squared Euclidean loss/distance. This is untrue - both models use likelihood based inference with a multinomial model, and/or Bayesian inference. The older LSI model uses a squared loss, but even the PLSI paper argued that this is insufficient (the implicit Gaussian assumption from squared errors does not hold with small counts as in text data), which motivates the probabilistic modeling approach in PLSI and LDA. The other papers by Mikolov by 2013 are more fundamental references which are better here, especially: Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality.


Collapse of Self-trained Language Models

arXiv.org Artificial Intelligence

In various fields of knowledge creation, including science, new ideas often build on pre-existing information. In this work, we explore this concept within the context of language models. Specifically, we explore the potential of self-training models on their own outputs, akin to how humans learn and build on their previous thoughts and actions. While this approach is intuitively appealing, our research reveals its practical limitations. We find that extended self-training of the GPT-2 model leads to a significant degradation in performance, resulting in repetitive and collapsed token output.


Research Papers for NLP Beginners - KDnuggets

#artificialintelligence

If you're new to the world of data and have a particular interest in NLP (Natural Language Processing), you're probably looking for resources to help grasp a better understanding. You have probably come across so many different research papers and are sitting there confused about which one to choose. Because let's face it, they're not short and they do consume a lot of brain power. So it would be smart to choose the right one that will benefit your path to mastering NLP. I have done some research and have collected a few NLP research papers that have been highly recommended for newbies in the NLP area and overall NLP knowledge.


Hyperbolic Centroid Calculations for Text Classification

arXiv.org Artificial Intelligence

A new development in NLP is the construction of hyperbolic word embeddings. As opposed to their Euclidean counterparts, hyperbolic embeddings are represented not by vectors, but by points in hyperbolic space. This makes the most common basic scheme for constructing document representations, namely the averaging of word vectors, meaningless in the hyperbolic setting. We reinterpret the vector mean as the centroid of the points represented by the vectors, and investigate various hyperbolic centroid schemes and their effectiveness at text classification.


Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks

arXiv.org Machine Learning

Word2Vec is a prominent tool for Natural Language Processing (NLP) tasks. Similar inspiration is found in distributed embeddings for state-of-the-art (sota) deep neural networks. However, wrong combination of hyper-parameters can produce poor quality vectors. The objective of this work is to show optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the original model released by Mikolov. Both intrinsic and extrinsic (downstream) evaluations, including Named Entity Recognition (NER) and Sentiment Analysis (SA) were carried out. The downstream tasks reveal that the best model is task-specific, high analogy scores don't necessarily correlate positively with F1 scores and the same applies for more data. Increasing vector dimension size after a point leads to poor quality or performance. If ethical considerations to save time, energy and the environment are made, then reasonably smaller corpora may do just as well or even better in some cases. Besides, using a small corpus, we obtain better human-assigned WordSim scores, corresponding Spearman correlation and better downstream (NER & SA) performance compared to Mikolov's model, trained on 100 billion word corpus.


Towards non-toxic landscapes: Automatic toxic comment detection using DNN

arXiv.org Machine Learning

The spectacular expansion of the Internet led to the development of a new research problem in the natural language processing field: automatic toxic comment detection, since many countries prohibit hate speech in public media. There is no clear and formal definition of hate, offensive, toxic and abusive speeches. In this article, we put all these terms under the "umbrella" of toxic speech. The contribution of this paper is the design of binary classification and regression-based approaches aiming to predict whether a comment is toxic or not. We compare different unsupervised word representations and different DNN classifiers. Moreover, we study the robustness of the proposed approaches to adversarial attacks by adding one (healthy or toxic) word. We evaluate the proposed methodology on the English Wikipedia Detox corpus. Our experiments show that using BERT fine-tuning outperforms feature-based BERT, Mikolov's word embedding or fastText representations with different DNN classifiers.


Word Embedding for Response-To-Text Assessment of Evidence

arXiv.org Artificial Intelligence

Manually grading the Response to Text Assessment (RTA) is labor intensive. Therefore, an automatic method is being developed for scoring analytical writing when the RTA is administered in large numbers of classrooms. Our long-term goal is to also use this scoring method to provide formative feedback to students and teachers about students' writing quality. As a first step towards this goal, interpretable features for automatically scoring the evidence rubric of the RTA have been developed. In this paper, we present a simple but promising method for improving evidence scoring by employing the word embedding model. We evaluate our method on corpora of responses written by upper elementary students.


You have no idea what artificial intelligence really does

#artificialintelligence

WHEN SOPHIA THE ROBOT first switched on, the world couldn't get enough. It had a cheery personality, it joked with late-night hosts, it had facial expressions that echoed our own. Here it was, finally -- a robot plucked straight out of science fiction, the closest thing to true artificial intelligence that we had ever seen. There's no doubt that Sophia is an impressive piece of engineering. It didn't take much to convince people of Sophia's apparent humanity -- many of Futurism's own articles refer to the robot as "her."


You have no idea what artificial intelligence really does

#artificialintelligence

But as Sophia became more popular and people took a closer look, cracks emerged. It became harder to believe that Sophia was the all-encompassing artificial intelligence that we all wanted it to be. Over time, articles that might have once oohed and ahhed about Sophia's conversational skills became more focused on the fact that they were partially scripted in advance. Ben Goertzel, CEO of SingularityNET and Chief Scientist of Hanson Robotics, isn't under any illusions about what Sophia is capable of. "Sophia and the other Hanson robots are not really'pure' as computer science research systems, because they combine so many different pieces and aspects in complex ways. They are not pure learning systems, but they do involve learning on various levels (learning in their neural net visual systems, learning in their OpenCog dialogue systems, etc.)," he told Futurism.