New Delhi, Jan 11 (IANS) When it comes to programming languages in India, Python is most popular among the students for its role in Artificial Intelligence (AI) applications, data science, Machine Learning (ML) and data analytics, US-based online education company Coursera has said. Python dominated the top 10 list with courses like "Programming for Everybody", "Python Data Structures", "Python for Data Science and AI" and more. Python is also easy to get started with, offers a lot of flexibility and is versatile. "Its open source nature makes it easy to learn. A large number libraries for tasks like web development, text processing, calculations add to its appeal," the repor said.
Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well.
New Delhi: When it comes to programming languages in India, Python is most popular among the students for its role in Artificial Intelligence (AI) applications, data science, Machine Learning (ML) and data analytics, US-based online education company Coursera has said. Python dominated the top 10 list with courses like'Programming for Everybody', 'Python Data Structures', 'Python for Data Science and AI' and more. Python is also easy to get started with, offers a lot of flexibility and is versatile. "Its open source nature makes it easy to learn. A large number libraries for tasks like web development, text processing, calculations add to its appeal," the repor said.
New Delhi: When it comes to programming languages in India, Python is most popular among the students for its role in Artificial Intelligence (AI) applications, data science, Machine Learning (ML) and data analytics, US-based online education company Coursera has said. Python dominated the top 10 list with courses like'Programming for Everybody', 'Python Data Structures', 'Python for Data Science and AI' and more. Python is also easy to get started with, offers a lot of flexibility and is versatile. "Its open-source nature makes it easy to learn. A large number libraries for tasks like web development, text processing, calculations add to its appeal," the report said.
Transfer learning --- transferring learned knowledge --- has brought a paradigm shift in the way models are trained. The lucrative benefits of improved accuracy and reduced training time have shown promise in training models with constrained computational resources and fewer training samples. Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus of datasets have seen ubiquitous adoption in practice. In this paper, we ask, "can transfer learning in text prediction models be exploited to perform misclassification attacks?" As our main contribution, we present novel attack techniques that utilize unintended features learnt in the teacher (public) model to generate adversarial examples for student (downstream) models. To the best of our knowledge, ours is the first work to show that transfer learning from state-of-the-art word-based and sentence-based teacher models increase the susceptibility of student models to misclassification attacks. First, we propose a novel word-score based attack algorithm for generating adversarial examples against student models trained using context-free word-level embedding model. On binary classification tasks trained using the GloVe teacher model, we achieve an average attack accuracy of 97% for the IMDB Movie Reviews and 80% for the Fake News Detection. For multi-class tasks, we divide the Newsgroup dataset into 6 and 20 classes and achieve an average attack accuracy of 75% and 41% respectively. Next, we present length-based and sentence-based misclassification attacks for the Fake News Detection task trained using a context-aware BERT model and achieve 78% and 39% attack accuracy respectively. Thus, our results motivate the need for designing training techniques that are robust to unintended feature learning, specifically for transfer learned models.
Two forces are driving a surge in the use of machine learning technology and other artificial intelligence-enabling technologies, according to industry analysts: the astounding growth of unstructured content and the use of robotic process automation (RPA) to automate content-related processes. Cognilytica says that between documents, images, emails, online data and videos, up to 90% of the content in the enterprise is in the form of unstructured data, which is growing at an astounding 55% to 65% per year. Go here to see a listing of eWEEK's Top Predictive Analytics Companies. Consequently, Everest Group Research says intelligent automation technologies are using ML where RPA intersects and interoperates with content-related processes. The development of ML technologies has given rise to the ability to extract more information and intelligence from the wide range of content in the enterprise, whether structured or unstructured.
Manually grading the Response to Text Assessment (RTA) is labor intensive. Therefore, an automatic method is being developed for scoring analytical writing when the RTA is administered in large numbers of classrooms. Our long-term goal is to also use this scoring method to provide formative feedback to students and teachers about students' writing quality. As a first step towards this goal, interpretable features for automatically scoring the evidence rubric of the RTA have been developed. In this paper, we present a simple but promising method for improving evidence scoring by employing the word embedding model. We evaluate our method on corpora of responses written by upper elementary students.
Automatic estimation of relative difficulty of a pair of questions is an important and challenging problem in community question answering (CQA) services. There are limited studies which addressed this problem. Past studies mostly leveraged expertise of users answering the questions and barely considered other properties of CQA services such as metadata of users and posts, temporal information and textual content. In this paper, we propose DiffQue, a novel system that maps this problem to a network-aided edge directionality prediction problem. DiffQue starts by constructing a novel network structure that captures different notions of difficulties among a pair of questions. It then measures the relative difficulty of two questions by predicting the direction of a (virtual) edge connecting these two questions in the network. It leverages features extracted from the network structure, metadata of users/posts and textual description of questions and answers. Experiments on datasets obtained from two CQA sites (further divided into four datasets) with human annotated ground-truth show that DiffQue outperforms four state-of-the-art methods by a significant margin (28.77% higher F1 score and 28.72% higher AUC than the best baseline). As opposed to the other baselines, (i) DiffQue appropriately responds to the training noise, (ii) DiffQue is capable of adapting multiple domains (CQA datasets), and (iii) DiffQue can efficiently handle 'cold start' problem which may arise due to the lack of information for newly posted questions or newly arrived users.
In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.
Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of "what should one learn first," we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce LectureBank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available. The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset.