AITopics

2302.05454

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceJan-24-2023

Weakly Supervised Headline Dependency Parsing

Benton, Adrian, Shi, Tianze, İrsoy, Ozan, Malioutov, Igor

English news headlines form a register with unique syntactic properties that have been documented in linguistics literature since the 1930s. However, headlines have received surprisingly little attention from the NLP syntactic parsing community. We aim to bridge this gap by providing the first news headline corpus of Universal Dependencies annotated syntactic dependency trees, which enables us to evaluate existing state-of-the-art dependency parsers on news headlines. To improve English news headline parsing accuracies, we develop a projection method to bootstrap silver training data from unlabeled news headline-article lead sentence pairs. Models trained on silver headline parses demonstrate significant improvements in performance over models trained solely on gold-annotated long-form texts. Ultimately, we find that, although projected silver training data improves parser performance across different news outlets, the improvement is moderated by constructions idiosyncratic to outlet.

artificial intelligence, machine learning, natural language, (17 more...)

2301.10371

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry: Media > News (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceApr-28-2021

Diversity-Aware Batch Active Learning for Dependency Parsing

Shi, Tianze, Benton, Adrian, Malioutov, Igor, İrsoy, Ozan

While the predictive performance of modern statistical dependency parsers relies heavily on the availability of expensive expert-annotated treebank data, not all annotations contribute equally to the training of the parsers. In this paper, we attempt to reduce the number of labeled examples needed to train a strong dependency parser using batch active learning (AL). In particular, we investigate whether enforcing diversity in the sampled batches, using determinantal point processes (DPPs), can improve over their diversity-agnostic counterparts. Simulation experiments on an English newswire corpus show that selecting diverse batches with DPPs is superior to strong selection strategies that do not enforce batch diversity, especially during the initial stages of the learning process. Additionally, our diversityaware strategy is robust under a corpus duplication setting, where diversity-agnostic sampling strategies exhibit significant degradation.

computational linguistics, neural network, text processing, (18 more...)

2104.13936

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > California (0.46)

Genre: Research Report (0.64)

Industry:

Banking & Finance (0.93)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceApr-28-2021

Learning Syntax from Naturally-Occurring Bracketings

Shi, Tianze, İrsoy, Ozan, Malioutov, Igor, Lee, Lillian

Naturally-occurring bracketings, such as answer fragments to natural language questions and hyperlinks on webpages, can reflect human syntactic intuition regarding phrasal boundaries. Their availability and approximate correspondence to syntax make them appealing as distant information sources to incorporate into unsupervised constituency parsing. But they are noisy and incomplete; to address this challenge, we develop a partial-brackets-aware structured ramp loss in learning. Experiments demonstrate that our distantly-supervised models trained on naturally-occurring bracketing data are more accurate in inducing syntactic structures than competing unsupervised systems. On the English WSJ corpus, our models achieve an unlabeled F1 score of 68.9 for constituency parsing.

computational linguistics, deep learning, neural network, (19 more...)

2104.13933

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Pennsylvania (0.28)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningDec-30-2020

k\=oan: A Corrected CBOW Implementation

İrsoy, Ozan, Benton, Adrian, Stratos, Karl

It is a common belief in the NLP community that continuous bag-of-words (CBOW) word embeddings tend to underperform skip-gram (SG) embeddings. We find that this belief is founded less on theoretical differences in their training objectives but more on faulty CBOW implementations in standard software libraries such as the official implementation word2vec.c and Gensim. We show that our correct implementation of CBOW yields word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks while being more than three times as fast to train. We release our implementation, k\=oan, at https://github.com/bloomberg/koan.

implementation, neural network, text processing, (21 more...)

2012.15332

Country:

North America > United States > Louisiana (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.91)

arXiv.org Artificial IntelligenceOct-21-2020

Semantic Role Labeling as Syntactic Dependency Parsing

Shi, Tianze, Malioutov, Igor, İrsoy, Ozan

We reduce the task of (span-based) PropBank-style semantic role labeling (SRL) to syntactic dependency parsing. Our approach is motivated by our empirical analysis that shows three common syntactic patterns account for over 98% of the SRL annotations for both English and Chinese data. Based on this observation, we present a conversion scheme that packs SRL annotations into dependency tree representations through joint labels that permit highly accurate recovery back to the original format. This representation allows us to train statistical dependency parsers to tackle SRL and achieve competitive performance with the current state of the art. Our findings show the promise of syntactic dependency trees in encoding semantic role relations within their syntactic domain of locality, and point to potential further integration of syntactic methods into semantic role labeling in the future.

deep learning, neural network, proceedings, (21 more...)

2010.1117

Country:

Asia (1.00)
North America > United States > California (0.46)
North America > United States > Massachusetts (0.28)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningAug-2-2019

Dialogue Act Classification in Group Chats with DAG-LSTMs

İrsoy, Ozan, Gosangi, Rakesh, Zhang, Haimin, Wei, Mu-Hsin, Lund, Peter, Pappadopulo, Duccio, Fahy, Brendan, Nephytou, Neophytos, Ortiz, Camilo

Dialogue act (DA) classification has been studied for the past two decades and has several key applications such as workflow automation and conversation analytics. Researchers have used, to address this problem, various traditional machine learning models, and more recently deep neural network models such as hierarchical convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. In this paper, we introduce a new model architecture, directed-acyclic-graph LSTM (DAG-LSTM) for DA classification. A DAG-LSTM exploits the turn-taking structure naturally present in a multi-party conversation, and encodes this relation in its model structure. Using the STAC corpus, we show that the proposed method performs roughly 0.8% better in accuracy and 1.2% better in macro-F1 score when compared to existing methods. The proposed method is generic and not limited to conversation applications.

deep learning, neural network, utterance, (18 more...)

1908.01821

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre:

Overview (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningMay-1-2019

On Expected Accuracy

İrsoy, Ozan

We empirically investigate the (negative) expected accuracy as an alternative loss function to cross entropy (negative log likelihood) for classification tasks. Coupled with softmax activation, it has small derivatives over most of its domain, and is therefore hard to optimize. A modified, leaky version is evaluated on a variety of classification tasks, including digit recognition, image classification, sequence tagging and tree tagging, using a variety of neural architectures such as logistic regression, multilayer perceptron, CNN, LSTM and Tree-LSTM. We show that it yields comparable or better accuracy compared to cross entropy. Furthermore, the proposed objective is shown to be more robust to label noise.

accuracy, deep learning, neural network, (20 more...)

1905.00448

Genre:

Research Report > New Finding (0.36)
Research Report > Experimental Study (0.36)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-25-2018

Dropout Regularization in Hierarchical Mixture of Experts

İrsoy, Ozan, Alpaydın, Ethem

Dropout is a very effective method in preventing overfitting and has become the go-to regularizer for multi-layer neural networks in recent years. Hierarchical mixture of experts is a hierarchically gated model that defines a soft decision tree where leaves correspond to experts and decision nodes correspond to gating models that softly choose between its children, and as such, the model defines a soft hierarchical partitioning of the input space. In this work, we propose a variant of dropout for hierarchical mixture of experts that is faithful to the tree hierarchy defined by the model, as opposed to having a flat, unitwise independent application of dropout as one has with multi-layer perceptrons. We show that on a synthetic regression data and on MNIST and CIFAR-10 datasets, our proposed dropout mechanism prevents overfitting on trees with many levels improving generalization and providing smoother fits.

deep learning, dropout, neural network, (20 more...)

1812.10158

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningApr-6-2018

Continuously Constructive Deep Neural Networks

İrsoy, Ozan, Alpaydın, Ethem

Traditionally, deep learning algorithms update the network weights whereas the network architecture is chosen manually, using a process of trial and error. In this work, we propose two novel approaches that automatically update the network structure while also learning its weights. The novelty of our approach lies in our parameterization where the depth, or additional complexity, is encapsulated continuously in the parameter space through control parameters that add additional complexity. We propose two methods: In tunnel networks, this selection is done at the level of a hidden unit, and in budding perceptrons, this is done at the level of a network layer; updating this control parameter introduces either another hidden unit or another hidden layer. We show the effectiveness of our methods on the synthetic two-spirals data and on two real data sets of MNIST and MIRFLICKR, where we see that our proposed methods, with the same set of hyperparameters, can correctly adjust the network complexity to the task complexity.

deep learning, neural network, tunnel network, (20 more...)

1804.02491

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)