Goto

Collaborating Authors

 glen






GLEN: Generative Retrieval via Lexical Index Learning

arXiv.org Artificial Intelligence

Generative retrieval shed light on a new paradigm of document retrieval, aiming to directly generate the identifier of a relevant document for a query. While it takes advantage of bypassing the construction of auxiliary index structures, existing studies face two significant challenges: (i) the discrepancy between the knowledge of pre-trained language models and identifiers and (ii) the gap between training and inference that poses difficulty in learning to rank. To overcome these challenges, we propose a novel generative retrieval method, namely Generative retrieval via LExical iNdex learning (GLEN). For training, GLEN effectively exploits a dynamic lexical identifier using a two-phase index learning strategy, enabling it to learn meaningful lexical identifiers and relevance signals between queries and documents. For inference, GLEN utilizes collision-free inference, using identifier weights to rank documents without additional overhead. Experimental results prove that GLEN achieves state-of-the-art or competitive performance against existing generative retrieval methods on various benchmark datasets, e.g., NQ320k, MS MARCO, and BEIR. The code is available at https://github.com/skleee/GLEN.


GLEN: General-Purpose Event Detection for Thousands of Types

arXiv.org Artificial Intelligence

The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today's largest event dataset. GLEN is created by utilizing the DWD Overlay, which provides a mapping between Wikidata Qnodes and PropBank rolesets. This enables us to use the abundant existing annotation for PropBank as distant supervision. In addition, we also propose a new multi-stage event detection model CEDAR specifically designed to handle the large ontology size in GLEN. We show that our model exhibits superior performance compared to a range of baselines including InstructGPT. Finally, we perform error analysis and show that label noise is still the largest challenge for improving performance for this new dataset. Our dataset, code, and models are released at \url{https://github.com/ZQS1943/GLEN}.}


Graph Laplacian Learning with Exponential Family Noise

arXiv.org Artificial Intelligence

A common challenge in applying graph machine learning methods is that the underlying graph Learning the graph structure underlying a set of smooth signals of a system is often unknown. Although different is a classical problem in GSP. Well-established methods graph inference methods have been proposed optimize a graph representation, usually the graph adjacency for continuous graph signals, inferring the matrix or the graph Laplacian, so that the total variation of graph structure underlying other types of data, given signals will be minimal on the learned graph (Dong such as discrete counts, is under-explored. In et al., 2016; Kalofolias, 2016; Egilmez et al., 2017; Kumar this paper, we generalize a graph signal processing et al., 2020). However, smooth graph signals are rarely (GSP) framework for learning a graph from encountered in the real world and one is often required to smooth graph signals to the exponential family deal with noisy signals.


Unsupervised Domain Adaptation with Global and Local Graph Neural Networks in Limited Labeled Data Scenario: Application to Disaster Management

arXiv.org Artificial Intelligence

Identification and categorization of social media posts generated during disasters are crucial to reduce the sufferings of the affected people. However, lack of labeled data is a significant bottleneck in learning an effective categorization system for a disaster. This motivates us to study the problem as unsupervised domain adaptation (UDA) between a previous disaster with labeled data (source) and a current disaster (target). However, if the amount of labeled data available is limited, it restricts the learning capabilities of the model. To handle this challenge, we utilize limited labeled data along with abundantly available unlabeled data, generated during a source disaster to propose a novel two-part graph neural network. The first-part extracts domain-agnostic global information by constructing a token level graph across domains and the second-part preserves local instance-level semantics. In our experiments, we show that the proposed method outperforms state-of-the-art techniques by $2.74\%$ weighted F$_1$ score on average on two standard public dataset in the area of disaster management. We also report experimental results for granular actionable multi-label classification datasets in disaster domain for the first time, on which we outperform BERT by $3.00\%$ on average w.r.t weighted F$_1$. Additionally, we show that our approach can retain performance when very limited labeled data is available.


Short Attention Span Theater: A Quick Look At Quibi's Launch Titles

NPR Technology

Two chefs get blasted in the face with a mystery entree on Dishmantled. Two chefs get blasted in the face with a mystery entree on Dishmantled. The streaming service Quibi -- short for "quick bites" -- calls itself "the first entertainment platform designed specifically for your phone." Perfect for the busy, distracted, on-the-go consumer! Too bad none of us are on-the-going anywhere, these days.


Impact of Digital Transformation & AI on Precision Medicine by Mr Glen de Vries

#artificialintelligence

More data was collected in the life sciences industry in 2017 than ever before. With the rise of modern information systems and digital technologies like artificial intelligence, data analytics and data science capabilities have become critical for research and decision making across industries and academia. In this talk, Mr Glen de Vries, President and Co-Founder of Medidata Solutions speaks about his career journey from being a lab scientist to President and Co-Founder of one of the largest public technology companies founded in New York and the lessons he has learnt. The role of digital transformation and artificial intelligence in the age of precision medicine, including how the industry can use it to improve and speed up clinical trial processes and decision making for accelerated outcomes for patients What the life sciences and clinical trials industry will look like in the next 10 years plus the skills and talent that the industry will need Mr Glen de Vries is the President and Co-founder of Medidata Solutions, the leading cloud platform for life sciences research. He has been driving Medidata's mission since the company's inception in 1999: Powering smarter treatments and healthier people.