AITopics

2310.15511

Country:

North America > United States > California > San Diego County > San Diego (0.24)
North America > Central America (0.04)
South America > Uruguay (0.04)
(14 more...)

Genre:

Personal > Honors (0.46)
Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry:

Government (0.46)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fransiska, Mega, Pitaloka, Diah, Saripudin, null, Putra, Satrio, Sutawika, Lintang

Utilizing Weak Supervision To Generate Indonesian Conservation Dataset

arXiv.org Artificial IntelligenceOct-24-2023

Weak supervision has emerged as a promising approach for rapid and large-scale dataset creation in response to the increasing demand for accelerated NLP development. By leveraging labeling functions, weak supervision allows practitioners to generate datasets quickly by creating learned label models that produce soft-labeled datasets. This paper aims to show how such an approach can be utilized to build an Indonesian NLP dataset from conservation news text. We construct two types of datasets: multi-class classification and sentiment classification. We then provide baseline experiments using various pretrained language models. These baseline results demonstrate test performances of 59.79% accuracy and 55.72% F1-score for sentiment classification, 66.87% F1-score-macro, 71.5% F1-score-micro, and 83.67% ROC-AUC for multi-class classification. Additionally, we release the datasets and labeling functions used in this work for further research and exploration.

classification, dataset, keyword, (12 more...)

2310.11258

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Indonesia (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Law (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

arXiv.org Artificial IntelligenceOct-24-2023

Beyond Semantics: Learning a Behavior Augmented Relevance Model with Self-supervised Learning

Chen, Zeyuan, Chen, Wei, Xu, Jia, Liu, Zhongyi, Zhang, Wei

Relevance modeling aims to locate desirable items for corresponding queries, which is crucial for search engines to ensure user experience. Although most conventional approaches address this problem by assessing the semantic similarity between the query and item, pure semantic matching is not everything. In reality, auxiliary query-item interactions extracted from user historical behavior data of the search log could provide hints to reveal users' search intents further. Drawing inspiration from this, we devise a novel Behavior Augmented Relevance Learning model for Alipay Search (BARL-ASe) that leverages neighbor queries of target item and neighbor items of target query to complement target query-item semantic matching. Specifically, our model builds multi-level co-attention for distilling coarse-grained and fine-grained semantic representations from both neighbor and target views. The model subsequently employs neighbor-target self-supervised learning to improve the accuracy and robustness of BARL-ASe by strengthening representation and logit learning. Furthermore, we discuss how to deal with the long-tail query-item matching of the mini apps search scenario of Alipay practically. Experiments on real-world industry data and online A/B testing demonstrate our proposal achieves promising performance with low latency.

behavior neighbor, proceedings, representation, (13 more...)

2308.05379

Country:

Europe > United Kingdom > England > West Midlands > Birmingham (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Inoue, Shumpei, Nguyen, Minh-Tien, Mizokuchi, Hiroki, Nguyen, Tuan-Anh D., Nguyen, Huu-Hiep, Le, Dung Tien

Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures

This paper introduces a new IncidentAI dataset for safety prevention. Different from prior corpora that usually contain a single task, our dataset comprises three tasks: named entity recognition, cause-effect extraction, and information retrieval. The dataset is annotated by domain experts who have at least six years of practical experience as high-pressure gas conservation managers. We validate the contribution of the dataset in the scenario of safety prevention. Preliminary results on the three tasks show that NLP techniques are beneficial for analyzing incident reports to prevent future failures. The dataset facilitates future research in NLP and incident management communities. The access to the dataset is also provided (the IncidentAI dataset is available at: https://github.com/Cinnamon/incident-ai-dataset).

information retrieval, large language model, machine learning, (23 more...)

2310.12074

Country:

North America > United States (1.00)
Asia > Vietnam (0.28)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Oil & Gas (1.00)
Transportation (0.70)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Hoseini, Sayed, Theissen-Lipp, Johannes, Quix, Christoph

Semantic Data Management in Data Lakes

In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies.

data lake, data source, semantic model, (15 more...)

2310.15373

Country:

North America > Canada > Ontario (0.04)
Europe > Germany (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (0.92)
Energy (0.67)
Leisure & Entertainment (0.67)
Media > Film (0.45)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning

Wang, Hao, Chen, Xiahua, Wang, Rui, Chu, Chenhui

Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resulting in a limited capacity to capture these non-textual features and suboptimal performance. In this paper, we propose a novel \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y \textbf{L}earning (\textsc{Vancl}) approach that addresses the above limitation by enhancing the model's ability to capture fine-grained visual and layout features through the incorporation of color priors. Experimental results on benchmark datasets show that our approach substantially outperforms the strong LayoutLM series baseline, demonstrating the effectiveness of our approach. Additionally, we investigate the effects of different color schemes on our approach, providing insights for optimizing model performance. We believe our work will inspire future research on multimodal information extraction.

computational linguistic, information, proceedings, (13 more...)

2310.14785

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York > New York County > New York City (0.04)
(11 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.64)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)

Zero-shot Query Reformulation for Conversational Search

Yang, Dayu, Zhang, Yue, Fang, Hui

As the popularity of voice assistants continues to surge, conversational search has gained increased attention in Information Retrieval. However, data sparsity issues in conversational search significantly hinder the progress of supervised conversational search methods. Consequently, researchers are focusing more on zero-shot conversational search approaches. Nevertheless, existing zero-shot methods face three primary limitations: they are not universally applicable to all retrievers, their effectiveness lacks sufficient explainability, and they struggle to resolve common conversational ambiguities caused by omission. To address these limitations, we introduce a novel Zero-shot Query Reformulation (ZeQR) framework that reformulates queries based on previous dialogue contexts without requiring supervision from conversational search data. Specifically, our framework utilizes language models designed for machine reading comprehension tasks to explicitly resolve two common ambiguities: coreference and omission, in raw queries. In comparison to existing zero-shot methods, our approach is universally applicable to any retriever without additional adaptation or indexing. It also provides greater explainability and effectively enhances query intent understanding because ambiguities are explicitly and proactively resolved. Through extensive experiments on four TREC conversational datasets, we demonstrate the effectiveness of our method, which consistently outperforms state-of-the-art baselines.

ambiguity, conversational search, query, (13 more...)

doi: 10.1145/3578337.3605143

2307.09384

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
Asia > Taiwan > Taiwan Province > Taipei (0.05)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Soares, Livio Baldini, Gillick, Daniel, Cole, Jeremy R., Kwiatkowski, Tom

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-of-the art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with Language models) as a model architecture that is compatible with recent encoder-decoder and decoder-only large language models, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries.

query, representation, retrieval, (16 more...)

2305.14499

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.93)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

EDIS: Entity-Driven Image Search over Multimodal Web Content

Liu, Siqi, Feng, Weixi, Fu, Tsu-jui, Chen, Wenhu, Wang, William Yang

Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion. In this work, we introduce \textbf{E}ntity-\textbf{D}riven \textbf{I}mage \textbf{S}earch (EDIS), a challenging dataset for cross-modal image search in the news domain. EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description. Unlike datasets that assume a small set of single-modality candidates, EDIS reflects real-world web image search scenarios by including a million multimodal image-text pairs as candidates. EDIS encourages the development of retrieval models that simultaneously address cross-modal information fusion and matching. To achieve accurate ranking results, a model must: 1) understand named entities and events from text queries, 2) ground entities onto images or text descriptions, and 3) effectively fuse textual and visual representations. Our experimental results show that EDIS challenges state-of-the-art methods with dense entities and a large-scale candidate set. The ablation study also proves that fusing textual features with visual features is critical in improving retrieval results.

dataset, query, retrieval, (14 more...)

2305.13631

Country:

Europe > United Kingdom (0.14)
Asia > South Korea (0.14)
Asia > Cambodia > Preah Sihanouk Province > Sihanoukville (0.04)
(11 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance (0.93)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Liu, Nelson F., Zhang, Tianyi, Liang, Percy

Evaluating Verifiability in Generative Search Engines

Generative search engines directly generate responses to user queries, along with in-line citations. A prerequisite trait of a trustworthy generative search engine is verifiability, i.e., systems should cite comprehensively (high citation recall; all statements are fully supported by citations) and accurately (high citation precision; every cite supports its associated statement). We conduct human evaluation to audit four popular generative search engines -- Bing Chat, NeevaAI, perplexity.ai, and YouChat -- across a diverse set of queries from a variety of sources (e.g., historical Google user queries, dynamically-collected open-ended questions on Reddit, etc.). We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations: on average, a mere 51.5% of generated sentences are fully supported by citations and only 74.5% of citations support their associated sentence. We believe that these results are concerningly low for systems that may serve as a primary tool for information-seeking users, especially given their facade of trustworthiness. We hope that our results further motivate the development of trustworthy generative search engines and help researchers and users better understand the shortcomings of existing commercial systems.

precision, query, search engine, (16 more...)

2304.09848

Country:

Europe > United Kingdom (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Trinidad and Tobago (0.04)
Europe > Spain (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (0.94)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)