AITopics

2409.05405

Country:

Asia > China > Chongqing Province > Chongqing (0.05)
Asia > China > Hong Kong (0.04)
Africa > Togo (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

arXiv.org Artificial IntelligenceSep-10-2024

QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval

Kandula, Hemanth, Karakos, Damianos, Qiu, Haoling, Rozonoyer, Benjamin, Soboroff, Ian, Tarlin, Lee, Min, Bonan

Frequently, users of an Information Retrieval (IR) system start with an overarching information need (a.k.a., an analytic task) and proceed to define finer-grained queries covering various important aspects (i.e., sub-topics) of that analytic task. We present a novel, interactive system called $\textit{QueryBuilder}$, which allows a novice, English-speaking user to create queries with a small amount of effort, through efficient exploration of an English development corpus in order to rapidly develop cross-lingual information retrieval queries corresponding to the user's information needs. QueryBuilder performs near real-time retrieval of documents based on user-entered search terms; the user looks through the retrieved documents and marks sentences as relevant to the information needed. The marked sentences are used by the system as additional information in query formation and refinement: query terms (and, optionally, event features, which capture event $'triggers'$ (indicator terms) and agent/patient roles) are appropriately weighted, and a neural-based system, which better captures textual meaning, retrieves other relevant content. The process of retrieval and marking is repeated as many times as desired, giving rise to increasingly refined queries in each iteration. The final product is a fine-grained query used in Cross-Lingual Information Retrieval (CLIR). Our experiments using analytic tasks and requests from the IARPA BETTER IR datasets show that with a small amount of effort (at most 10 minutes per sub-topic), novice users can form $\textit{useful}$ fine-grained queries including in languages they don't understand. QueryBuilder also provides beneficial capabilities to the traditional corpus exploration and query formation process. A demonstration video is released at https://vimeo.com/734795835

query, querybuilder, retrieval, (13 more...)

2409.04667

Country:

North America > United States > Virginia > Fairfax County > McLean (0.04)
North America > United States > Michigan > Genesee County > Flint (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > North Korea (0.04)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Government > Military (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Zhang, Linfeng, Hu, Changyue, Quan, Zhiyu

NLP-Powered Repository and Search Engine for Academic Papers: A Case Study on Cyber Risk Literature with CyLit

arXiv.org Artificial IntelligenceSep-10-2024

As the body of academic literature continues to grow, researchers face increasing difficulties in effectively searching for relevant resources. Existing databases and search engines often fall short of providing a comprehensive and contextually relevant collection of academic literature. To address this issue, we propose a novel framework that leverages Natural Language Processing (NLP) techniques. This framework automates the retrieval, summarization, and clustering of academic literature within a specific research domain. To demonstrate the effectiveness of our approach, we introduce CyLit, an NLP-powered repository specifically designed for the cyber risk literature. CyLit empowers researchers by providing access to context-specific resources and enabling the tracking of trends in the dynamic and rapidly evolving field of cyber risk. Through the automatic processing of large volumes of data, our NLP-powered solution significantly enhances the efficiency and specificity of academic literature searches. We compare the literature categorization results of CyLit to those presented in survey papers or generated by ChatGPT, highlighting the distinctive insights this tool provides into cyber risk research literature. Using NLP techniques, we aim to revolutionize the way researchers discover, analyze, and utilize academic resources, ultimately fostering advancements in various domains of knowledge.

academic paper, cyber risk literature, nlp-powered repository and search engine, (2 more...)

2409.06226

Genre: Research Report (0.69)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.60)

Acharya, Arkadeep, Murthy, Rudra, Kumar, Vishwajeet, Sen, Jaydeep

NLLB-E5: A Scalable Multilingual Retrieval Model

Despite significant progress in multilingual information retrieval, the lack of models capable of effectively supporting multiple languages, particularly low-resource like Indic languages, remains a critical challenge. This paper presents NLLB-E5: A Scalable Multilingual Retrieval Model. NLLB-E5 leverages the in-built multilingual capabilities in the NLLB encoder for translation tasks. It proposes a distillation approach from multilingual retriever E5 to provide a zero-shot retrieval approach handling multiple languages, including all major Indic languages, without requiring multilingual training data. We evaluate the model on a comprehensive suite of existing benchmarks, including Hindi-BEIR, highlighting its robust performance across diverse languages and tasks. Our findings uncover task and domain-specific challenges, providing valuable insights into the retrieval performance, especially for low-resource languages. NLLB-E5 addresses the urgent need for an inclusive, scalable, and language-agnostic text retrieval model, advancing the field of multilingual information access and promoting digital inclusivity for millions of users globally.

dataset, encoder, nllb-e5 model, (12 more...)

2409.05401

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation

Gokhan, Tuba, Wang, Kexin, Gurevych, Iryna, Briscoe, Ted

Regulatory documents, issued by governmental regulatory bodies, establish rules, guidelines, and standards that organizations must adhere to for legal compliance. These documents, characterized by their length, complexity and frequent updates, are challenging to interpret, requiring significant allocation of time and expertise on the part of organizations to ensure ongoing compliance.Regulatory Natural Language Processing (RegNLP) is a multidisciplinary subfield aimed at simplifying access to and interpretation of regulatory rules and obligations. We define an Automated Question-Passage Generation task for RegNLP, create the ObliQA dataset containing 27,869 questions derived from the Abu Dhabi Global Markets (ADGM) financial regulation document collection, design a baseline Regulatory Information Retrieval and Answer Generation system, and evaluate it with RePASs, a novel evaluation metric that tests whether generated answers accurately capture all relevant obligations and avoid contradictions.

computational linguistic, dataset, source passage, (15 more...)

2409.05677

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.24)
Asia > Singapore (0.05)
North America > Canada (0.04)
(9 more...)

Genre:

Research Report (0.82)
Workflow (0.68)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Banking & Finance > Trading (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.71)
(2 more...)

Valentini, Francisco, Cotik, Viviana, Furman, Damián, Bercovich, Ivan, Altszyler, Edgar, Pérez, Juan Manuel

MessIRve: A Large-Scale Spanish Information Retrieval Dataset

Information retrieval (IR) is the task of finding relevant documents in response to a user query. Although Spanish is the second most spoken native language, current IR benchmarks lack Spanish data, hindering the development of information access tools for Spanish speakers. We introduce MessIRve, a large-scale Spanish IR dataset with around 730 thousand queries from Google's autocomplete API and relevant documents sourced from Wikipedia. MessIRve's queries reflect diverse Spanish-speaking regions, unlike other datasets that are translated from English or do not consider dialectal variations. The large size of the dataset allows it to cover a wide variety of topics, unlike smaller datasets. We provide a comprehensive description of the dataset, comparisons with existing datasets, and baseline evaluations of prominent IR models. Our contributions aim to advance Spanish IR research and improve information access for Spanish speakers.

dataset, messirve, query, (15 more...)

2409.05994

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
North America > Mexico (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
(34 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Tian, Yuan, Kummerfeld, Jonathan K., Li, Toby Jia-Jun, Zhang, Tianyi

SQLucid: Grounding Natural Language Database Queries with Interactive Explanations

Though recent advances in machine learning have led to significant improvements in natural language interfaces for databases, the accuracy and reliability of these systems remain limited, especially in high-stakes domains. This paper introduces SQLucid, a novel user interface that bridges the gap between non-expert users and complex database querying processes. SQLucid addresses existing limitations by integrating visual correspondence, intermediate query results, and editable step-by-step SQL explanations in natural language to facilitate user understanding and engagement. This unique blend of features empowers users to understand and refine SQL queries easily and precisely. Two user studies and one quantitative experiment were conducted to validate SQLucid's effectiveness, showing significant improvement in task completion accuracy and user confidence compared to existing interfaces. Our code is available at https://github.com/magic-YuanTian/SQLucid.

explanation, query, sqlucid, (12 more...)

2409.06178

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Haiti (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
(23 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Education (0.93)
Transportation > Air (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.82)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(2 more...)

Lee, Edward C, Salley, Daniel, Sharma, Abhishek, Cronin, Leroy

AI-Driven Robotic Crystal Explorer for Rapid Polymorph Identification

arXiv.org Artificial IntelligenceSep-8-2024

Crystallisation is an important phenomenon which facilitates the purification as well as structural and bulk phase material characterisation using crystallographic methods. However, different conditions can lead to a vast set of different crystal structure polymorphs and these often exhibit different physical properties, allowing materials to be tailored to specific purposes. This means the high dimensionality that can result from variations in the conditions which affect crystallisation, and the interaction between them, means that exhaustive exploration is difficult, time-consuming, and costly to explore. Herein we present a robotic crystal search engine for the automated and efficient high-throughput approach to the exploration of crystallisation conditions. The system comprises a closed-loop computer crystal-vision system that uses machine learning to both identify crystals and classify their identity in a multiplexed robotic platform. By exploring the formation of a well-known polymorph, we were able to show how a robotic system could be used to efficiently search experimental space as a function of relative polymorph amount and efficiently create a high dimensionality phase diagram with minimal experimental budget and without expensive analytical techniques such as crystallography. In this way, we identify the set of polymorphs possible within a set of experimental conditions, as well as the optimal values of these conditions to grow each polymorph.

classifier, experiment, polymorph, (15 more...)

2409.05196

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Portugal > Braga > Braga (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.64)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.72)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceSep-6-2024

GraphEx: A Graph-based Extraction Method for Advertiser Keyphrase Recommendation

Mishra, Ashirbad, Dey, Soumik, Wu, Marshall, Zhao, Jinyu, Yu, He, Ni, Kaichen, Li, Binbin, Madduri, Kamesh

Online sellers and advertisers are recommended keyphrases for their listed products, which they bid on to enhance their sales. One popular paradigm that generates such recommendations is Extreme Multi-Label Classification (XMC), which involves tagging/mapping keyphrases to items. We outline the limitations of using traditional item-query based tagging or mapping techniques for keyphrase recommendations on E-Commerce platforms. We introduce GraphEx, an innovative graph-based approach that recommends keyphrases to sellers using extraction of token permutations from item titles. Additionally, we demonstrate that relying on traditional metrics such as precision/recall can be misleading in practical applications, thereby necessitating a combination of metrics to evaluate performance in real-world scenarios. These metrics are designed to assess the relevance of keyphrases to items and the potential for buyer outreach. GraphEx outperforms production models at eBay, achieving the objectives mentioned above. It supports near real-time inferencing in resource-constrained production environments and scales effectively for billions of items.

graphex, keyphrase, recommendation, (15 more...)

2409.0314

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)
(7 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Services (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management > Search (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Salatino, Angelo, Aggarwal, Tanay, Mannocci, Andrea, Osborne, Francesco, Motta, Enrico

A Survey on Knowledge Organization Systems of Research Fields: Resources and Challenges

arXiv.org Artificial IntelligenceSep-6-2024

Knowledge Organization Systems (KOSs), such as term lists, thesauri, taxonomies, and ontologies, play a fundamental role in categorising, managing, and retrieving information. In the academic domain, KOSs are often adopted for representing research areas and their relationships, primarily aiming to classify research articles, academic courses, patents, books, scientific venues, domain experts, grants, software, experiment materials, and several other relevant products and agents. These structured representations of research areas, widely embraced by many academic fields, have proven effective in empowering AI-based systems to i) enhance retrievability of relevant documents, ii) enable advanced analytic solutions to quantify the impact of academic research, and iii) analyse and forecast research dynamics. This paper aims to present a comprehensive survey of the current KOS for academic disciplines. We analysed and compared 45 KOSs according to five main dimensions: scope, structure, curation, usage, and links to other KOSs. Our results reveal a very heterogeneous scenario in terms of scope, scale, quality, and usage, highlighting the need for more integrated solutions for representing research knowledge across academic fields. We conclude by discussing the main challenges and the most promising future directions.

koss, ontology, thesaurus, (11 more...)

2409.04432

Country:

Oceania > New Zealand (0.14)
North America > Canada > Alberta (0.14)
North America > United States > New York > New York County > New York City (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Education (1.00)
Government > Regional Government (0.94)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)