AITopics

2412.06649

Country:

Asia > India > NCT > New Delhi (0.05)
Asia > India > NCT > Delhi (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Haryana > Faridabad (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.49)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Jamalifard, Mohammadreza, Andreu-Perez, Javier, Hagras, Hani, López, Luis Martínez

Fuzzy Norm-Explicit Product Quantization for Recommender Systems

arXiv.org Artificial IntelligenceDec-8-2024

As the data resources grow, providing recommendations that best meet the demands has become a vital requirement in business and life to overcome the information overload problem. However, building a system suggesting relevant recommendations has always been a point of debate. One of the most cost-efficient techniques in terms of producing relevant recommendations at a low complexity is Product Quantization (PQ). PQ approaches have continued developing in recent years. This system's crucial challenge is improving product quantization performance in terms of recall measures without compromising its complexity. This makes the algorithm suitable for problems that require a greater number of potentially relevant items without disregarding others, at high-speed and low-cost to keep up with traffic. This is the case of online shops where the recommendations for the purpose are important, although customers can be susceptible to scoping other products. This research proposes a fuzzy approach to perform norm-based product quantization. Type-2 Fuzzy sets (T2FSs) define the codebook allowing sub-vectors (T2FSs) to be associated with more than one element of the codebook, and next, its norm calculus is resolved by means of integration. Our method finesses the recall measure up, making the algorithm suitable for problems that require querying at most possible potential relevant items without disregarding others. The proposed method outperforms all PQ approaches such as NEQ, PQ, and RQ up to +6%, +5%, and +8% by achieving a recall of 94%, 69%, 59% in Netflix, Audio, Cifar60k datasets, respectively. More and over, computing time and complexity nearly equals the most computationally efficient existing PQ method in the state-of-the-art.

data mining, information retrieval, machine learning, (22 more...)

doi: 10.1109/TFUZZ.2024.3365722

2412.06069

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Essex (0.04)
Europe > Spain > Andalusia > Jaén Province > Jaén (0.04)
(5 more...)

Genre:

Research Report (1.00)
Personal (1.00)
Overview (0.93)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.89)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
(2 more...)

arXiv.org Artificial IntelligenceDec-8-2024

CardOOD: Robust Query-driven Cardinality Estimation under Out-of-Distribution

Li, Rui, Zhao, Kangfei, Yu, Jeffrey Xu, Wang, Guoren

Query-driven learned estimators are accurate, flexible, and lightweight alternatives to traditional estimators in query optimization. However, existing query-driven approaches struggle with the Out-of-distribution (OOD) problem, where the test workload distribution differs from the training workload, leading to performancedegradation. In this paper, we present CardOOD, a general learning framework designed to construct robust query-driven cardinality estimators that are resilient against the OOD problem. Our framework focuses on offline training algorithms that develop one-off models from a static workload, suitable for model initialization and periodic retraining. In CardOOD, we extend classical transfer/robust learning techniques to train query-driven cardinalityestimators, and the algorithms fall into three categories: representation learning, data manipulation, and new learning strategies. As these learning techniques are originally evaluated in computervision tasks, we also propose a new learning algorithm that exploits the property of cardinality estimation. This algorithm, lying in the category of new learning strategy, models the partial order constraint of cardinalities by a self-supervised learning task. Comprehensive experimental studies demonstrate the efficacy of the algorithms of CardOOD in mitigating the OOD problem to varying extents. We further integrate CardOOD into PostgreSQL, showcasing its practical utility in query optimization.

artificial intelligence, machine learning, natural language, (17 more...)

2412.05864

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-7-2024

Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings

Mahmood, Razi, Yan, Pingkun, Reyes, Diego Machado, Wang, Ge, Kalra, Mannudeep K., Kaviani, Parisa, Wu, Joy T., Syeda-Mahmood, Tanveer

While some metrics cover clinical entities and their relations[9, 11], generally Several evaluation metrics have been developed recently to scoring metrics do not explicitly capture the textual mention automatically assess the quality of generative AI reports for differences in the anatomy, laterality and severity. Further, chest radiographs based only on textual information using phrasal grounding of the findings in terms of anatomical localization lexical, semantic, or clinical named entity recognition methods. in images is not exploited in the quality scoring. In this paper, we develop a new method of report quality In this paper, we propose a metric that captures both finegrained evaluation by first extracting fine-grained finding patterns textual descriptions of findings as well as their phrasal capturing the location, laterality, and severity of a large number grounding information in terms of anatomical locations in images. of clinical findings. We then performed phrasal grounding We present results that compare this evaluation metric to localize their associated anatomical regions on chest radiograph to other textual metrics on a gold standard dataset derived images. The textual and visual measures are then combined from MIMIC collection of chest X-rays and validated reports, to rate the quality of the generated reports. We present to show its robustness and sensitivity to factual errors.

ffl pattern, information retrieval, machine learning, (19 more...)

2412.01031

Country:

North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

arXiv.org Artificial IntelligenceDec-6-2024

A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges

Singh, Aditi, Shetty, Akash, Ehtesham, Abul, Kumar, Saket, Khoei, Tala Talaei

Text-to-SQL systems facilitate smooth interaction with databases by translating natural language queries into Structured Query Language (SQL), bridging the gap between non-technical users and complex database management systems. This survey provides a comprehensive overview of the evolution of AI-driven text-to-SQL systems, highlighting their foundational components, advancements in large language model (LLM) architectures, and the critical role of datasets such as Spider, WikiSQL, and CoSQL in driving progress. We examine the applications of text-to-SQL in domains like healthcare, education, and finance, emphasizing their transformative potential for improving data accessibility. Additionally, we analyze persistent challenges, including domain generalization, query optimization, support for multi-turn conversational interactions, and the limited availability of datasets tailored for NoSQL databases and dynamic real-world scenarios. To address these challenges, we outline future research directions, such as extending text-to-SQL capabilities to support NoSQL databases, designing datasets for dynamic multi-turn interactions, and optimizing systems for real-world scalability and robustness. By surveying current advancements and identifying key gaps, this paper aims to guide the next generation of research and applications in LLM-based text-to-SQL systems.

large language model, machine learning, natural language, (18 more...)

2412.05208

Country: North America > United States (0.05)

Genre: Overview (1.00)

Industry:

Education (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Semantic Retrieval at Walmart

Magnani, Alessandro, Liu, Feng, Chaidaroon, Suthee, Yadav, Sachin, Suram, Praveen Reddy, Puthenputhussery, Ajit, Chen, Sijie, Xie, Min, Kashi, Anirudh, Lee, Tony, Liao, Ciya

In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web search, especially for tail queries, which have a complex and specific search intent. In this paper, we present a hybrid system for e-commerce search deployed at Walmart that combines traditional inverted index and embedding-based neural retrieval to better answer user tail queries. Our system significantly improved the relevance of the search engine, measured by both offline and online evaluations. The improvements were achieved through a combination of different approaches. We present a new technique to train the neural model at scale. and describe how the system was deployed in production with little impact on response time. We highlight multiple learnings and practical tricks that were used in the deployment of this system.

information retrieval, machine learning, natural language, (20 more...)

2412.04637

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > Santa Clara County > Sunnyvale (0.05)
Asia > India > Karnataka > Bengaluru (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Retail (0.64)
Information Technology > Services > e-Commerce Services (0.35)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
(2 more...)

Rohde, Florens, Christen, Victor, Franke, Martin, Rahm, Erhard

Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure

Record linkage, also known as entity resolution, aims at identifying different representations of the same real-world entity, such as a person. It is a crucial step in many data integration tasks in order to combine multiple data sources allowing enhanced data analysis. Typically, unique record identifiers are not available which would enable a join-like operation. Therefore, records are compared pairwise based on their identifying attributes, such as first name, last name and date of birth, and classified as match or non-match. However, record linkage may potentially harm the privacy of individuals by combining information that can be used against their interests. As a consequence, the conduction of such a linkage is subject to many legal and organizational constraints [CRS20]. Privacypreserving record linkage (PPRL) methods aim for enabling such linkages without sharing sensitive plaintext information between the data owners or with a third party. To protect the identifying data, the data owners encode it before sending it to an independent linkage unit which performs the matching on the encoded data only. A variety of such perturbation-based encoding techniques have been proposed, but the most popular and a quasi-standard is based on Bloom filters [Gk21].

clerical review, information, protocol, (14 more...)

2412.04178

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Saxony > Leipzig (0.05)
Oceania > Australia (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.42)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)

Lauc, Davor, Rutherford, Attapol, Wongwarawipatr, Weerin

AyutthayaAlpha: A Thai-Latin Script Transliteration Transformer

This study introduces AyutthayaAlpha, an advanced transformer-based machine learning model designed for the transliteration of Thai proper names into Latin script. Our system achieves state-of-the-art performance with 82.32% first-token accuracy and 95.24% first-three-token accuracy, while maintaining a low character error rate of 0.0047. The complexity of Thai phonology, including tonal features and vowel length distinctions, presents significant challenges for accurate transliteration, which we address through a novel two-model approach: AyutthayaAlpha-Small, based on the ByT5 architecture, and AyutthayaAlpha-VerySmall, a computationally efficient variant that unexpectedly outperforms its larger counterpart. Our research combines linguistic rules with deep learning, training on a carefully curated dataset of 1.2 million Thai-Latin name pairs, augmented through strategic upsampling to 2.7 million examples. Extensive evaluations against existing transliteration methods and human expert benchmarks demonstrate that AyutthayaAlpha not only achieves superior accuracy but also effectively captures personal and cultural preferences in name romanization. The system's practical applications extend to cross-lingual information retrieval, international data standardization, and identity verification systems, with particular relevance for government databases, academic institutions, and global business operations. This work represents a significant advance in bridging linguistic gaps between Thai and Latin scripts, while respecting the cultural and personal dimensions of name transliteration.

dataset, romanization, transliteration, (16 more...)

2412.03877

Country:

Europe > Croatia > Zagreb County > Zagreb (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Zinchenko, Sergey, Iazov, Sergey

HERO: Hint-Based Efficient and Reliable Query Optimizer

We propose a novel model for learned query optimization which provides query hints leading to better execution plans. The model addresses the three key challenges in learned hint-based query optimization: reliable hint recommendation (ensuring non-degradation of query latency), efficient hint exploration, and fast inference. We provide an in-depth analysis of existing NN-based approaches to hint-based optimization and experimentally confirm the named challenges for them. Our alternative solution consists of a new inference schema based on an ensemble of context-aware models and a graph storage for reliable hint suggestion and fast inference, and a budget-controlled training procedure with a local search algorithm that solves the issue of exponential search space exploration. In experiments on standard benchmarks, our model demonstrates optimization capability close to the best achievable with coarse-grained hints. Controlling the degree of parallelism (query dop) in addition to operator-related hints enables our model to achieve 3x latency improvement on JOB benchmark which sets a new standard for optimization. Our model is interpretable and easy to debug, which is particularly important for deployment in production.

optimization, optimizer, query, (17 more...)

2412.02372

Country:

North America > United States > Oregon (0.04)
Asia > Middle East > Palestine > Gaza Strip > Rafah Governorate > Rafah (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Cao, Clinton, Panichella, Annibale, Verwer, Sicco

Automated Test-Case Generation for REST APIs Using Model Inference Search Heuristic

arXiv.org Artificial IntelligenceDec-4-2024

The rising popularity of the microservice architectural style has led to a growing demand for automated testing approaches tailored to these systems. EvoMaster is a state-of-the-art tool that uses Evolutionary Algorithms (EAs) to automatically generate test cases for microservices' REST APIs. One limitation of these EAs is the use of unit-level search heuristics, such as branch distances, which focus on fine-grained code coverage and may not effectively capture the complex, interconnected behaviors characteristic of system-level testing. To address this limitation, we propose a new search heuristic (MISH) that uses real-time automaton learning to guide the test case generation process. We capture the sequential call patterns exhibited by a test case by learning an automaton from the stream of log events outputted by different microservices within the same system. Therefore, MISH learns a representation of the systemwide behavior, allowing us to define the fitness of a test case based on the path it traverses within the inferred automaton. We empirically evaluate MISH's effectiveness on six real-world benchmark microservice applications and compare it against a state-of-the-art technique, MOSA, for testing REST APIs. Our evaluation shows promising results for using MISH to guide the automated test case generation within EvoMaster.

evolutionary algorithm, information retrieval, machine learning, (18 more...)

2412.0342

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Netherlands > South Holland > Delft (0.05)
Europe > Switzerland (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.81)