AITopics

2410.09776

Country:

North America > United States (0.28)
Europe (0.14)
Asia > Nepal (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

arXiv.org Artificial IntelligenceOct-12-2024

Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering

Yu, Ting, Fu, Kunhao, Wang, Shuhui, Huang, Qingming, Yu, Jun

Video Question Answering (VideoQA) represents a crucial intersection between video understanding and language processing, requiring both discriminative unimodal comprehension and sophisticated cross-modal interaction for accurate inference. Despite advancements in multi-modal pre-trained models and video-language foundation models, these systems often struggle with domain-specific VideoQA due to their generalized pre-training objectives. Addressing this gap necessitates bridging the divide between broad cross-modal knowledge and the specific inference demands of VideoQA tasks. To this end, we introduce HeurVidQA, a framework that leverages domain-specific entity-action heuristics to refine pre-trained video-language foundation models. Our approach treats these models as implicit knowledge engines, employing domain-specific entity-action prompters to direct the model's focus toward precise cues that enhance reasoning. By delivering fine-grained heuristics, we improve the model's ability to identify and interpret key entities and actions, thereby enhancing its reasoning capabilities. Extensive evaluations across multiple VideoQA datasets demonstrate that our method significantly outperforms existing models, underscoring the importance of integrating domain-specific knowledge into video-language models for more accurate and context-aware VideoQA.

large language model, machine learning, question answering, (21 more...)

doi: 10.1109/TCSVT.2024.3475510

2410.0938

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)
(2 more...)

arXiv.org Artificial IntelligenceOct-12-2024

Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering

Yu, Ting, Fu, Kunhao, Zhang, Jian, Huang, Qingming, Yu, Jun

Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task focusing on semantic understanding of untrimmed long-term videos and diverse free-form questions, simultaneously emphasizing comprehensive cross-modal reasoning to yield precise answers. The canonical approaches often rely on off-the-shelf feature extractors to detour the expensive computation overhead, but often result in domain-independent modality-unrelated representations. Furthermore, the inherent gradient blocking between unimodal comprehension and cross-modal interaction hinders reliable answer generation. In contrast, recent emerging successful video-language pre-training models enable cost-effective end-to-end modeling but fall short in domain-specific ratiocination and exhibit disparities in task formulation. Toward this end, we present an entirely end-to-end solution for long-term VideoQA: Multi-granularity Contrastive cross-modal collaborative Generation (MCG) model. To derive discriminative representations possessing high visual concepts, we introduce Joint Unimodal Modeling (JUM) on a clip-bone architecture and leverage Multi-granularity Contrastive Learning (MCL) to harness the intrinsically or explicitly exhibited semantic correspondences. To alleviate the task formulation discrepancy problem, we propose a Cross-modal Collaborative Generation (CCG) module to reformulate VideoQA as a generative task instead of the conventional classification scheme, empowering the model with the capability for cross-modal high-semantic fusion and generation so as to rationalize and answer. Extensive experiments conducted on six publicly available VideoQA datasets underscore the superiority of our proposed method.

machine learning, natural language, question answering, (15 more...)

doi: 10.1109/TIP.2024.3390984

2410.09379

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceOct-12-2024

Information Discovery in e-Commerce

Ren, Zhaochun, He, Xiangnan, Yin, Dawei, de Rijke, Maarten

Electronic commerce, or e-commerce, is the buying and selling of goods and services, or the transmitting of funds or data online. E-commerce platforms come in many kinds, with global players such as Amazon, Airbnb, Alibaba, eBay and platforms targeting specific geographic regions. Information retrieval has a natural role to play in e-commerce, especially in connecting people to goods and services. Information discovery in e-commerce concerns different types of search (e.g., exploratory search vs. lookup tasks), recommender systems, and natural language processing in e-commerce portals. The rise in popularity of e-commerce sites has made research on information discovery in e-commerce an increasingly active research area. This is witnessed by an increase in publications and dedicated workshops in this space. Methods for information discovery in e-commerce largely focus on improving the effectiveness of e-commerce search and recommender systems, on enriching and using knowledge graphs to support e-commerce, and on developing innovative question answering and bot-based solutions that help to connect people to goods and services. In this survey, an overview is given of the fundamental infrastructure, algorithms, and technical solutions for information discovery in e-commerce. The topics covered include user behavior and profiling, search, recommendation, and language technology in e-commerce.

conversational recommendation system, multi-turn chinese dialogue dataset, post-click conversion rate estimation, (15 more...)

2410.05763

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Africa > Togo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > Promising Solution (0.92)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(13 more...)

Neural Information Processing SystemsOct-11-2024, 15:23:28 GMT

Exploring Models and Data for Image Question Answering

This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers.

algorithm, dataset, exploring model and data

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Neural Information Processing SystemsOct-11-2024, 09:12:00 GMT

A Central Limit Theorem for Differentially Private Query Answering

Perhaps the single most important use case for differential privacy is to privately answer numerical queries, which is usually achieved by adding noise to the answer vector. The central question is, therefore, to understand which noise distribution optimizes the privacy-accuracy trade-off, especially when the dimension of the answer vector is high. Accordingly, an extensive literature has been dedicated to the question and the upper and lower bounds have been successfully matched up to constant factors (Bun et al.,2018; Steinke & Ullman, 2017). In this paper, we take a novel approach to address this important optimality question. We first demonstrate an intriguing central limit theorem phenomenon in the high-dimensional regime.

central limit theorem, mechanism, privacy-accuracy trade-off, (3 more...)

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)

Neural Information Processing SystemsOct-11-2024, 07:55:43 GMT

Inductive Logical Query Answering in Knowledge Graphs

Formulating and answering logical queries is a standard communication interface for knowledge graphs (KGs). Alleviating the notorious incompleteness of real-world KGs, neural methods achieved impressive results in link prediction and complex query answering tasks by learning representations of entities, relations, and queries. Still, most existing query answering methods rely on transductive entity embeddings and cannot generalize to KGs containing new entities without retraining entity embeddings. In this work, we study the inductive query answering task where inference is performed on a graph containing new entities with queries over both seen and unseen entities. To this end, we devise two mechanisms leveraging inductive node and relational structure representations powered by graph neural networks (GNNs).Experimentally, we show that inductive models are able to perform logical reasoning at inference time over unseen nodes generalizing to graphs up to 500% larger than training ones.

knowledge graph, query, representation, (3 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Neural Information Processing SystemsOct-11-2024, 01:21:42 GMT

EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process.In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward.In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and a question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory.When it succeeds, it receives an intrinsic reward proportional to its confidence in its answer. This incentivizes the agent to generate trajectories which unambiguously explain various aspects of the general language goal.Our experimental study using various BabyAI environments shows that this approach, which does not require engineer intervention to design the auxiliary objectives, improves sample efficiency by effectively directing the exploration.

automatic reward shaping, general language goal, language-guided rl, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.81)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)

arXiv.org Artificial IntelligenceOct-11-2024

Measuring the Groundedness of Legal Question-Answering Systems

Trautmann, Dietrich, Ostapuk, Natalia, Grail, Quentin, Pol, Adrian Alan, Bonifazi, Guglielmo, Gao, Shang, Gajek, Martin

In high-stakes domains like legal question-answering, the accuracy and trustworthiness of generative AI systems are of paramount importance. This work presents a comprehensive benchmark of various methods to assess the groundedness of AI-generated responses, aiming to significantly enhance their reliability. Our experiments include similarity-based metrics and natural language inference models to evaluate whether responses are well-founded in the given contexts. We also explore different prompting strategies for large language models to improve the detection of ungrounded responses. We validated the effectiveness of these methods using a newly created grounding classification corpus, designed specifically for legal queries and corresponding responses from retrieval-augmented prompting, focusing on their alignment with source material. Our results indicate potential in groundedness classification of generated responses, with the best method achieving a macro-F1 score of 0.8. Additionally, we evaluated the methods in terms of their latency to determine their suitability for real-world applications, as this step typically follows the generation process. This capability is essential for processes that may trigger additional manual verification or automated response regeneration. In summary, this study demonstrates the potential of various detection methods to improve the trustworthiness of generative AI in legal settings.

large language model, machine learning, question answering, (22 more...)

2410.08764

Country:

North America > United States (0.28)
Europe > Switzerland > Zug > Zug (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Government & the Courts (0.46)
Law > Civil Rights & Constitutional Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

Christmann, Philipp, Vakulenko, Svitlana, Sorodoc, Ionut Teodor, Byrne, Bill, de Gispert, Adrià

Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision

arXiv.org Artificial IntelligenceOct-11-2024

Long-form question answering (LFQA) aims at generating in-depth answers to end-user questions, providing relevant information beyond the direct answer. However, existing retrievers are typically optimized towards information that directly targets the question, missing out on such contextual information. Furthermore, there is a lack of training data for relevant context. To this end, we propose and compare different weak supervision techniques to optimize retrieval for contextual information. Experiments demonstrate improvements on the end-to-end QA performance on ASQA, a dataset for long-form question answering. Importantly, as more contextual information is retrieved, we improve the relevant page recall for LFQA by 14.7% and the groundedness of generated long-form answers by 12.5%. Finally, we show that long-form answers often anticipate likely follow-up questions, via experiments on a conversational QA dataset.

information, machine learning, question answering, (20 more...)

2410.08623

Country:

North America > United States > New Jersey (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)