t5-large
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering (Appendix)
We chose the Google Search corpus [Luo et al., 2021] for our question-answering system as it provides good coverage of the knowledge needed and is publicly available. Therefore, it is advised to conduct an ethical review prior to deploying the system in live service. Table 1 shows the data statistics of the OK-VQA dataset. We build a DPR retriever as a baseline for FLMR. Equally contributed as the first author 37th Conference on Neural Information Processing Systems (NeurIPS 2023). The inner product search (supported by FAISS [Johnson et al., 2019]) is used to train and In answer generation, we use t5-large and Salesforce/blip2-flan-t5-xl.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Dominican Republic (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Georgia > Chatham County > Savannah (0.04)
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering (Appendix)
We chose the Google Search corpus [Luo et al., 2021] for our question-answering system as it provides good coverage of the knowledge needed and is publicly available. Therefore, it is advised to conduct an ethical review prior to deploying the system in live service. Table 1 shows the data statistics of the OK-VQA dataset. We build a DPR retriever as a baseline for FLMR. Equally contributed as the first author 37th Conference on Neural Information Processing Systems (NeurIPS 2023). The inner product search (supported by FAISS [Johnson et al., 2019]) is used to train and In answer generation, we use t5-large and Salesforce/blip2-flan-t5-xl.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Dominican Republic (0.04)
Disentangling Task Interference within Neurons: Model Merging in Alignment with Neuronal Mechanisms
Fang, Zitao, DU, Guodong, Yu, Shuyang, Guo, Yifei, Zhang, Yiwei, Li, Jing, Tang, Ho-Kin, Goh, Sim Kuan
Fine-tuning pre-trained models on targeted datasets enhances task-specific performance but often comes at the expense of generalization. Model merging techniques, which integrate multiple fine-tuned models into a single multi-task model through task arithmetic at various levels: model, layer, or parameter, offer a promising solution. However, task interference remains a fundamental challenge, leading to performance degradation and suboptimal merged models. Existing approaches largely overlook the fundamental role of individual neurons and their connectivity, resulting in a lack of interpretability in both the merging process and the merged models. In this work, we present the first study on the impact of neuronal alignment in model merging. We decompose task-specific representations into two complementary neuronal subspaces that regulate neuron sensitivity and input adaptability. Leveraging this decomposition, we introduce NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces, enabling training-free model fusion across diverse tasks. Through extensive experiments, we demonstrate that NeuroMerging achieves superior performance compared to existing methods on multi-task benchmarks across both vision and natural language domains. Our findings highlight the importance of aligning neuronal mechanisms in model merging, offering new insights into mitigating task interference and improving knowledge fusion.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (7 more...)
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Vision (0.93)
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Nandi, Palash, Sharma, Shivam, Chakraborty, Tanmoy
Memes act as cryptic tools for sharing sensitive ideas, often requiring contextual knowledge to interpret. This makes moderating multimodal memes challenging, as existing works either lack high-quality datasets on nuanced hate categories or rely on low-quality social media visuals. Here, we curate two novel multimodal hate speech datasets, MHS and MHS-Con, that capture fine-grained hateful abstractions in regular and confounding scenarios, respectively. We benchmark these datasets against several competing baselines. Furthermore, we introduce SAFE-MEME (Structured reAsoning FramEwork), a novel multimodal Chain-of-Thought-based framework employing Q&A-style reasoning (SAFE-MEME-QA) and hierarchical categorization (SAFE-MEME-H) to enable robust hate speech detection in memes. SAFE-MEME-QA outperforms existing baselines, achieving an average improvement of approximately 5% and 4% on MHS and MHS-Con, respectively. In comparison, SAFE-MEME-H achieves an average improvement of 6% in MHS while outperforming only multimodal baselines in MHS-Con. We show that fine-tuning a single-layer adapter within SAFE-MEME-H outperforms fully fine-tuned models in regular fine-grained hateful meme detection. However, the fully fine-tuning approach with a Q&A setup is more effective for handling confounding cases. We also systematically examine the error cases, offering valuable insights into the robustness and limitations of the proposed structured reasoning framework for analyzing hateful memes.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Law (0.67)
- Government > Regional Government (0.46)
- Health & Medicine > Therapeutic Area (0.46)
Facts-and-Feelings: Capturing both Objectivity and Subjectivity in Table-to-Text Generation
Dey, Tathagata, Bhattacharyya, Pushpak
Table-to-text generation, a long-standing challenge in natural language generation, has remained unexplored through the lens of subjectivity. Subjectivity here encompasses the comprehension of information derived from the table that cannot be described solely by objective data. Given the absence of pre-existing datasets, we introduce the Ta2TS dataset with 3849 data instances. We perform the task of fine-tuning sequence-to-sequence models on the linearized tables and prompting on popular large language models. We analyze the results from a quantitative and qualitative perspective to ensure the capture of subjectivity and factual consistency. The analysis shows the fine-tuned LMs can perform close to the prompted LLMs. Both the models can capture the tabular data, generating texts with 85.15% BERTScore and 26.28% Meteor score. To the best of our knowledge, we provide the first-of-its-kind dataset on tables with multiple genres and subjectivity included and present the first comprehensive analysis and comparison of different LLM performances on this task.
- Asia > India > West Bengal > Kolkata (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Texas (0.04)
- (9 more...)
NewsQs: Multi-Source Question Generation for the Inquiring Mind
Hwang, Alyssa, Dixit, Kalpit, Ballesteros, Miguel, Benajiba, Yassine, Castelli, Vittorio, Dreyer, Markus, Bansal, Mohit, McKeown, Kathleen
We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization.
- Europe > United Kingdom (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Pennsylvania (0.04)
- (18 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo
Sundararajan, Barkavi, Sripada, Somayajulu, Reiter, Ehud
Neural Table-to-Text models tend to hallucinate, producing texts that contain factual errors. We investigate whether such errors in the output can be traced back to problems with the input. We manually annotated 1,837 texts generated by multiple models in the politics domain of the ToTTo dataset. We identify the input problems that are responsible for many output errors and show that fixing these inputs reduces factual errors by between 52% and 76% (depending on the model). In addition, we observe that models struggle in processing tabular inputs that are structured in a non-standard way, particularly when the input lacks distinct row and column values or when the column headers are not correctly mapped to corresponding values.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > South Dakota (0.04)
- (18 more...)
How to Train Data-Efficient LLMs
Sachdeva, Noveen, Coleman, Benjamin, Kang, Wang-Cheng, Ni, Jianmo, Hong, Lichan, Chi, Ed H., Caverlee, James, McAuley, Julian, Cheng, Derek Zhiyuan
The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensive-to-compute data-quality estimates, and (ii) maximization of coverage and diversity-based measures in the feature space. Our first technique, Ask-LLM, leverages the zero-shot reasoning capabilities of instruction-tuned LLMs to directly assess the quality of a training example. To target coverage, we propose Density sampling, which models the data distribution to select a diverse sample. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories. Coverage sampling can recover the performance of the full data, while models trained on Ask-LLM data consistently outperform full-data training -- even when we reject 90% of the original dataset, while converging up to 70% faster.
- Asia > India > Tamil Nadu > Chennai (0.04)
- North America > United States > South Carolina (0.04)
- North America > United States > New Mexico (0.04)
- (7 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.65)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Health & Medicine (1.00)
- (5 more...)
Unifying Structured Data as Graph for Data-to-Text Pre-Training
Li, Shujie, Li, Liang, Geng, Ruiying, Yang, Min, Li, Binhua, Yuan, Guanghu, He, Wanwei, Yuan, Shao, Ma, Can, Huang, Fei, Li, Yongbin
Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performances. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.
- Asia > India > Karnataka > Bengaluru (0.05)
- North America > United States > Missouri > Jackson County > Kansas City (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (9 more...)