AITopics

2502.03038

Country:

Europe > Germany > Bremen > Bremen (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom (0.14)
(8 more...)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Health & Medicine (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

arXiv.org Artificial IntelligenceFeb-4-2025

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Liu, Zechun, Zhao, Changsheng, Huang, Hanxian, Chen, Sijia, Zhang, Jing, Zhao, Jiawei, Roy, Scott, Jin, Lisa, Xiong, Yunyang, Shi, Yangyang, Xiao, Lin, Tian, Yuandong, Soran, Bilge, Krishnamoorthi, Raghuraman, Blankevoort, Tijmen, Chandra, Vikas

The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1.58-bit offers superior results. However, the lack of a cohesive framework for different bits has left such conclusions relatively tenuous. We present ParetoQ, the first unified framework that facilitates rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. Our findings reveal a notable learning transition between 2 and 3 bits: For 3-bits and above, the fine-tuned models stay close to their original pre-trained distributions, whereas for learning 2-bit networks or below, the representations change drastically. By optimizing training schemes and refining quantization functions, ParetoQ surpasses all previous methods tailored to specific bit widths. Remarkably, our ParetoQ ternary 600M-parameter model even outperforms the previous SoTA ternary 3B-parameter model in accuracy, using only one-fifth of the parameters. Extensive experimentation shows that ternary, 2-bit, and 3-bit quantization maintains comparable performance in the size-accuracy trade-off and generally exceeds 4-bit and binary quantization. Considering hardware constraints, 2-bit quantization offers promising potential for memory reduction and speedup.

large language model, machine learning, quantization, (20 more...)

2502.02631

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)

Neural Information Processing SystemsJan-25-2025, 15:34:46 GMT

Reviews: Heterogeneous Graph Learning for Visual Commonsense Reasoning

Originality: The VCR task is a novel task (proposed by Zellers et al, CVPR19). The proposed HGL framework for this interesting task is novel and interesting. The paper applies the HGL framework on top of the baseline model (R2C from Zellers et al., CVPR19) and shows significant improvements. The paper compares other existing graph learning approaches. The main difference between the proposed approach and other graph learning approaches is the heterogeneous nature (across domains – vision and language) of the graph learning framework. Quality: The paper does a good job of evaluating the propsed approach and its ablations.

author please comment, heterogeneous graph learning, visual commonsense reasoning, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.40)

Neural Information Processing SystemsJan-25-2025, 15:34:35 GMT

Reviews: Heterogeneous Graph Learning for Visual Commonsense Reasoning

After considering the author response and discussing this submission, all reviewers recommend acceptance -- including two high ratings. The reviewers generally found the approach novel but were interested in how it applies outside of the VCR task to other question answering datasets. With the addition of the experiments from the rebuttal, this is a strong submission.

heterogeneous graph learning, submission, visual commonsense reasoning

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Neural Information Processing SystemsJan-25-2025, 11:11:22 GMT

Reviews: Connective Cognition Network for Directional Visual Commonsense Reasoning

Originality: The paper proposes a novel model for the recently introduced VCR task. The main novelty of the proposed model lies in the component GraphVLAD and directional GCN modules. The paper describes that one of the closest works to this work is that of Narsimhan et al., NeurIPS 2018 that used GCN to infer answers in VQA, however that work constructs an undirected graph, ignoring the directional information between the graph nodes. This paper uses directed graph instead and shows the usefulness of incorporating directional information. It would be good for this paper to include more related work on GraphVLAD front. Quality: The paper evaluates the proposed approach on the VCR dataset and compares with the baselines and previous state-of-the-art, demonstrating how the proposed work improves the previous best performance significantly.

connective cognition network, directional information, directional visual commonsense reasoning, (10 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.42)

Neural Information Processing SystemsJan-25-2025, 11:11:11 GMT

Reviews: Connective Cognition Network for Directional Visual Commonsense Reasoning

After considering the author response and discussing the submission, all reviewers voted to accept. Most reviewers shared a concern that the neurological connection is relatively weak and encourage authors to discuss it in looser terms like "inspired by". Reviewers generally found the paper's claims well established.

connective cognition network, directional visual commonsense reasoning, reviewer

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.40)

Mazhar, Abdullah, shaik, Zuhair hasan, Srivastava, Aseem, Ruhnke, Polly, Vaddavalli, Lavanya, Katragadda, Sri Keshav, Yadav, Shweta, Akhtar, Md Shad

Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification

arXiv.org Artificial IntelligenceJan-25-2025

The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these figurative aspects inherent in memes. To address this gap, we introduce a novel dataset, AxiOM, derived from the GAD anxiety questionnaire, which categorizes memes into six fine-grained anxiety symptoms. Next, we propose a commonsense and domain-enriched framework, M3H, to enhance MLMs' ability to interpret figurative language and commonsense knowledge. The overarching goal remains to first understand and then classify the mental health symptoms expressed in memes. We benchmark M3H against 6 competitive baselines (with 20 variations), demonstrating improvements in both quantitative and qualitative metrics, including a detailed human evaluation. We observe a clear improvement of 4.20% and 4.66% on weighted-F1 metric. To assess the generalizability, we perform extensive experiments on a public dataset, RESTORE, for depressive symptom identification, presenting an extensive ablation study that highlights the contribution of each module in both datasets. Our findings reveal limitations in existing models and the advantage of employing commonsense to enhance figurative understanding.

artificial intelligence, large language model, natural language, (14 more...)

2501.15321

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > India > NCT > New Delhi (0.04)
(9 more...)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Neural Information Processing SystemsJan-19-2025, 23:14:53 GMT

SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning

Recently, commonsense reasoning in text generation has attracted much attention. Generative commonsense reasoning is the task that requires machines, given a group of keywords, to compose a single coherent sentence with commonsense plausibility. While existing datasets targeting generative commonsense reasoning focus on everyday scenarios, it is unclear how well machines reason under specific geographical and temporal contexts. We formalize this challenging task as SituatedGen, where machines with commonsense should generate a pair of contrastive sentences given a group of keywords including geographical or temporal entities. We introduce a corresponding English dataset consisting of 8,268 contrastive sentence pairs, which are built upon several existing commonsense reasoning benchmarks with minimal manual labor.

generative commonsense reasoning, incorporating geographical and temporal context, situatedgen, (3 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)

Neural Information Processing SystemsJan-17-2025, 21:32:26 GMT

SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning

Augmenting pre-trained language models with knowledge graphs (KGs) has achieved success on various commonsense reasoning tasks. However, for a given task instance, the KG, or certain parts of the KG, may not be useful. Although KG-augmented models often use attention to focus on specific KG components, the KG is still always used, and the attention mechanism is never explicitly taught which KG components should be used. Meanwhile, saliency methods can measure how much a KG feature (e.g., graph, node, path) influences the model to make the correct prediction, thus explaining which KG features are useful. This paper explores how saliency explanations can be used to improve KG-augmented models' performance.

explanation, kg-augmented model, saliency explanation, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.64)

arXiv.org Artificial IntelligenceJan-1-2025

MSWA: Refining Local Attention with Multi-ScaleWindow Attention

Xu, Yixing, Nag, Shivank, Li, Dong, Tian, Lu, Barsoum, Emad

Transformer-based LLMs have achieved exceptional performance across a wide range of NLP tasks. However, the standard self-attention mechanism suffers from quadratic time complexity and linearly increased cache size. Sliding window attention (SWA) solves this problem by restricting the attention range to a fixed-size local context window. Nevertheless, SWA employs a uniform window size for each head in each layer, making it inefficient in capturing context of varying scales. To mitigate this limitation, we propose Multi-Scale Window Attention (MSWA) which applies diverse window sizes across heads and layers in the Transformer. It not only allows for different window sizes among heads within the same layer but also progressively increases window size allocation from shallow to deep layers, thus enabling the model to capture contextual information with different lengths and distances. Experimental results on language modeling and common-sense reasoning tasks substantiate that MSWA outperforms traditional local attention in both effectiveness and efficiency.

large language model, machine learning, mechanism, (20 more...)

2501.01039

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.87)