AITopics

Industry: Law > Civil Rights & Constitutional Law (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.84)

Neural Information Processing SystemsDec-24-2025, 04:53:51 GMT

On the Modularity of Hypernetworks

In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, two alternative methods are compared: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $\theta_I$ of the function $h_I(x) = g(x;\theta_I)$ are given by a hypernetwork $f$ as $\theta_I=f(I)$. In this paper, we define the property of modularity as the ability to effectively learn a different function for each input instance $I$. For this purpose, we adopt an expressivity perspective of this property and extend the theory of~\cite{devore} and provide a lower bound on the complexity (number of trainable parameters) of neural networks as function approximators, by eliminating the requirements for the approximation method to be robust. Our results are then used to compare the complexities of $q$ and $g$, showing that under certain conditions and when letting the functions $e$ and $f$ be as large as we wish, $g$ can be smaller than $q$ by orders of magnitude.

hypernetwork, modularity, name change, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Neural Information Processing SystemsOct-3-2025, 01:22:27 GMT

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods.

artificial intelligence, machine learning, natural language, (22 more...)

Country:

Europe (1.00)
North America > United States (0.93)
North America > Canada (0.68)

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceSep-25-2025

Towards Visual Text Grounding of Multimodal Large Language Model

Li, Ming, Zhang, Ruiyi, Chen, Jian, Wang, Chenguang, Gu, Jiuxiang, Zhou, Yufan, Dernoncourt, Franck, Zhu, Wanrong, Zhou, Tianyi, Sun, Tong

Despite the existing evolution of Multimodal Large Language Models (MLLMs), a non-neglectable limitation remains in their struggle with visual text grounding, especially in text-rich images of documents. Document images, such as scanned forms and infographics, highlight critical challenges due to their complex layouts and textual content. However, current benchmarks do not fully address these challenges, as they mostly focus on visual grounding on natural images, rather than text-rich document images. Thus, to bridge this gap, we introduce TRIG, a novel task with a newly designed instruction dataset for benchmarking and improving the Text-Rich Image Grounding capabilities of MLLMs in document question-answering. Specifically, we propose an OCR-LLM-human interaction pipeline to create 800 manually annotated question-answer pairs as a benchmark and a large-scale training set of 90$ synthetic data based on four diverse datasets. A comprehensive evaluation of various MLLMs on our proposed benchmark exposes substantial limitations in their grounding capability on text-rich images. In addition, we propose two simple and effective TRIG methods based on general instruction tuning and plug-and-play efficient embedding, respectively. By finetuning MLLMs on our synthetic dataset, they promisingly improve spatial reasoning and grounding capabilities.

large language model, machine learning, natural language, (18 more...)

2504.04974

Country:

North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-6-2025

Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification

Wei, Lu, Li, Liangzhi, Xiang, Tong, Liu, Xiao, Garcia, Noa

The internet has become a hotspot for hate speech (HS), threatening societal harmony and individual well-being. While automatic detection methods perform well in identifying explicit hate speech (ex-HS), they struggle with more subtle forms, such as implicit hate speech (im-HS). We tackle this problem by introducing a new taxonomy for im-HS detection, defining six encoding strategies named codetypes. We present two methods for integrating codetypes into im-HS detection: 1) prompting large language models (LLMs) directly to classify sentences based on generated responses, and 2) using LLMs as encoders with codetypes embedded during the encoding process. Experiments show that the use of codetypes improves im-HS detection in both Chinese and English datasets, validating the effectiveness of our approach across different languages.

codetype, large language model, machine learning, (19 more...)

2506.04693

Country:

Europe (0.67)
North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Terrorism (0.67)
Information Technology > Services (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Neural Information Processing SystemsMay-27-2025, 12:26:39 GMT

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches.

embedding-based classifier, embedding-based method, glass ceiling, (3 more...)

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

arXiv.org Artificial IntelligenceFeb-23-2025

FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis

Zheng, Yilun, Li, Sha, Wu, Fangkun, Ziyi, Yang, Hongchao, Lin, Hu, Zhichao, Xinjun, Cai, Wang, Ziming, Chen, Jinxuan, Luan, Sitao, Xu, Jiahao, Chen, Lihui

Parody is an emerging phenomenon on social media, where individuals imitate a role or position opposite to their own, often for humor, provocation, or controversy. Detecting and analyzing parody can be challenging and is often reliant on context, yet it plays a crucial role in understanding cultural values, promoting subcultures, and enhancing self-expression. However, the study of parody is hindered by limited available data and deficient diversity in current datasets. To bridge this gap, we built seven parody datasets from both English and Chinese corpora, with 14,755 annotated users and 21,210 annotated comments in total. To provide sufficient context information, we also collect replies and construct user-interaction graphs to provide richer contextual information, which is lacking in existing datasets. With these datasets, we test traditional methods and Large Language Models (LLMs) on three key tasks: (1) parody detection, (2) comment sentiment analysis with parody, and (3) user sentiment analysis with parody. Our extensive experiments reveal that parody-related tasks still remain challenging for all models, and contextual information plays a critical role. Interestingly, we find that, in certain scenarios, traditional sentence embedding methods combined with simple classifiers can outperform advanced LLMs, i.e. DeepSeek-R1 and GPT-o3, highlighting parody as a significant challenge for LLMs.

dataset, detection, parody detection, (15 more...)

2502.16503

Country:

North America > United States (0.14)
North America > Canada > Quebec (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2024, 13:46:17 GMT

On the Modularity of Hypernetworks

In the context of learning to map an input I to a function h_I:\mathcal{X}\to \mathbb{R}, two alternative methods are compared: (i) an embedding-based method, which learns a fixed function in which I is encoded as a conditioning signal e(I) and the learned function takes the form h_I(x) q(x,e(I)), and (ii) hypernetworks, in which the weights \theta_I of the function h_I(x) g(x;\theta_I) are given by a hypernetwork f as \theta_I f(I) . In this paper, we define the property of modularity as the ability to effectively learn a different function for each input instance I . For this purpose, we adopt an expressivity perspective of this property and extend the theory of \cite{devore} and provide a lower bound on the complexity (number of trainable parameters) of neural networks as function approximators, by eliminating the requirements for the approximation method to be robust. Our results are then used to compare the complexities of q and g, showing that under certain conditions and when letting the functions e and f be as large as we wish, g can be smaller than q by orders of magnitude. Besides, we show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.

hypernetwork, modularity, trainable parameter, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Neural Information Processing SystemsOct-10-2024, 08:25:28 GMT

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches.

embedding-based classifier, embedding-based method, glass ceiling, (3 more...)

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Singh, Aaditya K., Yang, Yu, Tirumala, Kushal, Elhoushi, Mostafa, Morcos, Ari S.

Brevity is the soul of wit: Pruning long files for code generation

arXiv.org Artificial IntelligenceJun-29-2024

Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements. Generally, three types of methods are used to filter internetscale corpora: embedding-based, heuristic-based, and classifier-based. In this work, we contrast the former two in the domain of finetuning LLMs for code generation. We find that embedding-based methods are often confounded by length, and that a simple heuristic-- pruning long files--outperforms other methods in compute-limited regimes. Our method can yield up to a 2x efficiency benefit in training (while matching performance) or a 3.5% absolute performance improvement on HumanEval (while matching compute). However, we find that perplexity on held-out long files can increase, begging the question of whether optimizing data mixtures for common coding benchmarks (HumanEval, MBPP) actually best serves downstream use cases. Overall, we hope our work builds useful intuitions about code data (specifically, the low quality of extremely long code files) provides a compelling heuristic-based method for data pruning, and brings to light questions in how we evaluate code generation models.

code generation, dataset, pruning, (17 more...)

2407.00434

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)