AITopics | Washington

Collaborating Authors

Washington

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

Neural Information Processing SystemsMay-28-2025, 16:18:05 GMT

As the development and application of Large Language Models (LLMs) continue to advance rapidly, enhancing their trustworthiness and aligning them with human preferences has become a critical area of research. Traditional methods rely heavily on extensive data for Reinforcement Learning from Human Feedback (RLHF), but representation engineering offers a new, training-free approach. This technique leverages semantic features to control the representation of LLM's intermediate hidden states, enabling the model to meet specific requirements such as increased honesty or heightened safety awareness. However, a significant challenge arises when attempting to fulfill multiple requirements simultaneously. It proves difficult to encode various semantic contents, like honesty and safety, into a singular semantic feature, restricting its practicality.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia (1.00)
Europe > Austria > Vienna (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Comprehensive Detection of Chinese Harmful Memes Junyu Lu

Neural Information Processing SystemsMay-28-2025, 14:43:42 GMT

Harmful memes have proliferated on the Chinese Internet, while research on detecting Chinese harmful memes significantly lags behind due to the absence of reliable datasets and effective detectors. To this end, we focus on the comprehensive detection of Chinese harmful memes.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)
(2 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Law (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

Add feedback

Understanding the Gains from Repeated Self-Distillation

Neural Information Processing SystemsMay-28-2025, 11:11:52 GMT

Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model. Despite using the same architecture and the same training data, self-distillation has been empirically observed to improve performance, especially when applied repeatedly. For such a process, there is a fundamental question of interest: How much gain is possible by applying multiple steps of self-distillation? To investigate this relative gain, we propose studying the simple but canonical task of linear regression. Our analysis shows that the excess risk achieved by multi-step self-distillation can significantly improve upon a single step of self-distillation, reducing the excess risk by a factor as large as d, where d is the input dimension.

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.48)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect

Neural Information Processing SystemsMay-28-2025, 08:32:54 GMT

Language models have shown vulnerability against backdoor attacks, threatening the security of services based on them. To mitigate the threat, existing solutions attempted to search for backdoor triggers, which can be time-consuming when handling a large search space. Looking into the attack process, we observe that poisoned data will create a long-tailed effect in the victim model, causing the decision boundary to shift towards the attack targets. Inspired by this observation, we introduce LT-Defense, the first searching-free backdoor defense via exploiting the long-tailed effect. Specifically, LT-Defense employs a small set of clean examples and two metrics to distinguish backdoor-related features in the target model. Upon detecting a backdoor model, LT-Defense additionally provides test-time backdoor freezing and attack target prediction. Extensive experiments demonstrate the effectiveness of LT-Defense in both detection accuracy and efficiency, e.g., in task-agnostic scenarios, LT-Defense achieves 98% accuracy across 1440 models with less than 1% of the time cost of state-of-the-art solutions.

lt-defense, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
Europe (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao

Neural Information Processing SystemsMay-26-2025, 09:49:20 GMT

Neural Information Processing Systems http://nips.cc/

m-walk, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County (0.14)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Submodular Field Grammars: Representation, Inference, and Application to Image Parsing

Abram L. Friesen, Pedro M. Domingos

Neural Information Processing SystemsMay-26-2025, 09:44:10 GMT

Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part. Unfortunately, in contrast to language grammars, where the number of possible split points for a production A BC is linear in the length of A, in an image there are an exponential number of ways to split a region into subregions. This makes parsing intractable and requires image grammars to be severely restricted in practice, for example by allowing only rectangular regions. In this paper, we address this problem by associating with each production a submodular Markov random field whose labels are the subparts and whose labeling segments the current object into these subparts. We call the resulting model a submodular field grammar (SFG). Finding the MAP split of a region into subregions is now tractable, and by exploiting this we develop an efficient approximate algorithm for MAP parsing of images with SFGs. Empirically, we show promising improvements in accuracy when using SFGs for scene understanding, and demonstrate exponential improvements in inference time compared to traditional methods, while returning comparable minima.

machine learning, natural language, parse, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Wavelet regression and additive models for irregularly spaced data

Asad Haris, Ali Shojaie, Noah Simon

Neural Information Processing SystemsMay-26-2025, 09:36:17 GMT

We present a novel approach for nonparametric regression using wavelet basis functions.

artificial intelligence, machine learning, wavemesh, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Turbo Learning for CaptionBot and DrawingBot

Qiuyuan Huang, Pengchuan Zhang, Dapeng Wu, Lei Zhang

Neural Information Processing SystemsMay-26-2025, 08:32:53 GMT

We study in this paper the problems of both image captioning and text-to-image generation, and present a novel turbo learning approach to jointly training an image-to-text generator (a.k.a.

artificial intelligence, drawingbot, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Redmond (0.40)

Technology: