Goto

Collaborating Authors

 Law


IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents

Neural Information Processing Systems

In this paper, we introduce IMPACT (Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents), a large-scale multimodal patent dataset with detailed captions for design patent figures. Our dataset includes half a million design patents comprising 3.61 million figures along with captions from patents granted by the United States Patent and Trademark Office (USPTO) over a 16-year period from 2007 to 2022. We incorporate the metadata of each patent application with elaborate captions that are coherent with multiple viewpoints of designs. Even though patents themselves contain a variety of design figures, titles, and descriptions of viewpoints, we find that they lack detailed descriptions that are necessary to perform multimodal tasks such as classification and retrieval. IMPACT closes this gap thereby providing researchers with necessary ingredients to instantiate a variety of multimodal tasks.


SILENCE: Protecting privacy in offloaded speech understanding on resource-constrained devices

Neural Information Processing Systems

Speech serves as a ubiquitous input interface for embedded mobile devices. Cloud-based solutions, while offering powerful speech understanding services, raise significant concerns regarding user privacy. To address this, disentanglement-based encoders have been proposed to remove sensitive information from speech signals without compromising the speech understanding functionality. However, these encoders demand high memory usage and computation complexity, making them impractical for resource-constrained wimpy devices.Our solution is based on a key observation that speech understanding hinges on long-term dependency knowledge of the entire utterance, in contrast to privacy-sensitive elements that are short-term dependent. Exploiting this observation, we propose SILENCE, a lightweight system that selectively obscuring short-term details, without damaging the long-term dependent speech understanding performance.The crucial part of SILENCE is a differential mask generator derived from interpretable learning to automatically configure the masking process.We have implemented SILENCE on the STM32H7 microcontroller and evaluate its efficacy under different attacking scenarios.


Apple's triple threat: tariffs, AI troubles and a Fortnite fail

The Guardian

This week in tech: Apple struggles on multiple fronts, OpenAI grows increasingly ambitious, and Trump helps some of his fans lose money on cryptocurrency. Long dominant and unassailable, Apple is showing signs of weakness. The CEO, Tim Cook, can't tame Donald Trump's threats of tariffs that would spike the price of an iPhone; Apple's AI offerings pale against its competitors; and the company can't win a Fortnite match – or a single battle in its legal war with Epic Games – to save its life. On Friday, the president threatened to levy a 25% tariff on any iPhone not made in the US. Trump said in the post: "I have long ago informed Tim Cook of Apple that I expect their iPhones that will be sold in the United States of America will be manufactured and built in the United States, not India, or anyplace else. If that is not the case, a Tariff of at least 25% must be paid by Apple to the US."


Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Neural Information Processing Systems

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches.


D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

Neural Information Processing Systems

Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually adopt laborious human efforts by grid-searching on a set of mixture ratios, which require high GPU training consumption costs. Besides, we cannot guarantee the selected ratio is optimal for the specific domain. To address the limitations of existing methods, inspired by the Scaling Law for performance prediction, we propose to investigate the Scaling Law of the Domain-specific Continual Pre-Training (D-CPT Law) to decide the optimal mixture ratio with acceptable training costs for LLMs of different sizes.


Constrained Predictive Coding as a Biologically Plausible Model of the Cortical Hierarchy

Neural Information Processing Systems

Predictive coding (PC) has emerged as an influential normative model of neural computation with numerous extensions and applications. As such, much effort has been put into mapping PC faithfully onto the cortex, but there are issues that remain unresolved or controversial. In particular, current implementations often involve separate value and error neurons and require symmetric forward and backward weights across different brain regions. These features have not been experimentally confirmed. In this work, we show that the PC framework in the linear regime can be modified to map faithfully onto the cortical hierarchy in a manner compatible with empirical observations.


Disentangled Style Domain for Implicit z -Watermark Towards Copyright Protection

Neural Information Processing Systems

Text-to-image models have shown surprising performance in high-quality image generation, while also raising intensified concerns about the unauthorized usage of personal dataset in training and personalized fine-tuning. Recent approaches, embedding watermarks, introducing perturbations, and inserting backdoors into datasets, rely on adding minor information vulnerable to adversarial training, limiting their ability to detect unauthorized data usage. In this paper, we introduce a novel implicit Zero-Watermarking scheme that first utilizes the disentangled style domain to detect unauthorized dataset usage in text-to-image models. Specifically, our approach generates the watermark from the disentangled style domain, enabling self-generalization and mutual exclusivity within the style domain anchored by protected units.


Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?

Neural Information Processing Systems

Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is not theoretically well understood. To address this gap, we study the geometry of the PC weight landscape at the inference equilibrium of the network activities. For deep linear networks, we first show that the equilibrated PC energy is equal to a rescaled mean squared error loss with a weight-dependent rescaling.


Societal Impacts Research Requires Benchmarks for Creative Composition Tasks

arXiv.org Artificial Intelligence

Foundation models that are capable of automating cognitive tasks represent a pivotal technological shift, yet their societal implications remain unclear. These systems promise exciting advances, yet they also risk flooding our information ecosystem with formulaic, homogeneous, and potentially misleading synthetic content. Developing benchmarks grounded in real use cases where these risks are most significant is therefore critical. Through a thematic analysis using 2 million language model user prompts, we identify creative composition tasks as a prevalent usage category where users seek help with personal tasks that require everyday creativity. Our fine-grained analysis identifies mismatches between current benchmarks and usage patterns among these tasks. Crucially, we argue that the same use cases that currently lack thorough evaluations can lead to negative downstream impacts. This position paper argues that benchmarks focused on creative composition tasks is a necessary step towards understanding the societal harms of AI-generated content. We call for greater transparency in usage patterns to inform the development of new benchmarks that can effectively measure both the progress and the impacts of models with creative capabilities.


Incentivizing High-Quality Human Annotations with Golden Questions

arXiv.org Machine Learning

Human-annotated data plays a vital role in training large language models (LLMs), such as supervised fine-tuning and human preference alignment. However, it is not guaranteed that paid human annotators produce high-quality data. In this paper, we study how to incentivize human annotators to do so. We start from a principal-agent model to model the dynamics between the company (the principal) and the annotator (the agent), where the principal can only monitor the annotation quality by examining $n$ samples. We investigate the maximum likelihood estimators (MLE) and the corresponding hypothesis testing to incentivize annotators: the agent is given a bonus if the MLE passes the test. By analyzing the variance of the outcome, we show that the strategic behavior of the agent makes the hypothesis testing very different from traditional ones: Unlike the exponential rate proved by the large deviation theory, the principal-agent model's hypothesis testing rate is of $Θ(1/\sqrt{n \log n})$. Our theory implies two criteria for the \emph{golden questions} to monitor the performance of the annotators: they should be of (1) high certainty and (2) similar format to normal ones. In that light, we select a set of golden questions in human preference data. By doing incentive-compatible experiments, we find out that the annotators' behavior is better revealed by those golden questions, compared to traditional survey techniques such as instructed manipulation checks.