Goto

Collaborating Authors

 dcp


Discrimination-aware Channel Pruning for Deep Neural Networks

Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu

Neural Information Processing Systems

Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels.


DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism

Jiang, Chenyu, Cai, Zhenkun, Tian, Ye, Jia, Zhen, Wang, Yida, Wu, Chuan

arXiv.org Artificial Intelligence

Context parallelism has emerged as a key technique to support long-context training, a growing trend in generative AI for modern large models. However, existing context parallel methods rely on static parallelization configurations that overlook the dynamic nature of training data, specifically, the variability in sequence lengths and token relationships (i.e., attention patterns) across samples. As a result, these methods often suffer from unnecessary communication overhead and imbalanced computation. In this paper, we present DCP, a dynamic context parallel training framework that introduces fine-grained blockwise partitioning of both data and computation. By enabling flexible mapping of data and computation blocks to devices, DCP can adapt to varying sequence characteristics, effectively reducing communication and improving memory and computation balance. Micro-benchmarks demonstrate that DCP accelerates attention by 1.19x~2.45x under causal masks and 2.15x~3.77x under sparse attention patterns. Additionally, we observe up to 0.94x~1.16x end-to-end training speed-up for causal masks, and 1.00x~1.46x for sparse masks.


Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning

Jiralerspong, Marco, Derman, Esther, Vucetic, Danilo, Malkin, Nikolay, Sun, Bilun, Zhang, Tianyu, Bacon, Pierre-Luc, Gidel, Gauthier

arXiv.org Artificial Intelligence

A major bottleneck in scientific discovery consists of narrowing an exponentially large set of objects, such as proteins or molecules, to a small set of promising candidates with desirable properties. While this process can rely on expert knowledge, recent methods leverage reinforcement learning (RL) guided by a proxy reward function to enable this filtering. By employing various forms of entropy regularization, these methods aim to learn samplers that generate diverse candidates that are highly rated by the proxy function. In this work, we make two main contributions. First, we show that these methods are liable to generate overly diverse, suboptimal candidates in large search spaces. To address this issue, we introduce a novel unified operator that combines several regularized RL operators into a general framework that better targets peakier sampling distributions. Secondly, we offer a novel, robust RL perspective of this filtering process. The regularization can be interpreted as robustness to a compositional form of uncertainty in the proxy function (i.e., the true evaluation of a candidate differs from the proxy's evaluation). Our analysis leads us to a novel, easy-to-use algorithm we name trajectory general mellowmax (TGM): we show it identifies higher quality, diverse candidates than baselines in both synthetic and real-world tasks. Code: https://github.com/marcojira/tgm.


that our implementation will be a widely used tool for embedding convex optimization problems in end-to-end learning

Neural Information Processing Systems

We thank the reviewers for their constructive feedback on our paper. We especially appreciate our reviewers' conviction Reviewers 1 and 2 found some of our explanations of ASA form and DPP difficult to follow. We will also explain the motivation for our ruleset (reviewer 1's guess is essentially correct). This is what we meant by our vague phrasing "jointly DCP ... [with] one We will separately explain how to reduce certain expressions in which parameters are multiplied together ( e.g., We will clarify this point. In the revision, we will make sure to clearly explain this.


Adversarial Text Generation with Dynamic Contextual Perturbation

Waghela, Hetvi, Sen, Jaydip, Rakshit, Sneha, Dasgupta, Subhasis

arXiv.org Artificial Intelligence

Adversarial attacks on Natural Language Processing (NLP) models expose vulnerabilities by introducing subtle perturbations to input text, often leading to misclassification while maintaining human readability. Existing methods typically focus on word-level or local text segment alterations, overlooking the broader context, which results in detectable or semantically inconsistent perturbations. We propose a novel adversarial text attack scheme named Dynamic Contextual Perturbation (DCP). DCP dynamically generates context-aware perturbations across sentences, paragraphs, and documents, ensuring semantic fidelity and fluency. Leveraging the capabilities of pre-trained language models, DCP iteratively refines perturbations through an adversarial objective function that balances the dual objectives of inducing model misclassification and preserving the naturalness of the text. This comprehensive approach allows DCP to produce more sophisticated and effective adversarial examples that better mimic natural language patterns. Our experimental results, conducted on various NLP models and datasets, demonstrate the efficacy of DCP in challenging the robustness of state-of-the-art NLP systems. By integrating dynamic contextual analysis, DCP significantly enhances the subtlety and impact of adversarial attacks. This study highlights the critical role of context in adversarial attacks and lays the groundwork for creating more robust NLP systems capable of withstanding sophisticated adversarial strategies.


DCP: Learning Accelerator Dataflow for Neural Network via Propagation

Xu, Peng, Shao, Wenqi, Ding, Mingyu, Luo, Ping

arXiv.org Artificial Intelligence

Deep neural network (DNN) hardware (HW) accelerators have achieved great success in improving DNNs' performance and efficiency. One key reason is dataflow in executing a DNN layer, including on-chip data partitioning, computation parallelism, and scheduling policy, which have large impacts on latency and energy consumption. Unlike prior works that required considerable efforts from HW engineers to design suitable dataflows for different DNNs, this work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. It has several attractive benefits that prior arts do not have. (i) We translate the HW dataflow configuration into a code representation in a unified dataflow coding space, which can be optimized by backpropagating gradients given a DNN layer or network. (ii) DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives e.g., latency and energy. (iii) It can be easily generalized to unseen HW configurations in a zero-shot or few-shot learning manner. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples. Extensive experiments on several representative models such as MobileNet, ResNet, and ViT show that DCP outperforms its counterparts in various settings.


Stochastic COLREGs Evaluation for Safe Navigation under Uncertainty

Hansen, Peter Nicholas, Papageorgiou, Dimitrios, Galeazzi, Roberto, Blanke, Mogens

arXiv.org Artificial Intelligence

The encounter situation between marine vessels determines how they should navigate to obey COLREGs, but time-varying and stochastic uncertainty in estimation of angles of encounter, and of closest point of approach, easily give rise to different assessment of situation at two approaching vessels. This may lead to high-risk conditions and could cause collision. This article considers decision making under uncertainty and suggests a novel method for probabilistic interpretation of vessel encounters that is explainable and provides a measure of uncertainty in the evaluation. The method is equally useful for decision support on a manned bridge as on Marine Autonomous Surface Ships (MASS) where it provides input for automated navigation. The method makes formal safety assessment and validation feasible. We obtain a resilient algorithm for machine interpretation of COLREGs under uncertainty and show its efficacy by simulations.


Programming Distributed Collective Processes in the eXchange Calculus

Audrito, Giorgio, Casadei, Roberto, Damiani, Ferruccio, Torta, Gianluca, Viroli, Mirko

arXiv.org Artificial Intelligence

Recent trends like the Internet of Things (IoT) suggest a vision of dense and multi-scale deployments of computing devices in nearly all kinds of environments. A prominent engineering challenge revolves around programming the collective adaptive behaviour of such computational ecosystems. This requires abstractions able to capture concepts like ensembles (dynamic groups of cooperating devices) and collective tasks (joint activities carried out by ensembles). In this work, we consider collections of devices interacting with neighbours and that execute in nearly-synchronised sense-compute-interact rounds, where the computation is given by a single program mapping sensing values and incoming messages to output and outcoming messages. To support programming whole computational collectives, we propose the abstraction of a distributed collective process, which can be used to define at once the ensemble formation logic and its collective task. We formalise the abstraction in the eXchange Calculus (XC), a core functional language based on neighbouring values (maps from neighbours to values) where state and interaction is handled through a single primitive, exchange, and provide a corresponding implementation in the FCPP language. Then, we exercise distributed collective processes using two case studies: multi-hop message propagation and distributed monitoring of spatial properties. Finally, we discuss the features of the abstraction and its suitability for different kinds of distributed computing applications.


Dynamically Conservative Self-Driving Planner for Long-Tail Cases

Zhou, Weitao, Cao, Zhong, Deng, Nanshan, Liu, Xiaoyu, Jiang, Kun, Yang, Diange

arXiv.org Artificial Intelligence

Self-driving vehicles (SDVs) are becoming reality but still suffer from "long-tail" challenges during natural driving: the SDVs will continually encounter rare, safety-critical cases that may not be included in the dataset they were trained. Some safety-assurance planners solve this problem by being conservative in all possible cases, which may significantly affect driving mobility. To this end, this work proposes a method to automatically adjust the conservative level according to each case's "long-tail" rate, named dynamically conservative planner (DCP). We first define the "long-tail" rate as an SDV's confidence to pass a driving case. The rate indicates the probability of safe-critical events and is estimated using the statistics bootstrapped method with historical data. Then, a reinforcement learning-based planner is designed to contain candidate policies with different conservative levels. The final policy is optimized based on the estimated "long-tail" rate. In this way, the DCP is designed to automatically adjust to be more conservative in low-confidence "long-tail" cases while keeping efficient otherwise. The DCP is evaluated in the CARLA simulator using driving cases with "long-tail" distributed training data. The results show that the DCP can accurately estimate the "long-tail" rate to identify potential risks. Based on the rate, the DCP automatically avoids potential collisions in "long-tail" cases using conservative decisions while not affecting the average velocity in other typical cases. Thus, the DCP is safer and more efficient than the baselines with fixed conservative levels, e.g., an always conservative planner. This work provides a technique to guarantee SDV's performance in unexpected driving cases without resorting to a global conservative setting, which contributes to solving the "long-tail" problem practically.


Feature Decomposition for Reducing Negative Transfer: A Novel Multi-task Learning Method for Recommender System

Zhou, Jie, Yu, Qian, Luo, Chuan, Zhang, Jing

arXiv.org Artificial Intelligence

In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS). However, in a recommender system, the correlations among the involved tasks are complex. Therefore, the existing MTL models designed for RS suffer from negative transfer to different degrees, which will injure optimization in MTL. We find that the root cause of negative transfer is feature redundancy that features learned for different tasks interfere with each other. To alleviate the issue of negative transfer, we propose a novel multi-task learning method termed Feature Decomposition Network (FDN). The key idea of the proposed FDN is reducing the phenomenon of feature redundancy by explicitly decomposing features into task-specific features and task-shared features with carefully designed constraints. We demonstrate the effectiveness of the proposed method on two datasets, a synthetic dataset and a public datasets (i.e., Ali-CCP). Experimental results show that our proposed FDN can outperform the state-of-the-art (SOTA) methods by a noticeable margin.