Goto

Collaborating Authors

 fred





Frequency Domain-based Dataset Distillation

Neural Information Processing Systems

Unlike conventional approaches that focus on the spatial domain, FreD employs frequency-based transforms to optimize the frequency representations of each data instance.


Opacity as Authority: Arbitrariness and the Preclusion of Contestation

Kayembe, Naomi Omeonga wa

arXiv.org Artificial Intelligence

This article redefines arbitrariness not as a normative flaw or a symptom of domination, but as a foundational functional mechanism structuring human systems and interactions. Diverging from critical traditions that conflate arbitrariness with injustice, it posits arbitrariness as a semiotic trait: a property enabling systems - linguistic, legal, or social - to operate effectively while withholding their internal rationale. Building on Ferdinand de Saussure's concept of l'arbitraire du signe, the analysis extends this principle beyond language to demonstrate its cross-domain applicability, particularly in law and social dynamics. The paper introduces the "Motivation -> Constatability -> Contestability" chain, arguing that motivation functions as a crucial interface rendering an act's logic vulnerable to intersubjective contestation. When this chain is broken through mechanisms like "immotivization" or "Conflict Lateralization" (exemplified by "the blur of the wolf drowned in the fish"), acts produce binding effects without exposing their rationale, thus precluding justiciability. This structural opacity, while appearing illogical, is a deliberate design protecting authority from accountability. Drawing on Shannon's entropy model, the paper formalizes arbitrariness as A = H(L|M) (conditional entropy). It thereby proposes a modern theory of arbitrariness as a neutral operator central to control as well as care, an overlooked dimension of interpersonal relations. While primarily developed through human social systems, this framework also illuminates a new pathway for analyzing explainability in advanced artificial intelligence systems.


FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Rashidi, Saeed, Won, William, Srinivasan, Sudarshan, Gupta, Puneet, Krishna, Tushar

arXiv.org Artificial Intelligence

Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of different parallelization strategies. Furthermore, FRED supports in-switch collective communication execution that reduces the network traffic by approximately 2X. Our results show that FRED can improve the average end-to-end training time of ResNet-152, Transformer-17B, GPT-3, and Transformer-1T by 1.76X, 1.87X, 1.34X, and 1.4X, respectively when compared to a baseline waferscale 2D-Mesh fabric.


In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

Kuratov, Yuri, Bulatov, Aydar, Anokhin, Petr, Sorokin, Dmitry, Sorokin, Artyom, Burtsev, Mikhail

arXiv.org Artificial Intelligence

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to $11\times 10^6$ elements. This achievement marks a substantial leap, as it is by far the longest input processed by any neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.


FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection

Lee, Chanho, Son, Jinsu, Shon, Hyounguk, Jeon, Yunho, Kim, Junmo

arXiv.org Artificial Intelligence

Rotation-equivariance is an essential yet challenging property in oriented object detection. While general object detectors naturally leverage robustness to spatial shifts due to the translation-equivariance of the conventional CNNs, achieving rotation-equivariance remains an elusive goal. Current detectors deploy various alignment techniques to derive rotation-invariant features, but still rely on high capacity models and heavy data augmentation with all possible rotations. In this paper, we introduce a Fully Rotation-Equivariant Oriented Object Detector (FRED), whose entire process from the image to the bounding box prediction is strictly equivariant. Specifically, we decouple the invariant task (object classification) and the equivariant task (object localization) to achieve end-to-end equivariance. We represent the bounding box as a set of rotation-equivariant vectors to implement rotation-equivariant localization. Moreover, we utilized these rotation-equivariant vectors as offsets in the deformable convolution, thereby enhancing the existing advantages of spatial adaptation. Leveraging full rotation-equivariance, our FRED demonstrates higher robustness to image-level rotation compared to existing methods. Furthermore, we show that FRED is one step closer to non-axis aligned learning through our experiments. Compared to state-of-the-art methods, our proposed method delivers comparable performance on DOTA-v1.0 and outperforms by 1.5 mAP on DOTA-v1.5, all while significantly reducing the model parameters to 16%.


Frequency Domain-based Dataset Distillation

Shin, Donghyeok, Shin, Seungjae, Moon, Il-Chul

arXiv.org Artificial Intelligence

This paper presents FreD, a novel parameterization method for dataset distillation, which utilizes the frequency domain to distill a small-sized synthetic dataset from a large-sized original dataset. Unlike conventional approaches that focus on the spatial domain, FreD employs frequency-based transforms to optimize the frequency representations of each data instance. By leveraging the concentration of spatial domain information on specific frequency components, FreD intelligently selects a subset of frequency dimensions for optimization, leading to a significant reduction in the required budget for synthesizing an instance. Through the selection of frequency dimensions based on the explained variance, FreD demonstrates both theoretical and empirical evidence of its ability to operate efficiently within a limited budget, while better preserving the information of the original dataset compared to conventional parameterization methods. Furthermore, based on the orthogonal compatibility of FreD with existing methods, we confirm that FreD consistently improves the performances of existing distillation methods over the evaluation scenarios with different benchmark datasets. We release the code at https://github.com/sdh0818/FreD.


Privately Customizing Prefinetuning to Better Match User Data in Federated Learning

Hou, Charlie, Zhan, Hongyuan, Shrivastava, Akshat, Wang, Sid, Livshits, Aleksandr, Fanti, Giulia, Lazar, Daniel

arXiv.org Artificial Intelligence

In Federated Learning (FL), accessing private client data incurs communication and privacy costs. As a result, FL deployments commonly prefinetune pretrained foundation models on a (large, possibly public) dataset that is held by the central server; they then FL-finetune the model on a private, federated dataset held by clients. Evaluating prefinetuning dataset quality reliably and privately is therefore of high importance. To this end, we propose FreD (Federated Private Fr\'echet Distance) -- a privately computed distance between a prefinetuning dataset and federated datasets. Intuitively, it privately computes and compares a Fr\'echet distance between embeddings generated by a large language model on both the central (public) dataset and the federated private client data. To make this computation privacy-preserving, we use distributed, differentially-private mean and covariance estimators. We show empirically that FreD accurately predicts the best prefinetuning dataset at minimal privacy cost. Altogether, using FreD we demonstrate a proof-of-concept for a new approach in private FL training: (1) customize a prefinetuning dataset to better match user data (2) prefinetune (3) perform FL-finetuning.