lsc
- Asia > Middle East > Israel (0.04)
- Asia > China > Guangdong Province (0.04)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.15)
- Asia > Singapore (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improve the training stability of UNet. Experimental results on CIFAR10, CelebA, ImageNet and COCO show that our methods are superior to stabilize training, and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones.
Feature-Guided Analysis of Neural Networks: A Replication Study
Formica, Federico, Gregis, Stefano, Zanenga, Aurora Francesca, Rota, Andrea, Lawford, Mark, Menghi, Claudio
Understanding why neural networks make certain decisions is pivotal for their use in safety-critical applications. Feature-Guided Analysis (FGA) extracts slices of neural networks relevant to their tasks. Existing feature-guided approaches typically monitor the activation of the neural network neurons to extract the relevant rules. Preliminary results are encouraging and demonstrate the feasibility of this solution by assessing the precision and recall of Feature-Guided Analysis on two pilot case studies. However, the applicability in industrial contexts needs additional empirical evidence. To mitigate this need, this paper assesses the applicability of FGA on a benchmark made by the MNIST and LSC datasets. We assessed the effectiveness of FGA in computing rules that explain the behavior of the neural network. Our results show that FGA has a higher precision on our benchmark than the results from the literature. We also evaluated how the selection of the neural network architecture, training, and feature selection affect the effectiveness of FGA. Our results show that the selection significantly affects the recall of FGA, while it has a negligible impact on its precision.
- North America > Canada > Ontario > Hamilton (0.14)
- Europe > Italy (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)
Separation of Unconscious Robots with Obstructed Visibility
Pyati, Prajyot, Kaur, Navjot, Jana, Saswata, Bhattacharya, Adri, Mandal, Partha Sarathi
We study a recently introduced \textit{unconscious} mobile robot model, where each robot is associated with a \textit{color}, which is visible to other robots but not to itself. The robots are autonomous, anonymous, oblivious and silent, operating in the Euclidean plane under the conventional \textit{Look-Compute-Move} cycle. A primary task in this model is the \textit{separation problem}, where unconscious robots sharing the same color must separate from others, forming recognizable geometric shapes such as circles, points, or lines. All prior works model the robots as \textit{transparent}, enabling each to know the positions and colors of all other robots. In contrast, we model the robots as \textit{opaque}, where a robot can obstruct the visibility of two other robots, if it lies on the line segment between them. Under this obstructed visibility, we consider a variant of the separation problem in which robots, starting from any arbitrary initial configuration, are required to separate into concentric semicircles. We present a collision-free algorithm that solves the separation problem under a semi-synchronous scheduler in $O(n)$ epochs, where $n$ is the number of robots. The robots agree on one coordinate axis but have no knowledge of $n$.
- Asia > India (0.14)
- Europe > Switzerland (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > China > Guangdong Province (0.04)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- Asia > Singapore (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs
Lee, Sungjae, Kim, Hoyoung, Hwang, Jeongyeon, Park, Eunhyeok, Ok, Jungseul
Scaling test-time computation--generating and analyzing multiple or sequential outputs for a single input--has become a promising strategy for improving the reliability and quality of large language models (LLMs), as evidenced by advances in uncertainty quantification and multi-step reasoning. A key shared component is semantic clustering, which groups outputs that differ in form but convey the same meaning. Semantic clustering enables estimation of the distribution over the semantics of outputs and helps avoid redundant exploration of reasoning paths. However, existing approaches typically rely on external models, which introduce substantial computational overhead and often fail to capture context-aware semantics. We propose Latent Semantic Clustering (LSC), a lightweight and context-sensitive method that leverages the generator LLM's internal hidden states for clustering, eliminating the need for external models. Our extensive experiment across various LLMs and datasets shows that LSC significantly improves the computational efficiency of test-time scaling while maintaining or exceeding the performance of existing methods.
- Asia > South Korea (0.04)
- Europe > Russia > Volga Federal District > Nizhny Novgorod Oblast > Nizhny Novgorod (0.04)
- Europe > Middle East > Cyprus (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area (0.93)
- Education (0.93)
Line Space Clustering (LSC): Feature-Based Clustering using K-medians and Dynamic Time Warping for Versatility
Chulev, Joanikij, Mladenovska, Angela
Clustering high-dimensional data is a critical challenge in machine learning due to the curse of dimensionality and the presence of noise. Traditional clustering algorithms often fail to capture the intrinsic structures in such data. This paper explores a combination of clustering methods, which we called Line Space Clustering (LSC), a representation that transforms data points into lines in a newly defined feature space, enabling clustering based on the similarity of feature value patterns, essentially treating features as sequences. LSC employs a combined distance metric that uses Euclidean and Dynamic Time Warping (DTW) distances, weighted by a parameter {\alpha}, allowing flexibility in emphasizing shape or magnitude similarities. We delve deeply into the mechanics of DTW and the Savitzky Golay filter, explaining their roles in the algorithm. Extensive experiments demonstrate the efficacy of LSC on synthetic and real-world datasets, showing that randomly experimenting with time-series optimized methods sometimes might surprisingly work on a complex dataset, particularly in noisy environments. Source code and experiments are available at: https://github.com/JoanikijChulev/LSC.
A General Framework to Evaluate Methods for Assessing Dimensions of Lexical Semantic Change Using LLM-Generated Synthetic Data
Baes, Naomi, Merx, Raphaël, Haslam, Nick, Vylomova, Ekaterina, Dubossarsky, Haim
Lexical Semantic Change (LSC) offers insights into cultural and social dynamics. Yet, the validity of methods for measuring kinds of LSC has yet to be established due to the absence of historical benchmark datasets. To address this gap, we develop a novel three-stage evaluation framework that involves: 1) creating a scalable, domain-general methodology for generating synthetic datasets that simulate theory-driven LSC across time, leveraging In-Context Learning and a lexical database; 2) using these datasets to evaluate the effectiveness of various methods; and 3) assessing their suitability for specific dimensions and domains. We apply this framework to simulate changes across key dimensions of LSC (SIB: Sentiment, Intensity, and Breadth) using examples from psychology, and evaluate the sensitivity of selected methods to detect these artificially induced changes. Our findings support the utility of the synthetic data approach, validate the efficacy of tailored methods for detecting synthetic changes in SIB, and reveal that a state-of-the-art LSC model faces challenges in detecting affective dimensions of LSC. This framework provides a valuable tool for dimension- and domain-specific bench-marking and evaluation of LSC methods, with particular benefits for the social sciences.
- Asia > Thailand (0.14)
- North America > Canada (0.14)
- Europe > Middle East > Malta (0.14)
- (9 more...)
- Law (0.93)
- Health & Medicine > Health Care Providers & Services (0.93)
- Health & Medicine > Consumer Health (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.68)