centralization
- North America > United States > New York (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- (4 more...)
on all the raised issues, and how we will address them, which will certainly improve our work
We thank all the Reviewers for their feedback and their service to the community. We summarize it in the following paragraphs. In particular, taking localization to mean "analysis of the variance-constrained star-convex hull", function Instead taking localization to mean "constrain by raw-variance," the above issue is solved, but now the We intend to fully explore this connection in future work.
- North America > United States > New York (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- (4 more...)
What are you sinking? A geometric approach on attention sink
Ruscio, Valeria, Nanni, Umberto, Silvestri, Fabrizio
Attention sink (AS) is a consistent pattern in transformer attention maps where certain tokens (often special tokens or positional anchors) disproportionately attract attention from other tokens. We show that in transformers, AS is not an architectural artifact, but it is the manifestation of a fundamental geometric principle: the establishment of reference frames that anchor representational spaces. We analyze several architectures and identify three distinct reference frame types, centralized, distributed, and bidirectional, that correlate with the attention sink phenomenon. We show that they emerge during the earliest stages of training as optimal solutions to the problem of establishing stable coordinate systems in high-dimensional spaces. We show the influence of architecture components, particularly position encoding implementations, on the specific type of reference frame. This perspective transforms our understanding of transformer attention mechanisms and provides insights for both architecture design and the relationship with AS.
- Research Report > Experimental Study (0.72)
- Research Report > New Finding (0.50)
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Ugan, Enes Yavuz, Pham, Ngoc-Quan, Waibel, Alexander
Modern neural network based speech recognition models are required to continually absorb new data without re-training the whole system, especially in downstream applications using foundation models, having no access to the original training data. Continually training the models in a rehearsal-free, multilingual, and language agnostic condition, likely leads to catastrophic forgetting, when a seemingly insignificant disruption to the weights can destructively harm the quality of the models. Inspired by the ability of human brains to learn and consolidate knowledge through the waking-sleeping cycle, we propose a continual learning approach with two distinct phases: factorization and centralization, learning and merging knowledge accordingly. Our experiments on a sequence of varied code-switching datasets showed that the centralization stage can effectively prevent catastrophic forgetting by accumulating the knowledge in multiple scattering low-rank adapters.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Asia > East Asia (0.04)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Review for NeurIPS paper: Sharp uniform convergence bounds through empirical centralization
Weaknesses: The authors consider the supremum of absolute deviation \sup_f \hat{E}_x[f]-E_D[f] . In statistical learning theory, it suffices to estimate \sup_{f}[E_D[f]-\hat{E}_x[f]], i.e., there is no absolute value. Therefore, the results in this paper may not be interesting since centralization does not help for a smaller complexity that is sufficient for generalization. The centralization in terms of expectation has been considered in the literature for both Rademacher averages and variances. This paper extends this centralization to empirical centralization.
Uncovering the hidden core-periphery structure in hyperbolic networks
Ansari, Imran, Yadav, Pawanesh, Sahni, Niteesh
The hyperbolic network models exhibit very fundamental and essential features, like small-worldness, scale-freeness, high-clustering coefficient, and community structure. In this paper, we comprehensively explore the presence of an important feature, the core-periphery structure, in the hyperbolic network models, which is often exhibited by real-world networks. We focused on well-known hyperbolic models such as popularity-similarity optimization model (PSO) and S1/H2 models and studied core-periphery structures using a well-established method that is based on standard random walk Markov chain model. The observed core-periphery centralization values indicate that the core-periphery structure can be very pronounced under certain conditions. We also validate our findings by statistically testing for the significance of the observed core-periphery structure in the network geometry. This study extends network science and reveals core-periphery insights applicable to various domains, enhancing network performance and resiliency in transportation and information systems.
Collectionless Artificial Intelligence
By and large, the professional handling of huge data collections is regarded as a fundamental ingredient of the progress of machine learning and of its spectacular results in related disciplines, with a growing agreement on risks connected to the centralization of such data collections. This paper sustains the position that the time has come for thinking of new learning protocols where machines conquer cognitive skills in a truly human-like context centered on environmental interactions. This comes with specific restrictions on the learning protocol according to the collectionless principle, which states that, at each time instant, data acquired from the environment is processed with the purpose of contributing to update the current internal representation of the environment, and that the agent is not given the privilege of recording the temporal stream. Basically, there is neither permission to store the temporal information coming from the sensors, thus promoting the development of self-organized memorization skills at a more abstract level, instead of relying on bare storage to simulate learning dynamics that are typical of offline learning algorithms. This purposely extreme position is intended to stimulate the development of machines that learn to dynamically organize the information by following human-based schemes. The proposition of this challenge suggests developing new foundations on computational processes of learning and reasoning that might open the doors to a truly orthogonal competitive track on AI technologies that avoid data accumulation by design, thus offering a framework which is better suited concerning privacy issues, control and customizability. Finally, pushing towards massively distributed computation, the collectionless approach to AI will likely reduce the concentration of power in companies and governments, thus better facing geopolitical issues.
- North America > United States > Texas > Coleman County (0.04)
- Europe > Italy > Liguria > Genoa (0.04)
- Education (1.00)
- Information Technology > Security & Privacy (0.54)
AI set to benefit from blockchain-based data infrastructure
The rise of ChatGPT has been nothing short of spectacular. Within two months of launch, the artificial intelligence (AI)-based application reached 100 million unique users. In January 2023 alone, ChatGPT registered about 590 million visits. In addition to AI, blockchain is another disruptive technology with increasing adoption. Decentralized protocols, applications and business models have matured and gained market traction since the Bitcoin (BTC) white paper was published in 2008.
- Information Technology > e-Commerce > Financial Technology (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)