hash table
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- North America > Canada > British Columbia > Vancouver (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Texas (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Education > Educational Setting (0.46)
- Information Technology > Smart Houses & Appliances (0.34)
- North America > United States > Maryland (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Asia > China (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Africa > Mali (0.04)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
TowardsCrowdsourcedTrainingofLargeNeural NetworksusingDecentralizedMixture-of-Experts
Many recent breakthroughs in deep learning were achieved by training increasingly larger models on massivedatasets. However,training such models can be prohibitively expensive. For instance, the cluster used to train GPT-3 costs over $250 million2. Asaresult, most researchers cannot afford totrain state oftheart models and contribute to their development.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
Large-Scale Distributed Learning via Private On-Device LSH
Locality-sensitive hashing (LSH) based frameworks have been used efficiently to select weight vectors in a dense hidden layer with high cosine similarity to an input, enabling dynamic pruning. While this type of scheme has been shown to improve computational training efficiency, existing algorithms require repeated randomized projection of the full layer weight, which is impractical for computational-and memory-constrained devices. In a distributed setting, deferring LSH analysis to a centralized host is (i) slow if the device cluster is large and (ii) requires access to input data which is forbidden in a federated context. Using a new family of hash functions, we develop the first private, personalized, and memory-efficient on-device LSH framework.Our framework enables privacy and personalization by allowing each device to generate hash tables, without the help of a central host, using device-specific hashing hyper-parameters (e.g., number of hash tables or hash length).Hash tables are generated with a compressed set of the full weights, and can be serially generated and discarded if the process is memory-intensive.This allows devices to avoid maintaining (i) the fully-sized model and (ii) large amounts of hash tables in local memory for LSH analysis. We prove several statistical and sensitivity properties of our hash functions, and experimentally demonstrate that our framework is competitive in training large scale recommender networks compared to other LSH frameworks which assume unrestricted on-device capacity.
Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences
Early detection and characterization of coronavirus disease (COVID-19), caused by SARS-CoV-2, remain critical for effective clinical response and public-health planning. The global availability of large-scale viral sequence data presents significant opportunities for computational analysis; however, existing approaches face notable limitations. Phylogenetic tree-based methods are computationally intensive and do not scale efficiently to today's multi-million-sequence datasets. Similarly, current embedding-based techniques often rely on aligned sequences or exhibit suboptimal predictive performance and high runtime costs, creating barriers to practical large-scale analysis. In this study, we focus on the most prevalent SARS-CoV-2 lineages associated with the spike protein region and introduce a scalable embedding method that leverages hashing to generate compact, low-dimensional representations of spike sequences. These embeddings are subsequently used to train a variety of machine learning models for supervised lineage classification. We conduct an extensive evaluation comparing our approach with multiple baseline and state-of-the-art biological sequence embedding methods across diverse metrics. Our results demonstrate that the proposed embeddings offer substantial improvements in efficiency, achieving up to 86.4\% classification accuracy while reducing embedding generation time by as much as 99.81\%. This highlights the method's potential as a fast, effective, and scalable solution for large-scale viral sequence analysis.
- Asia > China > Guangdong Province (0.14)
- North America > United States > California > Alameda County > Fremont (0.04)
- Asia > Pakistan > Sindh > Karachi Division > Karachi (0.04)
Dynamic Tree Databases in Automated Planning
Joergensen, Oliver, Drexler, Dominik, Seipp, Jendrik
A central challenge in scaling up explicit state-space search for large tasks is compactly representing the set of generated states. Tree databases, a data structure from model checking, require constant space per generated state in the best case, but they need a large preallocation of memory. We propose a novel dynamic variant of tree databases for compressing state sets over propositional and numeric variables and prove that it maintains the desirable properties of the static counterpart. Our empirical evaluation of state compression techniques for grounded and lifted planning on classical and numeric planning tasks reveals compression ratios of several orders of magnitude, often with negligible runtime overhead.