AITopics | replica

Collaborating Authors

replica

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SurDis: ASurface Discontinuity Dataset for Wearable Technology to Assist Blind Navigation in Urban Environments

Neural Information Processing SystemsApr-26-2026, 20:33:20 GMT

According to World Health Organization, there is an estimated 2.2 billion people with a near or distance vision impairment worldwide. Difficulty in self-navigation is one of the greatest challenges to independence for the blind and low vision (BLV) people. Through consultations with several BLV service providers, we realized that negotiating surface discontinuities is one of the very prominent challenges when navigating an outdoor environment within the urban. Surface discontinuities are commonly formed by rises and drop-offs along a pathway. They could be a threat to balancing during a walk and perceiving such a threat is highly challenging to the BLVs.

artificial intelligence, machine learning, surface discontinuity, (17 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Asia > Malaysia (0.15)
Asia > Japan (0.14)
Oceania > Australia (0.14)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.66)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)

Add feedback

cfd382c5eb817d52c7faf45a96f20b81-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 11:23:21 GMT

noise, nullnull, replica, (17 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > Japan (0.04)

Add feedback

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Neural Information Processing SystemsDec-24-2025, 16:00:20 GMT

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swapping them. When evolving the replicas' states, the Nos\'e-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines.

bayesian learning, e-hoover dynamic, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Lou, Chiheng, Qi, Sheng, Kang, Rui, Zhang, Yong, Sun, Chen, Wang, Pengcheng, Liu, Bingyang, Liu, Xuanzhe, Jin, Xin

arXiv.org Artificial IntelligenceDec-11-2025

Deploying multiple models within shared GPU clusters is promising for improving resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems optimize GPU utilization at the cost of worse inference performance, especially time-to-first-token (TTFT). We identify the root cause of such compromise as their unawareness of future workload characteristics. In contrast, recent analysis on real-world traces has shown the high periodicity and long-term predictability of LLM serving workloads. We propose universal GPU workers to enable one-for-many GPU prewarming that loads models with knowledge of future workloads. Based on universal GPU workers, we design and build WarmServe, a multi-LLM serving system that (1) mitigates cluster-wide prewarming interference by adopting an evict-aware model placement strategy, (2) prepares universal GPU workers in advance by proactive prewarming, and (3) manages GPU memory with a zero-overhead memory switching mechanism. Evaluation under real-world datasets shows that WarmServe improves TTFT by up to 50.8$\times$ compared to the state-of-the-art autoscaling-based system, while being capable of serving up to 2.5$\times$ more requests compared to the GPU-sharing system.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.09472

Genre: Research Report (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Hamadanian, Pouya, Karimi, Pantea, Nasr-Esfahany, Arash, Noorbakhsh, Kimia, Chandler, Joseph, ParandehGheibi, Ali, Alizadeh, Mohammad, Balakrishnan, Hari

arXiv.org Artificial IntelligenceNov-18-2025

Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.27176

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Communication Efficient LLM Pre-training with SparseLoCo

Sarfi, Amir, Thérien, Benjamin, Lidin, Joel, Belilovsky, Eugene

arXiv.org Artificial IntelligenceNov-7-2025

Communication-efficient distributed training algorithms have received considerable interest recently due to their benefits for training Large Language Models (LLMs) in bandwidth-constrained settings, such as across datacenters and over the internet. Despite reducing communication frequency, these methods still typically require communicating a full copy of the model's gradients-resulting in a communication bottleneck even for cross-datacenter links. Furthermore, they can slightly degrade performance compared to a naive AdamW DDP baseline. While quantization is often applied to reduce the pseudo-gradient's size, in the context of LLM pre-training, existing approaches have been unable to additionally leverage sparsification and have obtained limited quantization. In this work, we introduce SparseLoCo, a communication-efficient training algorithm for LLMs that effectively leverages error feedback with Top-k sparsification and 2-bit quantization to reach extreme sparsity as low as 1-3% while outperforming full-precision DiLoCo. Our key observations are that outer momentum can be locally approximated by an error feedback accumulator combined with aggressive sparsity, and that sparse aggregation can actually improve model performance. We empirically demonstrate in a range of communication-constrained LLM training settings that SparseLoCo provides significant benefits in both performance and communication cost.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.15706

Country:

North America > United States (0.46)
Europe > Italy (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models

Cui, Xingqi, Liang, Chieh-Jan Mike, Xing, Jiarong, Qiu, Haoran

arXiv.org Artificial IntelligenceNov-5-2025

Serving large generative models such as LLMs and multi- modal transformers requires balancing user-facing SLOs (e.g., time-to-first-token, time-between-tokens) with provider goals of efficiency and cost reduction. Existing solutions rely on static provisioning or model-level autoscaling, both of which treat the model as a monolith. This coarse-grained resource management leads to degraded performance or significant resource underutilization due to poor adaptability to dynamic inference traffic that is common online. The root cause of this inefficiency lies in the internal structure of generative models: they are executed as graphs of interconnected operators. Through detailed characterization and systematic analysis, we find that operators are heterogeneous in their compute and memory footprints and exhibit diverse sensitivity to workload and resource factors such as batch size, sequence length, and traffic rate. This heterogeneity suggests that the operator, rather than the entire model, is the right granularity for scaling decisions. We propose an operator-level autoscaling framework, which allocates resources at finer (operator)-granularity, optimizing the scaling, batching, and placement based on individual operator profiles. Evaluated on production-scale traces, our approach preserves SLOs with up to 40% fewer GPUs and 35% less energy, or under fixed resources achieves 1.6x higher throughput with 5% less energy. These results show that the operator, rather than the model, is fundamentally a more effective unit for scaling large generative workloads.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.02248

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.81)

Add feedback

SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations

Zhang, Haoting, Chen, Haoxian, Zhan, Donglin, Zhao, Hanyang, Lam, Henry, Tang, Wenpin, Yao, David, Zheng, Zeyu

arXiv.org Machine LearningNov-4-2025

The field of simulation optimization (SO) encompasses various methods developed to optimize complex, expensive-to-sample stochastic systems. Established methods include, but are not limited to, ranking-and-selection for finite alternatives and surrogate-based methods for continuous domains, with broad applications in engineering and operations management. The recent advent of large language models (LLMs) offers a new paradigm for exploiting system structure and automating the strategic selection and composition of these established SO methods into a tailored optimization procedure. This work introduces SOCRATES (Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations), a novel two-stage procedure that leverages LLMs to automate the design of tailored SO algorithms. The first stage constructs an ensemble of digital replicas of the real system. An LLM is employed to implement causal discovery from a textual description of the system, generating a structural `skeleton' that guides the sample-efficient learning of the replicas. In the second stage, this replica ensemble is used as an inexpensive testbed to evaluate a set of baseline SO algorithms. An LLM then acts as a meta-optimizer, analyzing the performance trajectories of these algorithms to iteratively revise and compose a final, hybrid optimization schedule. This schedule is designed to be adaptive, with the ability to be updated during the final execution on the real system when the optimization performance deviates from expectations. By integrating LLM-driven reasoning with LLM-assisted trajectory-aware meta-optimization, SOCRATES creates an effective and sample-efficient solution for complex SO optimization problems.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Machine Learning

2511.00685

Country:

North America > United States > Hawaii (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning by Steering the Neural Dynamics: A Statistical Mechanics Perspective

Scardecchia, Mattia

arXiv.org Artificial IntelligenceOct-15-2025

Despite the striking successes of deep neural networks trained with gradient-based optimization, these methods differ fundamentally from their biological counterparts. This gap raises key questions about how nature achieves robust, sample-efficient learning at minimal energy costs and solves the credit-assignment problem without backpropagation. We take a step toward bridging contemporary AI and computational neuroscience by studying how neural dynamics can support fully local, distributed learning that scales to simple machine-learning benchmarks. Using tools from statistical mechanics, we identify conditions for the emergence of robust dynamical attractors in random asymmetric recurrent networks. We derive a closed-form expression for the number of fixed points as a function of self-coupling strength, and we reveal a phase transition in their structure: below a critical self-coupling, isolated fixed points coexist with exponentially many narrow clusters showing the overlap-gap property; above it, subdominant yet dense and extensive clusters appear. These fixed points become accessible, including to a simple asynchronous dynamical rule, after an algorithm-dependent self-coupling threshold. Building on this analysis, we propose a biologically plausible algorithm for supervised learning with any binary recurrent network. Inputs are mapped to fixed points of the dynamics, by relaxing under transient external stimuli and stabilizing the resulting configurations via local plasticity. We show that our algorithm can learn an entangled version of MNIST, leverages depth to develop hierarchical representations and increase hetero-association capacity, and is applicable to several architectures. Finally, we highlight the strong connection between algorithm performance and the unveiled phase transition, and we suggest a cortex-inspired alternative to self-couplings for its emergence.

artificial intelligence, machine learning, visited, (19 more...)

arXiv.org Artificial Intelligence

2510.11984

Country:

North America > United States (0.46)
Europe (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Scaling Homomorphic Applications in Deployment

Marinelli, Ryan, Chowdhury, Angelica

arXiv.org Artificial IntelligenceOct-6-2025

In this endeavor, a proof-of-concept homomorphic application is developed to determine the production readiness of encryption ecosystems. A movie recommendation app is implemented for this purpose and productionized through containerization and orchestration. By tuning deployment configurations, the computational limitations of Fully Homomorphic Encryption (FHE) are mitigated through additional infrastructure optimizations.

cloud computing, machine learning, replica, (19 more...)

arXiv.org Artificial Intelligence

2510.02376

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.96)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Security & Privacy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback