dmt
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips. Existing methods predominantly rely on self-supervised contrastive learning of audio-visual correspondence. Without any bounding-box annotations, they struggle to achieve precise localization, especially for small objects, and suffer from blurry boundaries and false positives. Moreover, the naive semi-supervised method is poor in effectively utilizing the abundance of unlabeled audio-visual pairs. In this paper, we propose a novel Semi-Supervised Learning framework for AVSL, namely Dual Mean-Teacher (DMT), comprising two teacher-student structures to circumvent the confirmation bias issue. Specifically, two teachers, pre-trained on limited labeled data, are employed to filter out noisy samples via the consensus between their predictions, and then generate high-quality pseudo-labels by intersecting their confidence maps. The optimal utilization of both labeled and unlabeled data combined with this unbiased framework enable DMT to outperform current state-of-the-art methods by a large margin, with CIoU of $\textbf{90.4\%}$
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Massachusetts (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- Transportation (0.68)
- Health & Medicine > Therapeutic Area (0.68)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Massachusetts (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- Transportation (0.68)
- Health & Medicine > Therapeutic Area (0.68)
Accelerating Drug Discovery Through Agentic AI: A Multi-Agent Approach to Laboratory Automation in the DMTA Cycle
Fehlis, Yao, Crain, Charles, Jensen, Aidan, Watson, Michael, Juhasz, James, Mandel, Paul, Liu, Betty, Mahon, Shawn, Wilson, Daren, Lynch-Jonely, Nick, Leedom, Ben, Fuller, David
The pharmaceutical industry faces unprecedented challenges in drug discovery, with traditional approaches struggling to meet modern therapeutic development demands. This paper introduces a novel AI framework, Tippy, that transforms laboratory automation through specialized AI agents operating within the Design-Make-Test-Analyze (DMTA) cycle. Our multi-agent system employs five specialized agents - Supervisor, Molecule, Lab, Analysis, and Report, with Safety Guardrail oversight - each designed to excel in specific phases of the drug discovery pipeline. Tippy represents the first production-ready implementation of specialized AI agents for automating the DMTA cycle, providing a concrete example of how AI can transform laboratory workflows. By leveraging autonomous AI agents that reason, plan, and collaborate, we demonstrate how Tippy accelerates DMTA cycles while maintaining scientific rigor essential for pharmaceutical research. The system shows significant improvements in workflow efficiency, decision-making speed, and cross-disciplinary coordination, offering a new paradigm for AI-assisted drug discovery.
Aliens are already here...they are intelligent but have a dark side and operate on us
Users of a naturally occurring psychedelic drug are convinced they've encountered real alien beings, including'machine elves,' which inhabit a realm beyond our Earth. These machine elves, described as chattering, mischievous entities, consistently appear in the visions of those who take DMT, which one neuroscientist suggested could mean users are actually entering a shared alien reality. DMT (or N,N-Dimethyltryptamine) is present in thousands of plants, including ayahuasca, which is used in religious ceremonies, but is also present in small amounts within the human body. Dr Andrew Gallimore, who has a PhD was in biological chemistry and has studied computational neuroscience, said he encountered these beings firsthand after being transported to a hyper-dimensional world teeming with intelligent lifeforms. Unlike earthly creatures, these beings - ranging from insectoids to God-like figures -seem to exist in a space that defies our three-dimensional understanding.
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips. Existing methods predominantly rely on self-supervised contrastive learning of audio-visual correspondence. Without any bounding-box annotations, they struggle to achieve precise localization, especially for small objects, and suffer from blurry boundaries and false positives. Moreover, the naive semi-supervised method is poor in effectively utilizing the abundance of unlabeled audio-visual pairs. In this paper, we propose a novel Semi-Supervised Learning framework for AVSL, namely Dual Mean-Teacher (DMT), comprising two teacher-student structures to circumvent the confirmation bias issue.
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Taheri, Hossein, Thrampoulidis, Christos, Mazumdar, Arya
In this paper, we study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation. Our first result is a novel bound on the excess risk of deep networks trained by the logistic loss, via an alogirthmic stability analysis. Compared to previous works, our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds. Importantly, the bounds we derive in this paper are tighter, hold even for neural networks of small width, do not scale unfavorably with width, are algorithm-dependent, and consequently capture the role of initialization on the sample complexity of gradient descent for deep nets. Specialized to noiseless data separable with margin $\gamma$ by neural tangent kernel (NTK) features of a network of width $\Omega(\text{poly}(\log(n)))$, we show the test-error rate to be $e^{O(L)}/{\gamma^2 n}$, where $n$ is the training set size and $L$ denotes the number of hidden layers. This is an improvement in the test loss bound compared to previous works while maintaining the poly-logarithmic width conditions. We further investigate excess risk bounds for deep nets trained with noisy data, establishing that under a polynomial condition on the network width, gradient descent can achieve the optimal excess risk. Finally, we show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution. In particular, we show for a one-hidden-layer neural network of constant width $m$ with quadratic activation and standard Gaussian initialization that mini-batch SGD with linear sample complexity and with a large step-size $\eta=m$ reaches the perfect test accuracy after only $\ceil{\log(d)}$ iterations, where $d$ is the data dimension.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Luo, Liang, Zhang, Buyun, Tsang, Michael, Ma, Yinbin, Chu, Ching-Hsiang, Chen, Yuxin, Li, Shen, Hao, Yuchen, Zhao, Yanli, Lakshminarayanan, Guna, Wen, Ellie Dingqiao, Park, Jongsoo, Mudigere, Dheevatsa, Naumov, Maxim
We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global embedding lookup process into disjoint towers to exploit data center locality; (2) Tower Module (TM), a synergistic dense component attached to each tower to reduce model complexity and communication volume through hierarchical feature interaction; and (3) Tower Partitioner (TP), a feature partitioner to systematically create towers with meaningful feature interactions and load balanced assignments to preserve model quality and training throughput via learned embeddings. We show that DMT can achieve up to 1.9 speedup compared to the state-of-the-art baselines without losing accuracy across multiple generations of hardware at large data center scales. Since the embedding tables can be huge, the state-of-the-art practices train these models in a hybrid fashion: the sparse are synchronized through AllReduce operations. Nvidia, work done while at Meta.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)