torch
Appendix for "Episodic Multi-Task Learning with Heterogeneous Neural Processes "
Appendix for "Episodic Multi-T ask Learning with Heterogeneous Neural Processes" In this section, we list frequently asked questions from researchers who help proofread this manuscript. As shown in Table 1, we use "Heterogeneous tasks" to distinguish the different branches of multi-task Meanwhile, "Episodic training" is used to describe the data-feeding strategy. Thus, "Heterogeneous tasks" is not available here (-). In episodic multi-task learning, we restrict the scope of the problem to the case where tasks in the same episode are related and share the same target space. This also implies that tasks with the same target space are related.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > China > Hunan Province > Changsha (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
MIT professor designs 2026 Winter Olympics torch
Officially named'Essential,' the torch was designed by Carlo Ratti and weighs only 2.5 pounds. Breakthroughs, discoveries, and DIY tips sent six days a week. Every Olympic Games has a torch. Every torch has a designer. For the 2026 Milano Cortina Olympic Games and Paralympic Games, that designer is MIT engineer and architect Carlo Ratti .
- Oceania > Australia (0.06)
- Europe > Italy > Piedmont > Turin Province > Turin (0.06)
- Europe > Greece (0.06)
- (3 more...)
Mixture of Lookup Key-Value Experts
Recent research has developed several LLM architectures suitable for inference on end-user devices, such as the Mixture of Lookup Experts (MoLE)~\parencite{jie_mixture_2025}. A key feature of MoLE is that each token id is associated with a dedicated group of experts. For a given input, only the experts corresponding to the input token id will be activated. Since the communication overhead of loading this small number of activated experts into RAM during inference is negligible, expert parameters can be offloaded to storage, making MoLE suitable for resource-constrained devices. However, MoLE's context-independent expert selection mechanism, based solely on input ids, may limit model performance. To address this, we propose the \textbf{M}ixture \textbf{o}f \textbf{L}ookup \textbf{K}ey-\textbf{V}alue Experts (\textbf{MoLKV}) model. In MoLKV, each expert is structured as a key-value pair. For a given input, the input-derived query interacts with the cached key-value experts from the current sequence, generating a context-aware expert output. This context-aware mechanism alleviates the limitation of MoLE, and experimental results demonstrate that MoLKV achieves significantly lower validation loss in small-scale evaluations.
Understanding Diffusion Models via Code Execution
Diffusion models have achieved remarkable performance in generative modeling, yet their theoretical foundations are often intricate, and the gap between mathematical formulations in papers and practical open-source implementations can be difficult to bridge. Existing tutorials primarily focus on deriving equations, offering limited guidance on how diffusion models actually operate in code. To address this, we present a concise implementation of approximately 300 lines that explains diffusion models from a code-execution perspective. Our minimal example preserves the essential components -- including forward diffusion, reverse sampling, the noise-prediction network, and the training loop -- while removing unnecessary engineering details. This technical report aims to provide researchers with a clear, implementation-first understanding of how diffusion models work in practice and how code and theory correspond. Our code and pre-trained models are available at: https://github.com/disanda/GM/tree/main/DDPM-DDIM-ClassifierFree.
- Asia > China > Chongqing Province > Chongqing (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
CoMind: Towards Community-Driven Agents for Machine Learning Engineering
Li, Sijie, Sun, Weiwei, Li, Shanda, Talwalkar, Ameet, Yang, Yiming
Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, an multi-agent system designed to actively integrate external knowledge. CoMind employs an iterative parallel exploration mechanism, developing multiple solutions simultaneously to balance exploratory breadth with implementation depth. On 75 past Kaggle competitions within our MLE-Live framework, CoMind achieves a 36% medal rate, establishing a new state of the art. Critically, when deployed in eight live, ongoing competitions, CoMind outperforms 92.6% of human competitors on average, placing in the top 5% on three official leaderboards and the top 1% on one.
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Africa > Cameroon > Gulf of Guinea (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Education (0.67)