diverse domain
Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
As deep learning applications continue to become more diverse, an interesting question arises: Can general problem solving arise from jointly learning several such diverse tasks? To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures. The idea is that any set of (architecture,task) pairs can be decomposed into a set of potentially related subproblems, whose sharing is optimized by an efficient stochastic algorithm. The approach is first validated in a classic synthetic multi-task learning benchmark, and then applied to sharing across disparate architectures for vision, NLP, and genomics tasks. It discovers regularities across these domains, encodes them into sharable modules, and combines these modules systematically to improve performance in the individual tasks. The results confirm that sharing learned functionality across diverse domains and architectures is indeed beneficial, thus establishing a key ingredient for general problem solving in the future.
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
Su, Yi, Yu, Dian, Song, Linfeng, Li, Juntao, Mi, Haitao, Tu, Zhaopeng, Zhang, Min, Yu, Dong
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are accessible for verification. However, its extension to broader, less structured domains remains unexplored. In this work, we investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education, where structured reference answers are typically unavailable. We reveal that binary verification judgments on broad-domain tasks exhibit high consistency across various LLMs provided expert-written reference answers exist. Motivated by this finding, we utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications, especially in free-form, unstructured answer scenarios. We further demonstrate the feasibility of training cross-domain generative reward models using relatively small (7B) LLMs without the need for extensive domain-specific annotation. Through comprehensive experiments, our RLVR framework establishes clear performance gains, significantly outperforming state-of-the-art open-source aligned models such as Qwen2.5-72B and DeepSeek-R1-Distill-Qwen-32B across domains in free-form settings. Our approach notably enhances the robustness, flexibility, and scalability of RLVR, representing a substantial step towards practical reinforcement learning applications in complex, noisy-label scenarios.
- Health & Medicine (0.93)
- Education (0.68)
Reviews: Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
I have consequently increased my score. The paper proposes to decompose the parameters into L distinct parameter blocks. Each of these blocks is seen as solving a "pseudo-task", learning a linear map from inputs to outputs. The parameters of these blocks are generated by K hypermodules (small hypernetworks) that condition on a context vector for each pseudo-task based. The alignment of hypermodules to pseudo-tasks is governed by a softmax function and learned during training similar to mixture-of-experts.
Reviews: Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
The submission is proposing a multi-task learning method based on sharing linear submodules. The proposed idea is interesting, novel, and shown to be effective. On the other hand, reviewers raised various issues about the empirical study. Authors did a good job addressing this issue in their response, and the final evaluation of all reviewers are positive. The paper is a good addition to the conference, and I recommend acceptance.
QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
Sun, Wenfang, Du, Yingjun, Liu, Gaowen, Snoek, Cees G. M.
We tackle the problem of quantifying the number of objects by a generative text-to-image model. Rather than retraining such a model for each new image domain of interest, which leads to high computational costs and limited scalability, we are the first to consider this problem from a domain-agnostic perspective. We propose QUOTA, an optimization framework for text-to-image models that enables effective object quantification across unseen domains without retraining. It leverages a dual-loop meta-learning strategy to optimize a domain-invariant prompt. Further, by integrating prompt learning with learnable counting and domain tokens, our method captures stylistic variations and maintains accuracy, even for object classes not encountered during training. For evaluation, we adopt a new benchmark specifically designed for object quantification in domain generalization, enabling rigorous assessment of object quantification accuracy and adaptability across unseen domains in text-to-image generation. Extensive experiments demonstrate that QUOTA outperforms conventional models in both object quantification accuracy and semantic consistency, setting a new benchmark for efficient and scalable text-to-image generation for any domain.
- Europe > Switzerland (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > China (0.04)
Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
As deep learning applications continue to become more diverse, an interesting question arises: Can general problem solving arise from jointly learning several such diverse tasks? To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures. The idea is that any set of (architecture,task) pairs can be decomposed into a set of potentially related subproblems, whose sharing is optimized by an efficient stochastic algorithm. The approach is first validated in a classic synthetic multi-task learning benchmark, and then applied to sharing across disparate architectures for vision, NLP, and genomics tasks. It discovers regularities across these domains, encodes them into sharable modules, and combines these modules systematically to improve performance in the individual tasks.
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
Yin, Guoli, Bai, Haoping, Ma, Shuang, Nan, Feng, Sun, Yanchao, Xu, Zhaoyang, Ma, Shen, Lu, Jiarui, Kong, Xiang, Zhang, Aonan, Yap, Dian Ang, zhang, Yizhe, Ahnert, Karsten, Kamath, Vik, Berglund, Mathias, Walsh, Dominic, Gindele, Tobias, Wiest, Juergen, Lai, Zhengfeng, Wang, Xiaoming, Shan, Jiulong, Cao, Meng, Pang, Ruoming, Wang, Zirui
Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern where failures stem from. Additionally, setting up these environments requires considerable effort, and issues of unreliability and reproducibility sometimes arise, especially in interactive tasks. To address these limitations, we introduce the Massive Multitask Agent Understanding (MMAU) benchmark, featuring comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics, and covers five essential capabilities: Understanding, Reasoning, Planning, Problem-solving, and Self-correction. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents. By testing 18 representative models on MMAU, we provide deep and insightful analyses. Ultimately, MMAU not only sheds light on the capabilities and limitations of LLM agents but also enhances the interpretability of their performance. Datasets and evaluation scripts of MMAU are released at https://github.com/apple/axlearn/tree/main/docs/research/mmau.
Unveiling the frontiers of deep learning: innovations shaping diverse domains
Ahmed, Shams Forruque, Alam, Md. Sakib Bin, Kabir, Maliha, Afrin, Shaila, Rafa, Sabiha Jannat, Mehjabin, Aanushka, Gandomi, Amir H.
Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.
Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering
Nikandrou, Mavina, Yu, Lu, Suglia, Alessandro, Konstas, Ioannis, Rieser, Verena
Continual learning aims to train a model incre-mentally on a sequence of tasks without forgetting previous knowledge. Although continual learning has been widely studied in computer vision, its application to Vision+Language tasks is not that straightforward, as settings can be parameterized in multiple ways according to their input modalities. In this paper, we present a detailed study of how different settings affect performance for Visual Question Answering. We first propose three plausible task formulations and demonstrate their impact on the performance of continual learning algorithms. We break down several factors of task similarity, showing that performance and sensitivity to task order highly depend on the shift of the output distribution. We also investigate the potential of pretrained models and compare the robustness of transformer models with different visual embeddings. Finally, we provide an analysis interpreting model representations and their impact on forgetting. Our results highlight the importance of stabilizing visual representations in deeper layers.
- North America > United States (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)