model pool
- Asia > China > Zhejiang Province > Hangzhou (0.06)
- Oceania > Australia > Queensland > Brisbane (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- North America > United States > Utah (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.68)
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
Lu, Yifan, Liu, Rixin, Yuan, Jiayi, Cui, Xingqi, Zhang, Shenrun, Liu, Hongyi, Xing, Jiarong
Today's LLM ecosystem comprises a wide spectrum of models that differ in size, capability, and cost. No single model is optimal for all scenarios; hence, LLM routers have become essential for selecting the most appropriate model under varying circumstances. However, the rapid emergence of various routers makes choosing the right one increasingly challenging. To address this problem, we need a comprehensive router comparison and a standardized leaderboard, similar to those available for models. In this work, we introduce RouterArena, the first open platform enabling comprehensive comparison of LLM routers. RouterArena has (1) a principally constructed dataset with broad knowledge domain coverage, (2) distinguishable difficulty levels for each domain, (3) an extensive list of evaluation metrics, and (4) an automated framework for leaderboard updates. Leveraging our framework, we have produced the initial leaderboard with detailed metrics comparison as shown in Figure 1. Our framework for evaluating new routers is on https://github.com/RouteWorks/RouterArena. Our leaderboard is on https://routeworks.github.io/.
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > France (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Asia > China > Zhejiang Province > Hangzhou (0.06)
- Oceania > Australia > Queensland > Brisbane (0.04)
Tracking Functional Changes in Nonstationary Signals
Two strategies of evolve-at-changes and history-model-archive are designed to further improve efficiency and stability. Experiments with simulations and neural signals demonstrate that EvoEnsemble can track the changes in functions effectively thus improving the accuracy and robustness of neural decoding. The improvement is most significant in neural signals with functional changes.
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- North America > United States > Utah (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.68)
CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models
Pham, Tung-Thuy, Luong, Duy-Quan, Duong, Minh-Quan, Nguyen, Trung-Hieu, Nguyen, Thu-Trang, Nguyen, Son, Vo, Hieu Dinh
Composable AI offers a scalable and effective paradigm for tackling complex AI tasks by decomposing them into sub-tasks and solving each sub-task using ready-to-use well-trained models. However, systematically evaluating methods under this setting remains largely unexplored. In this paper, we introduce CABENCH, the first public benchmark comprising 70 realistic composable AI tasks, along with a curated pool of 700 models across multiple modalities and domains. We also propose an evaluation framework to enable end-to-end assessment of composable AI solutions. To establish initial baselines, we provide human-designed reference solutions and compare their performance with two LLM-based approaches. Our results illustrate the promise of composable AI in addressing complex real-world problems while highlighting the need for methods that can fully unlock its potential by automatically generating effective execution pipelines.
- Overview (0.68)
- Research Report > New Finding (0.48)
- Health & Medicine (0.68)
- Information Technology (0.68)
Multi-Agent Reinforcement Learning with Focal Diversity Optimization
Tekin, Selim Furkan, Ilhan, Fatih, Huang, Tiansheng, Hu, Sihao, Yahn, Zachary, Liu, Ling
The advancement of Large Language Models (LLMs) and their finetuning strategies has triggered the renewed interests in multi-agent reinforcement learning. In this paper, we introduce a focal diversity-optimized multi-agent reinforcement learning approach, coined as MARL-Focal, with three unique characteristics. First, we develop an agent-fusion framework for encouraging multiple LLM based agents to collaborate in producing the final inference output for each LLM query. Second, we develop a focal-diversity optimized agent selection algorithm that can choose a small subset of the available agents based on how well they can complement one another to generate the query output. Finally, we design a conflict-resolution method to detect output inconsistency among multiple agents and produce our MARL-Focal output through reward-aware and policy-adaptive inference fusion. Extensive evaluations on five benchmarks show that MARL-Focal is cost-efficient and adversarial-robust. Our multi-agent fusion model achieves performance improvement of 5.51\% compared to the best individual LLM-agent and offers stronger robustness over the TruthfulQA benchmark. Code is available at https://github.com/sftekin/rl-focal
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Improved Robustness for Deep Learning-based Segmentation of Multi-Center Myocardial Perfusion MRI Datasets Using Data Adaptive Uncertainty-guided Space-time Analysis
Yalcinkaya, Dilek M., Youssef, Khalid, Heydari, Bobak, Wei, Janet, Merz, Noel Bairey, Judd, Robert, Dharmakumar, Rohan, Simonetti, Orlando P., Weinsaft, Jonathan W., Raman, Subha V., Sharif, Behzad
Background. Fully automatic analysis of myocardial perfusion MRI datasets enables rapid and objective reporting of stress/rest studies in patients with suspected ischemic heart disease. Developing deep learning techniques that can analyze multi-center datasets despite limited training data and variations in software and hardware is an ongoing challenge. Methods. Datasets from 3 medical centers acquired at 3T (n = 150 subjects) were included: an internal dataset (inD; n = 95) and two external datasets (exDs; n = 55) used for evaluating the robustness of the trained deep neural network (DNN) models against differences in pulse sequence (exD-1) and scanner vendor (exD-2). A subset of inD (n = 85) was used for training/validation of a pool of DNNs for segmentation, all using the same spatiotemporal U-Net architecture and hyperparameters but with different parameter initializations. We employed a space-time sliding-patch analysis approach that automatically yields a pixel-wise "uncertainty map" as a byproduct of the segmentation process. In our approach, a given test case is segmented by all members of the DNN pool and the resulting uncertainty maps are leveraged to automatically select the "best" one among the pool of solutions. Results. The proposed DAUGS analysis approach performed similarly to the established approach on the internal dataset (p = n.s.) whereas it significantly outperformed on the external datasets (p < 0.005 for exD-1 and exD-2). Moreover, the number of image series with "failed" segmentation was significantly lower for the proposed vs. the established approach (4.3% vs. 17.1%, p < 0.0005). Conclusions. The proposed DAUGS analysis approach has the potential to improve the robustness of deep learning methods for segmentation of multi-center stress perfusion datasets with variations in the choice of pulse sequence, site location or scanner vendor.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting
Sima, Qi, Zhang, Xinze, Bao, Yukun, Yang, Siyue, Shen, Liang
Abstract--Recurrent neural network-based sequence-tosequence models have been extensively applied for multi-stepahead time series forecasting. These models typically involve a decoder trained using either its previous forecasts or the actual observed values as the decoder inputs. However, relying on self-generated predictions can lead to the rapid accumulation of errors over multiple steps, while using the actual observations introduces exposure bias as these values are unavailable during the extrapolation stage. In this regard, this study proposes a novel training approach called reinforced decoder, which introduces auxiliary models to generate alternative decoder inputs that remain accessible when extrapolating. Additionally, a reinforcement learning algorithm is utilized to dynamically select the optimal inputs to improve accuracy. ULTI-STEP-AHEAD time series prediction, which involves extrapolating a sequence of future values based extrapolating process, i.e., feeding back the one-step-ahead on historical observations, plays a vital role in various realworld prediction to the decoder to predict the value at the next step. Accordingly, research efforts have been devoted to developing statistical some non-autoregressive architectures were proposed and machine learning techniques for multi-step-ahead time to obviate the error propagation issue [10], [16], [17].
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
Improve Cross-Architecture Generalization on Dataset Distillation
Zhou, Binglin, Zhong, Linhao, Chen, Wentao
Dataset distillation, a pragmatic approach in machine learning, aims to create a smaller synthetic dataset from a larger existing dataset. However, existing distillation methods primarily adopt a model-based paradigm, where the synthetic dataset inherits model-specific biases, limiting its generalizability to alternative models. In response to this constraint, we propose a novel methodology termed "model pool". This approach involves selecting models from a diverse model pool based on a specific probability distribution during the data distillation process. Additionally, we integrate our model pool with the established knowledge distillation approach and apply knowledge distillation to the test process of the distilled dataset. Our experimental results validate the effectiveness of the model pool approach across a range of existing models while testing, demonstrating superior performance compared to existing methodologies.