general model
Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning
He, Yang, Ding, Xiao, Cai, Bibo, Zhang, Yufei, Xiong, Kai, Sun, Zhouhao, Qin, Bing, Liu, Ting
While reasoning-augmented large language models (RLLMs) significantly enhance complex task performance through extended reasoning chains, they inevitably introduce substantial unnecessary token consumption, particularly for simpler problems where Short Chain-of-Thought (Short CoT) suffices. This overthinking phenomenon leads to inefficient resource usage without proportional accuracy gains. To address this issue, we propose Self-Route, a dynamic reasoning framework that automatically selects between general and reasoning modes based on model capability estimation. Our approach introduces a lightweight pre-inference stage to extract capability-aware embeddings from hidden layer representations, enabling real-time evaluation of the model's ability to solve problems. We further construct Gradient-10K, a model difficulty estimation-based dataset with dense complexity sampling, to train the router for precise capability boundary detection. Extensive experiments demonstrate that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55\% across diverse benchmarks. The proposed framework demonstrates consistent effectiveness across models with different parameter scales and reasoning paradigms, highlighting its general applicability and practical value.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports
Moukheiber, Lama, Moukheiber, Mira, Moukheiiber, Dana, Ju, Jae-Woo, Lee, Hyung-Chul
We introduce a novel question-answering (QA) dataset using echocardiogram reports sourced from the Medical Information Mart for Intensive Care database. This dataset is specifically designed to enhance QA systems in cardiology, consisting of 771,244 QA pairs addressing a wide array of cardiac abnormalities and their severity. We compare large language models (LLMs), including open-source and biomedical-specific models for zero-shot evaluation, and closed-source models for zero-shot and three-shot evaluation. Our results show that fine-tuning LLMs improves performance across various QA metrics, validating the value of our dataset. Clinicians also qualitatively evaluate the best-performing model to assess the LLM responses for correctness. Further, we conduct fine-grained fairness audits to assess the bias-performance trade-off of LLMs across various social determinants of health. Our objective is to propel the field forward by establishing a benchmark for LLM AI agents aimed at supporting clinicians with cardiac differential diagnoses, thereby reducing the documentation burden that contributes to clinician burnout and enabling healthcare professionals to focus more on patient care.
- North America > United States > Massachusetts (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Europe (0.04)
- (3 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
The interplay between domain specialization and model size: a case study in the legal domain
Junior, Roseval Malaquias, Pires, Ramon, Almeida, Thales Sales, Sakiyama, Kenzo, Romero, Roseli, Nogueira, Rodrigo
Scaling laws for language models so far focused on finding the compute-optimal model size and token count for training from scratch. However, achieving this optimal balance requires significant compute resources due to the extensive data demands when training models from randomly-initialized weights. Continual pre-training offers a cost-effective alternative, leveraging the compute investment from pre-trained models to incorporate new knowledge without requiring extensive new data. Recent findings suggest that data quality influences constants in scaling laws, thereby altering the optimal parameter-token allocation ratio. Building on this insight, we investigate the interplay between domain specialization and model size during continual pre-training under compute-constrained scenarios. Our goal is to identify a compute-efficient training regime for this scenario and, potentially, detect patterns in this interplay that can be generalized across different model sizes and domains. To compare general and specialized training, we filtered a web-based dataset to extract legal domain data. We pre-trained models with 1.5B, 3B, 7B and 14B parameters on both the unfiltered and filtered datasets, then evaluated their performance on legal exams. Results show that as model size increases, the compute-effectiveness gap between specialized and general models widens.
- Asia > Middle East > Jordan (0.04)
- South America > Brazil > São Paulo (0.04)
- South America > Brazil > Rio Grande do Sul (0.04)
- (3 more...)
Pre-Ictal Seizure Prediction Using Personalized Deep Learning
Jaddu, Shriya, Jaddu, Sidh, Gutierrez, Camilo, Tran, Quincy K.
Introduction: Approximately 23 million or 30% of epilepsy patients worldwide suffer from drug-resistant epilepsy (DRE). The unpredictability of seizure occurrences, which causes safety issues as well as social concerns, restrict the lifestyles of DRE patients. Surgical solutions and EEG-based solutions are very expensive, unreliable, invasive or impractical. The goal of this research was to employ improved technologies and methods to epilepsy patient physiological data and predict seizures up to two hours before onset, enabling non-invasive, affordable seizure prediction for DRE patients. Methods: This research used a 1D Convolutional Neural Network-Based Bidirectional Long Short-Term Memory network that was trained on a diverse set of epileptic patient physiological data to predict seizures. Transfer learning was further utilized to personalize and optimize predictions for specific patients. Clinical data was retrospectively obtained for nine epilepsy patients via wearable devices over a period of about three to five days from a prospectively maintained database. The physiological data included 54 seizure occurrences and included heart rate, blood volume pulse, accelerometry, body temperature, and electrodermal activity. Results and Conclusion: A general deep-learning model trained on the physiological data with randomly sampled test data achieved an accuracy of 91.94%. However, such a generalized deep learning model had varied performances on data from unseen patients. When the general model was personalized (further trained) with patient-specific data, the personalized model achieved significantly improved performance with accuracies as high as 97%. This preliminary research shows that patient-specific personalization may be a viable approach to achieve affordable, non-invasive seizure prediction that can improve the quality of life for DRE patients.
- North America > United States > Maryland > Baltimore (0.05)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Health & Medicine > Therapeutic Area > Neurology > Epilepsy (1.00)
- Health & Medicine > Therapeutic Area > Genetic Disease (1.00)
A General Model for Detecting Learner Engagement: Implementation and Evaluation
Malekshahi, Somayeh, Kheyridoost, Javad M., Fatemi, Omid
Considering learner engagement has a mutual benefit for both learners and instructors. Instructors can help learners increase their attention, involvement, motivation, and interest. On the other hand, instructors can improve their instructional performance by evaluating the cumulative results of all learners and upgrading their training programs. This paper proposes a general, lightweight model for selecting and processing features to detect learners' engagement levels while preserving the sequential temporal relationship over time. During training and testing, we analyzed the videos from the publicly available DAiSEE dataset to capture the dynamic essence of learner engagement. We have also proposed an adaptation policy to find new labels that utilize the affective states of this dataset related to education, thereby improving the models' judgment. The suggested model achieves an accuracy of 68.57\% in a specific implementation and outperforms the studied state-of-the-art models detecting learners' engagement levels.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (8 more...)
- Research Report > New Finding (0.46)
- Instructional Material > Online (0.46)
- Instructional Material > Course Syllabus & Notes (0.46)
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.69)
Powering In-Database Dynamic Model Slicing for Structured Data Analytics
Zeng, Lingze, Xing, Naili, Cai, Shaofeng, Chen, Gang, Ooi, Beng Chin, Pei, Jian, Wu, Yuncheng
Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning system. The process can be prohibitively expensive, especially when there are a combinatorial number of subdatasets extracted for different analytical purposes. This calls for efficient in-database support of advanced analytical methods In this paper, we introduce LEADS, a novel SQL-aware dynamic model slicing technique to customize models for subdatasets specified by SQL queries. LEADS improves the predictive modeling of structured data via the mixture of experts (MoE) technique and maintains inference efficiency by a SQL-aware gating network. At the core of LEADS is the construction of a general model with multiple expert sub-models via MoE trained over the entire database. This SQL-aware MoE technique scales up the modeling capacity, enhances effectiveness, and preserves efficiency by activating only necessary experts via the gating network during inference. Additionally, we introduce two regularization terms during the training process of LEADS to strike a balance between effectiveness and efficiency. We also design and build an in-database inference system, called INDICES, to support end-to-end advanced structured data analytics by non-intrusively incorporating LEADS onto PostgreSQL. Our extensive experiments on real-world datasets demonstrate that LEADS consistently outperforms baseline models, and INDICES delivers effective in-database analytics with a considerable reduction in inference latency compared to traditional solutions.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > Singapore (0.04)
- Asia > China (0.04)
- Information Technology > Information Management (1.00)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance
Pecher, Branislav, Srba, Ivan, Bielikova, Maria
When solving NLP tasks with limited labelled data, researchers can either use a general large language model without further update, or use a small number of labelled examples to tune a specialised smaller model. In this work, we address the research gap of how many labelled samples are required for the specialised small models to outperform general large models, while taking the performance variance into consideration. By observing the behaviour of fine-tuning, instruction-tuning, prompting and in-context learning on 7 language models, we identify such performance break-even points across 8 representative text classification tasks of varying characteristics. We show that the specialised models often need only few samples (on average $10 - 1000$) to be on par or better than the general ones. At the same time, the number of required labels strongly depends on the dataset or task characteristics, with this number being significantly lower on multi-class datasets (up to $100$) than on binary datasets (up to $5000$). When performance variance is taken into consideration, the number of required labels increases on average by $100 - 200\%$ and even up to $1500\%$ in specific cases.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Singapore (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)
Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation
Azad, Parham Abed, Beigy, Hamid
The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. These challenges are also visible in the Persian name entity recognition (NER) settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. Single models often struggle to capture the nuances of diverse domains, while utilizing multiple large models can lead to resource constraints, rendering the training of a model for each domain virtually impractical. Therefore, this paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters. We utilize techniques such as prompt tuning and adapters, combined with the incorporation of additional layers, to add parameters that we can train for the specific domains. This enables the model to perform comparably to individual models for each domain. Experimental results on different formal and informal datasets show that by employing these added parameters, the proposed model significantly surpasses existing practical models in performance. Remarkably, the proposed model requires only one instance for training and storage, yet achieves outstanding results across all domains, even surpassing the state-of-the-art in some. Moreover, we analyze each adaptation strategy, delineating its strengths, weaknesses, and optimal hyper-parameters for the Persian NER settings. Finally, we introduce a document-based domain detection pipeline tailored for scenarios with unknown text domains, enhancing the adaptability and practicality of this paper in real-world applications.
- North America > United States (0.05)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Balancing the AI Strength of Roles in Self-Play Training with Regret Matching+
When training artificial intelligence for games encompassing multiple roles, the development of a generalized model capable of controlling any character within the game presents a viable option. This strategy not only conserves computational resources and time during the training phase but also reduces resource requirements during deployment. training such a generalized model often encounters challenges related to uneven capabilities when controlling different roles. A simple method is introduced based on Regret Matching+, which facilitates a more balanced performance of strength by the model when controlling various roles.
SER_AMPEL: a multi-source dataset for speech emotion recognition of Italian older adults
Grossi, Alessandra, Gasparini, Francesca
In this paper, SER_AMPEL, a multi-source dataset for speech emotion recognition (SER) is presented. The peculiarity of the dataset is that it is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults. The dataset is collected following different protocols, in particular considering acted conversations, extracted from movies and TV series, and recording natural conversations where the emotions are elicited by proper questions. The evidence of the need for such a dataset emerges from the analysis of the state of the art. Preliminary considerations on the critical issues of SER are reported analyzing the classification results on a subset of the proposed dataset.
- North America > United States (0.14)
- Europe > Italy > Lombardy > Milan (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)