imdb
Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in LLM Fine-tuning
Krishna, Shambhavi, Naik, Atharva, Agarwal, Chaitali, Govindan, Sudharshan, Lee, Taesung, Chang, Haw-Shiuan
Large language models are increasingly deployed across diverse applications. This often includes tasks LLMs have not encountered during training. This implies that enumerating and obtaining the high-quality training data for all tasks is infeasible. Thus, we often need to rely on transfer learning using datasets with different characteristics, and anticipate out-of-distribution requests. Motivated by this practical need, we propose an analysis framework, building a transfer learning matrix and dimensionality reduction, to dissect these cross-task interactions. We train and analyze 10 models to identify latent abilities (e.g., Reasoning, Sentiment Classification, NLU, Arithmetic) and discover the side effects of the transfer learning. Our findings reveal that performance improvements often defy explanations based on surface-level dataset similarity or source data quality. Instead, hidden statistical factors of the source dataset, such as class distribution and generation length proclivities, alongside specific linguistic features, are actually more influential. This work offers insights into the complex dynamics of transfer learning, paving the way for more predictable and effective LLM adaptation.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Detecting and Rectifying Noisy Labels: A Similarity-based Approach
Huu-Tien, Dang, Nguyen, Minh-Phuong, Inoue, Naoya
Label noise in datasets could significantly damage the performance and robustness of deep neural networks (DNNs) trained on these datasets. As the size of modern DNNs grows, there is a growing demand for automated tools for detecting such errors. In this paper, we propose post-hoc, model-agnostic noise detection and rectification methods utilizing the penultimate feature from a DNN. Our idea is based on the observation that the similarity between the penultimate feature of a mislabeled data point and its true class data points is higher than that for data points from other classes, making the probability of label occurrence within a tight, similar cluster informative for detecting and rectifying errors. Through theoretical and empirical analyses, we demonstrate that our approach achieves high detection performance across diverse, realistic noise scenarios and can automatically rectify these errors to improve dataset quality.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
CLAQS: Compact Learnable All-Quantum Token Mixer with Shared-ansatz for Text Classification
Chen, Junhao, Zhou, Yifan, Jiang, Hanqi, Pan, Yi, Li, Yiwei, Zhao, Huaqin, Zhang, Wei, Wang, Yingfeng, Liu, Tianming
Quantum compute is scaling fast, from cloud QPUs to high throughput GPU simulators, making it timely to prototype quantum NLP beyond toy tasks. However, devices remain qubit limited and depth limited, training can be unstable, and classical attention is compute and memory heavy. This motivates compact, phase aware quantum token mixers that stabilize amplitudes and scale to long sequences. We present CLAQS, a compact, fully quantum token mixer for text classification that jointly learns complex-valued mixing and nonlinear transformations within a unified quantum circuit. To enable stable end-to-end optimization, we apply l1 normalization to regulate amplitude scaling and introduce a two-stage parameterized quantum architecture that decouples shared token embeddings from a window-level quantum feed-forward module. Operating under a sliding-window regime with document-level aggregation, CLAQS requires only eight data qubits and shallow circuits, yet achieves 91.64% accuracy on SST-2 and 87.08% on IMDB, outperforming both classical Transformer baselines and strong hybrid quantum-classical counterparts.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Tennessee (0.04)
- Asia > China (0.04)
Foundation Artificial Intelligence Models for Health Recognition Using Face Photographs (FAHR-Face)
Haugg, Fridolin, Lee, Grace, He, John, Nürnberg, Leonard, Bontempi, Dennis, Bitterman, Danielle S., Catalano, Paul, Prudente, Vasco, Glubokov, Dmitrii, Warrington, Andrew, Pai, Suraj, De Ruysscher, Dirk, Guthier, Christian, Kann, Benjamin H., Gladyshev, Vadim N., Aerts, Hugo JWL, Mak, Raymond H.
Background: Facial appearance offers a noninvasive window into health. We built FAHR-Face, a foundation model trained on >40 million facial images and fine-tuned it for two distinct tasks: biological age estimation (FAHR-FaceAge) and survival risk prediction (FAHR-FaceSurvival). Methods: FAHR-FaceAge underwent a two-stage, age-balanced fine-tuning on 749,935 public images; FAHR-FaceSurvival was fine-tuned on 34,389 photos of cancer patients. Model robustness (cosmetic surgery, makeup, pose, lighting) and independence (saliency mapping) was tested extensively. Both models were clinically tested in two independent cancer patient datasets with survival analyzed by multivariable Cox models and adjusted for clinical prognostic factors. Findings: For age estimation, FAHR-FaceAge had the lowest mean absolute error of 5.1 years on public datasets, outperforming benchmark models and maintaining accuracy across the full human lifespan. In cancer patients, FAHR-FaceAge outperformed a prior facial age estimation model in survival prognostication. FAHR-FaceSurvival demonstrated robust prediction of mortality, and the highest-risk quartile had more than triple the mortality of the lowest (adjusted hazard ratio 3.22; P<0.001). These findings were validated in the independent cohort and both models showed generalizability across age, sex, race and cancer subgroups. The two algorithms provided distinct, complementary prognostic information; saliency mapping revealed each model relied on distinct facial regions. The combination of FAHR-FaceAge and FAHR-FaceSurvival improved prognostic accuracy. Interpretation: A single foundation model can generate inexpensive, scalable facial biomarkers that capture both biological ageing and disease-related mortality risk. The foundation model enabled effective training using relatively small clinical datasets.
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.40)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > New York > Albany County > Albany (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles
Liang, Xinzhu, Lukens, Joseph M., Lohani, Sanjaya, Kirby, Brian T., Searles, Thomas A., Qiu, Xin, Law, Kody J. H.
This work introduces a new method called scalable Bayesian Monte Carlo (SBMC). The model interpolates between a point estimator and the posterior, and the algorithm is a parallel implementation of a consistent (asymptotically unbiased) Bayesian deep learning algorithm: sequential Monte Carlo (SMC) or Markov chain Monte Carlo (MCMC). The method is motivated theoretically, and its utility is demonstrated on practical examples: MNIST, CIFAR, IMDb. A systematic numerical study reveals that parallel implementations of SMC and MCMC are comparable to serial implementations in terms of performance and total cost, and they achieve accuracy at or beyond the state-of-the-art (SOTA) methods like deep ensembles at convergence, along with substantially improved uncertainty quantification (UQ)--in particular, epistemic UQ. But even parallel implementations are expensive, with an irreducible time barrier much larger than the cost of the MAP estimator. Compressing time further leads to rapid degradation of accuracy, whereas UQ remains valuable. By anchoring to a point estimator we can recover accuracy, while retaining valuable UQ, ultimately delivering strong performance across metrics for a cost comparable to the SOTA.
- Research Report > New Finding (0.45)
- Research Report > Experimental Study (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Exploring Scaling Trends in LLM Robustness
Howe, Nikolaus, Zajac, Michał, McKenzie, Ian, Hollinsworth, Oskar, Tseng, Tom, Bacon, Pierre-Luc, Gleave, Adam
Language model capabilities predictably improve from scaling a model's size and training data. Motivated by this, increasingly large language models have been trained, yielding an array of impressive capabilities. Yet these models are vulnerable to adversarial prompts, such as "jailbreaks" that hijack models to perform undesired behaviors, posing a significant risk of misuse. Prior work indicates that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically, finding that larger models respond substantially better to adversarial training, but there is little to no benefit from model scale in the absence of explicit defenses.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Oregon (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- Media > Film (0.93)
- Leisure & Entertainment (0.93)
RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis
Rahman, Md. Mostafizer, Shiplu, Ariful Islam, Watanobe, Yutaka, Alam, Md. Ashad
Effectively analyzing the comments to uncover latent intentions holds immense value in making strategic decisions across various domains. However, several challenges hinder the process of sentiment analysis including the lexical diversity exhibited in comments, the presence of long dependencies within the text, encountering unknown symbols and words, and dealing with imbalanced datasets. Moreover, existing sentiment analysis tasks mostly leveraged sequential models to encode the long dependent texts and it requires longer execution time as it processes the text sequentially. In contrast, the Transformer requires less execution time due to its parallel processing nature. In this work, we introduce a novel hybrid deep learning model, RoBERTa-BiLSTM, which combines the Robustly Optimized BERT Pretraining Approach (RoBERTa) with Bidirectional Long Short-Term Memory (BiLSTM) networks. RoBERTa is utilized to generate meaningful word embedding vectors, while BiLSTM effectively captures the contextual semantics of long-dependent texts. The RoBERTa-BiLSTM hybrid model leverages the strengths of both sequential and Transformer models to enhance performance in sentiment analysis. We conducted experiments using datasets from IMDb, Twitter US Airline, and Sentiment140 to evaluate the proposed model against existing state-of-the-art methods. Our experimental findings demonstrate that the RoBERTa-BiLSTM model surpasses baseline models (e.g., BERT, RoBERTa-base, RoBERTa-GRU, and RoBERTa-LSTM), achieving accuracies of 80.74%, 92.36%, and 82.25% on the Twitter US Airline, IMDb, and Sentiment140 datasets, respectively. Additionally, the model achieves F1-scores of 80.73%, 92.35%, and 82.25% on the same datasets, respectively.
- Asia > Japan (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Indonesia (0.04)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Consumer Products & Services > Travel (1.00)
- Information Technology > Services (0.93)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Sci-fi series becomes IMDB's highest-rated after 'disappointing' first season FLOPPED in 2022 - and it even beat Netflix's Stranger Things and Black Mirror
A sci-fi series has taken the number one spot on IMDB following the release of its second season - despite the show's'disappointing' debut in 2022. The first season of the video game adaptation was deemed a'one-hit' wonder' by viewers who felt the story was written by a'high schooler' and the graphics were'low budget CGI.' But Halo season two, released this month, now sits at number one in IDMB's list of top sci-fi TV series. The Paramount series has 7.2 stars and more than 81,000 votes - overtaking popular shows like Netflix's Stranger Things and Black Mirror. Halo also has an 89 percent on Rotten Tomatoes - a jump from season one's 61 percent rating.
- Media > Television (1.00)
- Leisure & Entertainment > Games > Computer Games (0.57)