pre-trained
MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification
Tonmoy, Moshiur Rahman, Hossain, Md. Mithun, Dey, Nilanjan, Mridha, M. F.
Plant diseases significantly threaten global food security by reducing crop yields and undermining agricultural sustainability. AI-driven automated classification has emerged as a promising solution, with deep learning models demonstrating impressive performance in plant disease identification. However, deploying these models on mobile and edge devices remains challenging due to high computational demands and resource constraints, highlighting the need for lightweight, accurate solutions for accessible smart agriculture systems. To address this, we propose MobilePlantViT, a novel hybrid Vision Transformer (ViT) architecture designed for generalized plant disease classification, which optimizes resource efficiency while maintaining high performance. Extensive experiments across diverse plant disease datasets of varying scales show our model's effectiveness and strong generalizability, achieving test accuracies ranging from 80% to over 99%. Notably, with only 0.69 million parameters, our architecture outperforms the smallest versions of MobileViTv1 and MobileViTv2, despite their higher parameter counts. These results underscore the potential of our approach for real-world, AI-powered automated plant disease classification in sustainable and resource-efficient smart agriculture systems. All codes will be available in the GitHub repository: https://github.com/moshiurtonmoy/MobilePlantViT
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Europe > Switzerland (0.04)
- Asia > India > West Bengal > Kolkata (0.04)
OCT Data is All You Need: How Vision Transformers with and without Pre-training Benefit Imaging
Han, Zihao, De Wilde, Philippe
Optical Coherence Tomography (OCT) provides high-resolution cross-sectional images useful for diagnosing various diseases, but their distinct characteristics from natural images raise questions about whether large-scale pre-training on datasets like ImageNet is always beneficial. In this paper, we investigate the impact of ImageNet-based pre-training on Vision Transformer (ViT) performance for OCT image classification across different dataset sizes. Our experiments cover four-category retinal pathologies (CNV, DME, Drusen, Normal). Results suggest that while pre-training can accelerate convergence and potentially offer better performance in smaller datasets, training from scratch may achieve comparable or even superior accuracy when sufficient OCT data is available. Our findings highlight the importance of matching domain characteristics in pre-training and call for further study on large-scale OCT-specific pre-training.
- Europe > United Kingdom (0.14)
- Asia > China (0.04)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.95)
Unveiling Topological Structures in Text: A Comprehensive Survey of Topological Data Analysis Applications in NLP
The surge of data available on the internet has led to the adoption of various computational methods to analyze and extract valuable insights from this wealth of information. Among these, the field of Machine Learning (ML) has thrived by leveraging data to extract meaningful insights. However, ML techniques face notable challenges when dealing with real-world data, often due to issues of imbalance, noise, insufficient labeling, and high dimensionality. To address these limitations, some researchers advocate for the adoption of Topological Data Analysis (TDA), a statistical approach that discerningly captures the intrinsic shape of data despite noise. Despite its potential, TDA has not gained as much traction within the Natural Language Processing (NLP) domain compared to structurally distinct areas like computer vision. Nevertheless, a dedicated community of researchers has been exploring the application of TDA in NLP, yielding 87 papers we comprehensively survey in this paper. Our findings categorize these efforts into theoretical and non-theoretical approaches. Theoretical approaches aim to explain linguistic phenomena from a topological viewpoint, while non-theoretical approaches merge TDA with ML features, utilizing diverse numerical representation techniques. We conclude by exploring the challenges and unresolved questions that persist in this niche field. Resources and a list of papers on this topic can be found at: https://github.com/AdaUchendu/AwesomeTDA4NLP.
- Oceania > Australia (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Missouri > Greene County > Springfield (0.04)
- (13 more...)
- Overview (1.00)
- Research Report > New Finding (0.34)
- Government (0.93)
- Information Technology > Security & Privacy (0.69)
- Health & Medicine > Therapeutic Area (0.46)
Designing Domain-Specific Large Language Models: The Critical Role of Fine-Tuning in Public Opinion Simulation
Large language models (LLMs) have transformed natural language processing, yet face challenges in specialized tasks such as simulating opinions on environmental policies. This paper introduces a novel fine-tuning approach that integrates socio-demographic data from the UK Household Longitudinal Study, uniquely using profiling factors, such as age, gender, income, education, and region. This method enhances the accuracy and representation of generated views. By emulating diverse synthetic profiles, the fine-tuned models significantly outperform pre-trained counterparts, achieving measurable improvements in capturing demographic nuances. Evaluation metrics, including Chi-Squared, Cosine Similarity, Jaccard Index, and KL-divergence, reveal a strong alignment between synthetic and real-world opinions. This work demonstrates the potential of fine-tuned LLMs tailored to societal contexts to enable more ethical and precise policy simulations. Its broader implications include deploying LLMs in domains like healthcare and education, fostering inclusive and data-driven decision-making in both research and practice.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > West Midlands (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Law > Environmental Law (1.00)
- Government (1.00)
- Energy (1.00)
- Education (1.00)
Learning Latent Wireless Dynamics from Channel State Information
Chaaya, Charbel Bou, Girgis, Abanoub M., Bennis, Mehdi
In this work, we propose a novel data-driven machine learning (ML) technique to model and predict the dynamics of the wireless propagation environment in latent space. Leveraging the idea of channel charting, which learns compressed representations of high-dimensional channel state information (CSI), we incorporate a predictive component to capture the dynamics of the wireless system. Hence, we jointly learn a channel encoder that maps the estimated CSI to an appropriate latent space, and a predictor that models the relationships between such representations. Accordingly, our problem boils down to training a joint-embedding predictive architecture (JEPA) that simulates the latent dynamics of a wireless network from CSI. We present numerical evaluations on measured data and show that the proposed JEPA displays a two-fold increase in accuracy over benchmarks, for longer look-ahead prediction tasks.
Pre-trained Transformer Uncovers Meaningful Patterns in Human Mobility Data
We empirically demonstrate that a transformer pre-trained on country-scale unlabeled human mobility data learns embeddings capable, through fine-tuning, of developing a deep understanding of the target geography and its corresponding mobility patterns. Utilizing an adaptation framework, we evaluate the performance of our pre-trained embeddings in encapsulating a broad spectrum of concepts directly and indirectly related to human mobility. This includes basic notions, such as geographic location and distance, and extends to more complex constructs, such as administrative divisions and land cover. Our extensive empirical analysis reveals a substantial performance boost gained from pre-training, reaching up to 38% in tasks such as tree-cover regression. We attribute this result to the ability of the pre-training to uncover meaningful patterns hidden in the raw data, beneficial for modeling relevant Figure 1: A transformer pre-trained from scratch on countryscale high-level concepts. The pre-trained embeddings emerge as robust unlabeled human mobility data is adapted to model a representations of regions and trajectories, potentially valuable for variety of high-level concepts manifesting at different levels a wide range of downstream applications.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.14)
- North America > United States > New York (0.04)
- (4 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives
Karlov, Mark, Abedi, Ali, Khan, Shehroz S.
Exercise-based rehabilitation programs have proven to be effective in enhancing the quality of life and reducing mortality and rehospitalization rates. AI-driven virtual rehabilitation, which allows patients to independently complete exercises at home, utilizes AI algorithms to analyze exercise data, providing feedback to patients and updating clinicians on their progress. These programs commonly prescribe a variety of exercise types, leading to a distinct challenge in rehabilitation exercise assessment datasets: while abundant in overall training samples, these datasets often have a limited number of samples for each individual exercise type. This disparity hampers the ability of existing approaches to train generalizable models with such a small sample size per exercise. Addressing this issue, our paper introduces a novel supervised contrastive learning framework with hard and soft negative samples that effectively utilizes the entire dataset to train a single model applicable to all exercise types. This model, with a Spatial-Temporal Graph Convolutional Network (ST-GCN) architecture, demonstrated enhanced generalizability across exercises and a decrease in overall complexity. Through extensive experiments on three publicly available rehabilitation exercise assessment datasets, the University of Idaho-Physical Rehabilitation Movement Data (UI-PRMD), IntelliRehabDS (IRDS), and KInematic assessment of MOvement and clinical scores for remote monitoring of physical REhabilitation (KIMORE), our method has shown to surpass existing methods, setting a new benchmark in rehabilitation exercise assessment accuracy.
- North America > United States > Idaho (0.24)
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > British Columbia > East Kootenay Region > Fernie (0.04)
On Robustness of Finetuned Transformer-based NLP Models
Neerudu, Pavan Kalyan Reddy, Oota, Subba Reddy, Marreddy, Mounika, Kagita, Venkateswara Rao, Gupta, Manish
Transformer-based pretrained models like BERT, GPT-2 and T5 have been finetuned for a large number of natural language processing (NLP) tasks, and have been shown to be very effective. However, while finetuning, what changes across layers in these models with respect to pretrained checkpoints is under-studied. Further, how robust are these models to perturbations in input text? Does the robustness vary depending on the NLP task for which the models have been finetuned? While there exists some work on studying the robustness of BERT finetuned for a few NLP tasks, there is no rigorous study that compares this robustness across encoder only, decoder only and encoder-decoder models. In this paper, we characterize changes between pretrained and finetuned language model representations across layers using two metrics: CKA and STIR. Further, we study the robustness of three language models (BERT, GPT-2 and T5) with eight different text perturbations on classification tasks from the General Language Understanding Evaluation (GLUE) benchmark, and generation tasks like summarization, free-form generation and question generation. GPT-2 representations are more robust than BERT and T5 across multiple types of input perturbation. Although models exhibit good robustness broadly, dropping nouns, verbs or changing characters are the most impactful. Overall, this study provides valuable insights into perturbation-specific weaknesses of popular Transformer-based models, which should be kept in mind when passing inputs. We make the code and models publicly available [https://github.com/PavanNeerudu/Robustness-of-Transformers-models].
- Europe > Kosovo (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
- (3 more...)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.46)
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data
Lee, Alycia, Miranda, Brando, Sundar, Sudharsan, Koyejo, Sanmi
Current trends to pre-train capable Large Language Models (LLMs) mostly focus on scaling of model and dataset size. However, the quality of pre-training data is an important factor for training powerful LLMs, yet it is a nebulous concept that has not been fully characterized. Therefore, we use the recently proposed Task2Vec diversity coefficient to ground and understand formal aspects of data quality, to go beyond scale alone. Specifically, we measure the diversity coefficient of publicly available pre-training datasets to demonstrate that their formal diversity is high when compared to theoretical lower and upper bounds. In addition, to build confidence in the diversity coefficient, we conduct interpretability experiments and find that the coefficient aligns with intuitive properties of diversity, e.g., it increases as the number of latent concepts increases. We conclude the diversity coefficient is reliable, show it's high for publicly available LLM datasets, and conjecture it can be used to build useful diverse datasets for LLMs.
A Memory-Augmented Neural Network Model of Abstract Rule Learning
Sinha, Ishan, Webb, Taylor W., Cohen, Jonathan D.
Human intelligence is characterized by a remarkable ability to infer abstract rules from experience and apply these rules to novel domains. As such, designing neural network algorithms with this capacity is an important step toward the development of deep learning systems with more human-like intelligence. However, doing so is a major outstanding challenge, one that some argue will require neural networks to use explicit symbol-processing mechanisms. In this work, we focus on neural networks' capacity for arbitrary role-filler binding, the ability to associate abstract "roles" to context-specific "fillers," which many have argued is an important mechanism underlying the ability to learn and apply rules abstractly. Using a simplified version of Raven's Progressive Matrices, a hallmark test of human intelligence, we introduce a sequential formulation of a visual problem-solving task that requires this form of binding. Further, we introduce the Emergent Symbol Binding Network (ESBN), a recurrent neural network model that learns to use an external memory as a binding mechanism. This mechanism enables symbol-like variable representations to emerge through the ESBN's training process without the need for explicit symbol-processing machinery. We empirically demonstrate that the ESBN successfully learns the underlying abstract rule structure of our task and perfectly generalizes this rule structure to novel fillers.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Leisure & Entertainment > Games (0.46)