genera
ReefNet: A Large scale, Taxonomically Enriched Dataset and Benchmark for Hard Coral Classification
Battach, Yahia, Felemban, Abdulwahab, Khan, Faizan Farooq, Radwan, Yousef A., Li, Xiang, Marchese, Fabio, Beery, Sara, Jones, Burton H., Benzoni, Francesca, Elhoseiny, Mohamed
Coral reefs are rapidly declining due to anthropogenic pressures such as climate change, underscoring the urgent need for scalable, automated monitoring. We introduce ReefNet, a large public coral reef image dataset with point-label annotations mapped to the World Register of Marine Species (WoRMS). ReefNet aggregates imagery from 76 curated CoralNet sources and an additional site from Al Wajh in the Red Sea, totaling approximately 925000 genus-level hard coral annotations with expert-verified labels. Unlike prior datasets, which are often limited by size, geography, or coarse labels and are not ML-ready, ReefNet offers fine-grained, taxonomically mapped labels at a global scale to WoRMS. We propose two evaluation settings: (i) a within-source benchmark that partitions each source's images for localized evaluation, and (ii) a cross-source benchmark that withholds entire sources to test domain generalization. We analyze both supervised and zero-shot classification performance on ReefNet and find that while supervised within-source performance is promising, supervised performance drops sharply across domains, and performance is low across the board for zero-shot models, especially for rare and visually similar genera. This provides a challenging benchmark intended to catalyze advances in domain generalization and fine-grained coral classification. We will release our dataset, benchmarking code, and pretrained models to advance robust, domain-adaptive, global coral reef monitoring and conservation.
- Indian Ocean > Red Sea (0.25)
- Asia > Middle East > Yemen (0.25)
- Africa > Sudan (0.25)
- (35 more...)
Are the LLMs Capable of Maintaining at Least the Language Genus?
Mitrović, Sandra, Kletz, David, Dolamic, Ljiljana, Rinaldi, Fabio
Large Language Models (LLMs) display notable variation in multilingual behavior, yet the role of genealogical language structure in shaping this variation remains underexplored. In this paper, we investigate whether LLMs exhibit sensitivity to linguistic genera by extending prior analyses on the MultiQ dataset. We first check if models prefer to switch to genealogically related languages when prompt language fidelity is not maintained. Next, we investigate whether knowledge consistency is better preserved within than across genera. We show that genus-level effects are present but strongly conditioned by training resource availability. We further observe distinct multilingual strategies across LLMs families. Our findings suggest that LLMs encode aspects of genus-level structure, but training data imbalances remain the primary factor shaping their multilingual performance.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (6 more...)
Unsupervised Urban Tree Biodiversity Mapping from Street-Level Imagery Using Spatially-Aware Visual Clustering
Abuhani, Diaa Addeen, Seccaroni, Marco, Mazzarello, Martina, Zualkernan, Imran, Duarte, Fabio, Ratti, Carlo
Urban tree biodiversity is critical for climate resilience, ecological stability, and livability in cities, yet most municipalities lack detailed knowledge of their canopies. Field-based inventories provide reliable estimates of Shannon and Simpson diversity but are costly and time-consuming, while supervised AI methods require labeled data that often fail to generalize across regions. We introduce an unsupervised clustering framework that integrates visual embeddings from street-level imagery with spatial planting patterns to estimate biodiversity without labels. Applied to eight North American cities, the method recovers genus-level diversity patterns with high fidelity, achieving low Wasserstein distances to ground truth for Shannon and Simpson indices and preserving spatial autocorrelation. This scalable, fine-grained approach enables biodiversity mapping in cities lacking detailed inventories and offers a pathway for continuous, low-cost monitoring to support equitable access to greenery and adaptive management of urban ecosystems.
- North America > United States > California > San Francisco County > San Francisco (0.15)
- North America > United States > California > Los Angeles County > Los Angeles (0.15)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- (16 more...)
From Transformers to Large Language Models: A systematic review of AI applications in the energy sector towards Agentic Digital Twins
Antonesi, Gabriel, Cioara, Tudor, Anghel, Ionut, Michalakopoulos, Vasilis, Sarmas, Elissaios, Toderean, Liana
Artificial intelligence (AI) has long promised to improve energy management in smart grids by enhancing situational awareness and supporting more effective decision-making. While traditional machine learning has demonstrated notable results in forecasting and optimization, it often struggles with generalization, situational awareness, and heterogeneous data integration. Recent advances in foundation models such as Transformer architecture and Large Language Models (LLMs) have demonstrated improved capabilities in modelling complex temporal and contextual relationships, as well as in multi-modal data fusion which is essential for most AI applications in the energy sector. In this review we synthesize the rapid expanding field of AI applications in the energy domain focusing on Transformers and LLMs. We examine the architectural foundations, domain-specific adaptations and practical implementations of transformer models across various forecasting and grid management tasks. We then explore the emerging role of LLMs in the field: adaptation and fine tuning for the energy sector, the type of tasks they are suited for, and the new challenges they introduce. Along the way, we highlight practical implementations, innovations, and areas where the research frontier is rapidly expanding. These recent developments reviewed underscore a broader trend: Generative AI (GenAI) is beginning to augment decision-making not only in high-level planning but also in day-to-day operations, from forecasting and grid balancing to workforce training and asset onboarding. Building on these developments, we introduce the concept of the Agentic Digital Twin, a next-generation model that integrates LLMs to bring autonomy, proactivity, and social interaction into digital twin-based energy management systems.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > Romania > Nord-Vest Development Region > Cluj County > Cluj-Napoca (0.04)
- Europe > Belgium (0.04)
- (16 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
EuLearn: A 3D database for learning Euler characteristics
Fritz, Rodrigo, Suárez-Serrato, Pablo, Mijangos, Victor, Martinez-Hernandez, Anayanzi D., Richards, Eduardo Ivan Velazquez
We present EuLearn, the first surface datasets equitably representing a diversity of topological types. We designed our embedded surfaces of uniformly varying genera relying on random knots, thus allowing our surfaces to knot with themselves. EuLearn contributes new topological datasets of meshes, point clouds, and scalar fields in 3D. We aim to facilitate the training of machine learning systems that can discern topological features. We experimented with specific emblematic 3D neural network architectures, finding that their vanilla implementations perform poorly on genus classification. To enhance performance, we developed a novel, non-Euclidean, statistical sampling method adapted to graph and manifold data. We also introduce adjacency-informed adaptations of PointNet and Transformer architectures that rely on our non-Euclidean sampling strategy. Our results demonstrate that incorporating topological information into deep learning workflows significantly improves performance on these otherwise challenging EuLearn datasets.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Mexico (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Government (0.67)
- Law (0.67)
- Information Technology (0.46)
Multivariate Analysis of Gut Microbiota Composition and Prevalence of Gastric Cancer
Shankarnarayanan, Aadhith, Gangopadhyay, Dheeman, Alzaatreh, Ayman
The global surge in the cases of gastric cancer has prompted an investigation into the potential of gut microbiota as a predictive marker for the disease. The alterations in gut diversity are suspected to be associated with an elevated risk of gastric cancer. This paper delves into finding the correlation between gut microbiota and gastric cancer, focusing on patients who have undergone total and subtotal gastrectomy. Utilizing data mining and statistical learning methods, an analysis was conducted on 16S-RNA sequenced genes obtained from 96 participants with the aim of identifying specific genera of gut microbiota associated with gastric cancer. The study reveals several prominent bacterial genera that could potentially serve as biomarkers assessing the risk of gastric cancer. These findings offer a pathway for early risk assessment and precautionary measures in the diagnosis of gastric cancer. The intricate mechanisms through which these gut microbiotas influence gastric cancer progression warrant further investigation. This research significantly aims to contribute to the growing understanding of the gut-cancer axis and its implications in disease prediction and prevention.
- Asia > Middle East > UAE > Sharjah Emirate > Sharjah (0.06)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Health & Medicine > Therapeutic Area > Oncology > Gastric Cancer (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
PhaGO: Protein function annotation for bacteriophages by integrating the genomic context
Guan, Jiaojiao, Ji, Yongxin, Peng, Cheng, Zou, Wei, Tang, Xubo, Shang, Jiayu, Sun, Yanni
Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation. Accurate function annotation of phage proteins presents several challenges, including their inherent diversity and the scarcity of annotated ones. Existing tools have yet to fully leverage the unique properties of phages in annotating protein functions. In this work, we propose a new protein function annotation tool for phages by leveraging the modular genomic structure of phage genomes. By employing embeddings from the latest protein foundation models and Transformer to capture contextual information between proteins in phage genomes, PhaGO surpasses state-of-the-art methods in annotating diverged proteins and proteins with uncommon functions by 6.78% and 13.05% improvement, respectively. PhaGO can annotate proteins lacking homology search results, which is critical for characterizing the rapidly accumulating phage genomes. We demonstrate the utility of PhaGO by identifying 688 potential holins in phages, which exhibit high structural conservation with known holins. The results show the potential of PhaGO to extend our understanding of newly discovered phages.
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.48)
DeepLINK-T: deep learning inference for time series data using knockoffs and LSTM
Zuo, Wenxuan, Zhu, Zifan, Du, Yuxuan, Yeh, Yi-Chun, Fuhrman, Jed A., Lv, Jinchi, Fan, Yingying, Sun, Fengzhu
High-dimensional longitudinal time series data is prevalent across various real-world applications. Many such applications can be modeled as regression problems with high-dimensional time series covariates. Deep learning has been a popular and powerful tool for fitting these regression models. Yet, the development of interpretable and reproducible deep-learning models is challenging and remains underexplored. This study introduces a novel method, Deep Learning Inference using Knockoffs for Time series data (DeepLINK-T), focusing on the selection of significant time series variables in regression while controlling the false discovery rate (FDR) at a predetermined level. DeepLINK-T combines deep learning with knockoff inference to control FDR in feature selection for time series models, accommodating a wide variety of feature distributions. It addresses dependencies across time and features by leveraging a time-varying latent factor structure in time series covariates. Three key ingredients for DeepLINK-T are 1) a Long Short-Term Memory (LSTM) autoencoder for generating time series knockoff variables, 2) an LSTM prediction network using both original and knockoff variables, and 3) the application of the knockoffs framework for variable selection with FDR control. Extensive simulation studies have been conducted to evaluate DeepLINK-T's performance, showing its capability to control FDR effectively while demonstrating superior feature selection power for high-dimensional longitudinal time series data compared to its non-time series counterpart. DeepLINK-T is further applied to three metagenomic data sets, validating its practical utility and effectiveness, and underscoring its potential in real-world applications.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > North Sea (0.14)
- Southern Ocean (0.04)
- (4 more...)
Search Still Matters: Information Retrieval in the Era of Generative AI
Objective: Information retrieval (IR, also known as search) systems are ubiquitous in modern times. How does the emergence of generative artificial intelligence (AI), based on large language models (LLMs), fit into the IR process? Process: This perspective explores the use of generative AI in the context of the motivations, considerations, and outcomes of the IR process with a focus on the academic use of such systems. Conclusions: There are many information needs, from simple to complex, that motivate use of IR. Users of such systems, particularly academics, have concerns for authoritativeness, timeliness, and contextualization of search. While LLMs may provide functionality that aids the IR process, the continued need for search systems, and research into their improvement, remains essential.
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.89)
Automating Wood Species Detection and Classification in Microscopic Images of Fibrous Materials with Deep Learning
Nieradzik, Lars, Sieburg-Rockel, Jördis, Helmling, Stephanie, Keuper, Janis, Weibel, Thomas, Olbrich, Andrea, Stephani, Henrike
We have developed a methodology for the systematic generation of a large image dataset of macerated wood references, which we used to generate image data for nine hardwood genera. This is the basis for a substantial approach to automate, for the first time, the identification of hardwood species in microscopic images of fibrous materials by deep learning. Our methodology includes a flexible pipeline for easy annotation of vessel elements. We compare the performance of different neural network architectures and hyperparameters. Our proposed method performs similarly well to human experts. In the future, this will improve controls on global wood fiber product flows to protect forests.
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Europe > Portugal > Braga > Braga (0.04)