taxon
Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments
Brinner, Marc Felix, Zarrieß, Sina
This study explores strategies for efficiently classifying scientific full texts using both small, BERT-based models and local large language models like Llama-3.1 8B. We focus on developing methods for selecting subsets of input sentences to reduce input size while simultaneously enhancing classification performance. To this end, we compile a novel dataset consisting of full-text scientific papers from the field of invasion biology, specifically addressing the impacts of invasive species. These papers are aligned with publicly available impact assessments created by researchers for the International Union for Conservation of Nature (IUCN). Through extensive experimentation, we demonstrate that various sources like human evidence annotations, LLM-generated annotations or explainability scores can be used to train sentence selection models that improve the performance of both encoder- and decoder-based language models while optimizing efficiency through the reduction in input length, leading to improved results even if compared to models like ModernBERT that are able to handle the complete text as input. Additionally, we find that repeated sampling of shorter inputs proves to be a very effective strategy that, at a slightly increased cost, can further improve classification performance.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (6 more...)
Variational Bayesian Supertrees
Karcher, Michael, Zhang, Cheng, Matsen, Frederick A IV
Fields such as phylogenetics often work with a sort of abstracted family tree, called a phylogenetic tree, frequently abbreviated here as tree. These trees have different members of a population as their tips, and their branching points describe the relations between the tips and how recently they had a common ancestor. If some of the tips are censored, the tree topology simplifies in a process we refer to as restriction. If one has multiple trees restricted from the same original, uncensored tree, one may wish to reconstruct the original supertree. Suppose instead one has multiple probability distributions of restricted trees, then one may be interested in reconstructing the supertree probability distribution.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- (2 more...)
Combination of Topology and Nonmonotonic Logics for Typicality in a Scientific Field: Paleoanthropology
Jouis, Christophe (LIP6 (UPMC / CNRS)) | Jouis, Claude (Ecole Polytechnique) | Guy, Franck (Universite de Poitiers) | Habib, Bassel (LIP6 (UPMC / CNRS)) | Ganascia, Jean-Gabriel (LIP6 (UPMC / CNRS))
In computer science, ontology is a model of a domain in the form of classes and of relationships between these classes. Classes are organized in a graph the arrows of which are semantic relations. Ontology is static because the class hierarchy is fixed. In paleontology, systematic (i.e., the class hierarchies and the class relationships) is complicated by the time variable. Morphological changes over time yield, by natural selection, the emergence of new forms (taxa) differing from the ancestral morph and contemporaneous taxa of the same class hierarchy. Discovering new taxa implies, therefore, the rearrangement of the class hierarchy or the definition of new classes, based on the degree of atypicality of the new morph. Note that this phenomenon occurs in many domains such as physics, biology, linguistics, for example.
- North America > United States > California (0.04)
- North America > United States > Florida > Miami-Dade County > Miami > Coconut Grove (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Nonmonotonic Logic (0.67)