AITopics

2501.14114

Country:

Asia > China (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Law > Civil Rights & Constitutional Law (0.41)
Law > International Law (0.40)
Government > Intergovernmental Programs (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.62)

arXiv.org Artificial IntelligenceJan-23-2025

Enhancing kelp forest detection in remote sensing images using crowdsourced labels with Mixed Vision Transformers and ConvNeXt segmentation models

Nasios, Ioannis

Kelp forests, as foundation species, are vital to marine ecosystems, providing essential food and habitat for numerous organisms. This study explores the integration of crowdsourced labels with advanced artificial intelligence models to develop a fast and accurate kelp canopy detection pipeline using Landsat images. Building on the success of a machine learning competition, where this approach ranked third and performed consistently well on both local validation and public and private leaderboards, the research highlights the effectiveness of combining Mixed Vision Transformers (MIT) with ConvNeXt models. Training these models on various image sizes significantly enhanced the accuracy of the ensemble results. U-Net emerged as the best segmentation architecture, with UpperNet also contributing to the final ensemble. Key Landsat bands, such as ShortWave InfraRed (SWIR1) and Near-InfraRed (NIR), were crucial while altitude data was used in postprocessing to eliminate false positives on land. The methodology achieved a high detection rate, accurately identifying about three out of four pixels containing kelp canopy while keeping false positives low. Despite the medium resolution of Landsat satellites, their extensive historical coverage makes them effective for studying kelp forests. This work also underscores the potential of combining machine learning models with crowdsourced data for effective and scalable environmental monitoring. All running code for training all models and inference can be found at https://github.com/IoannisNasios/Kelp_Forests.

artificial intelligence, kelp forest, machine learning, (20 more...)

doi: 10.1080/01431161.2024.2448307

2501.14001

Country:

South America > Falkland Islands (0.05)
North America > Canada (0.04)
Indian Ocean > Red Sea (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.53)
Health & Medicine (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Florez, Juan Andres Medina, Raza, Shaina, Lynn, Rashida, Shakeri, Zahra, Smith, Brendan T., Dolatabadi, Elham

Academic Case Reports Lack Diversity: Assessing the Presence and Diversity of Sociodemographic and Behavioral Factors related to Post COVID-19 Condition

arXiv.org Artificial IntelligenceJan-23-2025

Understanding the prevalence, disparities, and symptom variations of Post COVID-19 Condition (PCC) for vulnerable populations is crucial to improving care and addressing intersecting inequities. This study aims to develop a comprehensive framework for integrating social determinants of health (SDOH) into PCC research by leveraging NLP techniques to analyze disparities and variations in SDOH representation within PCC case reports. Following construction of a PCC Case Report Corpus, comprising over 7,000 case reports from the LitCOVID repository, a subset of 709 reports were annotated with 26 core SDOH-related entity types using pre-trained named entity recognition (NER) models, human review, and data augmentation to improve quality, diversity and representation of entity types. An NLP pipeline integrating NER, natural language inference (NLI), trigram and frequency analyses was developed to extract and analyze these entities. Both encoder-only transformer models and RNN-based models were assessed for the NER objective. Fine-tuned encoder-only BERT models outperformed traditional RNN-based models in generalizability to distinct sentence structures and greater class sparsity. Exploratory analysis revealed variability in entity richness, with prevalent entities like condition, age, and access to care, and underrepresentation of sensitive categories like race and housing status. Trigram analysis highlighted frequent co-occurrences among entities, including age, gender, and condition. The NLI objective (entailment and contradiction analysis) showed attributes like "Experienced violence or abuse" and "Has medical insurance" had high entailment rates (82.4%-80.3%), while attributes such as "Is female-identifying," "Is married," and "Has a terminal condition" exhibited high contradiction rates (70.8%-98.5%).

case report, contradiction, entailment, (16 more...)

2501.12538

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > South Korea (0.14)
South America > Brazil (0.04)
(58 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Personality Disorder (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Eufrazio, Rafael P., Montesuma, Eduardo Fernandes, Cavalcante, Charles C.

A dimensionality reduction technique based on the Gromov-Wasserstein distance

arXiv.org Machine LearningJan-23-2025

Analyzing relationships between objects is a pivotal problem within data science. In this context, Dimensionality reduction (DR) techniques are employed to generate smaller and more manageable data representations. This paper proposes a new method for dimensionality reduction, based on optimal transportation theory and the Gromov-Wasserstein distance. We offer a new probabilistic view of the classical Multidimensional Scaling (MDS) algorithm and the nonlinear dimensionality reduction algorithm, Isomap (Isometric Mapping or Isometric Feature Mapping) that extends the classical MDS, in which we use the Gromov-Wasserstein distance between the probability measure of high-dimensional data, and its low-dimensional representation. Through gradient descent, our method embeds high-dimensional data into a lower-dimensional space, providing a robust and efficient solution for analyzing complex high-dimensional datasets.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2501.13732

Country: South America > Brazil (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Daily Mail - Science & techJan-22-2025, 16:41:48 GMT

The stupidest things footballers have said - as scientists claim professional players are actually 'super-clever individuals'

From Kevin Keegan to David Beckham and Michael Owen, many prolific footballers have won themselves simple-minded reputations as well as trophies. But scientists say elite football stars are actually'super-clever individuals'. 'Footballers often do not pursue higher education, such as university degrees, because their focus and interests lie elsewhere – primarily in their sport,' Professor Leonardo Bonetti, study author at Aarhus University in Denmark, told MailOnline. 'While this may mean they are less knowledgeable in certain academic areas, it does not reflect a lack of intelligence. 'Unfortunately, people often confuse being less formally educated with being less clever, which perpetuates this unfair stereotype.' Famously, former striker and England manager Keegan once said of Argentina: 'They're the second-best team in the world, and there's no higher praise than that.' Meanwhile, Beckham memorably commented after the birth of his eldest son: 'I want Brooklyn to be christened, but I don't know into what religion.'

artificial intelligence, footballer, scientist claim professional player, (10 more...)

Daily Mail - Science & tech

Country:

South America > Argentina (0.25)
Europe > United Kingdom > England (0.25)
Europe > Denmark (0.25)

Genre: Research Report > New Finding (0.70)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.49)

Mousavi, Seyed Mahed, Alghisi, Simone, Riccardi, Giuseppe

LLMs as Repositories of Factual Knowledge: Limitations and Solutions

LLMs' sources of knowledge are data snapshots containing factual information about entities collected at different timestamps and from different media types (e.g. wikis, social media, etc.). Such unstructured knowledge is subject to change due to updates through time from past to present. Equally important are the inconsistencies and inaccuracies occurring in different information sources. Consequently, the model's knowledge about an entity may be perturbed while training over the sequence of snapshots or at inference time, resulting in inconsistent and inaccurate model performance. In this work, we study the appropriateness of Large Language Models (LLMs) as repositories of factual knowledge. We consider twenty-four state-of-the-art LLMs that are either closed-, partially (weights), or fully (weight and training data) open-source. We evaluate their reliability in responding to time-sensitive factual questions in terms of accuracy and consistency when prompts are perturbed. We further evaluate the effectiveness of state-of-the-art methods to improve LLMs' accuracy and consistency. We then propose "ENtity-Aware Fine-tuning" (ENAF), a soft neurosymbolic approach aimed at providing a structured representation of entities during fine-tuning to improve the model's performance.

large language model, machine learning, natural language, (21 more...)

2501.12774

Country:

Europe (1.00)
North America (0.68)
South America (0.68)
Asia > Middle East > Saudi Arabia (0.46)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Government (1.00)
Energy > Oil & Gas (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gayoso-Cabada, Joaquí, Goicoechea-de-Jorge, María, Gómez-Albarrán, Mercedes, Sanz-Cabrerizo, Amelia, Sarasa-Cabezuelo, Antonio, Sierra, José-Luis

Ontology-Enhanced Educational Annotation Activities

Information and communications technology and technology-enhanced learning have unquestionably transformed traditional teaching-learning processes and are positioned as key factors to promote quality education, one of the basic sustainable development goals of the 2030 agenda. Document annotation, which was traditionally carried out with pencil and paper and currently benefits from digital document annotation tools, is a representative example of this transformation. Using document annotation tools, students can enrich the documents with annotations that highlight the most relevant aspects of these documents. As the conceptual complexity of the learning domain increases, the annotation of the documents may require comprehensive domain knowledge and an expert analysis capability that students usually lack. Consequently, a proliferation of irrelevant, incorrect, and/or poorly decontextualized annotations may appear, while other relevant aspects are completely ignored by the students. The main hypothesis proposed by this paper is that the use of a guiding annotation ontology in the annotation activities is a keystone aspect to alleviate these shortcomings. Consequently, comprehension is improved, exhaustive content analysis is promoted, and meta-reflective thinking is developed. To test this hypothesis, we describe our own annotation tool, \@note, which fully implements this ontology-enhanced annotation paradigm, and we provide experimental evidence about how \@note can improve academic performance via a pilot study concerning critical literary annotation.

annotation, artificial intelligence, machine learning, (17 more...)

doi: 10.3390/su11164455

2501.12943

Country:

Europe > Spain > Galicia > Madrid (0.05)
North America > United States > New York > New York County > New York City (0.05)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(15 more...)

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report > Experimental Study (0.46)

Industry:

Education > Educational Setting > Online (0.68)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Zahran, Raghda, Xu, Jianfei, Liang, Huizhi, Forshaw, Matthew

Data Science Students Perspectives on Learning Analytics: An Application of Human-Led and LLM Content Analysis

Objective This study is part of a series of initiatives at a UK university designed to cultivate a deep understanding of students' perspectives on analytics that resonate with their unique learning needs. It explores collaborative data processing undertaken by postgraduate students who examined an Open University Learning Analytics Dataset (OULAD). Methods A qualitative approach was adopted, integrating a Retrieval-Augmented Generation (RAG) and a Large Language Model (LLM) technique with human-led content analysis to gather information about students' perspectives based on their submitted work. The study involved 72 postgraduate students in 12 groups. Findings The analysis of group work revealed diverse insights into essential learning analytics from the students' perspectives. All groups adopted a structured data science methodology. The questions formulated by the groups were categorised into seven themes, reflecting their specific areas of interest. While there was variation in the selected variables to interpret correlations, a consensus was found regarding the general results. Conclusion A significant outcome of this study is that students specialising in data science exhibited a deeper understanding of learning analytics, effectively articulating their interests through inferences drawn from their analyses. While human-led content analysis provided a general understanding of students' perspectives, the LLM offered nuanced insights.

data mining, large language model, machine learning, (19 more...)

2502.10409

Country:

South America > Uruguay > Maldonado > Maldonado (0.06)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.43)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Yamaguchi, Tomoya, Hoxha, Bardh, Nickovic, Dejan

RTAMT -- Runtime Robustness Monitors with Application to CPS and Robotics

The library implements a flexible architecture that supports: (1) various environments connected by an Application Programming Interface (API) in Python, (2) various flavors of temporal logic specification and robustness notion such as STL, including an interface-aware variant that distinguishes between input and output variables, and (3) discrete-time and dense-time interpretation of STL with generation of online and offline monitors. We specifically focus on robotics and Cyber-Physical Systems (CPSs) applications, showing how to integrate RTAMT with (1) the Robot Operating System (ROS) and (2) MATLAB/Simulink environments. We evaluate the tool by demonstrating several use scenarios involving service robotic and avionic applications.

artificial intelligence, robustness, specification, (17 more...)

doi: 10.1007/S10009-023-00720-3

2501.18608

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain > Galicia > Madrid (0.04)
(20 more...)

Genre: Research Report (0.40)

Industry:

Transportation > Air (0.48)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

NExtLong: Toward Effective Long-Context Training without Long Documents

Gao, Chaochen, Wu, Xing, Lin, Zijia, Zhang, Debing, Hu, Songlin

Large language models (LLMs) with extended context windows have made significant strides yet remain a challenge due to the scarcity of long documents. Existing methods tend to synthesize long-context data but lack a clear mechanism to reinforce the long-range dependency modeling. To address this limitation, we propose NExtLong, a novel framework for synthesizing long-context data through Negative document Extension. NExtLong decomposes a document into multiple meta-chunks and extends the context by interleaving hard negative distractors retrieved from pretraining corpora. This approach compels the model to discriminate long-range dependent context from distracting content, enhancing its ability to model long-range dependencies. Extensive experiments demonstrate that NExtLong achieves significant performance improvements on the HELMET and RULER benchmarks compared to existing long-context synthesis approaches and leading models, which are trained on non-synthetic long documents. These findings highlight NExtLong's ability to reduce reliance on non-synthetic long documents, making it an effective framework for developing advanced long-context LLMs.

large language model, machine learning, preprint arxiv, (20 more...)

2501.12766

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)