Goto

Collaborating Authors

 Asmara


A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition

arXiv.org Artificial Intelligence

The emergence of multimodal content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasingly important area of research within Natural Language Processing. Despite progress in high-resource languages such as English, MNER remains underexplored for low-resource languages like Urdu. The primary challenges include the scarcity of annotated multimodal datasets and the lack of standardized baselines. To address these challenges, we introduce the U-MNER framework and release the Twitter2015-Urdu dataset, a pioneering resource for Urdu MNER. Adapted from the widely used Twitter2015 dataset, it is annotated with Urdu-specific grammar rules. We establish benchmark baselines by evaluating both text-based and multimodal models on this dataset, providing comparative analyses to support future research on Urdu MNER. The U-MNER framework integrates textual and visual context using Urdu-BERT for text embeddings and ResNet for visual feature extraction, with a Cross-Modal Fusion Module to align and fuse information. Our model achieves state-of-the-art performance on the Twitter2015-Urdu dataset, laying the groundwork for further MNER research in low-resource languages.


The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs

arXiv.org Artificial Intelligence

Coral reefs are declining worldwide due to climate change and local stressors. To inform effective conservation or restoration, monitoring at the highest possible spatial and temporal resolution is necessary. Conventional coral reef surveying methods are limited in scalability due to their reliance on expert labor time, motivating the use of computer vision tools to automate the identification and abundance estimation of live corals from images. However, the design and evaluation of such tools has been impeded by the lack of large high quality datasets. We release the Coralscapes dataset, the first general-purpose dense semantic segmentation dataset for coral reefs, covering 2075 images, 39 benthic classes, and 174k segmentation masks annotated by experts. Coralscapes has a similar scope and the same structure as the widely used Cityscapes dataset for urban scene segmentation, allowing benchmarking of semantic segmentation models in a new challenging domain which requires expert knowledge to annotate. We benchmark a wide range of semantic segmentation models, and find that transfer learning from Coralscapes to existing smaller datasets consistently leads to state-of-the-art performance. Coralscapes will catalyze research on efficient, scalable, and standardized coral reef surveying methods based on computer vision, and holds the potential to streamline the development of underwater ecological robotics.


Quantifying Public Response to COVID-19 Events: Introducing the Community Sentiment and Engagement Index

arXiv.org Artificial Intelligence

This study introduces the Community Sentiment and Engagement Index (CSEI), developed to capture nuanced public sentiment and engagement variations on social media, particularly in response to major events related to COVID-19. Constructed with diverse sentiment indicators, CSEI integrates features like engagement, daily post count, compound sentiment, fine-grain sentiments (fear, surprise, joy, sadness, anger, disgust, and neutral), readability, offensiveness, and domain diversity. Each component is systematically weighted through a multi-step Principal Component Analysis (PCA)-based framework, prioritizing features according to their variance contributions across temporal sentiment shifts. This approach dynamically adjusts component importance, enabling CSEI to precisely capture high-sensitivity shifts in public sentiment. The development of CSEI showed statistically significant correlations with its constituent features, underscoring internal consistency and sensitivity to specific sentiment dimensions. CSEI's responsiveness was validated using a dataset of 4,510,178 Reddit posts about COVID-19. The analysis focused on 15 major events, including the WHO's declaration of COVID-19 as a pandemic, the first reported cases of COVID-19 across different countries, national lockdowns, vaccine developments, and crucial public health measures. Cumulative changes in CSEI revealed prominent peaks and valleys aligned with these events, indicating significant patterns in public sentiment across different phases of the pandemic. Pearson correlation analysis further confirmed a statistically significant relationship between CSEI daily fluctuations and these events (p = 0.0428), highlighting the capacity of CSEI to infer and interpret shifts in public sentiment and engagement in response to major events related to COVID-19.


Interpretable LLM-based Table Question Answering

arXiv.org Artificial Intelligence

Interpretability for Table Question Answering (Table QA) is critical, particularly in high-stakes industries like finance or healthcare. Although recent approaches using Large Language Models (LLMs) have significantly improved Table QA performance, their explanations for how the answers are generated are ambiguous. To fill this gap, we introduce Plan-of-SQLs ( or POS), an interpretable, effective, and efficient approach to Table QA that answers an input query solely with SQL executions. Through qualitative and quantitative evaluations with human and LLM judges, we show that POS is most preferred among explanation methods, helps human users understand model decision boundaries, and facilitates model success and error identification. Furthermore, when evaluated in standard benchmarks (TabFact, WikiTQ, and FetaQA), POS achieves competitive or superior accuracy compared to existing methods, while maintaining greater efficiency by requiring significantly fewer LLM calls and database queries.


Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance

arXiv.org Artificial Intelligence

With the continuous development of technological and educational innovation, learners nowadays can obtain a variety of support from agents such as teachers, peers, education technologies, and recently, generative artificial intelligence such as ChatGPT. The concept of hybrid intelligence is still at a nascent stage, and how learners can benefit from a symbiotic relationship with various agents such as AI, human experts and intelligent learning systems is still unknown. The emerging concept of hybrid intelligence also lacks deep insights and understanding of the mechanisms and consequences of hybrid human-AI learning based on strong empirical research. In order to address this gap, we conducted a randomised experimental study and compared learners' motivations, self-regulated learning processes and learning performances on a writing task among different groups who had support from different agents (ChatGPT, human expert, writing analytics tools, and no extra tool). A total of 117 university students were recruited, and their multi-channel learning, performance and motivation data were collected and analysed. The results revealed that: learners who received different learning support showed no difference in post-task intrinsic motivation; there were significant differences in the frequency and sequences of the self-regulated learning processes among groups; ChatGPT group outperformed in the essay score improvement but their knowledge gain and transfer were not significantly different. Our research found that in the absence of differences in motivation, learners with different supports still exhibited different self-regulated learning processes, ultimately leading to differentiated performance. What is particularly noteworthy is that AI technologies such as ChatGPT may promote learners' dependence on technology and potentially trigger metacognitive laziness.


Learning Algorithms Made Simple

arXiv.org Artificial Intelligence

In this paper, we discuss learning algorithms and their importance in different types of applications which includes training to identify important patterns and features in a straightforward, easy-to-understand manner. We will review the main concepts of artificial intelligence (AI), machine learning (ML), deep learning (DL), and hybrid models. Some important subsets of Machine Learning algorithms such as supervised, unsupervised, and reinforcement learning are also discussed in this paper. These techniques can be used for some important tasks like prediction, classification, and segmentation. Convolutional Neural Networks (CNNs) are used for image and video processing and many more applications. We dive into the architecture of CNNs and how to integrate CNNs with ML algorithms to build hybrid models. This paper explores the vulnerability of learning algorithms to noise, leading to misclassification. We further discuss the integration of learning algorithms with Large Language Models (LLM) to generate coherent responses applicable to many domains such as healthcare, marketing, and finance by learning important patterns from large volumes of data. Furthermore, we discuss the next generation of learning algorithms and how we may have an unified Adaptive and Dynamic Network to perform important tasks. Overall, this article provides brief overview of learning algorithms, exploring their current state, applications and future direction.


Cleaning Robots in Public Spaces: A Survey and Proposal for Benchmarking Based on Stakeholders Interviews

arXiv.org Artificial Intelligence

Autonomous cleaning robots for public spaces have potential for addressing current societal challenges, such as labor shortages and cleanliness in public spaces. Other application domains like autonomous driving, bin picking, or search and rescue have shown that benchmarking platforms and approaches in competitive settings can advance their respective research fields, resulting in more applicable systems under real-world conditions. For this paper, we analyzed seven semi-structured, qualitative stakeholder interviews about outdoor cleaning, identified current needs as well as limitations, and considered those results for the development of a benchmarking scenario based on the previous observations.


PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering

arXiv.org Artificial Intelligence

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current efficient answer correctness (AC) metrics do not align with human judgments, particularly verbose, free-form answers from large language models (LLMs). There are two challenges: a lack of diverse evaluation data and that models are too big and non-transparent; LLM-based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing guidelines and datasets for evaluating machine QA adopted from human QA community. We also propose an efficient, low-resource, and interpretable QA evaluation method more stable than an exact match and neural methods.


SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings

arXiv.org Artificial Intelligence

Movement disorders are typically diagnosed by consensus-based expert evaluation of clinically acquired patient videos. However, such broad sharing of patient videos poses risks to patient privacy. Face blurring can be used to de-identify videos, but this process is often manual and time-consuming. Available automated face blurring techniques are subject to either excessive, inconsistent, or insufficient facial blurring - all of which can be disastrous for video assessment and patient privacy. Furthermore, assessing movement disorders in these videos is often subjective. The extraction of quantifiable kinematic features can help inform movement disorder assessment in these videos, but existing methods to do this are prone to errors if using pre-blurred videos. We have developed an open-source software called SecurePose that can both achieve reliable face blurring and automated kinematic extraction in patient videos recorded in a clinic setting using an iPad. SecurePose, extracts kinematics using a pose estimation method (OpenPose), tracks and uniquely identifies all individuals in the video, identifies the patient, and performs face blurring. The software was validated on gait videos recorded in outpatient clinic visits of 116 children with cerebral palsy. The validation involved assessing intermediate steps of kinematics extraction and face blurring with manual blurring (ground truth). Moreover, when SecurePose was compared with six selected existing methods, it outperformed other methods in automated face detection and achieved ceiling accuracy in 91.08% less time than a robust manual face blurring method. Furthermore, ten experienced researchers found SecurePose easy to learn and use, as evidenced by the System Usability Scale. The results of this work validated the performance and usability of SecurePose on clinically recorded gait videos for face blurring and kinematics extraction.


CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering

arXiv.org Artificial Intelligence

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current evaluation metrics to determine answer equivalence (AE) often do not align with human judgments, particularly more verbose, free-form answers from large language models (LLM). There are two challenges: a lack of data and that models are too big: LLM-based scorers can correlate better with human judges, but this task has only been tested on limited QA datasets, and even when available, update of the model is limited because LLMs are large and often expensive. We rectify both of these issues by providing clear and consistent guidelines for evaluating AE in machine QA adopted from professional human QA contests. We also introduce a combination of standard evaluation and a more efficient, robust, and lightweight discriminate AE classifier-based matching method (CFMatch, smaller than 1 MB), trained and validated to more accurately evaluate answer correctness in accordance with adopted expert AE rules that are more aligned with human judgments.