Goto

Collaborating Authors

 imran


Semantically Enriched Cross-Lingual Sentence Embeddings for Crisis-related Social Media Texts

Lamsal, Rabindra, Read, Maria Rodriguez, Karunasekera, Shanika

arXiv.org Artificial Intelligence

Tasks such as semantic search and clustering on crisis-related social media texts enhance our comprehension of crisis discourse, aiding decision-making and targeted interventions. Pre-trained language models have advanced performance in crisis informatics, but their contextual embeddings lack semantic meaningfulness. Although the CrisisTransformers family includes a sentence encoder to address the semanticity issue, it remains monolingual, processing only English texts. Furthermore, employing separate models for different languages leads to embeddings in distinct vector spaces, introducing challenges when comparing semantic similarities between multi-lingual texts. Therefore, we propose multi-lingual sentence encoders (CT-XLMR-SE and CT-mBERT-SE) that embed crisis-related social media texts for over 50 languages, such that texts with similar meanings are in close proximity within the same vector space, irrespective of language diversity. Results in sentence encoding and sentence matching tasks are promising, suggesting these models could serve as robust baselines when embedding multi-lingual crisis-related social media texts. The models are publicly available at: https://huggingface.co/crisistransformers.


NADBenchmarks -- a compilation of Benchmark Datasets for Machine Learning Tasks related to Natural Disasters

Proma, Adiba Mahbub, Islam, Md Saiful, Ciko, Stela, Baten, Raiyan Abdul, Hoque, Ehsan

arXiv.org Artificial Intelligence

Climate change has increased the intensity, frequency, and duration of extreme weather events and natural disasters across the world. While the increased data on natural disasters improves the scope of machine learning (ML) in this field, progress is relatively slow. One bottleneck is the lack of benchmark datasets that would allow ML researchers to quantify their progress against a standard metric. The objective of this short paper is to explore the state of benchmark datasets for ML tasks related to natural disasters, categorizing them according to the disaster management cycle. We compile a list of existing benchmark datasets introduced in the past five years. We propose a web platform - NADBenchmarks - where researchers can search for benchmark datasets for natural disasters, and we develop a preliminary version of such a platform using our compiled list. This paper is intended to aid researchers in finding benchmark datasets to train their ML models on, and provide general directions for topics where they can contribute new benchmark datasets.


Urdu Speech and Text Based Sentiment Analyzer

Ahmad, Waqar, Edalati, Maryam

arXiv.org Artificial Intelligence

Discovering what other people think has always been a key aspect of our information-gathering strategy. People can now actively utilize information technology to seek out and comprehend the ideas of others, thanks to the increased availability and popularity of opinion-rich resources such as online review sites and personal blogs. Because of its crucial function in understanding people's opinions, sentiment analysis (SA) is a crucial task. Existing research, on the other hand, is primarily focused on the English language, with just a small amount of study devoted to low-resource languages. For sentiment analysis, this work presented a new multi-class Urdu dataset based on user evaluations. The tweeter website was used to get Urdu dataset. Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative. The primary purpose of this research is to construct a manually annotated dataset for Urdu sentiment analysis and to establish the baseline result. Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.


40 Algorithms Every Programmer Should Know: Hone your problem-solving skills by learning different algorithms and their implementation in Python 1, Ahmad, Imran, eBook - Amazon.com

#artificialintelligence

Imran has been a part of cutting-edge research about Algorithms and Machine Learning for the last many years. He completed his PhD in 2010 in which he proposed a new Linear Programming based algorithm which can be used to optimally assign resources in a large scale cloud computing environment. In 2017, Imran developed a realtime analytics framework named StreamSensing. He has since authored multiple research papers that use StreamSensing to process multimedia data for various Machine Learning Algorithms. Imran is currently working at Advanced Analytics Solution Center (A2SC) at Canadian Federal Government as a Data Scientist where he is using Machine Learning Algorithms for critical use-cases. Imran is a visiting professor at Carleton University, Ottawa. Imran has also been teaching for Google and Learning Tree for the last many years. The topics Imran teaches include Algorithms, Cloud Computing and Deep Learning. Over his career, Imran has written many research papers and a couple of his recent papers have won the best paper award. Imran also regularly writes blogs on selected IT topics. In addition to his professional work, Imran is into Nature Photography. Over the years he has taken thousands of photos about nature. Imran's passion is to find a way to make technology work for the betterment of humanity. This passion is the main motivation behind his research.


HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

Alam, Firoj, Qazi, Umair, Imran, Muhammad, Ofli, Ferda

arXiv.org Artificial Intelligence

Social networks are widely used for information consumption and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large volume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making. To address such issues automatic classification systems have been developed using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with ~77K human-labeled tweets, sampled from a pool of ~24 million tweets across 19 disaster events that happened between 2016 and 2019. Moreover, we propose a data collection and sampling pipeline, which is important for social media data sampling for human annotation. We report multiclass classification results using classic and deep learning (fastText and transformer) based models to set the ground for future studies. The dataset and associated resources are publicly available. https://crisisnlp.qcri.org/humaid_dataset.html


Shot to the Gut: "Robotic" Pill Sails Through Human Safety Study

IEEE Spectrum Robotics

An average person with type 1 diabetes and no insulin pump sticks a needle into their abdomen between 700 and 1,000 times per year. A person with the hormone disorder acromegaly travels to a doctor's office to receive a painful injection into the muscles of the butt once a month. Someone with multiple sclerosis may inject the disease-slowing interferon beta drug three times per week, varying the injection site among the arms, legs and back. Medical inventor Mir Imran, holder of more than 400 patents, spent the last seven years working on an alternate way to deliver large drug molecules like these, and his solution--an unusual "robotic" pill--was recently tested in humans. The RaniPill capsule works like a miniature Rube Goldberg device: Once swallowed, the capsule travels to the intestines where the shell dissolves to mix two chemicals to inflate a balloon to push out a needle to pierce the intestinal wall to deliver a drug into the bloodstream.