alam
Foreign aid cuts hurt the most vulnerable in world's largest refugee camp
Cox's Bazar, Bangladesh – The sound of children at play echoes through the verdant lanes of one of the dozens of refugee camps on the outskirts of Cox's Bazar, a densely populated coastal town in southeast Bangladesh. Just for a moment, the sounds manage to soften the harsh living conditions faced by the more than one million people who live here in the world's largest refugee camp. Described as the most persecuted people on the planet, the Rohingya Muslim refugees in Bangladesh may now be one of the most forgotten populations in the world, eight years after being ethnically cleansed from their homes in neighbouring Myanmar by a predominantely Buddhist military regime. "Cox's Bazar is ground zero for the impact of budget cuts on people in desperate need," UN Secretary-General Antonio Guterres said during a visit to the sprawling camps in May. The UN chief's visit followed United States President Donald Trump's gutting of the US Agency for International Development (USAID), which has stalled several key projects in the camps, and the United Kingdom announcing cuts to foreign aid in order to increase defence spending.
Toxic language detection: a systematic review of Arabic datasets
Bensalem, Imene, Rosso, Paolo, Zitouni, Hanane
The detection of toxic language in the Arabic language has emerged as an active area of research in recent years, and reviewing the existing datasets employed for training the developed solutions has become a pressing need. This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 54 available datasets and their corresponding papers and conducted a thorough analysis, considering 18 criteria across four primary dimensions: availability details, content, annotation process, and reusability. This analysis enabled us to identify existing gaps and make recommendations for future research works. For the convenience of the research community, the list of the analysed datasets is maintained in a GitHub repository (https://github.com/Imene1/Arabic-toxic-language).
Ipswich Hospital advances stroke care thanks to tech partnership
Ipswich Hospital has seen benefits to its stroke pathway thanks to a strategic tech partnership between Visionable and AI-powered medtech solutions company Brainomix. The partnership is combining Visionable's virtual healthcare collaboration platform with Brainomix's e-Stroke imaging software. The collaboration is saving time, reducing clinicians' workload and supporting improved outcomes for patients. The benefit of combining these two technologies has already been felt at Ipswich Hospital. Recent performance metrics have confirmed that Ipswich Hospital is delivering best-in-class service.
Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text
Tarannum, Prerona, Alam, Firoj, Hasan, Md. Arid, Noori, Sheak Rashed Haider
The wide use of social media and digital technologies facilitates sharing various news and information about events and activities. Despite sharing positive information misleading and false information is also spreading on social media. There have been efforts in identifying such misleading information both manually by human experts and automatic tools. Manual effort does not scale well due to the high volume of information, containing factual claims, are appearing online. Therefore, automatically identifying check-worthy claims can be very useful for human experts. In this study, we describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022. We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not. We use the oversampling technique to balance the dataset and applied SVM and Random Forest (RF) with TF-IDF representations. We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments. We used BERT-m for the official submissions and our systems ranked as 3rd, 5th, and 12th in Spanish, Dutch, and English, respectively. In further experiments, our evaluation shows that transformer models (BERT-m and XLM-RoBERTa-base) outperform the SVM and RF in Dutch and English languages where a different scenario is observed for Spanish.
Alam
We present a novel negotiation protocol to facilitate energy exchange between off-grid homes that are equipped with renewable energy generation and electricity storage. Our protocol imposes restrictions over negotiation such that it reduces the complex interdependent multi-issue negotiation to one where agents have a strategy profile in subgame perfect Nash equilibrium. We show that our protocol is concurrent, scalable and; under certain conditions; leads to Pareto-optimal outcomes.
Alam
The popularization and quick growth of Linked Open Data (LOD) has led to challenging aspects regarding quality assessment and data exploration of the RDF triples that shape the LOD cloud.Particularly, we are interested in the completeness of data and its potential to provide concept definitions in terms of necessary and sufficient conditions.In this work we propose a novel technique based on Formal Concept Analysis which organizes RDF data into a concept lattice.This allows data exploration as well as the discovery of implications, which are used to automatically detect missing information and then to complete RDF data.Moreover, this is a way of reconciling syntax and semantics in the LOD cloud.Finally, experiments on the DBpedia knowledge base show that the approach is well-founded and effective.
Alam
During time-critical situations such as natural disasters, rapid classification of data posted on social networks by affected people is useful for humanitarian organizations to gain situ- ational awareness and to plan response efforts. However, the scarcity of labeled data in the early hours of a crisis hinders machine learning tasks thus delays crisis response. In this work, we propose to use an inductive semi-supervised tech- nique to utilize unlabeled data, which is often abundant at the onset of a crisis event, along with fewer labeled data. Specif- ically, we adopt a graph-based deep learning framework to learn an inductive semi-supervised model. We use two real- world crisis datasets from Twitter to evaluate the proposed approach. Our results show significant improvements using unlabeled data as compared to only using labeled data.
HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks
Alam, Firoj, Qazi, Umair, Imran, Muhammad, Ofli, Ferda
Social networks are widely used for information consumption and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large volume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making. To address such issues automatic classification systems have been developed using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with ~77K human-labeled tweets, sampled from a pool of ~24 million tweets across 19 disaster events that happened between 2016 and 2019. Moreover, we propose a data collection and sampling pipeline, which is important for social media data sampling for human annotation. We report multiclass classification results using classic and deep learning (fastText and transformer) based models to set the ground for future studies. The dataset and associated resources are publicly available. https://crisisnlp.qcri.org/humaid_dataset.html
Gene Shaving using influence function of a kernel method
Alam, Md. Ashad, Shahjama, Mohammad, Rahman, Md. Ferdush
Identifying significant subsets of the genes, gene shaving is an essential and challenging issue for biomedical research for a huge number of genes and the complex nature of biological networks,. Since positive definite kernel based methods on genomic information can improve the prediction of diseases, in this paper we proposed a new method, "kernel gene shaving (kernel canonical correlation analysis (kernel CCA) based gene shaving). This problem is addressed using the influence function of the kernel CCA. To investigate the performance of the proposed method in a comparison of three popular gene selection methods (T-test, SAM and LIMMA), we were used extensive simulated and real microarray gene expression datasets. The performance measures AUC was computed for each of the methods. The achievement of the proposed method has improved than the three well-known gene selection methods. In real data analysis, the proposed method identified a subsets of $210$ genes out of $2000$ genes. The network of these genes has significantly more interactions than expected, which indicates that they may function in a concerted effort on colon cancer.