Goto

Collaborating Authors

 Cabo Delgado Province


Expanding FLORES+ Benchmark for more Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation

arXiv.org Artificial Intelligence

As part of the Open Language Data Initiative shared tasks, we have expanded the FLORES+ evaluation set to include Emakhuwa, a low-resource language widely spoken in Mozambique. We translated the dev and devtest sets from Portuguese into Emakhuwa, and we detail the translation process and quality assurance measures used. Our methodology involved various quality checks, including post-editing and adequacy assessments. The resulting datasets consist of multiple reference sentences for each source. We present baseline results from training a Neural Machine Translation system and fine-tuning existing multilingual translation models. Our findings suggest that spelling inconsistencies remain a challenge in Emakhuwa. Additionally, the baseline models underperformed on this evaluation set, underscoring the necessity for further research to enhance machine translation quality for Emakhuwa. The data is publicly available at https://huggingface.co/datasets/LIACC/Emakhuwa-FLORES.


From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives

arXiv.org Machine Learning

In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in "prediction-powered inference" to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.


Taking it further: leveraging pseudo labels for field delineation across label-scarce smallholder regions

arXiv.org Artificial Intelligence

Transfer learning allows for resource-efficient geographic transfer of pre-trained field delineation models. However, the scarcity of labeled data for complex and dynamic smallholder landscapes, particularly in Sub-Saharan Africa, remains a major bottleneck for large-area field delineation. This study explores opportunities of using sparse field delineation pseudo labels for fine-tuning models across geographies and sensor characteristics. We build on a FracTAL ResUNet trained for crop field delineation in India (median field size of 0.24 ha) and use this pre-trained model to generate pseudo labels in Mozambique (median field size of 0.06 ha). We designed multiple pseudo label selection strategies and compared the quantities, area properties, seasonal distribution, and spatial agreement of the pseudo labels against human-annotated training labels (n = 1,512). We then used the human-annotated labels and the pseudo labels for model fine-tuning and compared predictions against human field annotations (n = 2,199). Our results indicate i) a good baseline performance of the pre-trained model in both field delineation and field size estimation, and ii) the added value of regional fine-tuning with performance improvements in nearly all experiments. Moreover, we found iii) substantial performance increases when using only pseudo labels (up to 77% of the IoU increases and 68% of the RMSE decreases obtained by human labels), and iv) additional performance increases when complementing human annotations with pseudo labels. Pseudo labels can be efficiently generated at scale and thus facilitate domain adaptation in label-scarce settings. The workflow presented here is a stepping stone for overcoming the persisting data gaps in heterogeneous smallholder agriculture of Sub-Saharan Africa, where labels are commonly scarce.


AI, analytics key to developing African hydrocarbons - IT-Online

#artificialintelligence

Africa has had massive oil and gas discoveries in recent years – including the Greater Tortue Ahmeyim offshore Senegal and Mauritania, the Luiperd and Brulpadda in South Africa and the Rovuma Basin discoveries offshore Mozambique, among others – but development has been slow owing largely to restricted investment, Covid-19 impacts and a lack of modern digital solutions. With more than 600-million people living without access to electricity in Africa, the accelerated development of Africa's oil and gas is key for making energy poverty history. Now, with the emergence of AI and analytics across the oil and gas sector, an opportunity has risen for Africa to drive modern and sustainable energy growth for years to come. With oil and gas production decreasing in Africa due to natural declines in legacy projects, increasing the use of AI and analytics across the upstream segment could help simplify drilling activities, revitalise the sector and expand the continent's hydrocarbons reserves for energy reliability, saving project developers, operators and owners time and resources. Furthermore, with African hydrocarbon-producing countries such as Nigeria losing billions in revenue due to theft and vandalism of infrastructure – a condition that is restraining Africa's oil and gas sector from expanding – AI and analytics tools can help optimisa industry growth by enhancing infrastructure maintenance and security across the entire oil and gas value chain, thereby helping reduce energy and revenue loss, and in the process stimulating investments across the oil and gas sector. What's more, despite Africa accounting for less than 3% of all carbon emissions, global energy transition related policies are hindering the deployment of investments necessary for boosting the continent's hydrocarbons sector.


Episode 42: How Far Can We Take AI?

#artificialintelligence

On this episode of the eeDesignIt Podcast, we're joined by Dhonam Pemba to explore artificial intelligence (AI) and his new company KidX AI. Dhonam is a neural engineer by PhD, a former rocket scientist and a serial AI entrepreneur. He was CTO of the exited company, Kadho which was acquired by Roybi for its Voice AI technology. At Kadho Sports he was their Chief Scientist which had clients in MLB, USA Volleyball, NFL, NHL, NBA, and NCAA. His latest company, KidX, is in the AI edtech space, where he has built NLP and Voice assessment to serve China's leading robotics company with 4M users.


Interview with AI Specialist Dhonam Pemba

#artificialintelligence

For our latest expert interview on our blog, we've welcomed Dhonam Pemba to share his thoughts on the topic of artificial intelligence (AI) and his journey behind founding KidX AI. Dhonam is a neural engineer by PhD, a former rocket scientist and a serial AI entrepreneur with one exit. He was CTO of the exited company, Kadho which was acquired by Roybi for its Voice AI technology. At Kadho Sports he was their Chief Scientist which had clients in MLB, USA Volleyball, NFL, NHL, NBA, and NCAA. His latest company, KidX, is in the AI edtech space, where he has built NLP and Voice assessment to serve China's leading robotics company with 4M users.


'At first I thought, this is crazy': the real-life plan to use novels to predict the next war

#artificialintelligence

As the car with the blacked-out windows came to a halt in a sidestreet near Tübingen's botanical gardens, keen-eyed passersby may have noticed something unusual about its numberplate. In Germany, the first few letters usually denote the municipality where a vehicle is registered. The letter Y, however, is reserved for members of the armed forces. Military men are a rare, not to say unwelcome, sight in Tübingen. A picturesque 15th-century university town that brought forth great German minds including the philosopher Hegel and the poet Friedrich Hölderlin, it is also a modern stronghold of the German Green party, thanks to its left-leaning academic population. In 2018, there was growing resistance on campus against plans to establish Europe's leading artificial intelligence research hub in the surrounding area: the involvement of arms manufacturers in Tübingen's "cyber valley", argued students who occupied a lecture hall that year, brought shame to the university's intellectual tradition. Yet the two high-ranking officials in field-grey Bundeswehr uniforms who stepped out of the Y-plated vehicle on 1 February 2018 had travelled into hostile territory to shake hands on a collaboration with academia, the like of which the world had never seen before. The name of the initiative was Project Cassandra: for the next two years, university researchers would use their expertise to help the German defence ministry predict the future. Instead, the people the colonels had sought out in a stuffy top-floor room were a small team of literary scholars led by Jürgen Wertheimer, a professor of comparative literature with wild curls and a penchant for black roll-necks.


AI Edtech Entrepreneur's Journey from Neuroscience to Toys

#artificialintelligence

Dr. Dhonam Pemba is the CEO and Co-Founder of KidX, he is a neural engineer by education, a former rocket scientist by work, and AI entrepeneur by entrepeneurship. He received his Biomedical Engineering undergraduate degree from Johns Hopkins University, and hi PhD from the University of California, Irvine also in BME, but worked on neural interface for his thesis. Can you me about the NASA JPL project and how it was related to your PhD work? My PhD work was building micro implantable neural implants. Very similar to the work that Elon Musks's company Neuralink is now doing.


Google Earth relaunches today with stunning detail

Daily Mail - Science & tech

Google has today launched a re-imagined version of its free Earth mapping service, weaving in storytelling and artificial intelligence. The new programme lets people get a close-up look of the planet from the comfort of their computers, smartphones or tablets. The new-look Google Earth enables its users to learn about far-flung corners of the globe under the guidance of scientists from Nasa and prestigious research institutions. Google Earth's new start-up screen offers a global view of the Earth. 'This is our gift to the world,' Google Earth director Rebecca Moore said.


Normalized Information Distance

arXiv.org Artificial Intelligence

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.