Goto

Collaborating Authors

 Pemba


Expanding FLORES+ Benchmark for more Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation

arXiv.org Artificial Intelligence

As part of the Open Language Data Initiative shared tasks, we have expanded the FLORES+ evaluation set to include Emakhuwa, a low-resource language widely spoken in Mozambique. We translated the dev and devtest sets from Portuguese into Emakhuwa, and we detail the translation process and quality assurance measures used. Our methodology involved various quality checks, including post-editing and adequacy assessments. The resulting datasets consist of multiple reference sentences for each source. We present baseline results from training a Neural Machine Translation system and fine-tuning existing multilingual translation models. Our findings suggest that spelling inconsistencies remain a challenge in Emakhuwa. Additionally, the baseline models underperformed on this evaluation set, underscoring the necessity for further research to enhance machine translation quality for Emakhuwa. The data is publicly available at https://huggingface.co/datasets/LIACC/Emakhuwa-FLORES.


From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives

arXiv.org Machine Learning

In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in "prediction-powered inference" to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.


Episode 42: How Far Can We Take AI?

#artificialintelligence

On this episode of the eeDesignIt Podcast, we're joined by Dhonam Pemba to explore artificial intelligence (AI) and his new company KidX AI. Dhonam is a neural engineer by PhD, a former rocket scientist and a serial AI entrepreneur. He was CTO of the exited company, Kadho which was acquired by Roybi for its Voice AI technology. At Kadho Sports he was their Chief Scientist which had clients in MLB, USA Volleyball, NFL, NHL, NBA, and NCAA. His latest company, KidX, is in the AI edtech space, where he has built NLP and Voice assessment to serve China's leading robotics company with 4M users.


Interview with AI Specialist Dhonam Pemba

#artificialintelligence

For our latest expert interview on our blog, we've welcomed Dhonam Pemba to share his thoughts on the topic of artificial intelligence (AI) and his journey behind founding KidX AI. Dhonam is a neural engineer by PhD, a former rocket scientist and a serial AI entrepreneur with one exit. He was CTO of the exited company, Kadho which was acquired by Roybi for its Voice AI technology. At Kadho Sports he was their Chief Scientist which had clients in MLB, USA Volleyball, NFL, NHL, NBA, and NCAA. His latest company, KidX, is in the AI edtech space, where he has built NLP and Voice assessment to serve China's leading robotics company with 4M users.


AI Edtech Entrepreneur's Journey from Neuroscience to Toys

#artificialintelligence

Dr. Dhonam Pemba is the CEO and Co-Founder of KidX, he is a neural engineer by education, a former rocket scientist by work, and AI entrepeneur by entrepeneurship. He received his Biomedical Engineering undergraduate degree from Johns Hopkins University, and hi PhD from the University of California, Irvine also in BME, but worked on neural interface for his thesis. Can you me about the NASA JPL project and how it was related to your PhD work? My PhD work was building micro implantable neural implants. Very similar to the work that Elon Musks's company Neuralink is now doing.


Google Earth relaunches today with stunning detail

Daily Mail - Science & tech

Google has today launched a re-imagined version of its free Earth mapping service, weaving in storytelling and artificial intelligence. The new programme lets people get a close-up look of the planet from the comfort of their computers, smartphones or tablets. The new-look Google Earth enables its users to learn about far-flung corners of the globe under the guidance of scientists from Nasa and prestigious research institutions. Google Earth's new start-up screen offers a global view of the Earth. 'This is our gift to the world,' Google Earth director Rebecca Moore said.


Normalized Information Distance

arXiv.org Artificial Intelligence

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.