zambia
CARROT: A Cost Aware Rate Optimal Router
Somerstep, Seamus, Polo, Felipe Maia, de Oliveira, Allysson Flavio Melo, Mangal, Prattyush, Silva, Mírian, Bhardwaj, Onkar, Yurochkin, Mikhail, Maity, Subha
With the rapid growth in the number of Large Language Models (LLMs), there has been a recent interest in LLM routing, or directing queries to the cheapest LLM that can deliver a suitable response. Following this line of work, we introduce CARROT, a Cost AwaRe Rate Optimal rouTer that can select models based on any desired trade-off between performance and cost. Given a query, CARROT selects a model based on estimates of models' cost and performance. Its simplicity lends CARROT computational efficiency, while our theoretical analysis demonstrates minimax rate-optimality in its routing performance. Alongside CARROT, we also introduce the Smart Price-aware Routing (SPROUT) dataset to facilitate routing on a wide spectrum of queries with the latest state-of-the-art LLMs. Using SPROUT and prior benchmarks such as Routerbench and open-LLM-leaderboard-v2 we empirically validate CARROT's performance against several alternative routers.
World's biggest bat colony gathers in Zambia every year: we used artificial intelligence to count them
Everybody who visits Kasanka National Park in Zambia during "bat season" agrees that the evening emergence of African straw-coloured fruit bats from their roost site is one of the wildlife wonders of the world. The bats (Eidolon helvum) arrive at Kasanka every year around October. The numbers swell rapidly until they peak in November. By January they are gone again. Once they recover from the shock of the breathtaking spectacle, everyone also converges on the same question – how many bats are there?
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Sikasote, Claytone, Siaminwe, Kalinda, Mwape, Stanly, Zulu, Bangiwe, Phiri, Mofya, Phiri, Martin, Zulu, David, Nyirenda, Mayumbo, Anastasopoulos, Antonios
This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 hours) consisting of read speech recorded from text sourced from publicly available literature books. The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches. To our knowledge, this is the first multilingual speech dataset created for Zambian languages. We exploit pretraining and cross-lingual transfer learning by finetuning the Wav2Vec2.0
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Sikasote, Claytone, Mukonde, Eunice, Alam, Md Mahfuz Ibn, Anastasopoulos, Antonios
We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the "traditionally" used high-resourced ones. All data and code are publicly available: https://github.com/csikasote/bigc.
Protecting Endangered Animals With AI
While AI is making a big impact in pretty much every business area, it is also important to note some of the ways it is helping to save our planet. Conservationists are increasingly turning to AI as an innovative solution to overcome various biodiversity crises. It helps protect a diverse set of species and assists law enforcement agents who are often short-staffed, and it is almost impossible for them to cover a vast stretch of land, such as a national park. This is one of the reasons why AI is so useful because it can take a lot of the time-consuming work off the shoulders of human workers, such as constantly monitoring surveillance data. In this article, we will talk about some of the interesting ways AI is being used to protect endangered species and the data annotation that is required to create it.
AI is helping treat healthcare as if it's a supply chain problem
Over the last few years, companies across industries from retail to manufacturing have started using digital twins to weather the worst of the world's ongoing supply-chain disruptions. "We wanted to step back and look at a country's whole health care network," says Heidi Albert, head of FIND South Africa. "That's what led us to supply-chain thinking." FIND (Foundation for Innovative New Diagnostics) is a nonprofit based in Switzerland. Testing is one of the weakest links in global health care, says Albert: "Our aim is to make sure that everyone who needs a test has access to one."
Deep learning for detecting pulmonary tuberculosis via chest radiography: an international study across 10 countries
Kazemzadeh, Sahar, Yu, Jin, Jamshy, Shahar, Pilgrim, Rory, Nabulsi, Zaid, Chen, Christina, Beladia, Neeral, Lau, Charles, McKinney, Scott Mayer, Hughes, Thad, Kiraly, Atilla, Kalidindi, Sreenivasa Raju, Muyoyeta, Monde, Malemela, Jameson, Shih, Ting, Corrado, Greg S., Peng, Lily, Chou, Katherine, Chen, Po-Hsuan Cameron, Liu, Yun, Eswaran, Krish, Tse, Daniel, Shetty, Shravya, Prabhakara, Shruthi
Tuberculosis (TB) is a top-10 cause of death worldwide. Though the WHO recommends chest radiographs (CXRs) for TB screening, the limited availability of CXR interpretation is a barrier. We trained a deep learning system (DLS) to detect active pulmonary TB using CXRs from 9 countries across Africa, Asia, and Europe, and utilized large-scale CXR pretraining, attention pooling, and noisy student semi-supervised learning. Evaluation was on (1) a combined test set spanning China, India, US, and Zambia, and (2) an independent mining population in South Africa. Given WHO targets of 90% sensitivity and 70% specificity, the DLS's operating point was prespecified to favor sensitivity over specificity. On the combined test set, the DLS's ROC curve was above all 9 India-based radiologists, with an AUC of 0.90 (95%CI 0.87-0.92). The DLS's sensitivity (88%) was higher than the India-based radiologists (75% mean sensitivity), p<0.001 for superiority; and its specificity (79%) was non-inferior to the radiologists (84% mean specificity), p=0.004. Similar trends were observed within HIV positive and sputum smear positive sub-groups, and in the South Africa test set. We found that 5 US-based radiologists (where TB isn't endemic) were more sensitive and less specific than the India-based radiologists (where TB is endemic). The DLS also remained non-inferior to the US-based radiologists. In simulations, using the DLS as a prioritization tool for confirmatory testing reduced the cost per positive case detected by 40-80% compared to using confirmatory testing alone. To conclude, our DLS generalized to 5 countries, and merits prospective evaluation to assist cost-effective screening efforts in radiologist-limited settings. Operating point flexibility may permit customization of the DLS to account for site-specific factors such as TB prevalence, demographics, clinical resources, and customary practice patterns.
On the Exponential View
The following is the text of a talk I gave in San Francisco on December 1st, 2016. The audience was readers of my newsletter, Exponential View. You can sign up here. This is a long (7,500 word) transcript of the talk. You can scan it to see the slides and accompanying exhibits if that is easier. Or even read it in more than one sitting…. Exponential View has a purpose. In between all the emojis and all the spelling mistakes, this is what it's about: This is me on my first day at school back when I was in Zambia in sub-Saharan Africa. On the right is my friend Rehan, who I reconnected recently through Facebook. He is now known as Dr. Freeze and he does non-invasive body sculpting in Orange County. So I can get you a good rate. But I think it's important, this starting point is important. We often are inspired from where we come from and what the hell was I doing in Zambia? My dad was trained as economist and accountant, well he is retired now, but then he was an economist and was down in Zambia building the kind of institutions that we take for granted in countries like the U.S. and the U.K. to make the country function. Zambia had just got independence from the U.K. It needed a deeper civil service, it was having to build its legal system, create its system of distribution and so on. So I got an early exposure to the importance of economic institutions for making societies wealthier and making them work. While I was down in Zambia, which is a land-locked country and doesn't have great access to the sea and this is the 1970s, so we didn't have a vast range of toys.
The month in games: battle by upvote
Despite the sophistication of their products, video game publishers are just as susceptible as less technically inclined brands to finding their carefully organised media coverage turning on them. This month, the trailer for upcoming game of drones and shooting people Call Of Duty: Infinite Warfare became the second most disliked video in YouTube history, while fellow online first-person shooter, Battlefield 1 became one of the 150 most liked. One reason posited for this vast discrepancy is that players have finally got bored with the glib futurism of many current military games, their fatigue at yet more satellite strikes and exoskeletons brought into sharp relief by Battlefield 1's earthy, steampunk alternate first world war. While there may be an element of truth in that, it's mostly the result of vote-brigading by rabidly contrarian posters on games forums, and demonstrates that even with rigorous planning and budgetary figures normally associated with money laundering operations, you can still be the victim of unintended consequences. Capcom, makers of the Resident Evil series, also found out that publicity can create unpredictable knock-on effects.
AI helps answer thousands of health queries in Zambia via SMS
For many people in Zambia with health queries, sending a text message is the best way to get it answered. U-report, a free SMS-based service set up by UNICEF and run by volunteers, receives many thousands of questions a month, many specifically about HIV and AIDS. Also popular in Uganda, U-report has seen usage triple in the last three years, and about a thousand new users register every day. The volume of messages is growing so fast that the volunteers can't keep up, so UNICEF is testing software that reads and responds to many of the messages automatically. In Zambia, there are roughly 27,000 new HIV infections a year, according to UNICEF, and 40 per cent of these are in those aged 15 to 24.