Goto

Collaborating Authors

 Kanchanaburi


CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs

Kowshik, Suhas S, Divekar, Abhishek, Malik, Vijit

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable performance in diverse tasks using zero-shot and few-shot prompting. Even though their capabilities of data synthesis have been studied well in recent years, the generated data suffers from a lack of diversity, less adherence to the prompt, and potential biases that creep into the data from the generator model. In this work, we tackle the challenge of generating datasets with high diversity, upon which a student model is trained for downstream tasks. Taking the route of decoding-time guidance-based approaches, we propose CorrSynth, which generates data that is more diverse and faithful to the input prompt using a correlated sampling strategy. Further, our method overcomes the complexity drawbacks of some other guidance-based techniques like classifier-based guidance. With extensive experiments, we show the effectiveness of our approach and substantiate our claims. In particular, we perform intrinsic evaluation to show the improvements in diversity. Our experiments show that CorrSynth improves both student metrics and intrinsic metrics upon competitive baselines across four datasets, showing the innate advantage of our method.


Injured hornbill found in Thailand can eat again after vets fit a new beak made with a 3D printer

Daily Mail - Science & tech

An injured hornbill that was found in Thailand with part of its beak snapped off can now eat again after vets fitted it with a replacement made using a 3D printer. The adult bird -- dubbed'Coco' -- was found sprawled on the ground with a broken wing and its lower bill missing in Kanchanaburi, western Thailand, on April 18. Wildlife officers are unsure how Coco was injured, but believe that she may have been shot or attacked by hunters or poachers and then left for dead in the forest. Although veterinarians were able to give urgent care and stabilise the bird, they were sadly unable to find its missing bill in order to reattach it. Realising it would be impossible for Coco to eat without her signature long bill, they scanned her body and used 3D printing technology to create plastic replacements.


HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

Kiskin, Ivan, Cobb, Adam D., Wang, Lawrence, Roberts, Stephen

arXiv.org Machine Learning

Mosquitoes are the only known vector of malaria, which leads to hundreds of thousands of deaths each year. Understanding the number and location of potential mosquito vectors is of paramount importance to aid the reduction of malaria transmission cases. In recent years, deep learning has become widely used for bioacoustic classification tasks. In order to enable further research applications in this field, we release a new dataset of mosquito audio recordings. With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events. We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels. We hope this will become a vital resource for those researching all aspects of malaria, and add to the existing audio datasets for bioacoustic detection and signal processing.


Personalizing Image Search Results on Flickr

Lerman, Kristina, Plangprasopchok, Anon, Wong, Chio

arXiv.org Artificial Intelligence

The social media site Flickr allows users to upload their photos, annotate them with tags, submit them to groups, and also to form social networks by adding other users as contacts. Flickr offers multiple ways of browsing or searching it. One option is tag search, which returns all images tagged with a specific keyword. If the keyword is ambiguous, e.g., ``beetle'' could mean an insect or a car, tag search results will include many images that are not relevant to the sense the user had in mind when executing the query. We claim that users express their photography interests through the metadata they add in the form of contacts and image annotations. We show how to exploit this metadata to personalize search results for the user, thereby improving search performance. First, we show that we can significantly improve search precision by filtering tag search results by user's contacts or a larger social network that includes those contact's contacts. Secondly, we describe a probabilistic model that takes advantage of tag information to discover latent topics contained in the search results. The users' interests can similarly be described by the tags they used for annotating their images. The latent topics found by the model are then used to personalize search results by finding images on topics that are of interest to the user.