tarr
LLM Stability: A detailed analysis with some surprises
Atil, Berk, Chittams, Alexa, Fu, Liseng, Ture, Ferhan, Xu, Lixinyu, Baldwin, Breck
LLM (large language model) practitioners commonly notice that outputs can vary for the same inputs, but we have been unable to find work that evaluates LLM stability as the main objective. In our study of 6 deterministically configured LLMs across 8 common tasks with 5 identical runs, we see accuracy variations up to 10\%. In addition, no LLM consistently delivers repeatable accuracy across all tasks. We also show examples of variation that are not normally distributed and compare configurations with zero-shot/few-shot prompting and fine-tuned examples. To better quantify what is going on, we introduce metrics focused on stability: TARr@N for the total agreement rate at N runs over raw output, and TARa@N for total agreement over parsed-out answers. We suggest that stability metrics be integrated into leader boards and research results going forward.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Monaco (0.04)
Irish teenager wins national award for 'deepfake' video detector
A teenage student in Ireland has won a national science competition for developing technology that can more easily detect "deepfake" videos online. Greg Tarr, from County Cork, was declared the winner of the 2021 BT Young Scientist & Technologist of the Year award last week for his project, "Towards Deepfake Detection". The picture or audio of deepfake videos is altered by artificial intelligence (AI) to make it appear as though someone has said or done something they have not. The viral spread of deepfake videos has caused international concern, in an age of digital news consumption, and social media companies have come under renewed scrutiny on how to tackle the spread of this misinformation. An altered video, claiming to show US President-elect Joe Biden falling asleep during a television interview, was widely shared before November's election.
- North America > United States (0.61)
- Europe > Ireland > Munster > County Cork (0.28)
Dataset bridges human vision and machine learning
Neuroscientists and computer vision scientists say a new dataset of unprecedented size--comprising brain scans of four volunteers who each viewed 5,000 images--will help researchers better understand how the brain processes images. Researchers at Carnegie Mellon University and Fordham University, reporting today in the journal Scientific Data, said acquiring functional magnetic resonance imaging (fMRI) scans at this scale presented unique challenges. Each volunteer participated in 20 or more hours of MRI scanning, challenging both their perseverance and the experimenters' ability to coordinate across scanning sessions. The extreme design decision to run the same individuals over so many sessions was necessary for disentangling the neural responses associated with individual images. The resulting dataset, dubbed BOLD5000, allows cognitive neuroscientists to better leverage the deep learning models that have dramatically improved artificial vision systems.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)