Goto

Collaborating Authors

 sheldon


AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

arXiv.org Artificial Intelligence

The creation of high-quality multimodal datasets remains fundamental for advancing role-playing capabilities in large language models (LLMs). While existing works predominantly focus on text-based persona simulation, Audio Role-Playing (ARP) presents unique challenges due to the need for synchronized alignment of semantic content and vocal characteristics. To address this gap, we propose AudioRole, a meticulously curated dataset from 13 TV series spanning 1K+ hours with 1M+ character-grounded dialogues, providing synchronized audio-text pairs annotated with speaker identities and contextual metadata. In addition, to demonstrate the effectiveness of the dataset, we introduced ARP-Eval, a dual-aspect evaluation framework that assesses both response quality and role fidelity. Empirical validation showing GLM-4-Voice trained on AudioRole (which we called ARP-Model) achieve an average Acoustic Personalization score of 0.31, significantly outperforming the original GLM-4-voice and the more powerful model MiniCPM-O-2.6, which specifically supports role-playing in one-shot scenarios. The ARP-Model also achieves a Content Personalization score of 0.36, surpassing the untrained original model by about 38% and maintaining the same level as MiniCPM-O-2.6. AudioRole features dialogues from over 115 main characters, 6 trained ARP-Models that role-play different characters, and evaluation protocols. Together, they provide an essential resource for advancing audio-grounded role-playing research.


Amazon, Google and Meta are 'pillaging culture, data and creativity' to train AI, Australian inquiry finds

The Guardian

Tech companies Amazon, Google and Meta have been criticised by a Senate select committee inquiry for being especially vague over how they used Australian data to train their powerful artificial intelligence products. Labor senator Tony Sheldon, the inquiry's chair, was frustrated by the multinationals' refusal to answer direct questions about their use of Australians' private and personal information. "Watching Amazon, Meta, and Google dodge questions during the hearings was like sitting through a cheap magic trick โ€“ plenty of hand-waving, a puff of smoke, and nothing to show for it in the end," Sheldon said in a statement, after releasing the final report of the inquiry on Tuesday. He called the tech companies "pirates" that were "pillaging our culture, data, and creativity for their gain while leaving Australians empty-handed." The report found some general-purpose AI models โ€“ such as OpenAI's GPT, Meta's Llama and Google's Gemini โ€“ should automatically default to a "high risk" category, and be subjected to mandated transparency and accountability requirements.


Reviews: Tomography of the London Underground: a Scalable Model for Origin-Destination Data

Neural Information Processing Systems

I thank the authors for the clarification in their rebuttal. It is even more clear that the authors should better contrast their work with aggregate approaches such as Dan Sheldon's collective graphical models (e.g., Sheldon and Dietterich (2011), Kumar et al. 2013, Bernstein and Sheldon 2016). Part of the confusion came from some of the modeling choices: In equation (1) the travel times added by one station is Poisson distributed?! Poisson is often used for link loads (how many people there are in a given station), not to model time. Is the quantization of time too coarse for a continuous-time model? Wouldn't a phase-type distribution(e.g., Erlang) be a better choice for time? Such modeling choices must be explained.


Meta's AI is scraping users' photos and posts. Europeans can opt out, but Australians cannot

The Guardian

Meta is using the public Facebook and Instagram photos and posts of its users to train artificial intelligence and, while European users have been allowed to opt out of the mass-scraping of their content, Australian users do not have that option, a parliamentary committee has heard. The parent company of Facebook and Instagram paused the launch of its AI product in Europe in July due to the General Data Protection Regulation (GDPR) privacy rules, and as a result of GDPR law. Meta was ordered to stop training its large language model on data from European users on privacy concerns, and Meta has given European users an opt-out option. Labor's chair of the inquiry examining AI adoption in Australia, senator Tony Sheldon, questioned Meta executives on Tuesday why that option had not been extended to Australian users. "I'll be very frank with you. I'd like to opt out in Australia โ€ฆ and I'd like to have the options similar to Europe, for all Australians, including for myself personally. Why can't I have that option?"


Hi Sheldon! Creating Deep Personalized Characters from TV Shows

arXiv.org Artificial Intelligence

Imagine an interesting multimodal interactive scenario that you can see, hear, and chat with an AI-generated digital character, who is capable of behaving like Sheldon from The Big Bang Theory, as a DEEP copy from appearance to personality. Towards this fantastic multimodal chatting scenario, we propose a novel task, named Deep Personalized Character Creation (DPCC): creating multimodal chat personalized characters from multimodal data such as TV shows. Specifically, given a single- or multi-modality input (text, audio, video), the goal of DPCC is to generate a multi-modality (text, audio, video) response, which should be well-matched the personality of a specific character such as Sheldon, and of high quality as well. To support this novel task, we further collect a character centric multimodal dialogue dataset, named Deep Personalized Character Dataset (DPCD), from TV shows. DPCD contains character-specific multimodal dialogue data of ~10k utterances and ~6 hours of audio/video per character, which is around 10 times larger compared to existing related datasets.On DPCD, we present a baseline method for the DPCC task and create 5 Deep personalized digital Characters (DeepCharacters) from Big Bang TV Shows. We conduct both subjective and objective experiments to evaluate the multimodal response from DeepCharacters in terms of characterization and quality. The results demonstrates that, on our collected DPCD dataset, the proposed baseline can create personalized digital characters for generating multimodal response.Our collected DPCD dataset, the code of data collection and our baseline will be published soon.


Voila raises $6M for its A.I.-powered storefronts for online creators โ€“ TechCrunch

#artificialintelligence

Voila, a startup building infrastructure for social commerce, is bringing concepts from China's e-commerce market to the U.S. The company offers an alternative to the "link in bio" solutions used today by creators, like Linktree and Beacons, which direct followers to creators' social profiles, personal websites, and other recommendations. Instead of a link list or landing page, Voila creates A.I.-powered customizable, shoppable storefronts by automatically detecting items in the creators' online content then generating shoppable links. With now over 10,000 creators signed up for the service, Voila is today announcing the close of its $6 million Series A led by Sinnovation Ventures and joined by Fosun Rz Capital. To date, Voila has raised $7.5 million, including from investors SOSV and Artesian. Voila founder Ke Shang first moved from China to the U.S. to attend college.


SCROLLS: Standardized CompaRison Over Long Language Sequences

arXiv.org Artificial Intelligence

NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.


Using Artificial Intelligence to Track Birds' Dark-of-Night Migrations - insideBIGDATA

#artificialintelligence

On many evenings during spring and fall migration, tens of millions of birds take flight at sunset and pass over our heads, unseen in the night sky. Though these flights have been recorded for decades by the National Weather Services' network of constantly scanning weather radars, until recently these data have been mostly out of reach for bird researchers. That's because the sheer magnitude of information and lack of tools to analyze it made only limited studies possible, says artificial intelligence (AI) researcher Dan Sheldon at the University of Massachusetts Amherst. Ornithologists and ecologists with the time and expertise to analyze individual radar images could clearly see patterns that allowed them to discriminate precipitation from birds and study migration, he adds. But the massive amount of information โ€“ over 200 million images and hundreds of terabytes of data โ€“ significantly limited their ability to sample enough nights, over enough years and in enough locations to be useful in characterizing, let alone tracking, seasonal, continent-wide migrations, he explains.


AI tracks migratory birds using weather radar

#artificialintelligence

Tens of millions of birds make migratory flights for the winter each year, often flying during nighttime. They're frequently spotted by the National Weather Services' network of 159 ground-based radars, which scan the skies every 4 to 10 minutes by emitting pulses of microwaves and measuring their reflections. However, ecologists have historically struggled to make use of the resulting data sets because of their sheer magnitude, which can range up to hundreds of millions of images and hundreds of terabytes over decades. In an effort to lighten the workload, scientists at Cornell's Lab of Ornithology and the University of Massachusetts' College of Information and Computer Sciences recently investigated an AI system capable of distinguishing birds in radar images from precipitation. They say that their tool, dubbed MistNet after the fine nets ornithologists use to capture migratory songbirds, not only aids with classification tasks, but can be used to estimate birds' flying velocity and traffic rates.


Artificial intelligence helps scientists track birds migrating at night

#artificialintelligence

It's difficult to track birds flying across the sky in the dark of night, but every fall and spring, millions of birds migrate through the night. Weather radar can offer a spotty view of the phenomenon, but to track nighttime migrations with greater accuracy and reliability, a group of researchers at the University of Massachusetts at Amherst turned to artificial intelligence. Scientists designed a machine-learning algorithm to analyze weather radar images and differentiate migrating birds from precipitation. The algorithm replicates the power of neural networks to analyze and classify radar images. Researchers used the new artificial intelligence program to survey decades-long radar data sets, revealing seasonal and continent-wide migration patterns.