Goto

Collaborating Authors

 Media


On The Landscape of Spoken Language Models: A Comprehensive Survey

arXiv.org Artificial Intelligence

The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken language models (SLMs) which act as universal speech processing systems. This trend is similar to the progression toward universal language models that has taken place in the field of (text) natural language processing. SLMs include both "pure" language models of speech -- models of the distribution of tokenized speech sequences -- and models that combine speech encoders with text language models, often including both spoken and written input or output. Work in this area is very diverse, with a range of terminology and evaluation settings. This paper aims to contribute an improved understanding of SLMs via a unifying literature survey of recent work in the context of the evolution of the field. Our survey categorizes the work in this area by model architecture, training, and evaluation choices, and describes some key challenges and directions for future work.


BOISHOMMO: Holistic Approach for Bangla Hate Speech

arXiv.org Artificial Intelligence

One of the most alarming issues in digital society is hate speech (HS) on social media. The severity is so high that researchers across the globe are captivated by this domain. A notable amount of work has been conducted to address the identification and alarm system. However, a noticeable gap exists, especially for low-resource languages. Comprehensive datasets are the main problem among the constrained resource languages, such as Bangla. Interestingly, hate speech or any particular speech has no single dimensionality. Similarly, the hate component can simultaneously have multiple abusive attributes, which seems to be missed in the existing datasets. Thus, a multi-label Bangla hate speech dataset named BOISHOMMO has been compiled and evaluated in this work. That includes categories of HS across race, gender, religion, politics, and more. With over two thousand annotated examples, BOISHOMMO provides a nuanced understanding of hate speech in Bangla and highlights the complexities of processing non-Latin scripts. Apart from evaluating with multiple algorithmic approaches, it also highlights the complexities of processing Bangla text and assesses model performance. This unique multi-label approach enriches future hate speech detection and analysis studies for low-resource languages by providing a more nuanced, diverse dataset.


LLM for Comparative Narrative Analysis

arXiv.org Artificial Intelligence

In this paper, we conducted a Multi-Perspective Comparative Narrative Analysis (CNA) on three prominent LLMs: GPT-3.5, PaLM2, and Llama2. We applied identical prompts and evaluated their outputs on specific tasks, ensuring an equitable and unbiased comparison between various LLMs. Our study revealed that the three LLMs generated divergent responses to the same prompt, indicating notable discrepancies in their ability to comprehend and analyze the given task. Human evaluation was used as the gold standard, evaluating four perspectives to analyze differences in LLM performance.


TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation

arXiv.org Artificial Intelligence

Human-centric motion control in video generation remains a critical challenge, particularly when jointly controlling camera movements and human poses in scenarios like the iconic Grammy Glambot moment. While recent video diffusion models have made significant progress, existing approaches struggle with limited motion representations and inadequate integration of camera and human motion controls. In this work, we present TokenMotion, the first DiT-based video diffusion framework that enables fine-grained control over camera motion, human motion, and their joint interaction. We represent camera trajectories and human poses as spatio-temporal tokens to enable local control granularity. Our approach introduces a unified modeling framework utilizing a decouple-and-fuse strategy, bridged by a human-aware dynamic mask that effectively handles the spatially-and-temporally varying nature of combined motion signals. Through extensive experiments, we demonstrate TokenMotion's effectiveness across both text-to-video and image-to-video paradigms, consistently outperforming current state-of-the-art methods in human-centric motion control tasks. Our work represents a significant advancement in controllable video generation, with particular relevance for creative production applications.


SPHERE: An Evaluation Card for Human-AI Systems

arXiv.org Artificial Intelligence

In the era of Large Language Models (LLMs), establishing effective evaluation methods and standards for diverse human-AI interaction systems is increasingly challenging. To encourage more transparent documentation and facilitate discussion on human-AI system evaluation design options, we present an evaluation card SPHERE, which encompasses five key dimensions: 1) What is being evaluated?; 2) How is the evaluation conducted?; 3) Who is participating in the evaluation?; 4) When is evaluation conducted?; 5) How is evaluation validated? We conduct a review of 39 human-AI systems using SPHERE, outlining current evaluation practices and areas for improvement. We provide three recommendations for improving the validity and rigor of evaluation practices.


AI humanoid robot learns to mimic human emotions and behavior

FOX News

Ready for a robot that not only looks human but also acts and reacts like one, expressing emotions like shyness, excitement or friendliness? Disney Research, the innovation powerhouse behind The Walt Disney Company, has turned this into reality. Its latest creation is an autonomous humanoid robot that can mimic human emotions and behaviors in real time. Think of it as a real-life WALL-E, but with even more personality. This groundbreaking robot uses advanced artificial intelligence to replicate natural gestures and deliberate actions with striking accuracy.


Jeff Bridges Is Digging It

The New Yorker

The interior of Jeff Bridges's garage, in Santa Barbara, California, has the ramshackle ease of an extravagant dorm room: a tiger-print rug, a potter's wheel, guitars, a rogue toothbrush, taped-up printouts of ideas he finds provocative or perhaps grounding ("Enlightenment is a communal experience"), and piles of books, from Richard Powers's "Bewilderment" to "Who Cares?! A black-and-white portrait of Captain Beefheart, incongruously dressed in a jacket and tie, hangs on a wall near an electric piano. When I arrived, on a recent afternoon, I did not take note of a lava lamp, but its presence didn't feel out of the question. Bridges was wearing rubber slides and a periwinkle-blue cardigan. He excitedly spread out a large furry blanket on a recliner and invited me to sit down: "Your throne, man!" he said. Earlier this month, Bridges released "Slow Magic, 1977-1978," a series of songs he recorded when he was in his late twenties, an emergent movie star, and involved in a regular Wednesday-night jam session with a coterie of musicians and oddballs from the west side of Los Angeles (the jams were organized by Steve Baim, who attended University High School with Bridges; they took place in various beach houses and, occasionally, at the Village, the recording studio where, around the same time, Fleetwood Mac was making "Tusk"). "Slow Magic" is great and also bonkers. On "Kong," Bridges recounts a story line he pitched for a potential "King Kong" sequel (in 1976, Bridges starred as the long-haired primatologist Jack Prescott in a "Kong" remake produced by Dino De Laurentiis); the track features animated narration from the actor Burgess Meredith, and its lyrics are centered on the revelation that Kong is actually a robot. "It's a sad story, but he was just a monkey machine!" Bridges wails in a tottering falsetto. On "Obnoxious," a weirdly tender song about feeling sad and having a stomachache ("I went to the bathroom / And threw up"), there are echoes of Frank Zappa and the Band. What I like most about the record is how social it feels: friends in a room, being dumb, intermittently (even inadvertently) doing something miraculous. "When recording technology kept improving, I said, 'Oh, I don't need anybody!


Pro-life journalist assaulted on street assigns blame to Democratic rhetoric

FOX News

'Live Action' journalist Savannah Craven Antao speaks out after being punched by an interviewee on'The Will Cain Show.' Pro-life activist Savannah Craven Antao believes the Democratic Party's recent rhetoric about "punching" at their Republican opponents contributed to the attack that left her bloody during a recent interview. Antao, a young pro-life influencer who was punched in the face by a woman she was interviewing in New York City earlier this month, pointed to Rep. Jasmine Crockett's, D-Texas, recent line about Democrats "punching" as inspiring the attack that happened to her. "She said, 'I think that you punch,'" Antao told Fox News Digital. "'I think you're okay with punching.' So yeah โ€“ pretty much just describes the left at this point. They're totally fine with just using force like that to hurt people if they don't agree with them."


Doctor Who 'The Robot Revolution' review: Meet Belinda Chandra

Engadget

The start of any season of Doctor Who is important, doubly so when there's a new co-star to introduce. "The Robot Revolution" has to get us to fall in love with Belinda Chandra (Varada Sethu), ensnare new fans and keep existing ones hooked. Especially since it's the second of two series that Disney paid for, meaning it's got to do well enough to keep the money flowing. It's an awkward teenage date, with Alan clearly trying to win the heart of his beau by buying her one of those star adoption certificates. In 2025, Belinda is now a nurse at a busy London hospital where, in the background, the Doctor is searching for her.


Netflix is reportedly testing a search function powered by OpenAI

Engadget

Netflix has started testing a new search feature powered by OpenAI that can help customers find movies and shows to watch, according to Bloomberg. The streaming service has reportedly given select users in Australia and New Zealand the option to use the tool. It will allow users to search for terms other than a specific show's title, an actor's name or the genre they want to watch. Bloomberg says it will give them a way to search for content using more specific terms, like their mood. Presumably, that means the service can surface dramatic shows for a search query that says "sad," and seeing as it's powered by generative AI, users will most likely be able to use natural language in their search terms.