Goto

Collaborating Authors

 video chat


SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models

Neural Information Processing Systems

Achieving fine-grained spatio-temporal understanding in videos remains a major challenge for current Video Large Multimodal Models (Video LMMs). Addressing this challenge requires mastering two core capabilities: video referring understanding, which captures the semantics of video regions, and video grounding, which segments object regions based on natural language descriptions. However, most existing approaches tackle these tasks in isolation, limiting progress toward unified, referentially grounded video interaction. We identify a key bottleneck in the lack of high-quality, unified video instruction data and a comprehensive benchmark for evaluating referentially grounded video chat. To address these challenges, we contribute in three core aspects: dataset, model, and benchmark. First, we introduce SAMA-239K, a large-scale dataset comprising 15K videos specifically curated to enable joint learning of video referring understanding, grounding, and multi-turn video chat. Second, we propose the SAMA model, which incorporates a versatile spatio-temporal context aggregator and a Segment Anything Model to jointly enhance fine-grained video comprehension and precise grounding capabilities. Finally, we establish SAMA-Bench, a meticulously designed benchmark consisting of 5,067 questions from 522 videos, to comprehensively evaluate the integrated capabilities of Video LMMs in multi-turn, spatio-temporal referring understanding and grounded dialogue. Extensive experiments and benchmarking results show that SAMA not only achieves strong performance on SAMA-Bench but also sets a new state-of-the-art on general grounding benchmarks, while maintaining highly competitive performance on standard visual understanding benchmarks.


Make a Video Call with LLM: A Measurement Campaign over Five Mainstream Apps

arXiv.org Artificial Intelligence

In 2025, Large Language Model (LLM) services have launched a new feature -- AI video chat -- allowing users to interact with AI agents via real-time video communication (RTC), just like chatting with real people. Despite its significance, no systematic study has characterized the performance of existing AI video chat systems. To address this gap, this paper proposes a comprehensive benchmark with carefully designed metrics across four dimensions: quality, latency, internal mechanisms, and system overhead. Using custom testbeds, we further evaluate five mainstream AI video chatbots with this benchmark. This work provides the research community a baseline of real-world performance and identifies unique system bottlenecks. In the meantime, our benchmarking results also open up several research questions for future optimizations of AI video chatbots.


The clever tech powering a wave of pig-butchering scams

FOX News

Fox News' Danamarie McNicholl reports alongside the Secret Service as they detect and prevent the use of credit card skimmers, traced to a crime ring led in Eastern Europe. Pig-butchering scams are getting more sophisticated -- and more costly -- by the day. One report found criminals have swindled an estimated 75 billion from victims. And just recently, a criminal organization in Asia was taken down, adding another 46 million to that tally. I've talked to lots of pig-butchering victims.


Amazon's Echo Show 5 falls to $40 in smart display sale

Engadget

Amazon's Echo Show smart displays with Alexa voice control are already a good value next to the competition, but a big smart display sale is making them even cheaper. The Show 5 is the least expensive, on sale right now for just $40, or 53 percent off the regular price -- a great deal for Alexa capability with a display. And if you need a larger screen, the Echo Show 8 is priced at just $60 (54 percent off) and the Echo Show 10 is $160, for a savings of 36 percent. The Echo Show 5 scored a very solid 85 score in our Engadget review, as it's small size is ideal if don't have a ton of space on your desk, nightstand or countertop. It has a 5.5-inch, 960 x 480 resolution display that shows things like weather forecasts, calendar events, photos and more. The 2MP camera can be used to video chat with friends and family, but it can also be used as a makeshift security camera of sorts.


Soon You'll Be Zooming in Roblox

WIRED

Right around the time Meta started making a feverish pitch for the headset-powered metaverse, executives at other tech companies began piping up to point out that the metaverse could already be accessed through hugely popular mobile apps like Fortnite and Roblox. People love these apps--especially kids and teens. Who needs a full-face computer when you can easily spend hours chatting with friends using the screens you already have? Now Roblox, which isn't just a game but an entire platform of user-generated video games, is adding more power to its metaverse punch. Starting in November, Roblox plans to launch an immersive video-chat option for gamers, Roblox chief executive David Baszucki said in an exclusive interview with WIRED ahead of the company's developers conference this week.


How to avoid the worst dating app scammers

FOX News

You can help prevent others from falling victim to the same romance scam and remember if something seems too good to be true. Get ready for this quick heartbreaking story about love gone wrong from a crafty and callous global dating scam artist. CLICK TO GET KURT'S CYBERGUY NEWSLETTER WITH QUICK TIPS, TECH REVIEWS, SECURITY ALERTS AND EASY HOW-TO'S TO MAKE YOU SMARTER I recently received an email from Linda, who is concerned and wondering if she should worry about falling for a scam from a person she's been talking to online. Here's what she had to say: "I have been in contact with a man who is a Structural Engineer that says he lives and has his office in Wisconsin, but currently is in Dubai overseeing the construction of buildings that he was awarded a contract to build, we talk on the phone all the time and text all the time. He has shared everything that I have asked.


I've Unlocked the Secret to Making First Dates (Mostly) Bearable

Slate

Earlier this year, Zoom announced a Byzantine policy change that, if I thought about it at all when it happened, I probably would have expected to have almost no impact on my life: One-on-one video calls, which had previously been free and unrestricted for all non-paying users of its platform, would now have a 40-minute time limit just like group calls. A bummer for thrifty Zoom power users, perhaps, but at the time, I was blessed to live an existence of only sporadic Zooming. Then a few months ago I had occasion to start using Zoom a little more frequently. I would love to leave the reasons for this sudden Zoomassaince vague and retain one emotional support shred of dignity, but there's no real way to explain the rest without disclosing the following: I had decided it was time to "get back out there" and was using Zoom to go on video dates. After meeting and chatting with people on dating apps, I would suggest we talk on video before actually getting together in person, et violร : video date.


Tinder launches 'Face to Face' video calls

Daily Mail - Science & tech

Locked down singles in Britain looking for love on Tinder, one of the world's most popular dating apps, can now video chat with their matches. Tinder had announced it is rolling out its'Face to Face' feature to its global customer base today. To prevent creeps and weirdos exploiting the feature to berate or harass their matches, video calling only becomes available when both parties opt in. It is designed to be used to compliment and boost conversation once a spark has been established. Tinder had announced it is rolling out its'Face to Face' feature today to its users around the world.


Proliferation Of Machine Learning Video Chat In Relationships

#artificialintelligence

Machine learning is becoming more important in our daily lives. But most of us probably never envisioned a day when it would be important in online dating or the beginning of new relationships. A growing number of video chat services are utilizing machine learning features in interesting ways. MarTech Series published an article last year on the growing relevance of machine learning in video conferencing. The same principles can be just as applicable to video chats with online dating services.


Zoom finally conferences in Alexa, Google and Facebook on Echo Show, Portal and Nest Hub Max

USATODAY - Tech Top Stories

After frustrating stay-at-home workers and parents looking for an easy way to connect their kids to Zoom because it wasn't available, the world's most popular video meeting application is finally coming to Amazon, Google and Facebook video display units. The Echo Show, Google Nest Hub Max and Facebook Portal were originally released as a way for folks to engage in video chatting and home entertainment without the bother of turning on the computer, phone or TV. That there would be a coronavirus that sent Zoom usage up 10x in less than a year wasn't foreseen, nor the need to have a dedicated video display that could handle one-touch setup for the meetings and classes, without the bother of phones and computers. Fun:How to make video meetings more like in-person experience? But now, Zoom is finally coming to the devices, first to Portal, in September, the latest version of the Amazon Echo Show (version 8) and Google's Nest Hub Max, but not until later this year.