Goto

Collaborating Authors

 vlogger


CAViAR: Critic-Augmented Video Agentic Reasoning

Menon, Sachit, Iscen, Ahmet, Nagrani, Arsha, Weyand, Tobias, Vondrick, Carl, Schmid, Cordelia

arXiv.org Artificial Intelligence

Video understanding has seen significant progress in recent years, with models' performance on perception from short clips continuing to rise. Yet, multiple recent benchmarks, such as LVBench, Neptune, and ActivityNet-RTL, show performance wanes for tasks requiring complex reasoning on videos as queries grow more complex and videos grow longer. In this work, we ask: can existing perception capabilities be leveraged to successfully perform more complex video reasoning? In particular, we develop a large language model agent given access to video modules as subagents or tools. Rather than following a fixed procedure to solve queries as in previous work such as Visual Programming, ViperGPT, and MoReVQA, the agent uses the results of each call to a module to determine subsequent steps. Inspired by work in the textual reasoning domain, we introduce a critic to distinguish between instances of successful and unsuccessful sequences from the agent. We show that the combination of our agent and critic achieve strong performance on the previously-mentioned datasets.


Vlogger: Make Your Dream A Vlog

Zhuang, Shaobin, Li, Kunchang, Chen, Xinyuan, Wang, Yaohui, Liu, Ziwei, Qiao, Yu, Wang, Yali

arXiv.org Artificial Intelligence

In this work, we present Vlogger, a generic AI system for generating a minute-level video blog (i.e., vlog) of user descriptions. Different from short videos with a few seconds, vlog often contains a complex storyline with diversified scenes, which is challenging for most existing video generation approaches. To break through this bottleneck, our Vlogger smartly leverages Large Language Model (LLM) as Director and decomposes a long video generation task of vlog into four key stages, where we invoke various foundation models to play the critical roles of vlog professionals, including (1) Script, (2) Actor, (3) ShowMaker, and (4) Voicer. With such a design of mimicking human beings, our Vlogger can generate vlogs through explainable cooperation of top-down planning and bottom-up shooting. Moreover, we introduce a novel video diffusion model, ShowMaker, which serves as a videographer in our Vlogger for generating the video snippet of each shooting scene. By incorporating Script and Actor attentively as textual and visual prompts, it can effectively enhance spatial-temporal coherence in the snippet. Besides, we design a concise mixed training paradigm for ShowMaker, boosting its capacity for both T2V generation and prediction. Finally, the extensive experiments show that our method achieves state-of-the-art performance on zero-shot T2V generation and prediction tasks. More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor. The code and model is all available at https://github.com/zhuangshaobin/Vlogger.


First look: DJI's GoPro killer, Osmo Action leaves GoPro with no worries

USATODAY - Tech Top Stories

DJI is treading on GoPro's turf with their new Osmo Action camera featuring their "rock steady" stabilization. How does it compare with the Hero 7? Robert Hanashiro, USA TODAY And it acts like it as well, with a caveat: The company that makes it, Chinese drone-maker DJI, says video footage on its Osmo Action camera is smoother than on the GoPro. We tried the camera in a back-to-back shootout with the GoPro Hero 7 Black, and the "RockSteady" footage isn't steadier and is, in fact, barely as smooth as the GoPro. But it is less wild and saturated, meaning that if you use an action camera to shoot footage of human beings, as opposed to surfing, cycling, hiking and the like, your subjects are going to look better. My guess is, though, that you want the camera for action, not portraits.


This AI-Enabled Camera is Perfect for Vloggers

#artificialintelligence

This AI-Enabled Camera is Perfect for Vloggers For everyone out there who wants to capture all the cool things they do in motion.


Sony's mid-range A6400 mirrorless camera is ideal for vloggers

Engadget

Sony has boosted its mid-range APS-C lineup with the launch of the 24-megapixel A6400 mirrorless camera. It looks much the same as its predecessor, the A5100, but has much-improved specs and should be especially ideal for vloggers, thanks to 4K 30fps video and a flip-up touch screen. The A6400 is also getting a bunch of features from its full-frame A7 III and A9 siblings, like 425-point contrast- and phase-detect autofocus with the "world's fastest" .02 It can handle high-speed continuous shooting at up to 11 fps with the mechanical shutter or 8 fps in silent shooting mode, both with continuous autofocus and auto-exposure tracking (you can capture up to 116 JPG frames or 46 RAW before the buffer fills up). As for low-light performance, you can shoot at up to 32,000 ISO or 102,400 expanded with less noise, Sony notes.


Sony's RX100 VI compact hides a ridiculous zoom lens

Engadget

Sony has unveiled the RX100 VI 20.1-megapixel premium compact camera with a new telephoto lens that will make it a lot more interesting for travel photography. T lens of the last model, the Mark VI packs a hugely longer, optically stabilized 24-200 f/2.8-4.5 T mm 8.3X zoom lens. T lens, but the extra range will be worth it for many folks, and Sony has boosted other specs to make up for it. The RX100 VI is equipped with a 1.0-type Exmor RS CMOS sensor with a built in DRAM chip to maximize speed. It can can shoot at 24 fps with full AF/AE tracking as before, but now focuses in just 0.03 seconds compared to 0.05 seconds. Moreover, eye-tracking Eye AF focus is twice as fast as on the RX100 V.


Logan Paul 'more popular' than Zoella with children

BBC News

Controversial vlogger Logan Paul has knocked Zoella off the top spot as UK children's favourite YouTuber, according to a study. Paul was embroiled in a row earlier this year when he posted a video showing the body of an apparent suicide victim. His rise suggested children were seeking "edgier" content, said an analyst for research firm Childwise. It was likely children had been exposed to "shocking content", he added. The Childwise study looked at the media habits of 2,000 children in the UK aged five to 16.


Voices of Vlogging

Biel, Joan-Isaac (Idiap Research Institute) | Gatica-Perez, Daniel (Idiap Research Institute)

AAAI Conferences

Vlogs have rapidly evolved from the ’chat from your bedroom’ format to a highly creative form of expression and communication. However, despite the high popularity of vlogging, automatic analysis of conversational vlogs have not been attempted in the literature. In this paper, we present a novel analysis of conversational vlogs based on the characterization of vloggers’ nonverbal behavior. We investigate the use of four nonverbal cues extracted automatically from the audio channel to measure the behavior of vloggers and explore the relation to their degree of popularity and that of their videos. Our study is validated on over 2200 videos and 150 hours of data, and shows that one nonverbal cue (speaking time) is correlated with levels of popularity with a medium size effect.