Goto

Collaborating Authors

 sandwich



Supplementary Materials A Numerical Example on Convergence Bounds

Neural Information Processing Systems

We use the following numerical experiment to further illustrate our finite-time bounds on the convergence of double Q-learning. In such an experiment, the optimal Q-function can be explicitly calculated and thus the learning errors can be tracked. We choose γ = 0 .8,α We prove Lemma 1 by induction. First, it is easy to justify that the initial case is satisfied, i.e., In this appendix, we will provide a detailed proof of Theorem 1.





CAViAR: Critic-Augmented Video Agentic Reasoning

Menon, Sachit, Iscen, Ahmet, Nagrani, Arsha, Weyand, Tobias, Vondrick, Carl, Schmid, Cordelia

arXiv.org Artificial Intelligence

Video understanding has seen significant progress in recent years, with models' performance on perception from short clips continuing to rise. Yet, multiple recent benchmarks, such as LVBench, Neptune, and ActivityNet-RTL, show performance wanes for tasks requiring complex reasoning on videos as queries grow more complex and videos grow longer. In this work, we ask: can existing perception capabilities be leveraged to successfully perform more complex video reasoning? In particular, we develop a large language model agent given access to video modules as subagents or tools. Rather than following a fixed procedure to solve queries as in previous work such as Visual Programming, ViperGPT, and MoReVQA, the agent uses the results of each call to a module to determine subsequent steps. Inspired by work in the textual reasoning domain, we introduce a critic to distinguish between instances of successful and unsuccessful sequences from the agent. We show that the combination of our agent and critic achieve strong performance on the previously-mentioned datasets.


Hasan Piker Will Never Run for Office

WIRED

The Twitch streamer could pivot from influencer to candidate. But he tells WIRED's podcast he'd rather use his platform to tell Dems "you can't podcast your way out of this problem." Hasan Piker is many things to many people. They don't all feel the same way about Piker or his politics, but most presumably agree on one thing: He is a relentless human being. Most days a week, you can find the 34-year-old Twitch streamer talking to his audience, often for six to nine hours at a stretch. And during President Trump's second term, there's plenty of that to go around. He has nearly 3 million followers on Twitch and has hosted conversations with Senator Bernie Sanders and US representative Alexandria Ocasio-Cortez. He claims his election night stream in 2024 reached a staggering 7.5 million viewers. On this episode of, I talked to Piker about his looks, his love of Italian sandwiches, and any future political aspirations he might (or might not) want to tease. It's great to be here. I heard you were just at the gym. Yeah, I was at the park. Some days I take my dog and I play a little bit of basketball and get to hang out with some people.


Sandwich: Separating Prefill-Decode Compilation for Efficient CPU LLM Serving

Zhao, Juntao, Li, Jiuru, Wu, Chuan

arXiv.org Artificial Intelligence

Utilizing CPUs to serve large language models (LLMs) is a resource-friendly alternative to GPU serving. Existing CPU-based solutions ignore workload differences between the prefill and the decode phases of LLM inference, applying a static per-NUMA (Non-Uniform Memory Access) node model partition and utilizing vendor libraries for operator-level execution, which is suboptimal. We propose Sandwich, a hardware-centric CPU-based LLM serving engine that uses different execution plans for the prefill and decode phases and optimizes them separately. We evaluate Sandwich across diverse baselines and datasets on five CPU platforms, including x86 with AVX-2 and AVX-512, as well as ARM with NEON. Sandwich achieves an average 2.01x throughput improvement and 90% satisfactory time-to-first-token (TTFT) and time-per-output-token (TPOT) latencies with up to 3.40x lower requirements in single sequence serving, and significant improvement in Goodput in continuous-batching serving. The GEMM kernels generated by Sandwich outperform representative vendor kernels and other dynamic shape solutions, achieving performance comparable to static compilers with three orders of magnitude less kernel tuning costs.


Tesla opens its first DINER in Hollywood - complete with robot servers, a drive-in cinema, and CyberTruck happy meals

Daily Mail - Science & tech

From flamethrowers to hot pants, Elon Musk has already released a range of weird and wacky products. Now, the billionaire is taking on the likes of McDonald's, Wendy's, and IHOP, with his very first diner. The Tesla Diner is described as a'retro-futuristic diner and drive-in charging experience.' The diner itself has over 250 seats for dining, with dishes on offer ranging from 7 cinnamon rolls to 10 salads. Alternatively, those hoping to relax for a few hours can enjoy a movie on either of the two 66ft LED megascreens outside the diner.


Dietary Intake Estimation via Continuous 3D Reconstruction of Food

Lee, Wallace, Chen, YuHao

arXiv.org Artificial Intelligence

Monitoring dietary habits is crucial for preventing health risks associated with overeating and undereating, including obesity, diabetes, and cardiovascular diseases. Traditional methods for tracking food intake rely on self-reported data before or after the eating, which are prone to inaccuracies. This study proposes an approach to accurately monitor ingest behaviours by leveraging 3D food models constructed from monocular 2D video. Using COLMAP and pose estimation algorithms, we generate detailed 3D representations of food, allowing us to observe changes in food volume as it is consumed. Experiments with toy models and real food items demonstrate the approach's potential. Meanwhile, we have proposed a new methodology for automated state recognition challenges to accurately detect state changes and maintain model fidelity. The 3D reconstruction approach shows promise in capturing comprehensive dietary behaviour insights, ultimately contributing to the development of automated and accurate dietary monitoring tools.