Goto

Collaborating Authors

 posit



EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Neural Information Processing Systems

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior. For each question, EgoSchema requires the correct answer to be selected between five given options based on a three-minute-long video clip. While some prior works have proposed video datasets with long clip lengths, we posit that merely the length of the video clip does not truly capture the temporal difficulty of the video task that is being considered. To remedy this, we introduce temporal certificate sets, a general notion for capturing the intrinsic temporal understanding length associated with a broad range of video understanding tasks & datasets.


XR-NPE: High-Throughput Mixed-precision SIMD Neural Processing Engine for Extended Reality Perception Workloads

Chaudhari, Tejas, J., Akarsh, Dewangan, Tanushree, Lokhande, Mukul, Vishvakarma, Santosh Kumar

arXiv.org Artificial Intelligence

This work proposes XR-NPE, a high-throughput Mixed-precision SIMD Neural Processing Engine, designed for extended reality (XR) perception workloads like visual inertial odometry (VIO), object classification, and eye gaze extraction. XR-NPE is first to support FP4, Posit (4,1), Posit (8,0), and Posit (16,1) formats, with layer adaptive hybrid-algorithmic implementation supporting ultra-low bit precision to significantly reduce memory bandwidth requirements, and accompanied by quantization-aware training for minimal accuracy loss. The proposed Reconfigurable Mantissa Multiplication and Exponent processing Circuitry (RMMEC) reduces dark silicon in the SIMD MAC compute engine, assisted by selective power gating to reduce energy consumption, providing 2.85x improved arithmetic intensity. XR-NPE achieves a maximum operating frequency of 1.72 GHz, area 0.016 mm2 , and arithmetic intensity 14 pJ at CMOS 28nm, reducing 42% area, 38% power compared to the best of state-of-the-art MAC approaches. The proposed XR-NPE based AXI-enabled Matrix-multiplication co-processor consumes 1.4x fewer LUTs, 1.77x fewer FFs, and provides 1.2x better energy efficiency compared to SoTA accelerators on VCU129. The proposed co-processor provides 23% better energy efficiency and 4% better compute density for VIO workloads. XR-NPE establishes itself as a scalable, precision-adaptive compute engine for future resource-constrained XR devices. The complete set for codes for results reproducibility are released publicly, enabling designers and researchers to readily adopt and build upon them. https://github.com/mukullokhande99/XR-NPE.



EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Neural Information Processing Systems

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior. For each question, EgoSchema requires the correct answer to be selected between five given options based on a three-minute-long video clip. While some prior works have proposed video datasets with long clip lengths, we posit that merely the length of the video clip does not truly capture the temporal difficulty of the video task that is being considered. To remedy this, we introduce temporal certificate sets, a general notion for capturing the intrinsic temporal understanding length associated with a broad range of video understanding tasks & datasets.


Brian Chesky Says Big Things Are Coming for Airbnb in 2025

WIRED

Big changes could be coming to Airbnb next year. In a conversation at WIRED's Big Interview even in San Francisco on Tuesday, the company's cofounder and CEO Brian Chesky told global editorial director Katie Drummond that he hopes that, in 2025, "people say'that was one of the biggest reinventions of a company in recent memory.'" Though Chesky kept details scant, he did say that the company hopes to reimagine its Experiences section, which he says consumers really like but that he doesn't think has caught on as much as it could. The move seems to be an extension of Chesky's belief in the value of physical experiences and physical community, which he still thinks trump most digital experiences, even in the age of AI. In an effort to prove that, even two years into the AI revolution, fundamentally very little has been changed for most people, Chesky challenged the room to look at the apps on their phone home screens and think how much any of them have been substantially changed by generative AI.


Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Ramachandran, Akshat, Wan, Zishen, Jeong, Geonhwa, Gustafson, John, Krishna, Tushar

arXiv.org Artificial Intelligence

Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while reducing representational divergence between quantized and full-precision models through a novel global-local contrastive objective. Additionally, we design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath. Our algorithm-hardware co-design demonstrates on average <1% drop in top-1 accuracy across various CNN and ViT models. It also achieves ~ 2x improvements in performance per unit area and 2.2x gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types.


Low-Precision Mixed-Computation Models for Inference on Edge

Azizi, Seyedarmin, Nazemi, Mahdi, Kamal, Mehdi, Pedram, Massoud

arXiv.org Artificial Intelligence

This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.


Posits, a revolution in mathematics? - Actu IA

#artificialintelligence

Mathematics plays a prominent role in the developments brought about by artificial intelligence in recent years: we think in particular of machine learning, or computational neuroscience. Recently, two researchers have started what could be a revolution thanks to posits, which are nothing more or less than a different way of representing numbers. We have to consider a huge computing power to make revolutionary AI applications work. Do you know, for example, how many operations it took to train GPT-3, Open AI's most advanced language model? All at an estimated cost of $5 million.


Looking for Alien Life? Seek Out Alien Tech

WIRED

Back in 1950, Enrico Fermi posed the question now known as the Fermi Paradox: Given the countless galaxies, stars, and planets out there, the odds are that life exists elsewhere--so why haven't we found it? The size of the universe is only one possible answer. Maybe humans have already encountered extraterrestrial (ET) life but didn't recognize it. Maybe it doesn't want to be found. Maybe it doesn't find us interesting.