Goto

Collaborating Authors

 posit



EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Neural Information Processing Systems

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior. For each question, EgoSchema requires the correct answer to be selected between five given options based on a three-minute-long video clip. While some prior works have proposed video datasets with long clip lengths, we posit that merely the length of the video clip does not truly capture the temporal difficulty of the video task that is being considered. To remedy this, we introduce temporal certificate sets, a general notion for capturing the intrinsic temporal understanding length associated with a broad range of video understanding tasks & datasets.



EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Neural Information Processing Systems

We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior. For each question, EgoSchema requires the correct answer to be selected between five given options based on a three-minute-long video clip. While some prior works have proposed video datasets with long clip lengths, we posit that merely the length of the video clip does not truly capture the temporal difficulty of the video task that is being considered. To remedy this, we introduce temporal certificate sets, a general notion for capturing the intrinsic temporal understanding length associated with a broad range of video understanding tasks & datasets.


Brian Chesky Says Big Things Are Coming for Airbnb in 2025

WIRED

Big changes could be coming to Airbnb next year. In a conversation at WIRED's Big Interview even in San Francisco on Tuesday, the company's cofounder and CEO Brian Chesky told global editorial director Katie Drummond that he hopes that, in 2025, "people say'that was one of the biggest reinventions of a company in recent memory.'" Though Chesky kept details scant, he did say that the company hopes to reimagine its Experiences section, which he says consumers really like but that he doesn't think has caught on as much as it could. The move seems to be an extension of Chesky's belief in the value of physical experiences and physical community, which he still thinks trump most digital experiences, even in the age of AI. In an effort to prove that, even two years into the AI revolution, fundamentally very little has been changed for most people, Chesky challenged the room to look at the apps on their phone home screens and think how much any of them have been substantially changed by generative AI.


Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Ramachandran, Akshat, Wan, Zishen, Jeong, Geonhwa, Gustafson, John, Krishna, Tushar

arXiv.org Artificial Intelligence

Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while reducing representational divergence between quantized and full-precision models through a novel global-local contrastive objective. Additionally, we design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath. Our algorithm-hardware co-design demonstrates on average <1% drop in top-1 accuracy across various CNN and ViT models. It also achieves ~ 2x improvements in performance per unit area and 2.2x gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types.


Low-Precision Mixed-Computation Models for Inference on Edge

Azizi, Seyedarmin, Nazemi, Mahdi, Kamal, Mehdi, Pedram, Massoud

arXiv.org Artificial Intelligence

This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.


Posits, a revolution in mathematics? - Actu IA

#artificialintelligence

Mathematics plays a prominent role in the developments brought about by artificial intelligence in recent years: we think in particular of machine learning, or computational neuroscience. Recently, two researchers have started what could be a revolution thanks to posits, which are nothing more or less than a different way of representing numbers. We have to consider a huge computing power to make revolutionary AI applications work. Do you know, for example, how many operations it took to train GPT-3, Open AI's most advanced language model? All at an estimated cost of $5 million.


Looking for Alien Life? Seek Out Alien Tech

WIRED

Back in 1950, Enrico Fermi posed the question now known as the Fermi Paradox: Given the countless galaxies, stars, and planets out there, the odds are that life exists elsewhere--so why haven't we found it? The size of the universe is only one possible answer. Maybe humans have already encountered extraterrestrial (ET) life but didn't recognize it. Maybe it doesn't want to be found. Maybe it doesn't find us interesting.


ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems

Nambi, Suresh, Ullah, Salim, Lohana, Aditya, Sahoo, Siva Satyendra, Merchant, Farhad, Kumar, Akash

arXiv.org Artificial Intelligence

The recent advances in machine learning, in general, and Artificial Neural Networks (ANN), in particular, has made smart embedded systems an attractive option for a larger number of application areas. However, the high computational complexity, memory footprints, and energy requirements of machine learning models hinder their deployment on resource-constrained embedded systems. Most state-of-the-art works have considered this problem by proposing various low bit-width data representation schemes, optimized arithmetic operators' implementations, and different complexity reduction techniques such as network pruning. To further elevate the implementation gains offered by these individual techniques, there is a need to cross-examine and combine these techniques' unique features. This paper presents ExPAN(N)D, a framework to analyze and ingather the efficacy of the Posit number representation scheme and the efficiency of fixed-point arithmetic implementations for ANNs. The Posit scheme offers a better dynamic range and higher precision for various applications than IEEE $754$ single-precision floating-point format. However, due to the dynamic nature of the various fields of the Posit scheme, the corresponding arithmetic circuits have higher critical path delay and resource requirements than the single-precision-based arithmetic units. Towards this end, we propose a novel Posit to fixed-point converter for enabling high-performance and energy-efficient hardware implementations for ANNs with minimal drop in the output accuracy. We also propose a modified Posit-based representation to store the trained parameters of a network. Compared to an $8$-bit fixed-point-based inference accelerator, our proposed implementation offers $\approx46\%$ and $\approx18\%$ reductions in the storage requirements of the parameters and energy consumption of the MAC units, respectively.