bild
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
Speculative Decoding with Big Little Decoder
The recent emergence of Large Language Models based on the Transformer architecture has enabled dramatic advancements in the field of Natural Language Processing. However, these models have long inference latency, which limits their deployment and makes them prohibitively expensive for various real-time applications. The inference latency is further exacerbated by autoregressive generative tasks, as models need to run iteratively to generate tokens sequentially without leveraging token-level parallelization. To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications. The BiLD framework contains two models with different sizes that collaboratively generate text.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
Speculative Decoding with Big Little Decoder
The recent emergence of Large Language Models based on the Transformer architecture has enabled dramatic advancements in the field of Natural Language Processing. However, these models have long inference latency, which limits their deployment and makes them prohibitively expensive for various real-time applications. The inference latency is further exacerbated by autoregressive generative tasks, as models need to run iteratively to generate tokens sequentially without leveraging token-level parallelization. To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications. The BiLD framework contains two models with different sizes that collaboratively generate text.
Speculative Decoding with Big Little Decoder
Kim, Sehoon, Mangalam, Karttikeya, Moon, Suhong, Malik, Jitendra, Mahoney, Michael W., Gholami, Amir, Keutzer, Kurt
The recent emergence of Large Language Models based on the Transformer architecture has enabled dramatic advancements in the field of Natural Language Processing. However, these models have long inference latency, which limits their deployment and makes them prohibitively expensive for various real-time applications. The inference latency is further exacerbated by autoregressive generative tasks, as models need to run iteratively to generate tokens sequentially without leveraging token-level parallelization. To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications. The BiLD framework contains two models with different sizes that collaboratively generate text. The small model runs autoregressively to generate text with a low inference cost, and the large model is only invoked occasionally to refine the small model's inaccurate predictions in a non-autoregressive manner. To coordinate the small and large models, BiLD introduces two simple yet effective policies: (1) the fallback policy that determines when to hand control over to the large model; and (2) the rollback policy that determines when the large model needs to correct the small model's inaccurate predictions. To evaluate our framework across different tasks and models, we apply BiLD to various text generation scenarios encompassing machine translation on IWSLT 2017 De-En and WMT 2014 De-En, and summarization on XSUM and CNN/DailyMail. On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation. Furthermore, our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture. Our code is open-sourced
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
What future for journalism in the age of AI?
The article you are about to read was written by a human. This kind of disclaimer will become an everyday occurrence as chatbots, or large language models, infiltrate deeper into our media space. Doubts about the veracity of such disclaimers will also become commonplace. With the leaps and bounds registered by machine learning and large language models over the past couple of years, it is becoming increasingly difficult to prove that a human is on the other side of a written or spoken communication. How would I prove to you that these words were the product of human creativity and exertion?
- North America > Panama (0.05)
- Europe > Germany (0.05)
The owner of Insider and Politico tells journalists: AI is coming for your jobs
One of Europe's biggest media groups has warned journalists that artificial intelligence (AI) could steal their jobs, and has provided tips for how reporters can avoid the chop. The chief executive of Axel Springer -- which owns Insider, Politico and German tabloid newspaper Bild -- told employees in a memo Tuesday that "artificial Intelligence has the potential to make independent journalism better than it ever was -- or simply replace it." In the memo, shared with CNN, Mathias Döpfner predicts that AI will soon be able to aggregate information much better than humans, and urges newsrooms to place a greater emphasis on commentary, exclusive news and investigations that can't be done by machines. Journalists would still be needed to understand people's "true motives", he said. "In short, the creation of exclusive and attractive content remains irreplaceable and is going to become even more critical to success for publishers," Döpfner wrote.
- Europe (0.27)
- North America > United States > New York (0.07)
Watch: This AI gadget makes stop-motion animation easy
Did you know Neural is taking the stage this fall? Together with an amazing line-up of experts, we will explore the future of AI during TNW Conference 2021. Artificial intelligence is far too often a solution looking for a problem. So it's refreshing when someone manages to find the perfect way to apply a simple AI trick to a huge problem plaguing humanity. And that's exactly what serial creator Nick Bild's done (again) with his novel AI-powered stop-motion animation system. Simply put, the big problem with creating stop-motion animation is that it takes forever.
- Information Technology > Graphics > Animation (1.00)
- Information Technology > Artificial Intelligence (1.00)
Nick Bild's Deep Clean System Flags Potentially Contaminated Surfaces
Amid the continued spread of coronavirus, extra care is being taken by just about everyone to wash hands and wipe down surfaces, from countertops to groceries. To spotlight potentially contaminated surfaces, hobbyist Nick Bild has come up with Deep Clean, a stereo camera system that flags objects that have been touched in a room. The device can be used by cleaning crews at hospitals and assisted living facilities or anyone who'd like to know what areas need special attention when trying to prevent disease transmission. Deep Clean uses an NVIDIA Jetson AGX Xavier developer kit as the main processing unit to map out a room, detecting where different objects lie within it. Jetson helps pinpoint the exact location (x,y-coordinates) and depth (z-coordinate) of each object.
Watch: Performance-enhancing AI could change baseball forever
Its creator says it can predict whether a baseball pitch will land inside or outside of the strike zone. Tipper was developed by Nick Bild, a serial creator who seems to have an unquenchable thirst to create and innovate. He makes apps, trains neural networks, and literally has a gold badge in'problem solving' on HackerRank. He says he was inspired to build Tipper while sitting idle in traffic, pondering the world from an engineer's point of view. A modified Nerf tennis ball launcher is programmatically fired with a solenoid. A 100FPS camera is pointed in the direction of the launcher and captures two successive images of the ball early in flight.