Large Language Model
Synthetic Data Is About To Transform Artificial Intelligence
These people do not exist. These faces were artificially generated using a form of deep learning ... [ ] known as generative adversarial networks (GANs). Synthetic data like this is becoming increasingly indistinguishable from real-world data. Imagine if it were possible to produce infinite amounts of the world's most valuable resource, cheaply and quickly. What dramatic economic transformations and opportunities would result? This is a reality today. It is called synthetic data. Synthetic data is not a new idea, but it is now approaching a critical inflection point in terms of real-world impact.
Does AI really need a paradigm shift?
The provocative public conversation I am having with Scott Alexander, of SlateStarCodex fame, already a meme, continues! In a fresh reply to my "What does it mean when AI fails?", Alexander has put forward his second stimulating critique of the week, "Somewhat Contra Marcus On AI Scaling". Which is not to say it's perfect; on the other hand, one doesn't need perfection in order to be provocative. I will respond briefly to the disappointing part, and then we then get to the good stuff.
Tesla Full Self Driving Is Using GPT For Vision -- Dr. Know It All Explains What This Means
Tesla's Full-Self Driving is using generative pre-trained transformers (GPT) for vision, Elon Musk tweeted recently. He added that the GPTs are running natively on Tesla TRIP chips versus needing to round trip to iGPU. I think it's important to take a quick deep dive into this, because this is kind of the heart and soul of FSD. Know It All Knows It All" to translate what all of this means. Learning new things is something we all should be open to, and that's why I'm writing this today. Elon Musk's initial tweet was a response to @JeffTutorials, who asked Elon Musk to add software release notes into the Tesla app, adding that it would be nice to see what was new right from the phone. In that thread, Elon Musk noted that the transformers are replacing C heuristics for post-processing of the vision neural networks' giant bag of points. I asked Dr. Know It All to share a bit more about TRIP chips and he pointed me to a project that the Department of Computer Science at The University of Texas at Austin worked on. I think, but am not 100% sure, that Elon was referring to TRIPS chips, which is a type of microprocessor architecture. You can read up on the project here. In the tweet below, KL Manish shared a definition of a TRIP chip and Elon Musk confirmed this. Dr. Know It All noted that Elon Musk revealed a lot of useful information, and his video is a short dive into what exactly Elon Musk is talking about and why it matters. I'm sure Jeff didn't plan on initiating a conversation about artificial intelligence and GPT, and Elon's reply to Jeff is a bit off the topic. What Jeff was referring to was making the release notes available in the Tesla app as well as on the screen of the car. It's a brilliant suggestion and would make taking screenshots of the release notes easier for those who share them on Twitter for us writers to write about. Dr. Know It All explained that GPT is something that OpenAI is working on -- specifically GPT3. GPT3 has 175 billion parameters. Now, I'm not saying GPT3 is what Tesla is using here but I just wanted to put that as a contextual element there."
Sparse models and cheap SRAM for language models
As compelling as the leading large-scale language models may be, the fact remains that only the largest companies have the resources to actually deploy and train them at meaningful scale. For enterprises eager to leverage AI to a competitive advantage, a cheaper, pared-down alternative may be a better fit, especially if it can be tuned to particular industries or domains. That's where an emerging set of AI startups hoping to carve out a niche: by building sparse, tailored models that, maybe not as powerful as GPT-3, are good enough for enterprise use cases and run on hardware that ditches expensive high-bandwidth memory (HBM) for commodity DDR. German AI startup Aleph Alpha is one such example. Founded in 2019, the Heidelberg, Germany-based company's Luminous natural-language model boasts many of the same headline-grabbing features as OpenAI's GPT-3: copywriting, classification, summarization, and translation, to name a few.
The Google engineer who thinks the company's AI has come to life
"We now have machines that can mindlessly generate words, but we haven't learned how to stop imagining a mind behind them," said Emily M. Bender, a linguistics professor at the University of Washington. The terminology used with large language models, like "learning" or even "neural nets," creates a false analogy to the human brain, she said. Humans learn their first languages by connecting with caregivers. These large language models "learn" by being shown lots of text and predicting what word comes next, or showing text with the words dropped out and filling them in.
A.I. gurus are leaving Big Tech to work on buzzy new start-ups
Artificial intelligence gurus are quitting top jobs at companies like Google, Meta, OpenAI and DeepMind and joining a new breed of start-ups that want to take AI to the next level, according to people familiar with the matter and LinkedIn analysis. Four of the best-funded new AI start-ups -- Inflection, Cohere, Adept and Anthropic -- have recently poached dozens of AI scientists with backgrounds in Big Tech. Their hiring efforts are being fueled by venture capital firms and billionaires keen to cash in on any success they have. Collectively, these firms have raised over $1 billion and they're using these vast war chests to poach talented individuals who command high salaries from their previous employers. The start-ups are building their products and services with a relatively new "architecture," which is a set of rules and methods that's used to describe the functionality, organization and implementation of a computer system.
Why Gato from Deepmind is a game changer - DataScienceCentral.com
While no agent can be expected to excel in all imaginable control tasks, especially those far outside of its training distribution, we here test the hypothesis that training an agent which is generally capable on a large number of tasks is possible; and that this general agent can be adapted with little extra data to succeed at an even larger number of tasks. We hypothesize that such an agent can be obtained through scaling data, compute and model parameters, continually broadening the training distribution while maintaining performance, towards covering any task, behavior and embodiment of interest. In this setting, natural language can act as a common grounding across otherwise incompatible embodiments, unlocking combinatorial generalization to new behaviors. The guiding design principle of Gato is to train on the widest variety of relevant data possible, including diverse modalities such as images, text, proprioception, joint torques, button presses, and other discrete and continuous observations and actions. To enable processing this multi-modal data, we serialize all data into a flat sequence of tokens.