bigger model
2025: The Year of the AI App
What a great idea I had for the first Plaintext of 2025. After following the frantic competition between OpenAI, Google, Meta, and Anthropic to churn out brainier and deeper "frontier" foundation models, I settled on a thesis about what's ahead: In the new year, those mighty trailblazers will consume billions of dollars, countless gigawatts, and all the silicon Nvidia can muster in their pursuit of AGI. We'll be bombarded by press releases boasting advanced reasoning, more tokens, and maybe even assurances that their models won't make up crazy facts. But people are tired of hearing about how AI is transformational and seeing few transformations to their day-to-day existence. Getting an AI summary of Google search results or having Facebook ask if you want to pose a follow-up question on a post doesn't make you a traveler to the neo-human future.
Accurate estimation of feature importance faithfulness for tree models
Gajewski, Mateusz, Karczmarz, Adam, Rapicki, Mateusz, Sankowski, Piotr
One of the key challenges in deploying modern machine learning models in such areas as medical diagnosis lies in the ability to indicate why a certain prediction has been made. Such an indication may be of critical importance when a human decides whether the prediction can be relied on. This is one of the reasons various aspects of explainability of machine learning models have been the subject of extensive research lately (see, e.g., [BH21]). For some basic types of models (e.g., single decision trees), the rationale behind a prediction is easy to understand by a human. However, predictions of more complex models (that offer much better accuracy, e.g., based on neural networks or decision tree ensembles) are also much more difficult to interpret. Accurate and concise explanations understandable to humans might not always exist. In such cases, it is still beneficial to have methods giving a flavor of what factors might have influenced the prediction the most.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > Poland > Greater Poland Province > Poznań (0.04)
- (4 more...)
In AI, is bigger always better?
Artificial-intelligence systems that can churn out fluent text, such as OpenAI's ChatGPT, are the newest darlings of the technology industry. But when faced with mathematical queries that require reasoning to answer, these large language models (LLMs) often stumble. A line parallel to y 4x 6 passes through (5, 10). What is the y-coordinate of the point where this line crosses the y-axis? Although LLMs can sometimes answer these types of question correctly, they more often get them wrong. In one early test of its reasoning abilities, ChatGPT scored just 26% when faced with a sample of questions from the'MATH' data set of secondary-school-level mathematical problems1. This is to be expected: given input text, an LLM simply generates new text in accordance with statistical regularities in the words, symbols and sentences that make up the model's training data.
- North America > Canada > Quebec > Montreal (0.15)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Washington > King County > Redmond (0.04)
- (6 more...)
- Energy (1.00)
- Education > Educational Setting (0.54)
- Information Technology > Services (0.47)
- Government > Regional Government (0.46)
Closer to AGI? – O'Reilly
DeepMind's new model, Gato, has sparked a debate on whether artificial general intelligence (AGI) is nearer–almost at hand–just a matter of scale. Gato is a model that can solve multiple unrelated problems: it can play a large number of different games, label images, chat, operate a robot, and more. Not so many years ago, one problem with AI was that AI systems were only good at one thing. After IBM's Deep Blue defeated Garry Kasparov in chess, it was easy to say "But the ability to play chess isn't really what we mean by intelligence." A model that plays chess can't also play space wars.
TinyML in a Nutshell
Most Machine Learning models are created to realize that you want to see 50% Memes and 50% cute cats. To do just that they use huge clusters of computers using CPUs and GPUs and even TPUs to deliver these outstanding state-of-the-art Artificial Intelligence recommendation technologies to you. As we all know this and much more computational hardware is used when training, for example, GPT-3 which costs alone in electricity millions of dollars to train. But most of the time, running inference that means predicting on these models is computationally expensive too. Making these types of energy costly operations happen mostly in data centers far away from your phone.
The Double Descent Hypothesis Explains How Bigger Models can Hurt Performance
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.
🔆🔅 Go Big First, Then Compress
Conventional wisdom in machine learning (ML) tells us that bigger models are better. In the current state of the ML ecosystem dominated by supervised learning models, the mantra is to go big. Bigger deep learning models tend to outperform smaller versions in most deep learning scenarios. However, bigger models are also slow, expensive to run and really difficult to operate. Model compression is one of the techniques that helps address those limitations.
Why a major AI Revolution is coming, but it's not what you think -- AAAI 2020
You already know that Deep Learning is good at vision, translation, playing games, and other tasks. But Neural Networks don't "learn" the way humans do, instead it's just really good at fast pattern matching. Today's research mainly focuses on bigger models with larger datasets, bigger models, and complicated loss functions. But the next revolution is likely going to be more fundamental. Let's take a look at two approaches: adding logic with Stacked Capsule Auto Encoders and Self-Supervised Learning at scale. This about sums up what most AI scientists already know: Deep Learning is really good at doing narrow, pattern based tasks such as object or speech recognition.
Why Big Is Not Always Better In Machine Learning
Neural networks are trained to exactly fit the data. Such models usually would be considered as over-fitting, and yet they have managed to obtain high accuracy on test data. It is counter-intuitive -- but it works. This has raised many eyebrows, especially regarding the mathematical foundations of machine learning and their relevance to practitioners. In order to address these contradictions, researchers at OpenAI, in their recent work, double down on this widely believed grand illusion of bigger is better. In this paper, an attempt has been made to reconcile classical understanding and modern practice within a unified performance curve.
Facebook's latest giant language AI hits computing wall at 500 Nvidia GPUs ZDNet
Facebook's giant "XLM-R" neural network is engineered to work word problems across 100 different languages, including Swahili and Urdu, but it runs up against computing constraints even using 500 of Nvidia's world-class GPUs. With a trend to bigger and bigger machine learning models, state-of-the-art artificial intelligence research continues to run up against the limits of conventional computing technology. Last week they published a report on their invention, XLM-R, a natural language model based on the wildly popular Transformer model from Google. XLM-R is engineered to be able to perform translations between one hundred different languages. It builds upon work that Conneau did earlier this year with Guillaume Lample at Facebook, the creation of the initial XLM.
- Information Technology > Services (0.65)
- Information Technology > Hardware (0.63)