Google DeepMind has a new way to look inside an AI's "mind"

Nov-14-2024, 10:00:00 GMT–MIT Technology Review

"I want to be able to look inside a model and see if it's being deceptive," says Neel Nanda, who runs the mechanistic interpretability team at Google DeepMind. "It seems like being able to read a model's mind should help." Mechanistic interpretability, also known as "mech interp," is a new research field that aims to understand how neural networks actually work. At the moment, very basically, we put inputs into a model in the form of a lot of data, and then we get a bunch of model weights at the end of training. These are the parameters that determine how a model makes decisions. We have some idea of what's happening between the inputs and the model weights: Essentially, the AI is finding patterns in the data and making conclusions from those patterns, but these patterns can be incredibly complex and often very hard for humans to interpret.

autoencoder, deepmind, sparse autoencoder, (11 more...)

MIT Technology Review

Nov-14-2024, 10:00:00 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)