Anthropic can now track the bizarre inner workings of a large language model
It's no secret that large language models work in mysterious ways. Few--if any--mass-market technologies have ever been so little understood. That makes figuring out what makes them tick one of the biggest open challenges in science. Shedding some light on how these models work would expose their weaknesses, revealing why they make stuff up and can be tricked into going off the rails. It would help resolve deep disputes about exactly what these models can and can't do.
Mar-27-2025, 17:00:00 GMT