Goto

Collaborating Authors

 assembler hacker


Why Deep Learning Needs Assembler Hackers

#artificialintelligence

For something so simple, it turns out it's amazingly hard for compilers to speed up without a lot of human intervention. This is the heart of the GEMM matrix multiply function, which powers deep learning, and every fast implementation I know has come from old-school assembler jockeys hand-tweaking instructions! When I first started looking at the engineering side of neural networks, I assumed that I'd be following the path I'd taken on the rest of my career and getting most of my performance wins from improving the algorithms, writing clean code, and generally getting out of the way so the compiler could do its job of optimizing it. Instead I spend a large amount of my time worrying about instruction dependencies and all the other hardware details that we were supposed to be able to escape in the 21st century. Matrix multiplies are a hard case for modern compilers to handle.