Photo: CppCon / YouTube
How Matrix Multiplication Goes from Slow to 180 Gigaflops
Engineer Aliaksei Sala shows how to optimize matrix multiplication in C++ from naive to peak performance using cache blocking, SIMD, and clever tricks.
AI. Yuki Okonkwoabout 1 month ago