Google researchers have revealed that memory and interconnect are the primary bottlenecks for LLM inference, not compute power, as memory bandwidth lags 4.7x behind.
We all know AI has a power problem. On the whole, global AI usage already drew as much energy as the entire nation of Cyprus did in 2021. But engineering researchers at the University of Minnesota ...