Basic HBM structure
HBM vertically stacks multiple DRAM dies, connects them through TSVs (through-silicon vias) and micro-bumps to a base die, then links the stack to the GPU through an interposer. It gives up some board-level flexibility of discrete memory packages in exchange for very wide IO, lower energy per bit, and compact package area.
Why bandwidth matters
AI training and inference are often limited by both matrix compute and memory access. As compute grows, the GPU waits if weights, activations, and KV cache cannot move fast enough. HBM places thousands of interface bits inside the package, giving better bandwidth and energy efficiency than memory located farther away on the board.
Engineering constraints
HBM is not a free upgrade. More stack layers make TSV control, heat, yield, test, and package stress harder. Capacity, bandwidth, power, and cost must be balanced together. Advanced GPU delivery is often shaped by HBM supply, interposer area, and advanced-packaging capacity.