CXL Memory Expansion and Compute Performance Key Takeaways

Pro 9, 2025

—

CXL Memory Expansion and Compute Performance: Key Takeaways

Overview of CXL

* CXL (Compute Express Link) allows hyperscalers to recycle older DDR4 memory in retired machines using a different protocol over PCIe lanes.
* CXL devices contain a memory controller that can host DDR4 or DDR5 modules and communicates with the host via CXL.

Structura X: Memory Expansion

* DDR4 Version: 12 DIMMs, up to 3x DIMM capacity
* DDRS5 Version: 6 DIMMs, providing higher speed memory
* Inline LZ4 compression doubles usable capacity by effectively doubling the theoretical 18:1-2x compression ratio

Structura A: Acceleration

* Similar form factor as DDR5 X with added ARM Neoverse V2 cores for local compute and processing
* Combines DDR5 memory, compression, and local compute to improve performance in AI, vector search, and other memory-bandwidth-heavy workloads
* Increased query per second (QPS) by 11.5k using one card, 24k with two, and 36k with three cards

Demonstrations and Performance Benefits

* A demonstration showed a 31 billion parameter LLM increased KV cache size by 1s when using CXL memory expansion
* CXL provides lower latency than expected for local compute-intensive workloads

Future Developments

* CXL 2.0: Adds switches and memory shelves to enable shared memory across multiple systems
* CXL 3.0: Expected to be released in the near future, adding even more features and improvements
link: https://www.youtube.com/watch?v=Sw3tgTipUy8

AI Youtube