- Home
- Silicon Intellectual Property
- NeuKompression™ IP
NeuKompression™ IP
NeuKompression™: Breaking the Inference Bottleneck and the 4-bit AI Revolution
U.S. Patent No. 11,615,286 B2
With the rapid evolution of LLMs and Generative AI, the memory subsystem has emerged as the primary bottleneck for inference performance and power efficiency. To address this challenge, NeuKompression™ provides critical compression technology that converts pre-trained FP32/16/8b high-precision models into 4-bit representations. The core advantage of this technology lies in its ability to significantly shrink model size without sacrificing accuracy, reducing the memory footprint by 2 to 8 times. By mitigating bandwidth dependency during data transfer, NeuKompression™ not only drastically optimizes power requirements but also effectively boosts the throughput and cost-effectiveness of AI accelerators, making high-performance inference more competitive in the market.
- Offline software compression: 2X-8X compression into efficient bitstream
- On-the-fly hardware decompression: restores FP8 weights for computation
- Broad Applicability: Supports both language (Llama series proven) and vision (Stable Diffusion family, Flux proven) models