Silicon Intellectual Property

NeuKompression™ IP

NeuKompression™: Breaking the Inference Bottleneck and the 4-bit AI Revolution

U.S. Patent No. 11,615,286 B2

Maximizing AI Accelerator Efficiency without Compromising Accuracy

With the rapid evolution of LLMs and Generative AI, the memory subsystem has emerged as the primary bottleneck for inference performance and power efficiency. To address this challenge, NeuKompression™ provides critical compression technology that converts pre-trained FP32/16/8b high-precision models into 4-bit representations. The core advantage of this technology lies in its ability to significantly shrink model size without sacrificing accuracy, reducing the memory footprint by 2 to 8 times. By mitigating bandwidth dependency during data transfer, NeuKompression™ not only drastically optimizes power requirements but also effectively boosts the throughput and cost-effectiveness of AI accelerators, making high-performance inference more competitive in the market.

Key Features
  • Offline software compression: 2X-8X compression into efficient bitstream
  • On-the-fly hardware decompression: restores FP8 weights for computation
  • Broad Applicability: Supports both language (Llama series proven) and vision (Stable Diffusion family, Flux proven) models
*Stable Diffusion Example: Cat in front of a laptop looking up from the screen

Benefits
Seamless Integration & Flexibility
Seamless Integration & Flexibility
Designed as a standalone IP, NeuKompression offers a streamlined licensing model and architecture that is easily integrated into diverse customer chips and platforms, significantly accelerating time-to-market
Cloud-to-Edge Versatility
Cloud-to-Edge Versatility
A highly scalable solution optimized for both high-throughput cloud environments and power-constrained edge AI inference applications
Market Differentiation & Cost Leadership
Market Differentiation & Cost Leadership
Empowers customers to launch superior AI solutions that achieve higher efficiency and lower total cost of ownership (TCO) through significant memory and bandwidth savings
Top