Neural Processing Unit IP

Blue Magpie - Neural Processing Unit IP

The Edge Computing Core Born for the Generative AI Era | Breaking Computational Boundaries, Redefining GenAI Height at the Edge

In the Generative AI era, traditional NPUs can no longer meet the complex matrix computation requirements of Large Language Models (LLM) and multimodal applications (e.g., VLM, CNN). Blue Magpie is a high-performance neural processing unit IP that is Silicon-Proven through advanced processes, specifically designed to overcome the computational bottlenecks of Generative AI. Through its proprietary MVP (Matrix-Vector Processors) architecture and flexibly configurable NeuKompression technology, we deliver ultimate computing power while significantly optimizing prompt processing speeds and data transfer efficiency.

Core Features

A Comprehensive Inference Acceleration Engine

The MVP core of Blue Magpie is not just a computing engine; it is an accelerator redesigned based on the underlying logic of modern AI models

Matrix Kernels — The Computing Soul of LLM: Deep hardware optimization for the most dominant tasks in the Transformer architecture
GEMM (General Matrix Multiplication): Targeted at massive parallel computing during the Prefill Phase; supports 2–64 TFLOPS scalable power with HW acceleration for FP8/6/4b and INT8/4b
GEMV (General Matrix-Vector Multiplication): Targeted at word-by-word generation during Autoregressive Decoding; provides up to 2 TFLOPS (FP32/16b) for memory-bound token generation

Convolutional Kernels

The Source of Multimodal Vision: High efficiency for VLM image encoding
Standard Convolution: Combines im2col with MVP to convert 2D/3D conv into high-speed matrix ops
Specialized Support: Optimized for Depthwise Separable & Dilated Convolutions, reducing parameters and expanding receptive field

Activation Functions & Vector Kernels

Activation & Non-linear Mapping: Dedicated HW circuits + proprietary patented algorithms for near-zero latency
Common Operators: ReLU, Sigmoid, Tanh.
GenAI Operators: Softmax, GeLU, SiLU (Swish)
Normalization: HW acceleration for LayerNorm & RMSNorm to stabilize distributions
Vector Ops: Pooling (Max/Average) & Element-wise arithmetic (Add/Mul) for Residual Connections

Data-Movement Engine

Built-in Master/Slave mode 2D/3D Gather/Scatter and Remapping engines to minimize memory traffic and break through the “Memory Wall” bottleneck

Key Advantages

Industry-Leading Prompt Processing Efficiency

Deep architectural reconstruction for Matrix-to-Matrix efficiency beyond CNN-first edge NPUs
TTFT (Time To First Token): Faster prompt prefill response, eliminating user wait times
Massive Context Length Handling: Easily manages high-volume text analysis via flexible core configurations

NeuKompression

Proprietary Hardware Compression reduces data volume while maintaining high precision
Lowers storage requirements and power consumption significantly

Ultimate Flexibility & Scalability

Modular design: Configure cores based on scenario
Low-Power Mode: Wearables / battery-powered edge detection
High-Performance Mode: Data center edge nodes for large multimodal models
Low-Latency Interconnect: MVP/CPU/DSP linked via proprietary Local Interconnect Bus for zero-latency dispatch

Silicon-Proven Guarantee for Faster Time-to-Market

As an advanced-process Silicon-Proven IP, Neuchips helps reduce SoC development risks and shorten integration cycles to gain a competitive edge

Applications

Smart Cockpit

Perfect integration of real-time Voice Assistants (LLM) and Driver Behavior Vision Analysis (VLM)

Edge Server

Handling large Context Lengths for document summarization, private Knowledge Bases (RAG), and long-text retrieval

Intelligent Surveillance

Utilizing Dilated Convolutions and FP8 computing power for ultra-low latency image feature extraction and automated alerts

Smart Factory

Running complex multimodal predictive maintenance and high-precision defect detection under strict power constraints