Silicon Intellectual Property

Neural Processing Unit IP

Blue Magpie - Neural Processing Unit IP

The Edge Computing Core Born for the Generative AI Era | Breaking Computational Boundaries, Redefining GenAI Height at the Edge

In the Generative AI era, traditional NPUs can no longer meet the complex matrix computation requirements of Large Language Models (LLM) and multimodal applications (e.g., VLM, CNN). Blue Magpie is a high-performance neural processing unit IP that is Silicon-Proven through advanced processes, specifically designed to overcome the computational bottlenecks of Generative AI. Through its proprietary MVP (Matrix-Vector Processors) architecture and flexibly configurable NeuKompression technology, we deliver ultimate computing power while significantly optimizing prompt processing speeds and data transfer efficiency.

Core Features
A Comprehensive Inference Acceleration Engine

The MVP core of Blue Magpie is not just a computing engine; it is an accelerator redesigned based on the underlying logic of modern AI models

  • Matrix Kernels — The Computing Soul of LLM: Deep hardware optimization for the most dominant tasks in the Transformer architecture
  • GEMM (General Matrix Multiplication): Targeted at massive parallel computing during the Prefill Phase; supports 2–64 TFLOPS scalable power with HW acceleration for FP8/6/4b and INT8/4b
  • GEMV (General Matrix-Vector Multiplication): Targeted at word-by-word generation during Autoregressive Decoding; provides up to 2 TFLOPS (FP32/16b) for memory-bound token generation
Convolutional Kernels
  • The Source of Multimodal Vision: High efficiency for VLM image encoding
  • Standard Convolution: Combines im2col with MVP to convert 2D/3D conv into high-speed matrix ops
  • Specialized Support: Optimized for Depthwise Separable & Dilated Convolutions, reducing parameters and expanding receptive field
Activation Functions & Vector Kernels
  • Activation & Non-linear Mapping: Dedicated HW circuits + proprietary patented algorithms for near-zero latency
  • Common Operators: ReLU, Sigmoid, Tanh.
  • GenAI Operators: Softmax, GeLU, SiLU (Swish)
  • Normalization: HW acceleration for LayerNorm & RMSNorm to stabilize distributions
  • Vector Ops: Pooling (Max/Average) & Element-wise arithmetic (Add/Mul) for Residual Connections
Data-Movement Engine
  • Built-in Master/Slave mode 2D/3D Gather/Scatter and Remapping engines to minimize memory traffic and break through the “Memory Wall” bottleneck

Key Advantages
Industry-Leading Prompt Processing Efficiency
  • Deep architectural reconstruction for Matrix-to-Matrix efficiency beyond CNN-first edge NPUs
  • TTFT (Time To First Token): Faster prompt prefill response, eliminating user wait times
  • Massive Context Length Handling: Easily manages high-volume text analysis via flexible core configurations
NeuKompression
Ultimate Flexibility & Scalability
  • Modular design: Configure cores based on scenario
  • Low-Power Mode: Wearables / battery-powered edge detection
  • High-Performance Mode: Data center edge nodes for large multimodal models
  • Low-Latency Interconnect: MVP/CPU/DSP linked via proprietary Local Interconnect Bus for zero-latency dispatch
Silicon-Proven Guarantee for Faster Time-to-Market
  • As an advanced-process Silicon-Proven IP, Neuchips helps reduce SoC development risks and shorten integration cycles to gain a competitive edge

Applications
Smart Cockpit
Smart Cockpit

Perfect integration of real-time Voice Assistants (LLM) and Driver Behavior Vision Analysis (VLM)

Edge Server
Edge Server

Handling large Context Lengths for document summarization, private Knowledge Bases (RAG), and long-text retrieval

Intelligent Surveillance
Intelligent Surveillance

Utilizing Dilated Convolutions and FP8 computing power for ultra-low latency image feature extraction and automated alerts

Smart Factory
Smart Factory

Running complex multimodal predictive maintenance and high-precision defect detection under strict power constraints

Top