Technology

Conventional neural network accelerators such as graphics processing units (GPUs) from NVIDIA and Google’s tensor processing units (TPUs) are extremely powerful platforms that not only support artificial intelligence (AI), but also weather forecasting, drug discovery and computer aided design. In contrast, TernaryNet is developing a new class of neural network accelerators that are optimised for inference tasks. In a manner analogous to a sport utility vehicle (SUV) compared with a Formula 1 car, by specialising our architecture, much higher performance can be reached.

There has been a trend of reducing the precision of AI calculations to lower the energy and computational costs of accelerators. NVIDIA’s Hopper and Blackwell architectures support 8-bit floating point calculations, and this tremendously improves the speed of training and inference. TernaryNet has taken this to the extreme by supporting ternary neural networks (TNNs), where the weight values can only be -1, 0 or +1. While training using ternary values is not currently possible, for inference, ternary large language models (LLMs) such as LLaMA can achieve the same accuracy as 32-bit floating point.

TernaryNet’s architecture utilises the following features of TNNs to improve speed and reduce power consumption:

  1. LLMs require large weight matrices which are either stored off-chip or, in the case of 2.5D or 3D integration, in high-bandwidth memory (HBM) on a different die in the same package. The system-level bottleneck lies in the associated data transfers. Since ternary weights use less bits than floating-point ones, less data needs to be transferred, reducing storage, execution time and energy.
  2. In a conventional accelerator chip, the computational bottleneck is multiplication of an activation matrix with a weight matrix. If the weight matrix only has ternary values, the multipliers can be replaced by adder/subtractors, resulting in a quadratic reduction in the chip area.
  3. For inference, the weight matrices are known in advance and multiplication with zero-valued weights can be skipped. This allows sparse matrix techniques to be employed to further improve speed and reduce energy.

Technical Excellence

5nm Process Node

Advanced fabrication technology for maximum performance density

1000+ TOPS/W

Industry-leading compute efficiency

Unified Memory

High-bandwidth memory architecture eliminates bottlenecks

TernaryNet: Revolutionary Features

Energy Efficiency

By utilising ternary weights, our architecture achieves a 10x speed improvement with lower power consumption leading to an order of magnitude saving in energy.

Scalable Architecture

Our modular architecture scales seamlessly from edge devices to massive data center deployments.

Minimal Source Code Changes

Users can utilise our tools to have their applications running on our hardware with minimal changes to their software.