Domain Specific Architectures (DSAs)

Certain applications have sufficient demand to lead to the ultimate optmization technique - designing hardware targeted at that one application. Examples of this include:

Cryptography accelerators
Compression gngines
Block chain hashers
Neural networks

In reality, many DSAs are targeting fast moving application areas, and as such a trade off is made between fixed purpose hardware and some level of programablility. However, these devices are still not suited for totally general purpose usage like A CPU. Devices in this category include

Network switch processors. This is a big area with a lot of interesting architecture, but in most cases they are not suitable for accelerating workloads not closely coupled to network routing / filtering.
Intel’s Exascale Dataflow engine (hard to tell how domain specific this is yet)
Vector processors such as Nec SX-Aurora.

A particularly big growth area for DSA is around neural networks:

Graphcore’s AI processor
Google TPU - Detailed description in Hennesy and Patterson.
ARM ML processors
Mythic - An unsual hardware approach doing inference in the analog domain.

Many other NN accelerators are under development or already on the market, see Wikipedia AI accelerator

Another big area is image processing DSAs which have been around for a long time. Recent progress has been towards making them more programable and flexible.

Google Visual Core.
Most mobile SoCs have some level of programmable image processor.

Programming DSAs

The wide variety of different DSA architectures typically means that the method used to work with each device is through a library. Examples include Tensorflow for Google’s TPU.

Some DSAs have more general programing approaches such as Halide for the Google Visual Core

Digital Signal Processors (DSPs) Field Programmable Gate Arrays (FPGAs)