Home Articles Evolving silicon choices in the AI age

Back to Articles

Evolving silicon choices in the AI age

Written by Dr Peter Debenham

Senior Consultant

Evolving silicon choices in the AI age

Only a few years ago choosing silicon for a processing job seemed simple.

Either use a CPU or GPU (if your task had a lot of parallel sections) or possibly create a totally bespoke FPGA or ASIC (much more expensive in engineering effort but often worthwhile in many high value jobs).

Now it is much more difficult. The old options still exist but new options exist such as TPUs (Tensor Processing Units) and their cousins NPUs (Neural Processing Units). What changed?

The change of course is Artificial Intelligence (AI). In the past few years AI has changed from a subject only talked about dryly in academic journals or otherwise displayed in Science Fiction where, usually, the AI is out to kill people in various ways (e.g. Ava in Ex-Machina 2014, Schwarzenegger’s Terminator 1984, or HAL from Kubrick’s 2001: A Space Odyssey 1968 – all three great films by the way).

Artificial Intelligence in the real world is taking a complicated set of inputs (pixels of a picture maybe), applying a series of weights or biases to those pixels across a number of layers to get a simple output (picture contains a person and a cat). Performing any single set of calculations is not usually too slow for modern computers but away from those dry academic journals most people want to perform a lot of sets of calculations and perform them fast. People want some kind of real-time response to a changing situation, say frames from a video camera but to achieve this they generally would prefer something which is physically small, efficient, and not doubling up as a fan-heater – silicon and electricity cost money after all even when you are plugged into the mains.

If the situation is computationally intensive enough when running a trained AI it is even worse when attempting to train an Artificial Intelligence model. This is an iterative process. A set of carefully chosen training data is fed through a putative AI model and the accuracy of the resulting output is measured. Based on how well the model performs changes are made to the model and the process is repeated. Training a large AI model requires using a large set of carefully chosen training data and many cycles of the loop; process data, check accuracy, refine model. Essentially the training process must run the AI model an enormous number of times. Whilst silicon cost and power consumption remain a concern the big driver here is usually the elapsed time necessary to train the model which can easily be tens of hours or even tens of days. Reducing this is, relatively speaking, worth a lot of electricity and silicon cost especially where the silicon can be rented from Amazon, Google or similar.

Given the requirement to run AI models in a fast and efficient fashion what type of processing silicon should be used? For a fixed model it is possible that a bespoke FPGA is fastest and least power hungry. But the time and cost of designing and implementing a bespoke FPGA remains considerable and is unaffordable in most circumstances. Consider that AI development is moving fast enough that a good model today will be a bad model tomorrow and a bespoke FPGA even less likely to be a good solution.

The Central Processing Unit/Graphics Processing Unit (CPU/GPU) equation remains like before. A single CPU core can perform a general complicated set of mathematical operations very fast. Each core in a multi-cored CPU can perform these calculations almost totally independently of the other cores. A high-end processor such as the AMD Threadripper Pro 5995WX has 64 cores and particularly when training an AI, it is possible to devise algorithms which use each efficiently. Even a more mainstream, lower cost and power consumptions, Intel i7 may have 12 or more cores.

Conventional GPUs operate differently to CPUs. Rather than a lower number of very powerful processing units, cores, these contain vastly more numerous but less individually powerful cores. The Nvidia H100 GPU has 16896 cores. Even a “humble” GeForce home PC graphics card can have over 9000 cores. But there is a catch. As well as each core being less powerful than a CPU core, GPU cores cannot each operate individually. They are grouped together (often in groups of 32 or 64 cores) and each such group must operate in lockstep and do so without using the storage memory required by other groups. Where software can be written to make efficient use of this the system can run much faster than on a CPU due to the large number of cores. This has been done for many software packages which train common AI model architectures.

More recently Google created a new type of processor specifically designed around the needs of a particular type of machine learning AI, convolutional neural networks (CNN). This is the Tensor Processing Unit (TPU) which Google announced in 2016, though they had by then been using them in-house. TPU workloads are available as part of Google’s cloud architecture (cloud TPU) and use Google’s own TensorFlow software. The TPU is a bespoke ASIC designed for high throughput of low precision calculations, specifically matrix processing. TPUs were originally designed for running already trained CNN models where they are more power-efficient per-operation than the more general GPU or CPU but can (since version 2) be used additionally for training such models. Waymo, uses TPUs to train its self-driving software.

At a similar time Nvidia added Tensor Cores to its data centre GPUs (2017) and its consumer GPUs (2018) again targeting matrix multiplication and accumulation operations. This adds efficient AI acceleration alongside the other advantages GPUs have over CPUs. The data centre H100 GPU adds 528 Tensor cores to the 16896 Cuda cores. In 2022 Tensor cores started to appear in lower-power devices such as the Jetson Orin Nano.

In general GPUs are preferred to CPUs for running large AI model training workloads. Often the choice over which to use is not what is theoretically “best” but which hardware you have available either physically in a machine you control or are able to rent in the cloud. For much of the past few years those wishing to purchase high end GPUs have suffered from significant, many months long, backlogs as demand overwhelmed the supply chains. OpenAI used GPUs to train its large-scale AI models such as GPT-3.

Neural Processing Units (NPU) are like Google’s TPUs. They again offer hardware specifically designed to accelerate aspects of neural networks and AI. NPUs may be standalone data centre cards or integrated alongside CPUs both in PCs (Intel Core Ultra Series, AMD Ryzen 8040) and mobile (Qualcomm, Huawei) or even lower-power edge processors.

Of particular interest to Plextek is adding intelligence “at the edge” for Internet of Things (IoT) devices where per-unit cost, computing power, battery capacities and communication bandwidth are tightly limited. The option of transmitting everything to a more powerful server which can do the number crunching is not possible. This means the ability to run an AI model locally efficiently and rapidly on an IoT device is necessary for it to respond to its environment only as and when needed.

IoT devices in addition to its sensors typically need an embedded microcontroller. For moderately complicated devices this is often some version of an Arm Cortex. These are capable of running AI loads but are often too slow or too power hungry to be practical. Then some kind of AI accelerator is required to both speed up the time to run the model and reduce the energy required from the limited power budget.

Examples of currently available “at the edge” accelerators include Google’s Edge TPU (a smaller version of their datacentre TPUs), Arm’s own Helium technology (Armv8.1-M Cortex-M chips such as the Cortex-M85) and NXP’s eIQ Neutron NPU (Arm Cortex-M33 based MCX-N series). At the higher power end of IoT there are devices such as Nvidia’s Jetson Nano boards incorporating an Arm A57 and a 128-core Nvidia GPU (an older design lacking AI specific Tensor Cores) or the Orin Nano containing a more recent Tensor core supporting GPU.

How much faster are the accelerators compared to just using the conventional Arm core? Numbers can be hard to find but NXP and Google publish some public data. Google shows the inference time difference between an Arm A-53 core and its development board (Arm A-53 and Edge TPU) when running various AI models trained using the ImageNet dataset. The Edge TPU is typically 30+ times faster than the Arm core by itself. This comes with the limitation that the Edge TPU is only designed to process a limited range of AI model types.

NXP does not give precise numbers for inference time but shows “ML Operator Acceleration” from the NPU compared to just using the Arm-M33 core for three different typical AI operations. As with the Edge TPU acceleration is found to be around 30+ times.

Nvidia make the point that their GPU solution can process models which the Edge TPU cannot by showing frames-per-second the Nano can manage when running Classification and Object detection models with frequent DNRs (did not report) for the Coral Edge TPU development board. Where both platforms could run the model, their speeds were similar.

Which type of AI processing platform is the best? As the Nvidia blog shows this depends on precisely what type of AI you are trying to run. Some types of processing platform can only handle a restricted set of AI model types. Others provide acceleration to a wide variety of AI processing loads but are tightly coupled to manufacturer’s CPU cores. As the use of AI continues to increase the only certainty is that the CPU/GPU/TPU/NPU acronym list is going to continue to grow.

Contact Us

Got a question?

If you have got a question, or even just an idea, get in touch

Get In Touch

Technology Platforms

Plextek's 'white-label' technology platforms allow you to accelerate product development, streamline efficiencies, and access our extensive R&D expertise to suit your project needs.

01 Configurable mmWave Radar Module

Plextek’s PLX-T60 platform enables rapid development and deployment of custom mmWave radar solutions at scale and pace

Configurable mmWave Radar Module
02 Configurable IoT Framework

Plextek’s IoT framework enables rapid development and deployment of custom IoT solutions, particularly those requiring extended operation on battery power

Configurable IoT Framework
03 Ubiquitous Radar

Plextek's Ubiquitous Radar will detect returns from many directions simultaneously and accurately, differentiating between drones and birds, and even determining the size and type of drone

Ubiquitous Radar

01 Configurable mmWave Radar Module

Plextek’s PLX-T60 platform enables rapid development and deployment of custom mmWave radar solutions at scale and pace

Configurable mmWave Radar Module
02 Configurable IoT Framework

Plextek’s IoT framework enables rapid development and deployment of custom IoT solutions, particularly those requiring extended operation on battery power

Configurable IoT Framework
03 Ubiquitous Radar

Plextek's Ubiquitous Radar will detect returns from many directions simultaneously and accurately, differentiating between drones and birds, and even determining the size and type of drone

Ubiquitous Radar

Evolving silicon choices in the AI age

How do you choose? We explore the complexities and evolution of processing silicon choices in the AI era, from CPUs and GPUs to the rise of TPUs and NPUs for efficient artificial intelligence model implementation.

SSL: The Revolution Will Not Be Supervised

Exploring the cutting-edge possibilities of Self-Supervised Learning (SSL) in machine learning architectures, revealing new potential for automatic feature learning without labelled datasets in niche and under-represented domains.

A Programmer’s Introduction to Processing Imaging Radar Data

A practical guide for programmers on processing imaging radar data, featuring example Python code and a detailed exploration of a millimetre-wave radar's data processing pipeline.

Using Artificial Intelligence to Explore the Biological World

Harnessing AI's capabilities to decode protein folding, catalysing a leap in biological research and therapeutic innovation.

Artificial Intelligence in the Big and Scary Real World

Analysing the application of Artificial Intelligence in real-world scenarios, addressing its transformative potential and the ethical framework required for its deployment.

AI Gesture Control

Exploring the possibilities of AI gesture control for household appliances and more, using privacy-preserving radar technology, underscoring innovation in smart home interactions.

Human Problem Solving in the AI-era

Exploring the symbiosis of human expertise and AI, the team navigated the AI era, enhancing problem-solving capabilities across various sectors without compromising the human touch.

Repurposing Innovation: Bullet Proof Your Wine

Repurposing military-grade technology to safeguard fine wines, ensuring their pristine condition from bottling to cellar.

Webcams and Eye Contact in the Post-Covid Office

Exploring the challenges and technological solutions to achieving effective eye contact through webcams in virtual meetings, enhancing remote communication in the post-COVID workplace.

A visual representation of: Of mice and ships

Calculating Error: What do a brain and a ship have in common?

Analysing the commonalities between brain function and ship steering through error correction methods, highlighting the indispensable role of calculus in both biological and engineered control systems.

Machine Learning for Rapid Propagation Assessment

Developing a groundbreaking ML model for swift and efficient coverage prediction in complex urban environments, enabling rapid optimisation of transmitter locations on standard computing hardware.

Game-Changing Radar for the CLEAR Mission

Developing vital radar technology for the CLEAR mission, advancing space debris removal techniques to safeguard operational satellites and spacecraft.

Future Sensing: Improving Mobile Ad-hoc Networks

Leading a transformative four-year research initiative to improve mobile ad-hoc networks through advanced directional antenna systems and cross-layer processing, significantly enhancing military communication capabilities.

Armour Integrity Monitoring System (AIMS)

Innovating a new Armour Integrity Monitoring System (AIMS) for the UK MoD, delivering a low SWaP-C solution that dramatically streamlines logistics and enhances protection through in-field armour integrity checks.

Millimetre-Wave radar system for foreign object detection

mmWave Radar for Foreign Object Debris Detection

Collaborating with WaveTech to develop an advanced mmWave radar system, enabling the rapid and automated detection of foreign object debris on runways, enhancing safety and operational efficiency at a South Korean airport.

Energenie Smart Home Controller

Redesigning the Energenie Smart Home Controller interface to introduce low-power radio technology, enhancing device functionality for centralised home heating control, and launching nationwide.

Developing Automated Manufacturing Systems

Delivering a pioneering predictive maintenance solution for a global healthcare product company, utilising miniature battery-powered sensor systems to optimise automated production lines and significantly reduce costly downtimes.

Surveillance Radar for Comprehensive Threat Detection

Advancing a perimeter surveillance solution with long-range detection and low false-alarm rates, using state-of-the-art Passive Electronically Scanned Array technology for robust and maintenance-free operation in a range of demanding environments.

Ax1 – Revolutionising Car Infotainment

Developing the Clarion AX1, a revolutionary car infotainment system that uses Android OS to offer enhanced connectivity, streaming, and a touchscreen interface, redefining the in-car experience.

Immersive Technology for Complex Systems Training

using the latest immersive technologies to deliver realistic and engaging training solutions for complex systems, ensuring specialist procedures are effectively imparted and measured within any environment.

Distributed Real Time Spectrum Monitoring

Developing an innovative distributed spectrum monitoring system, using low-cost software-defined radio platforms, to provide superior interference detection and larger coverage for high-value sites.

Intelligent Mobility

Advancing intelligent mobility by integrating cutting-edge electronic-scanning radar technology to ensure the safe and efficient operation of autonomous vehicles in complex real-world environments.

Related Technical Papers

View All

mmWave Imaging Radar

Camera systems are in widespread use as sensors that provide information about the surrounding environment. But this can struggle with image interpretation in complex scenarios. In contrast, mmWave radar technology offers a more straightforward view of the geometry and motion of objects, making it valuable for applications like autonomous vehicles, where radar aids in mapping surroundings and detecting obstacles. Radar’s ability to provide direct 3D location data and motion detection through Doppler effects is advantageous, though traditionally expensive and bulky. Advances in SiGe device integration are producing more compact and cost-effective radar solutions. Plextek aims to develop mm-wave radar prototypes that balance cost, size, weight, power, and real-time data processing for diverse applications, including autonomous vehicles, human-computer interfaces, transport systems, and building security.

Low Cost Millimeter Wave Radio frequency Sensors

This paper presents a range of novel low-cost millimeter-wave radio-frequency sensors that have been developed using the latest advances in commercially available electronic chip-sets. The recent emergence of low-cost, single chip silicon germanium transceiver modules combined with license exempt usage bands is creating a new area in which sensors can be developed. Three example systems using this technology are discussed, including: gas spectroscopy at stand off distances, non-invasive dielectric material characterization and high performance micro radar.

Metamaterial-Based Ku-Band Flat-Panel High-Grain

This technical paper by Dr. Rabbani and his team presents research on metamaterial-based, high-gain, flat-panel antennas for Ku-band satellite communications. The study focuses on leveraging the unique electromagnetic properties of metamaterials to enhance the performance of flat-panel antenna designs, aiming for compact structures with high gain and efficiency. The research outlines the design methodology involving multi-layer metasurfaces and leaky-wave antennas to achieve a compact antenna system with a realised gain greater than +20 dBi and an operational bandwidth of 200 MHz. Simulations results confirm the antenna's high efficiency and performance within the specified Ku-band frequency range. Significant findings include the antenna's potential for application in low-cost satellite communication systems and its capabilities for THz spectrum operations through design modifications. The paper provides a detailed technical roadmap of the design process, supported by diagrams, simulation results, and references to prior work in the field. This paper contributes to the advancement of antenna technology and metamaterial applications in satellite communications, offering valuable insights for researchers and professionals in telecommunications.

On the Radiation Resistance of Folded Antennas

This technical paper highlights the ambiguity in the antenna technical literature regarding the radiation resistance of folded antennas, such as the half-wave folded dipole (or quarter-wave folded monopole), electrically small self-resonant folded antennas and multiple-tuned antennas. The feed-point impedance of a folded antenna is increased over that of a single-element antenna but does this increase equate to an increase in the antenna’s radiation resistance or does the radiation resistance remain effectively the same and the increase in feed-point impedance is due to transformer action? Through theoretical analysis and numerical simulations, this study shows that the radiation resistance of a folded antenna is effectively the same as its single-element counterpart. This technical paper serves as an important point of clarification in the field of folded antennas. It also showcases Plextek's expertise in antenna theory and technologies. Practitioners in the antenna design field will find valuable information in this paper, contributing to a deeper understanding of folded antennas.

Frequency-Scanning Substrate-Integrated-Waveguide Meanderline Antenna for Radar Applications at 60GHz

This paper describes the design and characterization of a frequency-scanning meanderline antenna for operation at 60 GHz. The design incorporates SIW techniques and slot radiating elements. The amplitude profile across the antenna aperture has been weighted to reduce sidelobe levels, which makes the design attractive for radar applications. Measured performance agrees with simulations, and the achieved beam profile and sidelobe levels are better than previously documented frequency-scanning designs at V and W bands.

A Ku-Band, Low Sidelobe Waveguide Array Employing Radiating T Junctions

The design of a 16-element waveguide array employing radiating T-junctions that operates in the Ku band is described. Amplitude weighting results in low elevation sidelobe levels, while impedance matching provides a satisfactory VSWR, that are both achieved over a wide bandwidth (15.7-17.2 GHz). Simulation and measurement results, that agree very well, are presented. The design forms part of a 16 x 40 element waveguide array that achieves high gain and narrow beamwidths for use in an electronic-scanning radar system.

Sensing Auditory Evoked Potentials with Non-Invasive Electrodes and Low-Cost Headphones

This paper presents a sensor for measuring auditory brainstem responses to help diagnose hearing problems away from specialist clinical settings using non-invasive electrodes and commercially available headphones. The challenge of reliably measuring low level electronic signals in the presence of significant noise is addressed via a precision analog processing circuit which includes a novel impedance measurement approach to verify good electrode contact. Results are presented showing that the new sensor was able to reliably sense auditory brainstem responses using noninvasive electrodes, even at lower stimuli levels.

Long Range Retro-Reflector

Passive retro-reflectors that modulate a scattered RF signal but do not transmit in their own right are well known. They are widely used in RFID tags, and keyless entry systems with a number of standardised solutions defined within the industry. The main advantage of these systems is that the mobile unit (the tag) can either avoid completely the use of a battery by powering itself from the incident RF ‘interrogating’ signal or only require a very small battery with a long life. This enables a ‘disposable’ tag to be engineered at very low cost, size and weight. However, there are many potential applications that require a somewhat longer transmission range than can sensibly be achieved with this method. The conventional paradigm requires a higher power ‘interrogating’ signal in order to increase range and there are obvious limits to how far this can be taken. The combination of regulatory restrictions and the steep range vs power slope that results from the fundamental mode of operation generally restrict the range to a few metres at most. Plextek have been taking a fresh look at the possible ways of circumventing this obstacle to produce a long range device that is nevertheless RF passive (does not transmit but only scatters). This paper describes in outline some ideas in this space, some initial experiments that have been done and some potential applications of the techniques.

An Optical Room Occupancy Sensor

An automated sensor system that determines whether rooms within a building are occupied by a person or people has many applications. These divide broadly into the following classes: Security: Whole-site surveillance from control node, cases where a room should not be occupied, intruder detection and asset tracking. Safety: Identification of lone workers during non-core hours and remote supervision of isolated working environments. Confirmation of building evacuation. Management of high risk processes. Facilities Management: Environmental controls (lighting/heating) to meeting room booking aid. Sensors that seem to solve this problem are plentiful and it is only when they are considered in detail that their deficiencies become apparent. This short paper makes this case and introduces a new type of sensor based on an optical method.

An Introduction to Yocto

Yocto is a comprehensive project designed to address the complexity of building custom Linux distributions for embedded systems. Unlike conventional Linux distributions (distros) created for standard PC architectures, Yocto caters to the diverse and often incompatible hardware in the embedded world. By providing a sophisticated build system based on layered scripts called "recipes," Yocto streamlines the process of creating, maintaining, and updating embedded distros. Each package within a distro has its own recipe, maintained by the package developers, ensuring that updates and customizations are manageable and consistent. This structure allows developers to define precise sets of packages for their embedded systems, facilitates updates through package managers, and supports a wide range of hardware platforms. With support from major chip and board manufacturers, Yocto is becoming the go-to toolset for embedded Linux development, offering unparalleled flexibility and control for developers aiming to create finely tuned, market-ready products.

GPU Computing

Power limits restrict CPU speeds, but GPUs offer a solution for faster computing. Initially designed for graphics, GPUs now handle general computing, thanks to advancements by NVIDIA, AMD, and Intel. With hundreds of cores, GPUs significantly outperform CPUs in parallel processing tasks. Modern supercomputers, like Titan, utilize thousands of GPU cores for immense speed. NVIDIA’s CUDA platform simplifies GPU programming, making it accessible for parallel tasks. While GPUs excel in parallelizable problems, they can be limited by data transfer rates and algorithm design. NVIDIA’s Tesla GPUs provide high performance in both single and double precision calculations. Additionally, embedded GPUs like the NVIDIA Jetson TX2 deliver powerful, low-power computing for specialized applications. Overall, GPUs offer superior speed and efficiency for parallel tasks compared to CPUs.

Downloads

View All Downloads

What we offer

Building Innovation & Transformation Capability

Developing Innovation Strategy

Digital Transformation

Ideation & Concept Creation

Transformative Sustainability

Configurable mmWave Radar Module

Technology Platforms

Ubiquitous Radar

Configurable IoT Framework

Consumer

Markets

Defence

Healthtech

Industrial

Satellites & Space

Security

How we work

Impact

Projects

Articles

Insights

Resources

About Us

Life at Plextek

About Us

Our Facilities

Sustainability

Quality Assurance & Accreditations

Current Vacancies

Early Careers & Summer Opportunities

Evolving silicon choices in the AI age

Written by Dr Peter Debenham

Evolving silicon choices in the AI age

Only a few years ago choosing silicon for a processing job seemed simple.

Contact Us

Got a question?

Technology Platforms

More Articles

Evolving silicon choices in the AI age

SSL: The Revolution Will Not Be Supervised

A Programmer’s Introduction to Processing Imaging Radar Data

Using Artificial Intelligence to Explore the Biological World

Artificial Intelligence in the Big and Scary Real World

AI Gesture Control

Human Problem Solving in the AI-era

Repurposing Innovation: Bullet Proof Your Wine

Webcams and Eye Contact in the Post-Covid Office

Calculating Error: What do a brain and a ship have in common?

Machine Learning for Rapid Propagation Assessment

Game-Changing Radar for the CLEAR Mission

Future Sensing: Improving Mobile Ad-hoc Networks

Armour Integrity Monitoring System (AIMS)

mmWave Radar for Foreign Object Debris Detection

Energenie Smart Home Controller

Developing Automated Manufacturing Systems

Surveillance Radar for Comprehensive Threat Detection

Ax1 – Revolutionising Car Infotainment

Immersive Technology for Complex Systems Training

Distributed Real Time Spectrum Monitoring

Intelligent Mobility

Related Technical Papers

mmWave Imaging Radar

Low Cost Millimeter Wave Radio frequency Sensors

Metamaterial-Based Ku-Band Flat-Panel High-Grain

On the Radiation Resistance of Folded Antennas

Frequency-Scanning Substrate-Integrated-Waveguide Meanderline Antenna for Radar Applications at 60GHz

A Ku-Band, Low Sidelobe Waveguide Array Employing Radiating T Junctions

Sensing Auditory Evoked Potentials with Non-Invasive Electrodes and Low-Cost Headphones

Long Range Retro-Reflector

An Optical Room Occupancy Sensor

An Introduction to Yocto

GPU Computing

Downloads