Table of Contents
Section: Firmware and AI
Lecturer: Dr. Nhan Tran (Fermilab, US)
In this module, students will learn how to train an ML algorithm for an experimental physics task using Kears and TensorFlow software packages. They will be taught to design their algorithm satisfying the latency and throughput requirements and at the same time comply with the resource constraints. Students will apply quantization-aware training and parameter pruning to compress the model, making it faster and more efficient, while maintaining an acceptable accuracy. Finally, students will use the HLS4ML Python library to deploy the algorithm on a PYNQ-Z2 FPGA development board
Learning Objectives:
By the end of the course, the participants will be able to:
- Train a neural network using Keras and TensorFlow
- Convert a trained neural network into FPGA firmware using HLS4ML
- Optimize a neural network and its resource utilization for deployment onto an FPGA 4. Deploy a neural network onto an FPGA board
- Introduction to Machine Learning on FPGAs
- Basic overview of FPGAs and their underlying structure. Rationale, Motivation, and trade-offs of using FPGAs for Machine Learning.
- Overview of common Neural Network acceleration techniques and hardware, including GPU Acceleration, Systolic Arrays, and Dataflow Architectures.
- Considerations and parameters to tune when implementing a neural network on a FPGA. Including parallelism and the trade-off between latency and resource utilization, arbitrary bitwidth numerical representations, and potential resource bottlenecks.
- Using HLS4ML to convert a Neural network into FPGA Firmware
- Introduction to using the HLS4ML package, basic configuration, and neural network to firmware conversion. A hands-on walk-through of the model conversion, firmware synthesis, and bitfile generation workflow for a simple physics task.
- Tuning the details of the implemented model, such as parallelism and precision, performing Post-Training Quantization, and determining the desired implementation strategy.
- Advanced configuration of implementation parallelism, parameter precision, and implementation strategy. Overview of the different these values at different configuration scopes.
- Simulation, profiling and evaluation of a model before firmware generation.
- Optimizing your neural network for deployment onto an FPGA
- Overview of common model compression techniques, including Quantization Aware Training (QAT), Parameter Pruning, and Knowledge Distillation.
- A survey of commonly used Quantizaton Aware Training tool kits, their differences, and when/how to use them. Plus, an example of performing QAT on a model, and how to configure and convert a quantized model using hls4ml
- Example and walkthrough of model pruning, and how to configure and convert a pruned model with hls4ml. Also an example and discussion of how to combine quantization and pruning, its effects on a model, and an example of converting a quantized and pruned model with hls4ml.
- Deployment and the PYNQ software stack
- Overview of Xilinx’s “PYNQ” Python API and OS Image, basic usage of PYNQ to interact with, manage, and configure supported devices, such as Xilinx’s “ZYNQ/ZYNQ ULTRASCALE+” and ALVEO devices, through a Python and Jupyter Notebook interface.
- A discussion and overview of developing/supporting the PYNQ API when building a firmware image, its design requirements and considerations, and examples of more complex firmware images with a neural network built into them.
- Deployment of a hls4ml generated firmware image onto a TUL Pynq-Z2 development board, running neural network inferences on the FPGA accelerator via the “PYNQ” API and OS, and an example of running the same project on an “ALVEO” device.
Prerequisites:
Required for this course: Intermediate experience with the Python programming language, basic understanding of Machine Learning/Neural Networks.