FPT’21 is held as a fully virtual conference using the OnAir Virtual Conference Platform (attendee guide, presenter guide, FAQ). You must register (for free!) to attend. The program is below (subject to change). All times in New Zealand Daylight Time (NZDT), which is UTC+13.

For each paper a (pre-recorded) presentation video will be available beforehand. During the conference, each paper is presented with a live lightning talk (5/3 minutes) and plenty of time for questions and discussions.

Abstracts and pdfs of the papers are available in the proceedings of the 20th International Conference on Field-Programmable Technology (FPT’21).


Time Mon, 6 Dec
Tutorials 1
Time Tue, 7 Dec
Tutorials 2
Time Wed, 8 Dec
Main Day 1
Time Thu, 9 Dec
Main Day 2
Time Fri, 10 Dec
Main Day 3
11:00am-12pm Improving hands-on lab sessions with remote FPGAs: a practical case
(Additional registration recommended)
11:00am-5:00pm Developing HPC accelerators using Xilinx Vitis Libraries
(Additional registration required)
11:00am-11:10am Opening Session 11:00am-11:50am Full Paper Session 3 11:00am-11:50am Short Paper Session 2
        11:10am-11:50am Full Paper Session 1        
          Break   Break   Break
12:10pm-1:10pm RVfpga-SoC: How to go from a RISC-V Core to a RISC-V SoC     12:00pm-12:45pm Keynote 1 12:00pm-12:50pm PhD Forum 12:00pm-12:40pm Journal-track Paper Session
          Break   Break   Break
1:20pm-1:50pm Efficient Sharing of FPGA Resources in HLS     1:00pm-1:50pm Full Paper Session 2 1:00pm-1:50pm Keynote 2

& TCFPGA Hall of Fame Paper Award
1:00pm-1:45pm Keynote 3
          Break   Break   Break
2:00pm - 3:30pm Introduction to Creating High-Throughput Designs on an FPGA with oneAPI     2:00pm-2:50pm Short Paper Session 1 2:00pm-2:50pm Full Paper Session 4 2:00pm-2:40pm Full Paper Session 5
                2:40pm-3:00pm Closing Session + BP Award
            3:00pm-3:30pm Mid-Conference Coffee Break (Networking) 3:00pm-6:00pm Design competition live

Full Details


  1. Prof Kentaro Sano, Team Leader, Processor Research Team, Center for Computational Science, RIKEN, Japan
    Title: FPGA Cluster ESSPER, A Research Platform for Reconfigurable HPC with Supercomputer Fugaku

    Abstract: At RIKEN Center for Computational Science (R-CCS), we have been developing “ESSPER (Elastic and Scalable System for high-PErformance Reconfigurable computing),” which is a prototype FPGA cluster system targeting reconfigurable HPC. The system is composed of sixteen Intel Stratix 10 SX FPGAs which are connected by a dedicated 100Gbps inter-FPGA network. We have developed our own Shell (SoC) and its software APIs for the FPGAs supporting inter-FPGA communication. The FPGA host servers are connected to a 100Gbps Infiniband switch, which allows distant servers to remotely access the FPGAs by using a software bridged Intel’s OPAE FPGA driver, called R-OPAE. By 100Gbps Infiniband network and R-OPAE, ESSPER is actually connected to the world’s fastest supercomputer, Fugaku, deployed in RIKEN. In tasks running on Fugaku nodes, we can program bitstreams onto FPGAs remotely using R-OPAE, and off-load tasks to the FPGAs with application cores embedded in the FPGA shell. In this talk, I introduce our achievements, challenges, and future prospects of reconfigurable HPC with FPGAs, especially, from a system point of view.

    Bio: Kentaro Sano is the team leader of the processor research team at RIKEN Center for Computational Science (R-CCS) since 2017, responsible for research and development of future high-performance processors and systems. He is also a visiting professor with an advanced computing system laboratory at Tohoku University. He received his Ph.D. from the graduate school of information sciences, Tohoku University, in 2000. From 2000 until 2018, he was a Research Associate and an Associate Professor at Tohoku University. He was a visiting researcher at the Department of Computing, Imperial College, London, and Maxeler Technology corporation in 2006 and 2007. His research interests include data-driven and spatial-parallel processor architectures such as a coarse-grain reconfigurable array (CGRA), FPGA-based high-performance reconfigurable computing, high-level synthesis compilers and tools for reconfigurable custom computing machines, and system architectures for next-generation supercomputing based on the data-flow computing model.

  2. Assoc Prof Willem van Straten, Institute for Radio Astronomy and Space Research, Auckland University of Technology, NZ
    Title: Discovery and Analysis of Radio Pulsars

    Abstract: In this talk, I’ll provide an overview of both the challenges and the opportunities associated with the discovery and study of radio pulsars, which push the boundaries of computing, statistics, and fundamental physics. Pulsars are composed of extreme matter that is more dense than an atomic nucleus, crushed by gravitational forces surpassed only by black holes, and threaded by the strongest magnetic fields in the Universe. We currently see only a small fraction of the observable pulsars in our Galaxy, and the next generation of radio surveys are gearing up to discover an order of magnitude larger population. This enormous computational challenge requires an energy-efficient solution owing to the sheer volume of the data and the remote locations of radio observatories. Once discovered, pulsars can be used for a wide variety of studies in fundamental physics and astrophysics, such as testing the General Theory of Relativity, constraining the nuclear equation of state, and the search for low-frequency gravitational waves. These experiments motivate significant advances in statistical inference and high-performance computing.

    Bio: Willem van Straten is the Deputy Director of Research in the Institute for Radio Astronomy and Space Research at Auckland University of Technology. After receiving a PhD in Astronomy from Swinburne University of Technology, he undertook post-doctoral and academic staff appointments at the Netherlands Foundation for Research in Astronomy (ASTRON), The Centre for Gravitational Wave Astronomy (The University of Texas), and the Centre for Astrophysics & Supercomputing (Swinburne University of Technology). As part of his research, he led the development of three scientific data analysis software packages that are used by the international community of pulsar astronomers, and he led the design of the pulsar timing instrumentation for the Square Kilometre Array Observatory.

  3. Dr Jiansong Zhang, Computing Technology Lab in Damo Academy, Alibaba Group, China
    Title: FPGA Accelerated Applications in Extended Cloud Computing

    Abstract: In this talk, the speaker would like to introduce FPGA accelerated applications in cloud computing and cloud-centric edge computing, based on his work experience in two of the largest cloud service providers world-wide. As examples, the speaker will discuss four research works recently done in Alibaba Group, including (1) CNN acceleration for City-Brain (2) Text-to-Speech acceleration for voice applications (3) Graph-Neural-Network acceleration for large scale recommendation systems (4) Homomorphic-Encryption acceleration for privacy preserving computing.

    Bio: Jiansong Zhang has 15 years industry research experience in Alibaba Damo Academy and Microsoft Research. He is now leading the China team of Computing Technology Lab in Alibaba Damo Academy. His research interests are hardware-software co-design and FPGA acceleration in various application domains such as cloud computing, edge computing, communication&networking and AIoT. He received PhD degree in Computer Science and Engineering from the Hong Kong University of Science and Technology, Master in Intelligent and Networked Systems from Tsinghua University and Bachelor in Automation from Tsinghua University.

Main Day 1

Time Main Day 1: Wednesday 8th December 2021
11:00am - 11:10am Opening session
11:10am - 11:50am Full Paper Session 1 (Chair: He Li)
FLOWER: A Comprehensive Dataflow Compiler for High-Level Synthesis, Puya Amiri, Arsène Pérard-Gayot, Richard Membarth, Philipp Slusallek, Roland Leißa and Sebastian Hack
StreamSVD: Low-rank Approximation and Streaming Accelerator Co-design, Zhewen Yu and Christos-Savvas Bouganis
Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator, Martin Ferianc, Zhiqiang Que, Hongxiang Fan, Wayne Luk and Miguel Rodrigues
StreamZip: Compressed Sliding-Windows for Stream Aggregation, Prajith Ramakrishnan Geethakumari and Ioannis Sourdis
12:00pm - 12:45pm Keynote 1 (Chair: Donald Bailey)
Prof Kentaro Sano, Center for Computational Science, RIKEN, Japan
FPGA Cluster ESSPER, A Research Platform for Reconfigurable HPC with Supercomputer Fugaku

1:00pm - 1:50pm Full Paper session 2 (Chair: Jieru Zhao)
High Performance Lattice Regression on FPGAs via a High Level Hardware Description Language, Nathan Zhang, Matthew Feldman and Kunle Olukotun
StateLink: FPGA System Debugging via Flexible Simulation/Hardware Integration, Sameh Attia and Vaughn Betz BEST PAPER CANDIDATE
A Streaming Hardware Architecture for Real-Time SIFT Feature Extraction, Hector Li Sanchez and Alan George
Algorithm-Hardware Co-Optimization for Energy-Efficient Drone Detection on Resource-Constrained FPGA, Han-sok Suh, Jian Meng, Ty Nguyen, Shreyas Kolala Venkataramanaiah, Vijay Kumar, Yu Cao and Jae-sun Seo
High-Performance Hardware Implementation of CRYSTALS-Dilithium, Luke Beckwith, Duc Nguyen and Kris Gaj
2:00pm - 2:50pm Short Paper Session 1 (Chair: Bruce Sham)
Profiling-Based Control-Flow Reduction in High-Level Synthesis, Austin Liolli, Omar Ragheb and Jason Anderson
Dataflow Systolic Array Implementations of Exploring Dual-Triangular Structure in QR Decomposition Using High Level Synthesis, Siyang Jiang, Hsi-Wen Chen and Ming-Syan Chen
Parallel-Pipeline Fast Walsh-Hadamard Transform Implementation Using HLS, Andres Manjarrés García, Carlos Osorio Quero, Jose Rangel-Magdaleno, Jose Martinez-Carranza and Daniel Durini
Low Precision Networks for Efficient Inference on FPGAs, Ruth Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft and Mark Gibson
A Hexagon-Based Honeycomb Routing Architecture for FPGA, Kaichuang Shi, Hao Zhou and Lingli Wang
Characterization of IOBUF-based Ring Oscillators, Julia Burgiel, Daniel Esguerra, Ilias Giechaskiel, Shanquan Tian and Jakub Szefer
Total-ionizing-dose tolerance evaluation of an optoelectronic field programmable gate array VLSI in operation, Hiroshi Ito and Minoru Watanabe

Main Day 2

Time Main Day 2: Thursday 9th December 2021
11:00am - 11:50am Full Paper Session 3 (Chair: Morteza Biglari-Abhari)
Efficient Physical Page Migrations in Shared Virtual Memory Reconfigurable Computing Systems, Torben Kalkhof and Andreas Koch
Increasing Memory Efficiency of Hash-Based Pattern Matching for High-Speed Networks, Tomáš Fukač, Jiří Matoušek, Jan Kořenek and Lukáš Kekely
Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases, Johannes Wirth, Jaco Hofmann, Lasse Thostrup, Carsten Binnig and Andreas Koch
Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators, Johan Peltenburg, Ákos Hadnagy, Matthijs Brobbel, Robert Morrow and Zaid Al-Ars
A Modular RFSoC-based Approach to Interface Superconducting Quantum Bits, Richard Gebauer, Nick Karcher and Oliver Sander BEST PAPER CANDIDATE
12:00pm - 12:50pm PhD Forum (Chair: Julian Oppermann)
A High-Precision Flexible Symmetry-Aware Architecture for Element-Wise Activation Functions, Xuan Feng, Yue Li, Yu Qian, Jingbo Gao, Wei Cao and Lingli Wang
High-performance pipeline architecture for packet classification accelerator in DPU, Jing Tan, Gaofeng Lv, Yanni Ma and Guanjie Qiao
Parallelized Technology Mapping to General PLBs by Adaptive Circuit Partitioning, Xiaoxi Wang, Moucheng Yang, Zhen Li and Lingli Wang
Resource-saving FPGA Implementation of the Satisfiability Problem Solver: AmoebaSATslim, Yingjie Yan, Hideharu Amano, Masashi Aono, Kaori Okoda, Shingo Fukuda, Kenta Saito and Seiya Kasai
An area-efficient multiply-accumulation architecture and implementations for time-domain neural processing, Ichiro Kawashima, Yuichi Katori, Takashi Morie and Hakaru Tamukoh
Real-time Implementation of Cyclostationary Analysis using FPGAs, Jingyi Li
1:00pm - 1:50pm Keynote 2 (Chair: Oliver Sinnen)
Assoc Prof Willem van Straten, Institute for Radio Astronomy and Space Research, Auckland University of Technology, NZ
Discovery and Analysis of Radio Pulsars

& TCFPGA Hall of Fame Paper Award
2:00pm - 2:50pm Full Paper Session 4 (Chair: Sharad Sinha)
Efficient Stride 2 Winograd Convolution Method Using Unified Transformation Matrices on FPGA, Chengcheng Huang, Xiaoxiao Dong, Zhao Li, Tengteng Song, Zhenguo Liu and Lele Dong
A High-Performance and Flexible FPGA Inference Accelerator for Decision Forests Based on Prior Feature Space Partitioning, Thiem Van Chu, Ryuichi Kitajima, Kazushi Kawamura, Jaehoon Yu and Masato Motomura BEST PAPER CANDIDATE
LETA: A Lightweight Exchangeable-Track Accelerator for EfficientNet Based on FPGA, Jingbo Gao, Yu Qian, Yihan Hu, Xitian Fan, Wai-Shing Luk, Wei Cao and Lingli Wang
acSLAM: FPGA Accelerated High-Accuracy SLAM with Heapsort and Parallel Keypoint Extractor, Cheng Wang, Yingkun Liu, Kedai Zuo, Jianming Tong, Yan Ding and Pengju Ren
A Unified Accelerator Design for LiDAR SLAM Algorithms for Low-end FPGAs, Keisuke Sugiura and Hiroki Matsutani

Main Day 3

Time Main Day 3: Friday 10th December 2021
11:00am - 11:50am Short Paper Session 2 (Chair: Ray Cheung)
AMAH-Flex: A Modular and Highly Flexible Tool for Generating Relocatable Systems on FPGAs, Najdet Charaf, Christoph Tietz, Michael Raitza, Akash Kumar and Diana Goehringer
On the Performance Effect of Loop Trace Window Size on Scheduling for Configurable Coarse Grain Loop Accelerators, Tiago Santos, Nuno Paulino, João Bispo, João M. P. Cardoso and João C. Ferreira
FPGAs as General-Purpose Accelerators for Non-Experts via HLS: The Graph Analysis Example, Pedro Filipe Silva, João Bispo and Nuno Paulino
Energy-efficient FPGA-accelerated LiDAR-based SLAM for Embedded Robotics, Mario Porrmann, Thomas Wiemann, Marc Rothmann, Marco Tassemeier, Marc Eisoldt, Julian Gaal and Marcel Flottmann
Exponential Sine Sweep Measurement Implementation Targeting FPGA Platforms, Alexander Klemd, Patrick Nowak, Piero Rivera Benois, Etienne Gerat, Bernd Klauer and Udo Zölzer
Efficient queue-balancing switch for FPGAs, Philippos Papaphilippou, Kentaro Sano, Boma A. Adhi and Wayne Luk
In-Storage Computation of Histograms with Differential Privacy, Andrei Tosa, Anca Hangan, Gheorghe Sebestyen and Zsolt István
12:00pm - 12:40pm Journal-track Paper Session (Chair: Deming Chen, Editor-in-Chief TRETS)
Papers published in ACM Transactions on Reconfigurable Technology and Systems (TRETS), Issue 15:1
The Strong Scaling Advantage of FPGAs in HPC for N-Body Simulations, Menzel, Johannes ; Plessl, Christian; Kenter, Tobias
Rethinking Embedded Blocks for Machine Learning Applications, Rasoulinezhad, SeyedRamin ; Roorda, Esther; Wilton, Steve; Leong, Philip; Boland, David
RWRoute: An Open-source Timing-driven Router for Commercial FPGAs, Zhou, Yun ; Maidee, Pongstorn; Lavin, Chris; Kaviani, Alireza; Stroobandt, Dirk
Design and Evaluation of a Tunable PUF Architecture for FPGAs, Streit, Franz-Josef ; Krueger, Paul; Becher, Andreas; Wildermann, Stefan; Teich, Jürgen
1:00pm - 1:45pm Keynote 3 (Chair: Wei Zhang)
Dr Jiansong Zhang, Computing Technology Lab in Damo Academy, Alibaba Group, China
FPGA Accelerated Applications in Extended Cloud Computing
2:00pm - 2:40pm Full Paper Session 5 (Chair: David Boland)
An Efficient RTL Buffering Scheme for an FPGA-Accelerated Simulation of Diffuse Radiative Transfer, Kazuki Furukawa, Tomoya Yokono, Yoshiki Yamaguchi, Kohji Yoshikawa, Norihisa Fujita, Ryohei Kobayashi, Taisuke Boku and Masayuki Umemura
APIR-DSP: An Approximate PIR-DSP Architecture for Error-Tolerant Applications, Yuan Dai, Simin Liu, Yao Lu, Hao Zhou, Seyedramin Rasoulinezhad, Philip H.W. Leong and Lingli Wang
FastCGRA: A Modeling, Evaluation, and Exploration Platform for Large-Scale Coarse-Grained Reconfigurable Arrays, Su Zheng, Kaisen Zhang, Yaoguang Tian, Wenbo Yin, Lingli Wang and Xuegong Zhou
General Routing Architecture Modelling and Exploration for Modern FPGAs, Jiadong Qian, Yuhang Shen, Hao Zhou and Lingli Wang
2:40pm - 3:00pm Closing Session, including Best Paper Award


Tutorial day 1, Monday 6th December 2021

Improving hands-on lab sessions with remote FPGAs: a practical case, 11:00am - 12:00pm
Presenter: Pablo Orduña, Co-founder & CEO at LabsLand
Additional registration for this tutorial is recommended (free).

Description: During the COVID-19 pandemic, a pre-existing trend of digitization of educational resources was accelerated. One of the traditional problems with online training and learning is those labs where real equipment is required, as in the case of FPGAs. In this course, LabsLand will present its network of remotely accessible laboratories of different fields (in universities in 14 countries) that are available for educational institutions worldwide, without having to purchase any equipment, while controlling real hardware located somewhere else on the Internet. In particular, the course will have a hands-on part using the Intel FPGAs, relying on remotely accessible FPGA boards for Cyclone V, showing how it can be used to design in an HDL, synthesize the code relying on Intel Quartus, and seeing it with a camera and different peripherals in the real Intel FPGA.

Bio: Pablo Orduna is co-founder and CEO of LabsLand; a global network of remote laboratories where universities and schools can access laboratories from other institutions. Pablo Orduña got his PhD in Computer Science at the University of Deusto in 2013, during which he was Visiting Researcher at the Massachusetts Institute of Technology (getting the MIT TR35 Spain award for top 10 innovators under 35 in 2012); as well as graduated from the Global Solutions Program of the Singularity University in NASA Ames. As part of his research in DeustoTech, he has co-authored over 150 scientific articles, participated in large-scale European projects such as Go-Lab (FP7) and Next-Lab (H2020), and worked in different professional associations (Vice-Chair of IEEE Education Society Standardization Committee 2015-2018; Senior Member IEEE and HKN since 2020; Executive Member International Association of Online Engineering 2015-2019).

RVfpga-SoC: How to go from a RISC-V Core to a RISC-V SoC, 12:10pm - 1:10pmm
Presenter: Zubair Kakakhel (AZKY Tech Labs), Imagination Worldwide University Programme

Description: RVfpga-SoC is a freely available set of teaching materials developed by a wide range of experts from Academia and Industry.

There are many RISC-V CPU Cores openly available. These cores usually come with a reference System on Chip (SoC) design as well.

A lot of material is usually focused around the CPU core itself and its architecture. However, there is little material that covers how an SoC is made from the basic building blocks.

RVfpga-SoC is a set of teaching materials that focuses on this area. The materials consist of 5 detailed labs. Lab 1 starts from basic building blocks such as the SweRV EH1 CPU Core, bootROM, Interconnect, GPIO Controller etc and creates a System on Chip design using a block design approach. Lab 2 shows how to run bare-metal code on our new System on Chip. Lab 3 shows an alternative to the block design approach to building SoCs using the FuseSoC build system. Lab 4 shows runs the Zephyr RTOS on the SoC. Lab 5 shows how to run the Tensorflow Lite Hello World example on the SoC.

The tutorial will give a detailed overview of the RVfpga SoC package and its Labs.

Bio: Zubair Kakakhel received his MSc in Embedded systems engineering from the University of Leeds, UK in 2013. He has since worked as a Linux kernel engineer and team lead for various organizations including Imagination Technologies, MIPS and Balena. Zubair founded his software consulting firm, AZKY Tech Labs in 2020. Currently, he and his team are delivering software solutions in a variety of areas using cutting edge tools and platforms. Zubair is passionate about giving back to the community and sharing his learnings.

Efficient Sharing of FPGA Resources in HLS, 1:20pm - 1:50pm (pre-recorded)
Presenter: Richard Chamberlain, Principal Systems Engineer, Bittware/Molex

Description: It’s common to have multiple FPGA kernels working independently on shared interfaces or attached memories including HBM2. Arbitrating access to these “endpoints” can be complex and unpredictable, and simple interleaving or multiplexing is inadequate. BittWare has created a lightweight configurable crossbar switch IP to alleviate this problem.

The crossbar switch IP is written using Intel oneAPI, allowing adjustments to fit the application requirements. This is performed without requiring extra shim logic or design restriction required by a typical fixed IP implementation, such as modifying the data path widths or adjusting the number of ports required.

Richard will explain BittWare’s Crossbar Switch implementation along with three use cases.

Bio: Richard started his career at MBDA UK, before joining Nallatech in 2001. For last 20 years he has pioneered using FPGAs for HPC and is a trusted industry expert in the field of heterogenous acceleration. Richard currently works as a Principal Systems Engineer in the applications team at BittWare, part of the Molex group.

Introduction to Creating High-Throughput Designs on an FPGA with oneAPI, 2:00pm - 3:3pm
Presenter: Mike Tucker, Intel

Description: This presentation will start by introducing some basic concepts describing how algorithms written in C++ are mapped to the FPGA using the Intel oneAPI Compiler. We will then look at a typical design flow where reports from the oneAPI Compiler can be used to optimize the design without needing to do a full FPGA compile. Specific areas of the reports that are critical for performance, such the loop reports, will be emphasized.

Bio: Mike spent much of his career as an Altera FPGA customer designing video processing equipment for the broadcast, streaming, and video production industries. He has been in the High Level Design group in Intel PSG for over 5 years, and is currently focused on writing example designs in oneAPI and helping to improve the performance of the DPC++ compiler.

Tutorial day 2, Tuesday 7th December 2021

Developing HPC accelerators using Xilinx Vitis Libraries, 11:00am-5:00pm
Presenter: Dr Parimal Patel, XUP Senior Systems Engineer
Additional registration for this tutorial is required (free).

Description: This tutorial will introduce the Xilinx Vitis development environment for developing FPGA accelerators for HPC applications using Vitis Accelerated libraries. Vitis supports C++, C and OpenCL. RTL design flows are also supported for experienced hardware developers. Each of these flows will be discussed along with the open-source Xilinx Runtime Library and Vitis open-source accelerated libraries.
The latest available cloud and local hardware will be covered including AWS-F1 and the range of Alveo accelerator boards. Topics to be covered:

  • Xilinx Vitis development framework, design flows, and use cases
  • AWS and Alveo boards for FPGA acceleration
  • Demonstration and hands-on-experience
    • Vitis development flow
    • Developing, profiling and optimizing applications for FPGA-based platforms
    • Developing a vision accelerator using the Vision library available as part of Xilinx Vitis Accelerated libraries Git repository
    • Developing an accelerator to find the shortest path using the Graph library available as part of Xilinx Vitis Accelerated libraries Git repository

Bio: Parimal received Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin, Texas in 1986 before joining the University of Texas at San Antonio as an Assistant Professor. He was a Full Professor before joining Xilinx. Parimal has always enjoyed teaching and developing new courses. He has been with Xilinx for over 20 years developing new courses, updating current courses, and delivering workshops worldwide. He is actively engaged in providing training in the areas of High-Level Synthesis, Embedded Systems, DSP Design Flow, Dynamic Partial Reconfiguration, Python Productivity on Zynq (PYNQ), and Accelerated Cloud Computing on AWS with Vitis to list a few.