

Besides channels, we propose new schemes to enable multi-kernel pipelines. In this paper, we propose a source-to-source compiler framework, MKPipe, for optimizing multi-kernel workloads in OpenCL for FPGA. However, existing works either focus primarily on optimizing single kernels or solely depend on channels to design multi-kernel pipelines. Recent works have shown that code optimization at the OpenCL level is important to achieve high computational efficiency. OpenCL for FPGA enables developers to design FPGAs using a programming model similar for processors. Our evaluation on different FPGA boards shows that FOS can provide performance improvements in both single-tenant and multi-tenant environments while substantially reducing the development time and, at the same time, improving flexibility. Further, to dynamically maximise the utilisation transparently from the users, FOS employs resource-elastic scheduling to arbitrate the FPGA resources in both time and spatial domain for any type of accelerators. To solve this, we introduce a modular FPGA operating system - FOS, which adopts a modular FPGA development flow to allow each system component to be changed and be agnostic to the heterogeneity of EDA tool versions, hardware and software layers. However, current FPGA systems fail to achieve modularity and support for multi-tenancy due to dependencies between system components and lack of standardised abstraction layers. Moreover, these FPGA systems need to be maintainable and adaptable to changing workloads while improving accessibility for the application developers. With FPGAs now being deployed in the cloud and at the edge, there is a need for scalable design methods which can incorporate the heterogeneity present in the hardware and software components of FPGA systems. We make the benchmarks and results completely open-source to give opportunities for the community to perform additional analyses and provide a repository of well-documented designs for follow-on research. We describe the resulting design spaces, and perform a statistical analysis of the optimization configurations which provides valuable architecture insights to FPGA developers. We outfitted each benchmark with a range of optimization parameters (or knobs), compiled over 8300 unique designs using the Altera OpenCL SDK, executed them on a Terasic DE5 board, and recorded their corresponding performance and utilization characteristics. To solve this problem, we present an OpenCL FPGA benchmark suite.

However this requires a significant amount of design space data that is currently unavailable or difficult to generate. Thus, understanding the design space, and guiding the optimization process is a crucial requirement for enabling the widespread adoption of these high-level synthesis tools. This creates a significant barrier for design optimization since even experts typically need to test many designs due to the non-obvious interactions between the different optimizations. Unfortunately, these tools have a complex compilation process that can take several hours to synthesize a single design. High-level synthesis tools allow programmers to use OpenCL to create FPGA designs.
