Warp Processors: Self-Optimizing Chips
Warp processors are hybrid SoC (system on a chip) devices that dynamically optimize software by synthesizing hardware implemented in an on-chip FPGA. From a software developer’s point of view, a warp processor initially executes an application like any other microprocessor, but after some period of time the application transparently executes more efficiently, with improved performance and reduced energy. This transparency allows for synthesis to be integrated into any existing application development tool flow, allowing developers to use their existing languages and compilers. Warp processors completely hide synthesis from software developers, who often avoid hardware design due to the difficult and time-consuming process of register-transfer level specification. Also, the dynamic nature of warp processing enables dynamic optimizations not possible in existing static approaches, such as phase-based optimizations.
To perform synthesis at runtime, warp processors have a specialized architecture capable of profiling the executing software, decompiling computation kernels, synthesizing the decompiled kernels, and then mapping, placing, and routing the kernels into an on-chip FPGA. The main challenge in the design of warp processors is the design of these CAD tools, which must run in an on-chip environment – a difficult task considering these tools typically require power workstations. We have currently implemented a complete on-chip CAD tool flow that executes in just several seconds on an ARM microprocessor, resulting in a hardware/software system that is often 10x faster than software execution. We are currently extending warp processors to handle multithreaded applications, by synthesizing custom accelerators for executing threads. Early results show that multithreaded warp processing can achieve more than 100x speedups compared to software execution on multi-core systems with up to 64 cores.
Synthesis from Software Binaries
Much of my (Greg Stitt) research has focused on one of the enabling technologies of warp processors – synthesis from software binaries. Because the dynamic synthesis performed by warp processors must be performed on a software binary as opposed to high-level code, the resulting hardware can potentially be much slower, due to the loss of high-level information during software compilation. To make synthesis from software binaries feasible, I have adapted existing decompilation techniques and introduced new techniques to recover high-level information needed for effective synthesis. By using these techniques, I have shown that for many applications, including a commercial h.264 decoder, synthesis from software binaries can in fact achieve similar, or even identical results compared to high-level synthesis approaches. Synthesis from software binaries can also be used independently of warp processors, providing similar transparency 0advantages for desktop CAD, in addition to supporting synthesis of library code, legacy code, and hand-optimized assembly.