Intel Nehalem Processor Core Made FPGA Synthesizable

Graham Schelle,  Jamison Collins,  Ethan Schuchman,  Perry Wang,  Xiang Zou,  Gautham Chinya,  Ralf Plate,  Thorsten Mattner,  Franz Olbrich,  Per Hammarlund,  Ronak Singhal,  Sebastian Steibl,  Hong Wang
Intel


Abstract

We present a FPGA-synthesizable version of the Intel Nehalem processor core, synthesized, partitioned and mapped to a multi-FPGA emulation system consisting of Xilinx Virtex4 and Virtex5 FPGAs. To our knowledge, this is the first time a modern state-of-the-art x86 design with the out-of-order micro-architecture is made FPGA synthesizable and capable of high-speed cycle-accurate emulation. Unlike the Intel Atom core which was much simpler and made FPGA synthesizable on a single Xilinx Virtex5 in a previous endeavor, the complexity of the Nehalem core leads to aggressive clockgating, double phase latch RAMs, and RTL constructs that have no true equivalent in FPGA architectures. Despite these challenges, we are successful in porting the RTL, partitioning across 5 FPGAs, and emulating the core at 520 KHz with only 5% code modifications. To verify full functionality, we are able to boot Linux and execute standard x86 workloads with all architectural features enabled. We share our experience and methodology on how to cope with some of the most complex processor designs, in order to make those structures FPGA synthesizable and emulation ready. We use this synthesizable processor to compare Nehalem against Intel's in-order processor, Atom, demonstrating a 2-4x performance improvement across workloads that stress Nehalem's out-of-order capabilities.