Recent progress in program analysis has produced tools that are able to compute upper bounds on the use of dynamic memory. This opens up a space for the use of dynamic memory abstraction in high-level synthesis. In this paper, we explain how to design hardware using C programs with malloc() and free() and describe in detail a compiler that inputs a C program and creates a parallel execution model of the program that maps naturally into a hardware description. As demonstrated by our experiments, the generated circuits provide improvement by a factor up to 1.9 in terms of clock frequency and a factor up to 2.7 in terms of clock cycles over the previous work. This provides for an aggregate performance improvement by a factor of up to 5.