The project
Automated methodology for production and execution of
data-centric multi-level approximate equivalent applications for
heterogeneous computing platforms (MIS-5005377) studies the
design of hardware accelerators that trade-off accuracy with the
performance (i.e. maximum operation frequency, application's throughput,
energy consumption, etc) metrics. Since the failure of
Dennard scaling, energy efficiency has become a first-class design
concern in computer systems. Its potential benefits go beyond reduced
power demands in servers and longer battery life in mobile devices,
since improving energy efficiency has become a requirement due to limits
of device scaling and the well-known "dark silicon" or "power wall"
problem. This project proposes a framework for exploiting the intrinsic
error resilience of a large number of application domains in order to
produce approximate solutions as a design alternative for energy
efficient system design, trading accuracy for significant energy gains.
Emphasis at this task is given to the hardware/software co-design of
these hardware accelerators. Instead of well-established solutions
(mainly at hardware level), the proposed framework applies a multi-level
approximation technique in order to maximize the potential
energy-savings of the approximate computing application with the minimum
possible controllable error. The underline infrastructure for the
execution of approximate kernels is a state-of-the-art many-accelerator
hardware platform provided by Maxeler. This platform is employed in
various application domains with computational-intensive kernels (e.g.
market analysis, weather forecast, seismology, etc). The programming
model for this platform considers that application's kernels for
acceleration are developed as Intellectual Property (IP) cores, also
known as DataFlow Engine (DFE), where inputs are fed based on a
data-flow approach in order to maximize application's throughput. For
the scopes of the propsoed framework we employ the Maxeler MPC-X. More
precisily, the utilized platform contains 8 DFE accelerators, each of
which has 48GB DRAM as LMEM. In order to enable DFE interconnection, it
is realized with the MaxRing technology. The platform programming is
performed with the MaxCompiler compiler), which utilizes in an optimal
way the available hardware resources. Through this compiler it is
possible to appropriately configure both the processing cores, as well
as the memory hierarchy per DFE, based on the inherent application's
requirements.
|