Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters
Articolo
Data di Pubblicazione:
2014
Citazione:
Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters / Rahimi, Abbas; Cesarini, Daniele; Marongiu, Andrea; Gupta Rajesh, K.; Benini, Luca. - In: IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS. - ISSN 2156-3357. - STAMPA. - 4:2(2014), pp. 216-229. [10.1109/JETCAS.2014.2315883]
Abstract:
Manufacturing and environmental variations cause
timing errors in microelectronic processors that are typically
avoided by ultra-conservative multi-corner design margins or
corrected by error detection and recovery mechanisms at the
circuit-level. In contrast, we present here runtime software support
for cost-effective countermeasures against hardware timing
failures during system operation. We propose a variability-aware
OpenMP (VOMP) programming environment, suitable for
tightly-coupled shared memory processor clusters, that relies
upon modeling across the hardware/software interface. VOMP is
implemented as an extension to the OpenMP v3.0 programming
model that covers various parallel constructs, including ,
, and . Using the notion of work-unit vulnerability
(WUV) proposed here, we capture timing errors caused by
circuit-level variability as high-level software knowledge. WUV
consists of descriptive metadata to characterize the impact of variability
on different work-unit types running on various cores. As
such, WUV provides a useful abstraction of hardware variability
to efficiently allocate a given work-unit to a suitable core for
execution. VOMP enables hardware/software collaboration with
online variability monitors in hardware and runtime scheduling
in software. The hardware provides online per-core characterization
of WUV metadata. This metadata is made available by
carefully placing key data structures in a shared L1 memory
and is used by VOMP schedulerss. Our results show that VOMP
greatly reduces the cost of timing error recovery compared to the
baseline schedulers of OpenMP, yielding speedup of 3%–36%
for tasks, and 26%–49% for sections. Further, VOMP reaches
energy saving of 2%–46% and 15%–50% for tasks, and sections,
respectively.
Tipologia CRIS:
Articolo su rivista
Keywords:
Cross-layer variability management; OpenMP; processor clusters; recovery; robust system design; scheduling; timing errors; variations; Electrical and Electronic Engineering
Elenco autori:
Rahimi, Abbas; Cesarini, Daniele; Marongiu, Andrea; Gupta Rajesh, K.; Benini, Luca
Link alla scheda completa:
Pubblicato in: