Skip to main navigation Skip to search Skip to main content

Maintaining Sanity: Algorithm-based Comprehensive Fault Tolerance for CNNs

  • Jinhyo Jung
  • , Hwisoo So
  • , Woobin Ko
  • , Sumedh Shridhar Joshi
  • , Yebon Kim
  • , Yohan Ko
  • , Aviral Shrivastava
  • , Kyoungwoo Lee
  • Yonsei University
  • Arizona State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

As the deployment of neural networks in safety-critical applications proliferates, it becomes imperative that they exhibit consistent and dependable performance amidst hardware malfunctions. Several protection schemes have been proposed to protect neural networks, but they suffer from huge overheads or insufficient fault coverage. This paper presents Maintaining Sanity, a comprehensive and efficient protection technique for CNNs. Maintaining Sanity extends the state-of-the-art algorithm-based fault tolerance for CNN, utilizing hamming codes and checkpointing to correct over 99.6% of critical faults with about 72% runtime overhead and minimal memory overhead compared to traditional triple modular redundancy (TMR) techniques.

Original languageEnglish
Title of host publicationProceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798400706011
DOIs
StatePublished - 7 Nov 2024
Event61st ACM/IEEE Design Automation Conference, DAC 2024 - San Francisco, United States
Duration: 23 Jun 202427 Jun 2024

Publication series

NameProceedings - Design Automation Conference
ISSN (Print)0738-100X

Conference

Conference61st ACM/IEEE Design Automation Conference, DAC 2024
Country/TerritoryUnited States
CitySan Francisco
Period23/06/2427/06/24

Keywords

  • ABFT
  • CNN
  • Fault Injection
  • Reliability
  • Reliable Machine Learning
  • Soft Errors
  • Transient Faults

Fingerprint

Dive into the research topics of 'Maintaining Sanity: Algorithm-based Comprehensive Fault Tolerance for CNNs'. Together they form a unique fingerprint.

Cite this