En poursuivant votre navigation sur ce site, vous acceptez l'utilisation d'un simple cookie d'identification. Aucune autre exploitation n'est faite de ce cookie. OK
1

Detection and correction of silent errors in the Conjugate Gradient algorithm

Sélection Signaler une erreur
Multi angle
Auteurs : Meurant, Gérard (Auteur de la Conférence)
CIRM (Editeur )

Loading the player...

Résumé : There are more and more computing elements in modern supercomputers. This increases the probability of computer errors. Errors that do not stop the computation are called soft errors or silent errors. Of course, they could have a negative impact on the output of the code. So, it is of interest to be able to detect these silent errors and to correct them.
In this talk we are concerned with the detection and correction of silent errors in the conjugate gradient (CG) algorithm to solve linear systems Ax = b with a symmetric positive definite matrix A. Silent errors in CG may affect or even prevent the convergence of the algorithm. We propose a new way to detect silent errors using a scalar relation that must be satisfied by CG variables,
$\alpha_{2 k-1}\tfrac{\left(A p_{k-1}, A p_{k-1}\right)}{\left(r_{k-1}, r_{k-1}\right)}=1+\beta_{k},(1)$
where rj's are the residual vectors, pj's the descent directions and
$\alpha_{k-1}=\tfrac{\left(r_{k-1}, r_{k-1}\right)}{\left(\mathrm{p}_{\mathrm{k}-1}, \mathrm{Ap}_{\mathrm{k}-1}\right)}$, $\beta_{\mathrm{k}}=\frac{\left(\mathrm{r}_{\mathrm{k}}, \mathrm{r}_{\mathrm{k}}\right)}{\left(r_{k-1}, r_{k-1}\right)}$
are the coefficients computed in $\mathrm{CG}$.
We study how relation (1) is modified in finite precision arithmetic and define a criterion to detect when this relation is not satisfied.
Checking relation (1) involves computing an additional dot product, but, as it was shown some time ago in [1] and more recently in [2], relation (1) can be used to introduce more parallelism in the algorithm.
Assuming that the input data $(A, b)$ is not corrupted, we model silent errors by bit flips in the output of some CG steps. When an error is detected in some iteration $\mathrm{k}$, we could restore the CG data from iteration $k-2$ to be able to continue the computation safely.
Numerical experiments will show the efficiency of this approach.

Keywords : conjugate gradient algorithm; silent error; detection and correction

Codes MSC :
65F10 - Iterative methods for linear systems
65F30 - Other matrix algorithms
65F50 - Sparse matrices

    Informations sur la Vidéo

    Réalisateur : Recanzone, Luca
    Langue : Anglais
    Date de publication : 26/11/2021
    Date de captation : 09/11/2021
    Sous collection : Research talks
    arXiv category : Numerical Analysis
    Domaine : Numerical Analysis & Scientific Computing ; Computer Science
    Format : MP4 (.mp4) - HD
    Durée : 00:24:30
    Audience : Researchers
    Download : https://videos.cirm-math.fr/2021-11-9_Meurant.mp4

Informations sur la Rencontre

Nom de la rencontre : Numerical Methods and Scientific Computing / Méthodes numériques et calcul scientifique
Organisateurs de la rencontre : Beckermann, Bernhard ; Brezinski, Claude ; da Rocha, Zélia ; Redivo-Zaglia, Michela ; Rodriguez, Giuseppe
Dates : 08/11/2021 - 12/11/2021
Année de la rencontre : 2021
URL Congrès : https://conferences.cirm-math.fr/2431.html

Données de citation

DOI : 10.24350/CIRM.V.19829403
Citer cette vidéo: Meurant, Gérard (2021). Detection and correction of silent errors in the Conjugate Gradient algorithm. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19829403
URI : http://dx.doi.org/10.24350/CIRM.V.19829403

Voir aussi

Bibliographie



Sélection Signaler une erreur