F Nous contacter

Multi angle

H 1 What does back propagation compute?

Auteurs : Pauwels, Edouard (Auteur de la Conférence)
CIRM (Editeur )

    Loading the player...

    Résumé : We are interested in nonsmooth analysis of backpropagation as implemented in modern machine learning librairies, such as Tensorflow or Pytorch. First I will illustrate how blind application of
    differential calculus to nonsmooth objects can be problematic, requiring a proper mathematical model.
    Then I will introduce a weak notion of generalized derivative, named conservativity, and illustrate how it complies with calculus and optimization for well structured objects. We provide stability results for empirical risk minimization similar as in the smooth setting for the combination of nonsmooth automatic differentiation, minibatch stochastic approximation and first order optimization. This is joint work with Jérôme Bolte.

    Keywords : optimization; non-smooth analysis; machine learning

    Codes MSC :
    65K05 - Mathematical programming methods
    65K10 - Optimization and variational techniques
    68T99 - None of the above but in this section

    Ressources complémentaires :

      Informations sur la Vidéo

      Réalisateur : Hennenfent, Guillaume
      Langue : Anglais
      Date de publication : 06/04/2020
      Date de captation : 09/03/2020
      Collection : Control Theory and Optimization ; Mathematics in Science and Technology
      Sous collection : Research talks
      Format : MP4
      Domaine : Control Theory & Optimization ; Mathematics in Science & Technology
      Durée : 00:37:49
      Audience : Chercheurs ; Doctorants , Post - Doctorants
      Download : https://videos.cirm-math.fr/2020-03-09_Pauwels.mp4

    Informations sur la rencontre

    Nom de la rencontre : Optimization for Machine Learning / Optimisation pour l’apprentissage automatique
    Organisateurs de la rencontre : Boyer, Claire ; d'Aspremont, Alexandre ; Gramfort, Alexandre ; Salmon, Joseph ; Villar, Soledad
    Dates : 09/03/2020 - 13/03/2020
    Année de la rencontre : 2020
    URL Congrès : https://conferences.cirm-math.fr/2133.html

    Citation Data

    DOI : 10.24350/CIRM.V.19623303
    Cite this video as: Pauwels, Edouard (2020). What does back propagation compute?. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19623303
    URI : http://dx.doi.org/10.24350/CIRM.V.19623303

    Voir aussi


    1. ANIL, Cem, LUCAS, James, et GROSSE, Roger. Sorting out lipschitz function approximation. arXiv preprint arXiv:1811.05381, 2018. - https://arxiv.org/abs/1811.05381

    2. DAVIS, Damek, DRUSVYATSKIY, Dmitriy, KAKADE, Sham, et al. Stochastic subgradient method converges on tame functions. Foundations of computational mathematics, 2018, p. 1-36. - https://doi.org/10.1007/s10208-018-09409-5

    3. ABADI, Martín, BARHAM, Paul, CHEN, Jianmin, et al. Tensorflow: A system for large-scale machine learning. In : 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016. p. 265-283. - https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

    4. GUIDE, A. Hitchhiker’s. Infinite dimensional analysis. Springer, 2006. -

    5. ATTOUCH, Hedy, GOUDOU, Xavier, et REDONT, Patrick. The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Communications in Contemporary Mathematics, 2000, vol. 2, no 01, p. 1-34. - https://doi.org/10.1142/S0219199700000025

    6. AUBIN, Jean-Pierre et CELLINA, Arrigo. Differential inclusions: set-valued maps and viability theory. 1984. -

    7. AUBIN, Jean-Pierre et FRANKOWSKA, Hélène. Set-valued analysis. Springer Science & Business Media, 2009. -

    8. PASZKE, Adam, GROSS, Sam, CHINTALA, Soumith, et al. Automatic differentiation in pytorch. 2017. - https://dl.acm.org/doi/abs/10.5555/3122009.3242010

    9. BENAÏM, Michel. Dynamics of stochastic approximation algorithms. In : Seminaire de probabilites XXXIII. Springer, Berlin, Heidelberg, 1999. p. 1-68. - http://dx.doi.org/10.1007/BFb0096509

    10. BENAÏM, Michel, HOFBAUER, Josef, et SORIN, Sylvain. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 2005, vol. 44, no 1, p. 328-348. - https://doi.org/10.1137/S0363012904439301

    11. BOLTE, Jérôme, DANIILIDIS, Aris, LEWIS, Adrian, et al. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 2007, vol. 18, no 2, p. 556-572. - https://doi.org/10.1137/060670080

    12. BOLTE, Jérôme, SABACH, Shoham, et TEBOULLE, Marc. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 2014, vol. 146, no 1-2, p. 459-494. - https://doi.org/10.1007/s10107-013-0701-9

    13. BORKAR, Vivek S. Stochastic approximation: a dynamical systems viewpoint. Springer, 2009. -

    14. BORWEIN, Jonathan et LEWIS, Adrian S. Convex analysis and nonlinear optimization: theory and examples. Springer Science & Business Media, 2010. -

    15. BORWEIN, Jonathan M. et MOORS, Warren B. Essentially smooth Lipschitz functions. Journal of functional analysis, 1997, vol. 149, no 2, p. 305-351. - https://doi.org/10.1006/jfan.1997.3101

    16. BORWEIN, Jonathan M. et MOORS, Warren B. A chain rule for essentially smooth Lipschitz functions. SIAM Journal on Optimization, 1998, vol. 8, no 2, p. 300-308. - https://doi.org/10.1137/S1052623496297838

    17. BORWEIN, Jonathan, MOORS, Warren, et WANG, Xianfu. Generalized subdifferentials: a Baire categorical approach. Transactions of the American Mathematical Society, 2001, vol. 353, no 10, p. 3875-3893. - http://dx.doi.org/10.1090/S0002-9947-01-02820-3

    18. BOTTOU, Léon et BOUSQUET, Olivier. The tradeoffs of large scale learning. In : Advances in neural information processing systems. 2008. p. 161-168. - http://papers.nips.cc/paper/3323-the-tradeoffs-of-large-scale-learning

    19. BOTTOU, Léon, CURTIS, Frank E., et NOCEDAL, Jorge. Optimization methods for large-scale machine learning. Siam Review, 2018, vol. 60, no 2, p. 223-311. - https://doi.org/10.1137/16M1080173

    20. CASTERA, Camille, BOLTE, Jérôme, FÉVOTTE, Cédric, et al. An Inertial Newton Algorithm for Deep Learning. arXiv preprint arXiv:1905.12278, 2019. - https://arxiv.org/abs/1905.12278

    21. CLARKE, Frank H. Optimization and nonsmooth analysis. Siam, 1990. -

    22. CHIZAT, Lenaic et BACH, Francis. On the global convergence of gradient descent for over-parameterized models using optimal transport. In : Advances in neural information processing systems. 2018. p. 3036-3046. - http://papers.nips.cc/paper/7567-on-the-global-convergence-of-gradient-descent-for-over-parameterized-models-using-optimal-transport

    23. CORLISS, George, FAURE, Christele, GRIEWANK, Andreas, et al. (ed.). Automatic differentiation of algorithms: from simulation to optimization. Springer Science & Business Media, 2002. -

    24. CORREA, RAFAEL et JOFRE, A. Tangentially continuous directional derivatives in nonsmooth analysis. Journal of optimization theory and applications, 1989, vol. 61, no 1, p. 1-21. - https://doi.org/10.1007/BF00940840

    25. COSTE, M. An introduction to o-minimal geometry, Inst. Rech. Math., Univ. de Rennes, 1999. -

    26. DAVIS, Damek, DRUSVYATSKIY, Dmitriy, KAKADE, Sham, et al. Stochastic subgradient method converges on tame functions. Foundations of computational mathematics, 2018, p. 1-36. - https://doi.org/10.1007/s10208-018-09409-5

    27. VAN DEN DRIES, Lou, MILLER, Chris, et al. Geometric categories and o-minimal structures. Duke Math. J, 1996, vol. 84, no 2, p. 497-540. -

    28. EVANS, L. C. et GARIEPY, R. F. Measure Theory and Fine Properties of Functions, Revised edn. Textbooks in Mathematics. CRC Press, Boca Raton, 2015. -

    29. GLOROT, Xavier, BORDES, Antoine, et BENGIO, Yoshua. Deep sparse rectifier neural networks. In : Proceedings of the fourteenth international conference on artificial intelligence and statistics. 2011. p. 315-323. - http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

    30. GRIEWANK, Andreas et WALTHER, Andrea. Evaluating derivatives: principles and techniques of algorithmic differentiation. Siam, 2008. -

    31. GRIEWANK, Andreas. On stable piecewise linearization and generalized algorithmic differentiation. Optimization Methods and Software, 2013, vol. 28, no 6, p. 1139-1178. - https://doi.org/10.1080/10556788.2013.796683

    32. Griewank A., Walther A., Fiege S. and Bosse T. (2016).
      On Lipschitz optimization based on gray-box piecewise linearization.
      Mathematical Programming, 158(1-2), 383-415. - http://dx.doi.org/10.1007/s10107-015-0934-x

    33. IOFFE, A. D. Nonsmooth analysis: differential calculus of nondifferentiable mappings. Transactions of the American Mathematical Society, 1981, vol. 266, no 1, p. 1-56. - http://dx.doi.org/10.1090/S0002-9947-1981-0613784-7

    34. IOFFE, Alexander D. Variational analysis of regular mappings. Springer Monographs in Mathematics. Springer, Cham, 2017. - http://dx.doi.org/10.1007/978-3-319-64277-2

    35. KAKADE, Sham M. et LEE, Jason D. Provably correct automatic sub-differentiation for qualified programs. In : Advances in neural information processing systems. 2018. p. 7125-7135. - http://papers.nips.cc/paper/7943-provably-correct-automatic-sub-differentiation-for-qualified-programs

    36. KURDYKA, Krzysztof. On gradients of functions definable in o-minimal structures. In : Annales de l'institut Fourier. 1998. p. 769-783. - https://doi.org/10.5802/aif.1638

    37. KURDYKA, Krzysztof, MOSTOWSKI, Tadeusz, et PARUSINSKI, Adam. Proof of the gradient conjecture of R. Thom. Annals of Mathematics, 2000, vol. 152, no 3, p. 763-792. - http://www.jstor.org/stable/2661354

    38. KUSHNER, Harold et YIN, G. George. Stochastic approximation and recursive algorithms and applications. Springer Science & Business Media, 2003. -

    39. LECUN, Yann, BENGIO, Yoshua, et HINTON, Geoffrey. Deep learning. nature, 2015, vol. 521, no 7553, p. 436-444. - https://doi.org/10.1038/nature14539

    40. LJUNG, Lennart. Analysis of recursive stochastic algorithms. IEEE transactions on automatic control, 1977, vol. 22, no 4, p. 551-575. - https://doi.org/10.1109/TAC.1977.1101561

    41. MAJEWSKI, Szymon, MIASOJEDOW, Błażej, et MOULINES, Eric. Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv preprint arXiv:1805.01916, 2018. - https://arxiv.org/abs/1805.01916

    42. MOHAMMADI, Bijan et PIRONNEAU, Olivier. Applied shape optimization for fluids. Oxford university press, 2010. -

    43. MOULINES, Eric et BACH, Francis R. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In : Advances in Neural Information Processing Systems. 2011. p. 451-459. - http://papers.nips.cc/paper/4316-non-asymptotic-analysis-of-stochastic-approximation-algorithms-for-machine

    44. MOREAU, Jean Jacques. Fonctionnelles sous-différentiables. 1963. -

    45. MORDUKHOVICH, Boris S. Variational analysis and generalized differentiation I: Basic theory. Springer Science & Business Media, 2006. -

    46. PASZKE, Adam, GROSS, Sam, CHINTALA, Soumith, et al. Automatic differentiation in pytorch. 2017. - https://openreview.net/forum?id=BJJsrmfCZ

    47. ROBBINS, Herbert et MONRO, Sutton. A stochastic approximation method. The annals of mathematical statistics, 1951, p. 400-407. - https://www.jstor.org/stable/2236626

    48. ROCKAFELLAR, Ralph Tyrrell. Convex functions and dual extremum problems. 1963. Thèse de doctorat. Harvard University. -

    49. ROCKAFELLAR, Ralph. On the maximal monotonicity of subdifferential mappings. Pacific Journal of Mathematics, 1970, vol. 33, no 1, p. 209-216. - http://dx.doi.org/10.2140/pjm.1970.33.209

    50. ROCKAFELLAR, R. T. et WETS, R. J. B. Variational Analysis. 1998. -

    51. RUMELHART, David E., HINTON, Geoffrey E., et WILLIAMS, Ronald J. Learning representations by back-propagating errors. nature, 1986, vol. 323, no 6088, p. 533-536. - https://doi.org/10.1038/323533a0

    52. SPEELPENNING, Bert. Compiling fast partial derivatives of functions given by algorithms. Illinois Univ., Urbana (USA). Dept. of Computer Science, 1980. - https://doi.org/10.2172/5254402

    53. THIBAULT, Lionel. On generalized differentials and subdifferentials of Lipschitz vector-valued functions. Nonlinear Analysis: Theory, Methods & Applications, 1982, vol. 6, no 10, p. 1037-1053. - https://doi.org/10.1016/0362-546X(82)90074-8

    54. THIBAULT, Lionel et ZAGRODNY, Dariusz. Integration of subdifferentials of lower semicontinuous functions on Banach spaces. Journal of Mathematical Analysis and Applications, 1995, vol. 189, no 1, p. 33-58. - https://doi.org/10.1006/jmaa.1995.1003

    55. THIBAULT, Lionel et ZLATEVA, Nadia. Integrability of subdifferentials of directionally Lipschitz functions. Proceedings of the American Mathematical Society, 2005, p. 2939-2948. - https://www.jstor.org/stable/4097908

    56. VALADIER, Michel. Entrainement unilateral, lignes de descente, fonctions lipschitziennes non pathologiques. CRAS Paris, 1989, vol. 308, p. 241-244. -

    57. WANG, Xianfu. Pathological Lipschitz function in R (n). 1995. Thèse de doctorat. Theses (Dept. of Mathematics and Statistics)/Simon Fraser University. -

Imagette Video