En poursuivant votre navigation sur ce site, vous acceptez l'utilisation d'un simple cookie d'identification. Aucune autre exploitation n'est faite de ce cookie. OK
1

What does back propagation compute?

Bookmarks Report an error
Multi angle
Authors : Pauwels, Edouard (Author of the conference)
CIRM (Publisher )

Loading the player...

Abstract : We are interested in nonsmooth analysis of backpropagation as implemented in modern machine learning librairies, such as Tensorflow or Pytorch. First I will illustrate how blind application of
differential calculus to nonsmooth objects can be problematic, requiring a proper mathematical model.
Then I will introduce a weak notion of generalized derivative, named conservativity, and illustrate how it complies with calculus and optimization for well structured objects. We provide stability results for empirical risk minimization similar as in the smooth setting for the combination of nonsmooth automatic differentiation, minibatch stochastic approximation and first order optimization. This is joint work with Jérôme Bolte.

Keywords : optimization; non-smooth analysis; machine learning

MSC Codes :
65K05 - Mathematical programming methods
65K10 - Optimization and variational techniques
68T99 - None of the above but in this section

Additional resources :
https://www.cirm-math.fr/RepOrga/2133/Slides/presPauwels.pdf

    Information on the Video

    Film maker : Hennenfent, Guillaume
    Language : English
    Available date : 06/04/2020
    Conference Date : 09/03/2020
    Subseries : Research talks
    arXiv category : Optimization and Control ; Machine Learning
    Mathematical Area(s) : Control Theory & Optimization ; Mathematics in Science & Technology
    Format : MP4 (.mp4) - HD
    Video Time : 00:37:49
    Targeted Audience : Researchers
    Download : https://videos.cirm-math.fr/2020-03-09_Pauwels.mp4

Information on the Event

Event Title : Optimization for Machine Learning / Optimisation pour l'apprentissage automatique
Event Organizers : Boyer, Claire ; d'Aspremont, Alexandre ; Gramfort, Alexandre ; Salmon, Joseph ; Villar, Soledad
Dates : 09/03/2020 - 13/03/2020
Event Year : 2020
Event URL : https://conferences.cirm-math.fr/2133.html

Citation Data

DOI : 10.24350/CIRM.V.19623303
Cite this video as: Pauwels, Edouard (2020). What does back propagation compute?. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19623303
URI : http://dx.doi.org/10.24350/CIRM.V.19623303

See Also

Bibliography

  • ANIL, Cem, LUCAS, James, et GROSSE, Roger. Sorting out lipschitz function approximation. arXiv preprint arXiv:1811.05381, 2018. - https://arxiv.org/abs/1811.05381

  • DAVIS, Damek, DRUSVYATSKIY, Dmitriy, KAKADE, Sham, et al. Stochastic subgradient method converges on tame functions. Foundations of computational mathematics, 2018, p. 1-36. - https://doi.org/10.1007/s10208-018-09409-5

  • ABADI, Martín, BARHAM, Paul, CHEN, Jianmin, et al. Tensorflow: A system for large-scale machine learning. In : 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016. p. 265-283. - https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

  • GUIDE, A. Hitchhiker's. Infinite dimensional analysis. Springer, 2006. -

  • ATTOUCH, Hedy, GOUDOU, Xavier, et REDONT, Patrick. The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Communications in Contemporary Mathematics, 2000, vol. 2, no 01, p. 1-34. - https://doi.org/10.1142/S0219199700000025

  • AUBIN, Jean-Pierre et CELLINA, Arrigo. Differential inclusions: set-valued maps and viability theory. 1984. -

  • AUBIN, Jean-Pierre et FRANKOWSKA, Hélène. Set-valued analysis. Springer Science & Business Media, 2009. -

  • PASZKE, Adam, GROSS, Sam, CHINTALA, Soumith, et al. Automatic differentiation in pytorch. 2017. - https://dl.acm.org/doi/abs/10.5555/3122009.3242010

  • BENAÏM, Michel. Dynamics of stochastic approximation algorithms. In : Seminaire de probabilites XXXIII. Springer, Berlin, Heidelberg, 1999. p. 1-68. - http://dx.doi.org/10.1007/BFb0096509

  • BENAÏM, Michel, HOFBAUER, Josef, et SORIN, Sylvain. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 2005, vol. 44, no 1, p. 328-348. - https://doi.org/10.1137/S0363012904439301

  • BOLTE, Jérôme, DANIILIDIS, Aris, LEWIS, Adrian, et al. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 2007, vol. 18, no 2, p. 556-572. - https://doi.org/10.1137/060670080

  • BOLTE, Jérôme, SABACH, Shoham, et TEBOULLE, Marc. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 2014, vol. 146, no 1-2, p. 459-494. - https://doi.org/10.1007/s10107-013-0701-9

  • BORKAR, Vivek S. Stochastic approximation: a dynamical systems viewpoint. Springer, 2009. -

  • BORWEIN, Jonathan et LEWIS, Adrian S. Convex analysis and nonlinear optimization: theory and examples. Springer Science & Business Media, 2010. -

  • BORWEIN, Jonathan M. et MOORS, Warren B. Essentially smooth Lipschitz functions. Journal of functional analysis, 1997, vol. 149, no 2, p. 305-351. - https://doi.org/10.1006/jfan.1997.3101

  • BORWEIN, Jonathan M. et MOORS, Warren B. A chain rule for essentially smooth Lipschitz functions. SIAM Journal on Optimization, 1998, vol. 8, no 2, p. 300-308. - https://doi.org/10.1137/S1052623496297838

  • BORWEIN, Jonathan, MOORS, Warren, et WANG, Xianfu. Generalized subdifferentials: a Baire categorical approach. Transactions of the American Mathematical Society, 2001, vol. 353, no 10, p. 3875-3893. - http://dx.doi.org/10.1090/S0002-9947-01-02820-3

  • BOTTOU, Léon et BOUSQUET, Olivier. The tradeoffs of large scale learning. In : Advances in neural information processing systems. 2008. p. 161-168. - http://papers.nips.cc/paper/3323-the-tradeoffs-of-large-scale-learning

  • BOTTOU, Léon, CURTIS, Frank E., et NOCEDAL, Jorge. Optimization methods for large-scale machine learning. Siam Review, 2018, vol. 60, no 2, p. 223-311. - https://doi.org/10.1137/16M1080173

  • CASTERA, Camille, BOLTE, Jérôme, FÉVOTTE, Cédric, et al. An Inertial Newton Algorithm for Deep Learning. arXiv preprint arXiv:1905.12278, 2019. - https://arxiv.org/abs/1905.12278

  • CLARKE, Frank H. Optimization and nonsmooth analysis. Siam, 1990. -

  • CHIZAT, Lenaic et BACH, Francis. On the global convergence of gradient descent for over-parameterized models using optimal transport. In : Advances in neural information processing systems. 2018. p. 3036-3046. - http://papers.nips.cc/paper/7567-on-the-global-convergence-of-gradient-descent-for-over-parameterized-models-using-optimal-transport

  • CORLISS, George, FAURE, Christele, GRIEWANK, Andreas, et al. (ed.). Automatic differentiation of algorithms: from simulation to optimization. Springer Science & Business Media, 2002. -

  • CORREA, RAFAEL et JOFRE, A. Tangentially continuous directional derivatives in nonsmooth analysis. Journal of optimization theory and applications, 1989, vol. 61, no 1, p. 1-21. - https://doi.org/10.1007/BF00940840

  • COSTE, M. An introduction to o-minimal geometry, Inst. Rech. Math., Univ. de Rennes, 1999. -

  • DAVIS, Damek, DRUSVYATSKIY, Dmitriy, KAKADE, Sham, et al. Stochastic subgradient method converges on tame functions. Foundations of computational mathematics, 2018, p. 1-36. - https://doi.org/10.1007/s10208-018-09409-5

  • VAN DEN DRIES, Lou, MILLER, Chris, et al. Geometric categories and o-minimal structures. Duke Math. J, 1996, vol. 84, no 2, p. 497-540. -

  • EVANS, L. C. et GARIEPY, R. F. Measure Theory and Fine Properties of Functions, Revised edn. Textbooks in Mathematics. CRC Press, Boca Raton, 2015. -

  • GLOROT, Xavier, BORDES, Antoine, et BENGIO, Yoshua. Deep sparse rectifier neural networks. In : Proceedings of the fourteenth international conference on artificial intelligence and statistics. 2011. p. 315-323. - http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

  • GRIEWANK, Andreas et WALTHER, Andrea. Evaluating derivatives: principles and techniques of algorithmic differentiation. Siam, 2008. -

  • GRIEWANK, Andreas. On stable piecewise linearization and generalized algorithmic differentiation. Optimization Methods and Software, 2013, vol. 28, no 6, p. 1139-1178. - https://doi.org/10.1080/10556788.2013.796683

  • Griewank A., Walther A., Fiege S. and Bosse T. (2016).
    On Lipschitz optimization based on gray-box piecewise linearization.
    Mathematical Programming, 158(1-2), 383-415. - http://dx.doi.org/10.1007/s10107-015-0934-x

  • IOFFE, A. D. Nonsmooth analysis: differential calculus of nondifferentiable mappings. Transactions of the American Mathematical Society, 1981, vol. 266, no 1, p. 1-56. - http://dx.doi.org/10.1090/S0002-9947-1981-0613784-7

  • IOFFE, Alexander D. Variational analysis of regular mappings. Springer Monographs in Mathematics. Springer, Cham, 2017. - http://dx.doi.org/10.1007/978-3-319-64277-2

  • KAKADE, Sham M. et LEE, Jason D. Provably correct automatic sub-differentiation for qualified programs. In : Advances in neural information processing systems. 2018. p. 7125-7135. - http://papers.nips.cc/paper/7943-provably-correct-automatic-sub-differentiation-for-qualified-programs

  • KURDYKA, Krzysztof. On gradients of functions definable in o-minimal structures. In : Annales de l'institut Fourier. 1998. p. 769-783. - https://doi.org/10.5802/aif.1638

  • KURDYKA, Krzysztof, MOSTOWSKI, Tadeusz, et PARUSINSKI, Adam. Proof of the gradient conjecture of R. Thom. Annals of Mathematics, 2000, vol. 152, no 3, p. 763-792. - http://www.jstor.org/stable/2661354

  • KUSHNER, Harold et YIN, G. George. Stochastic approximation and recursive algorithms and applications. Springer Science & Business Media, 2003. -

  • LECUN, Yann, BENGIO, Yoshua, et HINTON, Geoffrey. Deep learning. nature, 2015, vol. 521, no 7553, p. 436-444. - https://doi.org/10.1038/nature14539

  • LJUNG, Lennart. Analysis of recursive stochastic algorithms. IEEE transactions on automatic control, 1977, vol. 22, no 4, p. 551-575. - https://doi.org/10.1109/TAC.1977.1101561

  • MAJEWSKI, Szymon, MIASOJEDOW, Błażej, et MOULINES, Eric. Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv preprint arXiv:1805.01916, 2018. - https://arxiv.org/abs/1805.01916

  • MOHAMMADI, Bijan et PIRONNEAU, Olivier. Applied shape optimization for fluids. Oxford university press, 2010. -

  • MOULINES, Eric et BACH, Francis R. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In : Advances in Neural Information Processing Systems. 2011. p. 451-459. - http://papers.nips.cc/paper/4316-non-asymptotic-analysis-of-stochastic-approximation-algorithms-for-machine

  • MOREAU, Jean Jacques. Fonctionnelles sous-différentiables. 1963. -

  • MORDUKHOVICH, Boris S. Variational analysis and generalized differentiation I: Basic theory. Springer Science & Business Media, 2006. -

  • PASZKE, Adam, GROSS, Sam, CHINTALA, Soumith, et al. Automatic differentiation in pytorch. 2017. - https://openreview.net/forum?id=BJJsrmfCZ

  • ROBBINS, Herbert et MONRO, Sutton. A stochastic approximation method. The annals of mathematical statistics, 1951, p. 400-407. - https://www.jstor.org/stable/2236626

  • ROCKAFELLAR, Ralph Tyrrell. Convex functions and dual extremum problems. 1963. Thèse de doctorat. Harvard University. -

  • ROCKAFELLAR, Ralph. On the maximal monotonicity of subdifferential mappings. Pacific Journal of Mathematics, 1970, vol. 33, no 1, p. 209-216. - http://dx.doi.org/10.2140/pjm.1970.33.209

  • ROCKAFELLAR, R. T. et WETS, R. J. B. Variational Analysis. 1998. -

  • RUMELHART, David E., HINTON, Geoffrey E., et WILLIAMS, Ronald J. Learning representations by back-propagating errors. nature, 1986, vol. 323, no 6088, p. 533-536. - https://doi.org/10.1038/323533a0

  • SPEELPENNING, Bert. Compiling fast partial derivatives of functions given by algorithms. Illinois Univ., Urbana (USA). Dept. of Computer Science, 1980. - https://doi.org/10.2172/5254402

  • THIBAULT, Lionel. On generalized differentials and subdifferentials of Lipschitz vector-valued functions. Nonlinear Analysis: Theory, Methods & Applications, 1982, vol. 6, no 10, p. 1037-1053. - https://doi.org/10.1016/0362-546X(82)90074-8

  • THIBAULT, Lionel et ZAGRODNY, Dariusz. Integration of subdifferentials of lower semicontinuous functions on Banach spaces. Journal of Mathematical Analysis and Applications, 1995, vol. 189, no 1, p. 33-58. - https://doi.org/10.1006/jmaa.1995.1003

  • THIBAULT, Lionel et ZLATEVA, Nadia. Integrability of subdifferentials of directionally Lipschitz functions. Proceedings of the American Mathematical Society, 2005, p. 2939-2948. - https://www.jstor.org/stable/4097908

  • VALADIER, Michel. Entrainement unilateral, lignes de descente, fonctions lipschitziennes non pathologiques. CRAS Paris, 1989, vol. 308, p. 241-244. -

  • WANG, Xianfu. Pathological Lipschitz function in R (n). 1995. Thèse de doctorat. Theses (Dept. of Mathematics and Statistics)/Simon Fraser University. -



Imagette Video

Bookmarks Report an error