CIRM - Videos & books Library - What does back propagation compute?

Multi angle

Authors : Pauwels, Edouard (Author of the conference)
CIRM (Publisher )

Loading the player...

Abstract : We are interested in nonsmooth analysis of backpropagation as implemented in modern machine learning librairies, such as Tensorflow or Pytorch. First I will illustrate how blind application of
differential calculus to nonsmooth objects can be problematic, requiring a proper mathematical model.
Then I will introduce a weak notion of generalized derivative, named conservativity, and illustrate how it complies with calculus and optimization for well structured objects. We provide stability results for empirical risk minimization similar as in the smooth setting for the combination of nonsmooth automatic differentiation, minibatch stochastic approximation and first order optimization. This is joint work with Jérôme Bolte.

Keywords : optimization; non-smooth analysis; machine learning

MSC Codes :
65K05 - Mathematical programming methods
65K10 - Optimization and variational techniques
68T99 - None of the above but in this section

Additional resources :
https://www.cirm-math.fr/RepOrga/2133/Slides/presPauwels.pdf

Information on the Video

Film maker :

Hennenfent, Guillaume

Language :

Available date :

Conference Date :

Subseries :

arXiv category :

Mathematical Area(s) :

Format :

Video Time :

Targeted Audience :

Download :

https://videos.cirm-math.fr/2020-03-09_Pauwels.mp4

Information on the Event

Event Title : Optimization for Machine Learning / Optimisation pour l'apprentissage automatique
Event Organizers : Boyer, Claire ; d'Aspremont, Alexandre ; Gramfort, Alexandre ; Salmon, Joseph ; Villar, Soledad
Dates : 09/03/2020 - 13/03/2020
Event Year : 2020
Event URL : https://conferences.cirm-math.fr/2133.html

Citation Data

DOI : 10.24350/CIRM.V.19623303
Cite this video as: Pauwels, Edouard (2020). What does back propagation compute?. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19623303
URI : http://dx.doi.org/10.24350/CIRM.V.19623303

[Multi angle] Rank optimality for the Burer-Monteiro factorization / Author of the conference Waldspurger, Irène.
[Multi angle] Bridging the gap between optimal transport and MMD Sinkhorn divergences / Author of the conference Genevay, Aude.
[Post-edited] Learning with differentiable perturbed optimizers / Author of the conference Berthet, Quentin.
[Multi angle] Gradient descent for wide two-layer neural networks / Author of the conference Bach, Francis.

Bibliography

ANIL, Cem, LUCAS, James, et GROSSE, Roger. Sorting out lipschitz function approximation. arXiv preprint arXiv:1811.05381, 2018. - https://arxiv.org/abs/1811.05381

DAVIS, Damek, DRUSVYATSKIY, Dmitriy, KAKADE, Sham, et al. Stochastic subgradient method converges on tame functions. Foundations of computational mathematics, 2018, p. 1-36. - https://doi.org/10.1007/s10208-018-09409-5

ABADI, Martín, BARHAM, Paul, CHEN, Jianmin, et al. Tensorflow: A system for large-scale machine learning. In : 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016. p. 265-283. - https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

GUIDE, A. Hitchhiker's. Infinite dimensional analysis. Springer, 2006. -

ATTOUCH, Hedy, GOUDOU, Xavier, et REDONT, Patrick. The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Communications in Contemporary Mathematics, 2000, vol. 2, no 01, p. 1-34. - https://doi.org/10.1142/S0219199700000025

AUBIN, Jean-Pierre et CELLINA, Arrigo. Differential inclusions: set-valued maps and viability theory. 1984. -

AUBIN, Jean-Pierre et FRANKOWSKA, Hélène. Set-valued analysis. Springer Science & Business Media, 2009. -

PASZKE, Adam, GROSS, Sam, CHINTALA, Soumith, et al. Automatic differentiation in pytorch. 2017. - https://dl.acm.org/doi/abs/10.5555/3122009.3242010

BENAÏM, Michel. Dynamics of stochastic approximation algorithms. In : Seminaire de probabilites XXXIII. Springer, Berlin, Heidelberg, 1999. p. 1-68. - http://dx.doi.org/10.1007/BFb0096509

BENAÏM, Michel, HOFBAUER, Josef, et SORIN, Sylvain. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 2005, vol. 44, no 1, p. 328-348. - https://doi.org/10.1137/S0363012904439301

BOLTE, Jérôme, DANIILIDIS, Aris, LEWIS, Adrian, et al. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 2007, vol. 18, no 2, p. 556-572. - https://doi.org/10.1137/060670080

BOLTE, Jérôme, SABACH, Shoham, et TEBOULLE, Marc. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 2014, vol. 146, no 1-2, p. 459-494. - https://doi.org/10.1007/s10107-013-0701-9

BORKAR, Vivek S. Stochastic approximation: a dynamical systems viewpoint. Springer, 2009. -

BORWEIN, Jonathan et LEWIS, Adrian S. Convex analysis and nonlinear optimization: theory and examples. Springer Science & Business Media, 2010. -

BORWEIN, Jonathan M. et MOORS, Warren B. Essentially smooth Lipschitz functions. Journal of functional analysis, 1997, vol. 149, no 2, p. 305-351. - https://doi.org/10.1006/jfan.1997.3101

BORWEIN, Jonathan M. et MOORS, Warren B. A chain rule for essentially smooth Lipschitz functions. SIAM Journal on Optimization, 1998, vol. 8, no 2, p. 300-308. - https://doi.org/10.1137/S1052623496297838

BORWEIN, Jonathan, MOORS, Warren, et WANG, Xianfu. Generalized subdifferentials: a Baire categorical approach. Transactions of the American Mathematical Society, 2001, vol. 353, no 10, p. 3875-3893. - http://dx.doi.org/10.1090/S0002-9947-01-02820-3

BOTTOU, Léon et BOUSQUET, Olivier. The tradeoffs of large scale learning. In : Advances in neural information processing systems. 2008. p. 161-168. - http://papers.nips.cc/paper/3323-the-tradeoffs-of-large-scale-learning

BOTTOU, Léon, CURTIS, Frank E., et NOCEDAL, Jorge. Optimization methods for large-scale machine learning. Siam Review, 2018, vol. 60, no 2, p. 223-311. - https://doi.org/10.1137/16M1080173

CASTERA, Camille, BOLTE, Jérôme, FÉVOTTE, Cédric, et al. An Inertial Newton Algorithm for Deep Learning. arXiv preprint arXiv:1905.12278, 2019. - https://arxiv.org/abs/1905.12278

CLARKE, Frank H. Optimization and nonsmooth analysis. Siam, 1990. -

CHIZAT, Lenaic et BACH, Francis. On the global convergence of gradient descent for over-parameterized models using optimal transport. In : Advances in neural information processing systems. 2018. p. 3036-3046. - http://papers.nips.cc/paper/7567-on-the-global-convergence-of-gradient-descent-for-over-parameterized-models-using-optimal-transport

CORLISS, George, FAURE, Christele, GRIEWANK, Andreas, et al. (ed.). Automatic differentiation of algorithms: from simulation to optimization. Springer Science & Business Media, 2002. -

CORREA, RAFAEL et JOFRE, A. Tangentially continuous directional derivatives in nonsmooth analysis. Journal of optimization theory and applications, 1989, vol. 61, no 1, p. 1-21. - https://doi.org/10.1007/BF00940840

COSTE, M. An introduction to o-minimal geometry, Inst. Rech. Math., Univ. de Rennes, 1999. -

VAN DEN DRIES, Lou, MILLER, Chris, et al. Geometric categories and o-minimal structures. Duke Math. J, 1996, vol. 84, no 2, p. 497-540. -

EVANS, L. C. et GARIEPY, R. F. Measure Theory and Fine Properties of Functions, Revised edn. Textbooks in Mathematics. CRC Press, Boca Raton, 2015. -

GLOROT, Xavier, BORDES, Antoine, et BENGIO, Yoshua. Deep sparse rectifier neural networks. In : Proceedings of the fourteenth international conference on artificial intelligence and statistics. 2011. p. 315-323. - http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

GRIEWANK, Andreas et WALTHER, Andrea. Evaluating derivatives: principles and techniques of algorithmic differentiation. Siam, 2008. -

GRIEWANK, Andreas. On stable piecewise linearization and generalized algorithmic differentiation. Optimization Methods and Software, 2013, vol. 28, no 6, p. 1139-1178. - https://doi.org/10.1080/10556788.2013.796683

Griewank A., Walther A., Fiege S. and Bosse T. (2016).
On Lipschitz optimization based on gray-box piecewise linearization.
Mathematical Programming, 158(1-2), 383-415. - http://dx.doi.org/10.1007/s10107-015-0934-x

IOFFE, A. D. Nonsmooth analysis: differential calculus of nondifferentiable mappings. Transactions of the American Mathematical Society, 1981, vol. 266, no 1, p. 1-56. - http://dx.doi.org/10.1090/S0002-9947-1981-0613784-7

IOFFE, Alexander D. Variational analysis of regular mappings. Springer Monographs in Mathematics. Springer, Cham, 2017. - http://dx.doi.org/10.1007/978-3-319-64277-2

KAKADE, Sham M. et LEE, Jason D. Provably correct automatic sub-differentiation for qualified programs. In : Advances in neural information processing systems. 2018. p. 7125-7135. - http://papers.nips.cc/paper/7943-provably-correct-automatic-sub-differentiation-for-qualified-programs

KURDYKA, Krzysztof. On gradients of functions definable in o-minimal structures. In : Annales de l'institut Fourier. 1998. p. 769-783. - https://doi.org/10.5802/aif.1638

KURDYKA, Krzysztof, MOSTOWSKI, Tadeusz, et PARUSINSKI, Adam. Proof of the gradient conjecture of R. Thom. Annals of Mathematics, 2000, vol. 152, no 3, p. 763-792. - http://www.jstor.org/stable/2661354

KUSHNER, Harold et YIN, G. George. Stochastic approximation and recursive algorithms and applications. Springer Science & Business Media, 2003. -

LECUN, Yann, BENGIO, Yoshua, et HINTON, Geoffrey. Deep learning. nature, 2015, vol. 521, no 7553, p. 436-444. - https://doi.org/10.1038/nature14539

LJUNG, Lennart. Analysis of recursive stochastic algorithms. IEEE transactions on automatic control, 1977, vol. 22, no 4, p. 551-575. - https://doi.org/10.1109/TAC.1977.1101561

MAJEWSKI, Szymon, MIASOJEDOW, Błażej, et MOULINES, Eric. Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv preprint arXiv:1805.01916, 2018. - https://arxiv.org/abs/1805.01916

MOHAMMADI, Bijan et PIRONNEAU, Olivier. Applied shape optimization for fluids. Oxford university press, 2010. -

MOULINES, Eric et BACH, Francis R. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In : Advances in Neural Information Processing Systems. 2011. p. 451-459. - http://papers.nips.cc/paper/4316-non-asymptotic-analysis-of-stochastic-approximation-algorithms-for-machine

MOREAU, Jean Jacques. Fonctionnelles sous-différentiables. 1963. -

MORDUKHOVICH, Boris S. Variational analysis and generalized differentiation I: Basic theory. Springer Science & Business Media, 2006. -

PASZKE, Adam, GROSS, Sam, CHINTALA, Soumith, et al. Automatic differentiation in pytorch. 2017. - https://openreview.net/forum?id=BJJsrmfCZ

ROBBINS, Herbert et MONRO, Sutton. A stochastic approximation method. The annals of mathematical statistics, 1951, p. 400-407. - https://www.jstor.org/stable/2236626

ROCKAFELLAR, Ralph Tyrrell. Convex functions and dual extremum problems. 1963. Thèse de doctorat. Harvard University. -

ROCKAFELLAR, Ralph. On the maximal monotonicity of subdifferential mappings. Pacific Journal of Mathematics, 1970, vol. 33, no 1, p. 209-216. - http://dx.doi.org/10.2140/pjm.1970.33.209

ROCKAFELLAR, R. T. et WETS, R. J. B. Variational Analysis. 1998. -

RUMELHART, David E., HINTON, Geoffrey E., et WILLIAMS, Ronald J. Learning representations by back-propagating errors. nature, 1986, vol. 323, no 6088, p. 533-536. - https://doi.org/10.1038/323533a0

SPEELPENNING, Bert. Compiling fast partial derivatives of functions given by algorithms. Illinois Univ., Urbana (USA). Dept. of Computer Science, 1980. - https://doi.org/10.2172/5254402

THIBAULT, Lionel. On generalized differentials and subdifferentials of Lipschitz vector-valued functions. Nonlinear Analysis: Theory, Methods & Applications, 1982, vol. 6, no 10, p. 1037-1053. - https://doi.org/10.1016/0362-546X(82)90074-8

THIBAULT, Lionel et ZAGRODNY, Dariusz. Integration of subdifferentials of lower semicontinuous functions on Banach spaces. Journal of Mathematical Analysis and Applications, 1995, vol. 189, no 1, p. 33-58. - https://doi.org/10.1006/jmaa.1995.1003

THIBAULT, Lionel et ZLATEVA, Nadia. Integrability of subdifferentials of directionally Lipschitz functions. Proceedings of the American Mathematical Society, 2005, p. 2939-2948. - https://www.jstor.org/stable/4097908

VALADIER, Michel. Entrainement unilateral, lignes de descente, fonctions lipschitziennes non pathologiques. CRAS Paris, 1989, vol. 308, p. 241-244. -

WANG, Xianfu. Pathological Lipschitz function in R (n). 1995. Thèse de doctorat. Theses (Dept. of Mathematics and Statistics)/Simon Fraser University. -

image

Bookmarks Report an error

TRUSTEES

INSTITUTIONAL PARTNERS

Destination de la recherche

Raccourcis

See Also