Leveraging Augmented-Lagrangian Techniques for Differentiating Over Infeasible Quadratic Programs in Machine Learning

Inria - Département d’Informatique de l'École normale supérieure, PSL Research University, France.

Abstract

Optimization layers within neural network architectures have become increasingly popular for their ability to solve a wide range of machine learning tasks and to model domain-specific knowledge. However, designing optimization layers requires careful consideration as the underlying optimization problems might be infeasible during training. Motivated by applications in learning, control and robotics, this work focuses on convex quadratic programming (QP) layers. The specific structure of this type of optimization layer can be efficiently exploited for faster computations while still allowing rich modeling capabilities. We leverage primal-dual augmented Lagrangian techniques for computing derivatives of both feasible and infeasible QP solutions. More precisely, we propose a unified approach that tackles the differentiability of the closest feasible QP solutions in a classical ℓ2 sense. We then harness this approach to enrich the expressive capabilities of existing QP layers. More precisely, we show how differentiating through infeasible QPs during training enables to drive towards feasibility at test time a new range of QP layers. These layers notably demonstrate superior predictive performance in some conventional learning tasks. Additionally, we present alternative formulations that enhance numerical robustness, speed, and accuracy for training such layers. Along with these contributions, we provide an open-source C++ software package called QPLayer for differentiating feasible and infeasible convex QPs and which can be interfaced with modern learning frameworks.
QPLayer

Results

Our backward mode differentiation of convex QP layers has been implemented in C++. We refer to it as QPLayer in what follows. Our code leverages the primal-dual augmented Lagrangian solver ProxQP (Bambade et al., 2022), also written in C++ as its internal QP solver. This section illustrates through a classic Sudoku learning task that QPLayer allows relaxing primal feasibility constraints, thereby enabling the training of simplified layers.

Sudoku Plot

Figure 1: Deep learning paradigm for Sudoku solving task.

Loss Plot

Figure 2: Test MSE loss of QPLayer, OptNet, QPLayer-learn A, and OptNet-learn A specialized for learning A. It includes Sudoku Ax = 1 violation.

Error Plot

Figure 3: Test prediction errors over 1000 puzzles of OptNet, QPLayer, QPLayer-learn A and OptNet-learn A specialized for learning A.

Related Links

This work relies on Proxsuite and has been integrated as PyTorch Module with Proxsuite.

Here is a code example showing how to easily use QPLayer in PyTorch:

QPLayer

BibTeX

@article{bambade2024augmented,
        author    = {Bambade, Antoine and Schramm, Fabian and Taylor, Adrien and Carpentier, Justin},
        title     = {Leveraging Augmented-Lagrangian Techniques for Differentiating Over Infeasible Quadratic Programs in Machine Learning},
        journal   = {Twelfth International Conference on Learning Representations (ICLR)},
        year      = {2024},
}