QPLayer

Abstract

Optimization layers within neural network architectures have become increasingly popular for their ability to solve a wide range of machine learning tasks and to model domain-specific knowledge. However, designing optimization layers requires careful consideration as the underlying optimization problems might be infeasible during training. Motivated by applications in learning, control and robotics, this work focuses on convex quadratic programming (QP) layers. The specific structure of this type of optimization layer can be efficiently exploited for faster computations while still allowing rich modeling capabilities. We leverage primal-dual augmented Lagrangian techniques for computing derivatives of both feasible and infeasible QP solutions. More precisely, we propose a unified approach that tackles the differentiability of the closest feasible QP solutions in a classical ℓ2 sense. We then harness this approach to enrich the expressive capabilities of existing QP layers. More precisely, we show how differentiating through infeasible QPs during training enables to drive towards feasibility at test time a new range of QP layers. These layers notably demonstrate superior predictive performance in some conventional learning tasks. Additionally, we present alternative formulations that enhance numerical robustness, speed, and accuracy for training such layers. Along with these contributions, we provide an open-source C++ software package called QPLayer for differentiating feasible and infeasible convex QPs and which can be interfaced with modern learning frameworks.

Results

Our backward mode differentiation of convex QP layers has been implemented in C++. We refer to it as QPLayer in what follows. Our code leverages the primal-dual augmented Lagrangian solver ProxQP (Bambade et al., 2022), also written in C++ as its internal QP solver. This section illustrates through a classic Sudoku learning task that QPLayer allows relaxing primal feasibility constraints, thereby enabling the training of simplified layers.

Figure 1: Deep learning paradigm for Sudoku solving task.

Figure 2: Test MSE loss of QPLayer, OptNet, QPLayer-learn A, and OptNet-learn A specialized for learning A. It includes Sudoku Ax = 1 violation.

Figure 3: Test prediction errors over 1000 puzzles of OptNet, QPLayer, QPLayer-learn A and OptNet-learn A specialized for learning A.

BibTeX

@article{bambade2024augmented,
        author    = {Bambade, Antoine and Schramm, Fabian and Taylor, Adrien and Carpentier, Justin},
        title     = {Leveraging Augmented-Lagrangian Techniques for Differentiating Over Infeasible Quadratic Programs in Machine Learning},
        journal   = {Twelfth International Conference on Learning Representations (ICLR)},
        year      = {2024},
}

Leveraging Augmented-Lagrangian Techniques for Differentiating Over Infeasible Quadratic Programs in Machine Learning

Abstract

Results

Related Links

BibTeX