(Communicated by Liqun Qi)
Abstract: We study four interior continuous trajectories defined by matrix differential equations, aiming at developing new interior point methods for convex semidefinite programming (SDP). By assuming the boundedness of the level set, we establish the optimality and convergence of the first and the second tra- jectories for linear SDP. For the convex case, we show that, starting from any interior feasible point, the third trajectory converges to an optimal solution that has the maximal rank among all optimal solutions, under the assumptions that an optimal solution exists and the maximal rank of optimal solutions is one. Finally, we obtain the strongest result under the weakest assumption for the fourth trajectory, namely, by only assuming the existence of an optimal solution, we show that the trajectory converges to an optimal solution that has the maximal rank among all optimal solutions.
This paper is dedicated to Professor Masao Fukushima in celebration of his seventy-fifth birthday. Masao is a funding co-editor of Pacific Journal of Optimization. Under his leadership, the journal has become a flagship journal of the Pacific Optimization Research Activity Group and an important publication platform for optimization researchers in the world. We have known Masao since 1990s and he has been a long-term colleague and close friend of us. We wish him the best at this glorious moment of his life.
(Communicated by Liqun Qi)
In this paper, we consider a broad class of optimization problems with objective functions that are the sum of a smooth function and a possibly nonsmooth difference of convex (DC) functions. We develop an inexact regularized proximal DC Newton-type method that combines a DC algorithm with the proximal Newton-type method. We prove the global convergence of the method and demonstrate its effectiveness through numerical experiments on large-scale data sets.
(Communicated by Liqun Qi)
In this paper, we investigate the problem of finding a sparse least squares solution to tensor equations, where the tensor involved need not to be a square tensor. Specificly, let $\mathscr{A}$ be an $m$-th order $l\times n\times \cdots\times n$-dimensional tensor, i.e., $\mathscr{A}=(a_{i_1i_2\cdots i_m})$ with $a_{i_1i_2\cdots i_m}\in \mathbb{R}$ for any $i_1\in \{1,2,\ldots,l\}$ and $i_j\in \{1,2,\ldots,n\}$ with $j\in \{2,3,\ldots,m\}$, and $b\in \mathbb{R}^l$ be a given vector, we consider the following model:
min $f(x)=\frac{1}{2}\|\mathscr{A} x^{m-1}-b\|^2\\$
s.t. $\|x\|_0\le k$, (1)
where $\mathscr{A}x^{m-1}\in \mathbb{R}^l$ is defined by
$(\mathscr{A}x^{m-1})_i=\sum_{i_2,\ldots,i_m\in\{1,2,\ldots,n\}}a_{ii_2\cdots i_m}x_{i_2}\cdots x_{i_m},\quad\forall i\in \{1,2,\ldots,l\}.$
When $m=2$, the problem (1) comes down to the widely studied sparse linear least squares problem. The problem (1) is a sparse least squares optimization model for solving tensor equations:
$\mathscr{A} x^{m-1}=b$. (2)
When $l=n$, the problem (2) comes down to the widely studied tensor equations.
By splitting the invloved tensor, we propose two linearized methods described as natural thresholding and natural thresholding pursuit for solving (1), which are extensions of two natural thresholding algorithms proposed recently by Zhao and Luo for solving sparse linear least squares problem. We show the global linear convergence of the proposed method under two assumptions. The first assumption is that the majorization matrix $M$ of the tensor $\mathscr{A}$ satisfies the restricted isometry property, which is commonly used in the field of sparse optimization. The second assumption is that the function $h(x)=\mathscr{A} x^{m-1}-Mx^{[m-1]}$ is the sparse $(m-1)$-power Lipschitz continuous with respect to a solution of (2), which is trivially true when the model we consider returns to the case of compressed sensing. Preliminary numerical results illustrate that the proposed algorithms are effective.
(Communicated by Prof. Liqun Qi)
In this paper, we propose a new semidefinite relaxation-based branch-and-bound algorithm for a class of complex quadratic programming (CQP) problems with nonconvex constraints on modulus and phase difference. We derive a sufficient condition for the tightness of the semidefinite relaxation and estimate the gap between the nonconvex constraints and their convex hulls. The numerical results show that the proposed algorithm outperforms the well-known commercial solver Gurobi and an existing global algorithm based on semidefinite relaxation that utilizes the structure of phase constraints when applied to the typical applications of CQP problems, such as discrete and virtual beamforming problems.
Dedicated to Prof. Masao Fukushima for celebrating his 75th birthday.
(Communicated by Liqun Qi)
Second-order necessary or sufficient optimality conditions for nonlinear programming are usually defined by means of the positive (semi-)definiteness of a quadratic form, or a maximum of quadratic forms, over the critical cone. However, algorithms for finding such second-order stationary points are currently unknown. Typically, an algorithm with a second-order property approximates a primal-dual point such that the Hessian of the Lagrangian evaluated at the limit point is, under a constraint qualification, positive semidefinite over the lineality space of the critical cone. This is in general a proper subset of the critical cone, unless one assumes strict complementarity, which gives a weaker necessary optimality condition. In this paper, a new strong sequential optimality condition is suggested and analyzed. Based on this, we propose a penalty algorithm which, under certain conditions, is able to achieve second-order stationarity with positive semidefiniteness over the whole critical cone, which corresponds to a strong necessary optimality condition. Although the algorithm we propose is somewhat of a theoretical nature, our analysis provides a general framework for obtaining strong second-order stationarity.
We dedicate this paper to Professor Masao Fukushima on the occasion of his 75th birthday to express our deep gratitude and send him the best wishes for joy, happiness, and health.
(Communicated by Michel Théra)
We consider optimization problems with objective and constraint being the difference of convex and weakly convex functions. This framework covers a vast family of nonsmooth and nonconvex optimization problems, particularly those involving certain classes of composite and nonconvex value functions. We investigate several stationary conditions and extend the proximal bundle algorithm of [van Ackooij et al., Comput. Optim. Appl., 78 (2021), pp. 451–490 ] to compute critical points for problems in this class. Our modifications on that algorithm boil down to a different master program and an original rule to update the proximal parameter to ensure convergence in this more general setting. Thanks to this new rule, no pre-estimation of the underlying weakly-convex moduli is needed, opening the way to deal with optimization problems for which no practical and mathematically sound algorithms exist. Numerical experiments on some nonconvex stochastic problems illustrate the practical performance of the method.
Dedicated to Professor Masao Fukushima.
(Communicated by Jie Sun)
Riemannian stochastic gradient descent (RSGD) is the most basic Riemannian stochastic optimization algorithm and is used in many applications of machine learning.
Riemannian stochastic optimization problem is defined as follows.Given a data point $z$ in data domain $Z$, a Riemannian stochastic optimization problem provides a smooth loss function, $\ell(\cdot;z):M \rightarrow \mathbb{R}$. We minimize the expected loss $f:M \rightarrow \mathbb{R}$ defined by
$f(x) := \mathbb{E}_{z \sim \mathcal{D}}\left[\ell(x;z)\right] = \mathbb{E}\left[\ell_\xi(x)\right]$ (1)
where $\mathcal{D}$ is a probability distribution over $Z$, $\xi$ denotes a random variable with distribution function $P$, and $\mathbb{E}[\cdot]$ denotes the expectation taken with respect to $\xi$. We assume that an SFO exists such that, for a given $x \in M$, it returns the stochastic gradient $\mathsf{G}_\xi(x)$ of function $f$ defined by (1), where a random variable $\xi$ is supported on $\Xi$ independently of $x$. Moreover, we define two standard conditions. The standard conditions (C1) and (C2) are assumed in the our convergence analyses:
(C1) Let $(x_k)_{k=0}^{\infty} \subset M$ be the sequence generated by the algorithm. For each iteration $k$,
$\mathbb{E}_{\xi_k}[\mathsf{G}_{\xi_k}(x_k)]=\mathrm{grad}f(x_k)$,
where $\xi_0, \xi_1, \cdots$ are independent samples, and the random variable $\xi_k$ is independent of $(x_i)_{i=0}^k$. There exists a nonnegative constant $\sigma^2$ such that
$\mathbb{E}_{\xi_k}[\|\mathsf{G}_{\xi_k}(x_k)] -\mathrm{grad}f(x_k)\| ^2_{x_k}]\leq \sigma^2$,
(C2) For each iteration $k$, the optimizer samples a batch $B_k$ of size $b$ independently of $k$ and estimates the full gradient grad$f$ as
$\mathrm{grad}f_{B_k}(x_k) := \frac{1}{b}\sum_{i \in [b]}\mathsf{G}_{\xi_{k,i}}(x_k)$,
where $\xi_{k,i}$ is a random variable generated by the $i$-th sampling in the $k$-th iteration.
A complete simply connected Riemannian manifold of a nonpositive sectional curvature is called a Hadamard manifold. Let $M$ be a Hadamard manifold and $f : M \rightarrow \mathbb{R}$ be a smooth function on $M$. For a positive number $L > 0$, $f$ is said to be geodesically $L$-smooth if for any $x, y \in M$,
$\|{\mathrm{grad}f(x) - \Gamma_y^x(\mathrm{grad}f(y))}\|_x \leq L \|{\mathrm{Exp}^{-1}_x(y)}\|_x$,
where $\Gamma_y^x : T_yM \to T_xM$ is the parallel transport from $y$ to $x$.
In this paper, we use RSGD with a variable batch, as shown in Algorithm 1.
Algorithm 1 Riemannian stochastic gradient descent. |
Require Initial point $x_0 \in M$, step sizes $(\alpha_k)_{k=0}^{\infty} \subset
\mathbb{R}_{++}$, batch size $b \in \mathbb{N}$. |
Ensure Sequence $(x_k)_{k=0}^{\infty} \subset M$. |
1: $k \leftarrow 0$. |
2: loop |
3: $\eta_k := - \mathrm{grad}f_{B_k} (x_k) = -b^{-1}\sum_{i \in [b]}\mathsf{G}_{\xi_{k,i}}(x_k)$. |
4: $x_{k+1} :=\mathrm{Exp}_{x_k}(\alpha_k\eta_k)$. |
5: $k \leftarrow k + 1$. |
6: end loop |
First, we consider the case in which an objective function $f:M\to\mathbb{R}$ is $L$-smooth. Since calculating the constant $L$ in the definition of $L$-smooth is often difficult, we also present convergence analyses for the function $f:M \to \mathbb{R}$ not $L$-smooth.
We show that the number of steps $K$ needed for an $\varepsilon$-approximation of RSGD is convex and monotone decreasing with respect to the batch size $b$. We also show that stochastic first-order oracle (SFO) complexity (defined as $Kb$) is convex with respect to $b$ and that there exists a critical batch size such that SFO complexity is minimized.
Finally, we perform numerical experiments to support our theoretical findings. We considered the Riemannian centroid problem of a set of symmetric positive-definite (SPD) matrices $\{A_i\}_{i=0}^n$. The loss function $f:\mathcal{S}^d_{++}\to\mathbb{R}$ can be expressed as
$f(M) := \frac{1}{N}\sum_{i=0}^N\|{\log\left(A_i^{-\frac{1}{2}}MA_i^{-\frac{1}{2}}\right)}\|_F^2$,
where $\mathcal{S}^d_{++}$ denotes the set of $d \times d$ SPD matrices and $\|{\cdot}\|_F$ is the Frobenius norm.
Application of RSGD with several batch sizes to a Riemannian stochastic optimization problem on an SPD manifold theoretically shows that increasing the batch size improves RSGD performance. A numerical evaluation of the relationship between batch size and RSGD performance provides evidence supporting the theoretical results.
(Communicated by Liqun Qi)
Minimax problem has an important role in optimization, global optimization, game theory, and operations research. In [1], optimality conditions have been formulated for the maxmin problem. In a general case, since the maxmin and minimax values are not always equal, therefore, the optimality conditions for the both problems might be different. The classical minimax theorem of von Neuman [2] deals with the equality conditions of maxmin and minimax values. In this paper, we derive new optimality conditions for the minimax problem based on Duvobizkii-Milyution theory (Duvobizkii and Milyuton in USSR Comput Math Math Phys 5:1-80, 1965).
References
[1] Enkhbat Rentsen and Enkhbayar Jamsranjav: A Note on Maxmin Problem, Optimization letters, 13, 475-483, 2019.
[2] von Neumann, J.: Zur Theorie der Gessellschaftspiele, Math.Ann. 100, 295-320 (1928)
The paper is dedicated to Professor Masao Fukushima’s 75th birthday. I have known Professor Masao Fukushima since 2001 when I first met him at Kyoto University during my JSPS fellowship period. I was invited twice by him to give talks at the seminars of his department. During 2005-2009, my former master student Mend-Amar Majig from the National University of Mongolia did his Ph.D. under the supervision of Prof. M. Fukushima. We had written joint research papers (M. Majig, B.Barsbold, R.Enkhbat and M.Fukushima, A Global Optimization Approach for Solving Non-Monotone Variational Inequality Problems, Optimization, Vol.58, No.7, pp.871-881, 2009; M.Mend-Amar, R.Enkhbat and M.Fukushima, Evolutionary Algorithm for Generalized Nash Equilibrium Problems, in book: “Optimization, Simulation, and Control”,pp.97-106, Springer, 2013). In 2007, he was a plenary speaker at the second international conference on optimization and optimal control held in Ulaanbaatar, Mongolia. In 2017, he was invited by the School of Mathematics and Computer Sciences of the National University of Mongolia for our joint research. We thank very much Prof. M. Fukushima for his fruitful collaboration with his Mongolian colleagues and congratulate him on the occasion of his 75th birthday, and wish him all the success and happiness.
(Communicated by Liqun Qi)
We consider the following constrained minimization problem:
$\left\{\begin{array}{l} \mathrm{minimize}~f(x)\\\mathrm{subject~to}~\langle g^i, x\rangle \leq p_i, i=1,\ldots, l,\\~~~~~~~~~~~~~~~~~~x\in\mathbb{R}^n \end{array}\right. $ (1)
where $g^i \in\mathbb{R}^n,~p_i\in\mathbb{R},~i=1\ldots,l$, $\langle \cdot, \cdot \rangle$ is an inner product in $\mathbb{R}^n$ and the objective function $f$ is a continuous piecewise linear function represented as a difference of two convex polyhedral (DP) functions as follows:
$f(x) = f_1(x) - f_2(x),$
and
$f_1(x) = \max_{i \in I_1} ~\Big[\langle a^{1i}, x\rangle - b_{1i}\Big],f_2(x) = \max_{j \in I_2} ~\Big[\langle a^{2j}, x\rangle - b_{2j}\Big]$.
Here $I_k=\{1,\ldots,l_k\},~l_k \geq 1,~k = 1,2,~a^{i1}, a^{2j}\in\mathbb{R}^n,~b_{i1},~b_{2j}\in\mathbb{R},~i \in I_1,~j \in I_2$. The class of DP functions is a subclass of difference of convex (DC) functions. The problem (1) is called the difference of polyhedral optimization problem.
First, we consider the unconstrained DP optimization problem when $l=0$ in (1). We study necessary and sufficient local and global optimality conditions for this problem using codifferentials of functions $f_1$ and $f_2$. Furthermore, applying these codifferentials we demonstrate how to calculate local descent directions at non-stationary points and global descent directions at local minimizers of the function $f$ which are not global minimizers. Finally, these directions are used to design exact and approximate local and global search methods for solving the unconstrained DP optimization problems. We prove finite convergence of these algorithms.
Next, we consider the constrained DP optimization problem (1) and extend results from the unconstrained case to the constrained case. More specifically, using codifferentials of the functions $f_1$ and $f_2$ we formulate local and global optimality conditions for the problem (1), design algorithms based on these conditions and prove finite convergence of these algorithms.
We present illustrative examples to demonstrate the performance of algorithms developed in this paper by considering both bounded and unbounded cases. The proposed global search algorithms contain criteria to identify boundedness and unboundedness of unconstrained and constrained DP optimization problems.
(Communicated by Liqun Qi)
A multi-objective proximal gradient method was proposed recently as a suitable descent method for composite multi-objective optimization problems. However, the method solves subproblems using only Euclidean distances, and it requires the differentiable part of each objective to have a Lipschitz continuous gradient, which limits its application. In this paper, we propose an extension of this method, by using Bregman distances and requiring a less demanding assumption called relative smoothness. We also consider two stepsize strategies: the constant stepsize and the backtracking procedure. In both cases, we prove global convergence in the sense of Pareto stationarity and analyze the convergence rate through a merit function.