Generalization of the Legendre transformation
In mathematics and mathematical optimization , the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation , Fenchel transformation , or Fenchel conjugate (after Adrien-Marie Legendre and Werner Fenchel ). The convex conjugate is widely used for constructing the dual problem in optimization theory , thus generalizing Lagrangian duality .
Let
X
{\displaystyle X}
be a real topological vector space and let
X
∗
{\displaystyle X^{*}}
be the dual space to
X
{\displaystyle X}
. Denote by
⟨
⋅
,
⋅
⟩
:
X
∗
×
X
→
R
{\displaystyle \langle \cdot ,\cdot \rangle :X^{*}\times X\to \mathbb {R} }
the canonical dual pairing , which is defined by
⟨
x
∗
,
x
⟩
↦
x
∗
(
x
)
.
{\displaystyle \left\langle x^{*},x\right\rangle \mapsto x^{*}(x).}
For a function
f
:
X
→
R
∪
{
−
∞
,
+
∞
}
{\displaystyle f:X\to \mathbb {R} \cup \{-\infty ,+\infty \}}
taking values on the extended real number line , its convex conjugate is the function
f
∗
:
X
∗
→
R
∪
{
−
∞
,
+
∞
}
{\displaystyle f^{*}:X^{*}\to \mathbb {R} \cup \{-\infty ,+\infty \}}
whose value at
x
∗
∈
X
∗
{\displaystyle x^{*}\in X^{*}}
is defined to be the supremum :
f
∗
(
x
∗
)
:=
sup
{
⟨
x
∗
,
x
⟩
−
f
(
x
)
:
x
∈
X
}
,
{\displaystyle f^{*}\left(x^{*}\right):=\sup \left\{\left\langle x^{*},x\right\rangle -f(x)~\colon ~x\in X\right\},}
or, equivalently, in terms of the infimum :
f
∗
(
x
∗
)
:=
−
inf
{
f
(
x
)
−
⟨
x
∗
,
x
⟩
:
x
∈
X
}
.
{\displaystyle f^{*}\left(x^{*}\right):=-\inf \left\{f(x)-\left\langle x^{*},x\right\rangle ~\colon ~x\in X\right\}.}
This definition can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes .[ 1]
For more examples, see § Table of selected convex conjugates .
The convex conjugate of an affine function
f
(
x
)
=
⟨
a
,
x
⟩
−
b
{\displaystyle f(x)=\left\langle a,x\right\rangle -b}
is
f
∗
(
x
∗
)
=
{
b
,
x
∗
=
a
+
∞
,
x
∗
≠
a
.
{\displaystyle f^{*}\left(x^{*}\right)={\begin{cases}b,&x^{*}=a\\+\infty ,&x^{*}\neq a.\end{cases}}}
The convex conjugate of a power function
f
(
x
)
=
1
p
|
x
|
p
,
1
<
p
<
∞
{\displaystyle f(x)={\frac {1}{p}}|x|^{p},1<p<\infty }
is
f
∗
(
x
∗
)
=
1
q
|
x
∗
|
q
,
1
<
q
<
∞
,
where
1
p
+
1
q
=
1.
{\displaystyle f^{*}\left(x^{*}\right)={\frac {1}{q}}|x^{*}|^{q},1<q<\infty ,{\text{where}}{\tfrac {1}{p}}+{\tfrac {1}{q}}=1.}
The convex conjugate of the absolute value function
f
(
x
)
=
|
x
|
{\displaystyle f(x)=\left|x\right|}
is
f
∗
(
x
∗
)
=
{
0
,
|
x
∗
|
≤
1
∞
,
|
x
∗
|
>
1.
{\displaystyle f^{*}\left(x^{*}\right)={\begin{cases}0,&\left|x^{*}\right|\leq 1\\\infty ,&\left|x^{*}\right|>1.\end{cases}}}
The convex conjugate of the exponential function
f
(
x
)
=
e
x
{\displaystyle f(x)=e^{x}}
is
f
∗
(
x
∗
)
=
{
x
∗
ln
x
∗
−
x
∗
,
x
∗
>
0
0
,
x
∗
=
0
∞
,
x
∗
<
0.
{\displaystyle f^{*}\left(x^{*}\right)={\begin{cases}x^{*}\ln x^{*}-x^{*},&x^{*}>0\\0,&x^{*}=0\\\infty ,&x^{*}<0.\end{cases}}}
The convex conjugate and Legendre transform of the exponential function agree except that the ___domain of the convex conjugate is strictly larger as the Legendre transform is only defined for positive real numbers.
Connection with expected shortfall (average value at risk)[ edit ]
See this article for example.
Let F denote a cumulative distribution function of a random variable X . Then (integrating by parts),
f
(
x
)
:=
∫
−
∞
x
F
(
u
)
d
u
=
E
[
max
(
0
,
x
−
X
)
]
=
x
−
E
[
min
(
x
,
X
)
]
{\displaystyle f(x):=\int _{-\infty }^{x}F(u)\,du=\operatorname {E} \left[\max(0,x-X)\right]=x-\operatorname {E} \left[\min(x,X)\right]}
has the convex conjugate
f
∗
(
p
)
=
∫
0
p
F
−
1
(
q
)
d
q
=
(
p
−
1
)
F
−
1
(
p
)
+
E
[
min
(
F
−
1
(
p
)
,
X
)
]
=
p
F
−
1
(
p
)
−
E
[
max
(
0
,
F
−
1
(
p
)
−
X
)
]
.
{\displaystyle f^{*}(p)=\int _{0}^{p}F^{-1}(q)\,dq=(p-1)F^{-1}(p)+\operatorname {E} \left[\min(F^{-1}(p),X)\right]=pF^{-1}(p)-\operatorname {E} \left[\max(0,F^{-1}(p)-X)\right].}
A particular interpretation has the transform
f
inc
(
x
)
:=
arg
sup
t
t
⋅
x
−
∫
0
1
max
{
t
−
f
(
u
)
,
0
}
d
u
,
{\displaystyle f^{\text{inc}}(x):=\arg \sup _{t}t\cdot x-\int _{0}^{1}\max\{t-f(u),0\}\,du,}
as this is a nondecreasing rearrangement of the initial function f ; in particular,
f
inc
=
f
{\displaystyle f^{\text{inc}}=f}
for f nondecreasing.
The convex conjugate of a closed convex function is again a closed convex function. The convex conjugate of a polyhedral convex function (a convex function with polyhedral epigraph ) is again a polyhedral convex function.
Declare that
f
≤
g
{\displaystyle f\leq g}
if and only if
f
(
x
)
≤
g
(
x
)
{\displaystyle f(x)\leq g(x)}
for all
x
.
{\displaystyle x.}
Then convex-conjugation is order-reversing , which by definition means that if
f
≤
g
{\displaystyle f\leq g}
then
f
∗
≥
g
∗
.
{\displaystyle f^{*}\geq g^{*}.}
For a family of functions
(
f
α
)
α
{\displaystyle \left(f_{\alpha }\right)_{\alpha }}
it follows from the fact that supremums may be interchanged that
(
inf
α
f
α
)
∗
(
x
∗
)
=
sup
α
f
α
∗
(
x
∗
)
,
{\displaystyle \left(\inf _{\alpha }f_{\alpha }\right)^{*}(x^{*})=\sup _{\alpha }f_{\alpha }^{*}(x^{*}),}
and from the max–min inequality that
(
sup
α
f
α
)
∗
(
x
∗
)
≤
inf
α
f
α
∗
(
x
∗
)
.
{\displaystyle \left(\sup _{\alpha }f_{\alpha }\right)^{*}(x^{*})\leq \inf _{\alpha }f_{\alpha }^{*}(x^{*}).}
The convex conjugate of a function is always lower semi-continuous . The biconjugate
f
∗
∗
{\displaystyle f^{**}}
(the convex conjugate of the convex conjugate) is also the closed convex hull , i.e. the largest lower semi-continuous convex function with
f
∗
∗
≤
f
.
{\displaystyle f^{**}\leq f.}
For proper functions
f
,
{\displaystyle f,}
f
=
f
∗
∗
{\displaystyle f=f^{**}}
if and only if
f
{\displaystyle f}
is convex and lower semi-continuous, by the Fenchel–Moreau theorem .
Fenchel's inequality[ edit ]
For any function f and its convex conjugate f * , Fenchel's inequality (also known as the Fenchel–Young inequality ) holds for every
x
∈
X
{\displaystyle x\in X}
and
p
∈
X
∗
{\displaystyle p\in X^{*}}
:
⟨
p
,
x
⟩
≤
f
(
x
)
+
f
∗
(
p
)
.
{\displaystyle \left\langle p,x\right\rangle \leq f(x)+f^{*}(p).}
Furthermore, the equality holds only when
p
∈
∂
f
(
x
)
{\displaystyle p\in \partial f(x)}
.
The proof follows from the definition of convex conjugate:
f
∗
(
p
)
=
sup
x
~
{
⟨
p
,
x
~
⟩
−
f
(
x
~
)
}
≥
⟨
p
,
x
⟩
−
f
(
x
)
.
{\displaystyle f^{*}(p)=\sup _{\tilde {x}}\left\{\langle p,{\tilde {x}}\rangle -f({\tilde {x}})\right\}\geq \langle p,x\rangle -f(x).}
For two functions
f
0
{\displaystyle f_{0}}
and
f
1
{\displaystyle f_{1}}
and a number
0
≤
λ
≤
1
{\displaystyle 0\leq \lambda \leq 1}
the convexity relation
(
(
1
−
λ
)
f
0
+
λ
f
1
)
∗
≤
(
1
−
λ
)
f
0
∗
+
λ
f
1
∗
{\displaystyle \left((1-\lambda )f_{0}+\lambda f_{1}\right)^{*}\leq (1-\lambda )f_{0}^{*}+\lambda f_{1}^{*}}
holds. The
∗
{\displaystyle {*}}
operation is a convex mapping itself.
Infimal convolution [ edit ]
The infimal convolution (or epi-sum) of two functions
f
{\displaystyle f}
and
g
{\displaystyle g}
is defined as
(
f
◻
g
)
(
x
)
=
inf
{
f
(
x
−
y
)
+
g
(
y
)
∣
y
∈
R
n
}
.
{\displaystyle \left(f\operatorname {\Box } g\right)(x)=\inf \left\{f(x-y)+g(y)\mid y\in \mathbb {R} ^{n}\right\}.}
Let
f
1
,
…
,
f
m
{\displaystyle f_{1},\ldots ,f_{m}}
be proper , convex and lower semicontinuous functions on
R
n
.
{\displaystyle \mathbb {R} ^{n}.}
Then the infimal convolution is convex and lower semicontinuous (but not necessarily proper),[ 2] and satisfies
(
f
1
◻
⋯
◻
f
m
)
∗
=
f
1
∗
+
⋯
+
f
m
∗
.
{\displaystyle \left(f_{1}\operatorname {\Box } \cdots \operatorname {\Box } f_{m}\right)^{*}=f_{1}^{*}+\cdots +f_{m}^{*}.}
The infimal convolution of two functions has a geometric interpretation: The (strict) epigraph of the infimal convolution of two functions is the Minkowski sum of the (strict) epigraphs of those functions.[ 3]
Maximizing argument [ edit ]
If the function
f
{\displaystyle f}
is differentiable, then its derivative is the maximizing argument in the computation of the convex conjugate:
f
′
(
x
)
=
x
∗
(
x
)
:=
arg
sup
x
∗
⟨
x
,
x
∗
⟩
−
f
∗
(
x
∗
)
{\displaystyle f^{\prime }(x)=x^{*}(x):=\arg \sup _{x^{*}}{\langle x,x^{*}\rangle }-f^{*}\left(x^{*}\right)}
and
f
∗
′
(
x
∗
)
=
x
(
x
∗
)
:=
arg
sup
x
⟨
x
,
x
∗
⟩
−
f
(
x
)
;
{\displaystyle f^{{*}\prime }\left(x^{*}\right)=x\left(x^{*}\right):=\arg \sup _{x}{\langle x,x^{*}\rangle }-f(x);}
hence
x
=
∇
f
∗
(
∇
f
(
x
)
)
,
{\displaystyle x=\nabla f^{*}\left(\nabla f(x)\right),}
x
∗
=
∇
f
(
∇
f
∗
(
x
∗
)
)
,
{\displaystyle x^{*}=\nabla f\left(\nabla f^{*}\left(x^{*}\right)\right),}
and moreover
f
′
′
(
x
)
⋅
f
∗
′
′
(
x
∗
(
x
)
)
=
1
,
{\displaystyle f^{\prime \prime }(x)\cdot f^{{*}\prime \prime }\left(x^{*}(x)\right)=1,}
f
∗
′
′
(
x
∗
)
⋅
f
′
′
(
x
(
x
∗
)
)
=
1.
{\displaystyle f^{{*}\prime \prime }\left(x^{*}\right)\cdot f^{\prime \prime }\left(x(x^{*})\right)=1.}
If for some
γ
>
0
,
{\displaystyle \gamma >0,}
g
(
x
)
=
α
+
β
x
+
γ
⋅
f
(
λ
x
+
δ
)
{\displaystyle g(x)=\alpha +\beta x+\gamma \cdot f\left(\lambda x+\delta \right)}
, then
g
∗
(
x
∗
)
=
−
α
−
δ
x
∗
−
β
λ
+
γ
⋅
f
∗
(
x
∗
−
β
λ
γ
)
.
{\displaystyle g^{*}\left(x^{*}\right)=-\alpha -\delta {\frac {x^{*}-\beta }{\lambda }}+\gamma \cdot f^{*}\left({\frac {x^{*}-\beta }{\lambda \gamma }}\right).}
Let
A
:
X
→
Y
{\displaystyle A:X\to Y}
be a bounded linear operator . For any convex function
f
{\displaystyle f}
on
X
,
{\displaystyle X,}
(
A
f
)
∗
=
f
∗
A
∗
{\displaystyle \left(Af\right)^{*}=f^{*}A^{*}}
where
(
A
f
)
(
y
)
=
inf
{
f
(
x
)
:
x
∈
X
,
A
x
=
y
}
{\displaystyle (Af)(y)=\inf\{f(x):x\in X,Ax=y\}}
is the preimage of
f
{\displaystyle f}
with respect to
A
{\displaystyle A}
and
A
∗
{\displaystyle A^{*}}
is the adjoint operator of
A
.
{\displaystyle A.}
[ 4]
A closed convex function
f
{\displaystyle f}
is symmetric with respect to a given set
G
{\displaystyle G}
of orthogonal linear transformations ,
f
(
A
x
)
=
f
(
x
)
{\displaystyle f(Ax)=f(x)}
for all
x
{\displaystyle x}
and all
A
∈
G
{\displaystyle A\in G}
if and only if its convex conjugate
f
∗
{\displaystyle f^{*}}
is symmetric with respect to
G
.
{\displaystyle G.}
Table of selected convex conjugates [ edit ]
The following table provides Legendre transforms for many common functions as well as a few useful properties.[ 5]
g
(
x
)
{\displaystyle g(x)}
dom
(
g
)
{\displaystyle \operatorname {dom} (g)}
g
∗
(
x
∗
)
{\displaystyle g^{*}(x^{*})}
dom
(
g
∗
)
{\displaystyle \operatorname {dom} (g^{*})}
f
(
a
x
)
{\displaystyle f(ax)}
(where
a
≠
0
{\displaystyle a\neq 0}
)
X
{\displaystyle X}
f
∗
(
x
∗
a
)
{\displaystyle f^{*}\left({\frac {x^{*}}{a}}\right)}
X
∗
{\displaystyle X^{*}}
f
(
x
+
b
)
{\displaystyle f(x+b)}
X
{\displaystyle X}
f
∗
(
x
∗
)
−
⟨
b
,
x
∗
⟩
{\displaystyle f^{*}(x^{*})-\langle b,x^{*}\rangle }
X
∗
{\displaystyle X^{*}}
a
f
(
x
)
{\displaystyle af(x)}
(where
a
>
0
{\displaystyle a>0}
)
X
{\displaystyle X}
a
f
∗
(
x
∗
a
)
{\displaystyle af^{*}\left({\frac {x^{*}}{a}}\right)}
X
∗
{\displaystyle X^{*}}
α
+
β
x
+
γ
⋅
f
(
λ
x
+
δ
)
{\displaystyle \alpha +\beta x+\gamma \cdot f(\lambda x+\delta )}
X
{\displaystyle X}
−
α
−
δ
x
∗
−
β
λ
+
γ
⋅
f
∗
(
x
∗
−
β
γ
λ
)
(
γ
>
0
)
{\displaystyle -\alpha -\delta {\frac {x^{*}-\beta }{\lambda }}+\gamma \cdot f^{*}\left({\frac {x^{*}-\beta }{\gamma \lambda }}\right)\quad (\gamma >0)}
X
∗
{\displaystyle X^{*}}
|
x
|
p
p
{\displaystyle {\frac {|x|^{p}}{p}}}
(where
p
>
1
{\displaystyle p>1}
)
R
{\displaystyle \mathbb {R} }
|
x
∗
|
q
q
{\displaystyle {\frac {|x^{*}|^{q}}{q}}}
(where
1
p
+
1
q
=
1
{\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}
)
R
{\displaystyle \mathbb {R} }
−
x
p
p
{\displaystyle {\frac {-x^{p}}{p}}}
(where
0
<
p
<
1
{\displaystyle 0<p<1}
)
R
+
{\displaystyle \mathbb {R} _{+}}
−
(
−
x
∗
)
q
q
{\displaystyle {\frac {-(-x^{*})^{q}}{q}}}
(where
1
p
+
1
q
=
1
{\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}
)
R
−
−
{\displaystyle \mathbb {R} _{--}}
1
+
x
2
{\displaystyle {\sqrt {1+x^{2}}}}
R
{\displaystyle \mathbb {R} }
−
1
−
(
x
∗
)
2
{\displaystyle -{\sqrt {1-(x^{*})^{2}}}}
[
−
1
,
1
]
{\displaystyle [-1,1]}
−
log
(
x
)
{\displaystyle -\log(x)}
R
+
+
{\displaystyle \mathbb {R} _{++}}
−
(
1
+
log
(
−
x
∗
)
)
{\displaystyle -(1+\log(-x^{*}))}
R
−
−
{\displaystyle \mathbb {R} _{--}}
e
x
{\displaystyle e^{x}}
R
{\displaystyle \mathbb {R} }
{
x
∗
log
(
x
∗
)
−
x
∗
if
x
∗
>
0
0
if
x
∗
=
0
{\displaystyle {\begin{cases}x^{*}\log(x^{*})-x^{*}&{\text{if }}x^{*}>0\\0&{\text{if }}x^{*}=0\end{cases}}}
R
+
{\displaystyle \mathbb {R} _{+}}
log
(
1
+
e
x
)
{\displaystyle \log \left(1+e^{x}\right)}
R
{\displaystyle \mathbb {R} }
{
x
∗
log
(
x
∗
)
+
(
1
−
x
∗
)
log
(
1
−
x
∗
)
if
0
<
x
∗
<
1
0
if
x
∗
=
0
,
1
{\displaystyle {\begin{cases}x^{*}\log(x^{*})+(1-x^{*})\log(1-x^{*})&{\text{if }}0<x^{*}<1\\0&{\text{if }}x^{*}=0,1\end{cases}}}
[
0
,
1
]
{\displaystyle [0,1]}
−
log
(
1
−
e
x
)
{\displaystyle -\log \left(1-e^{x}\right)}
R
−
−
{\displaystyle \mathbb {R} _{--}}
{
x
∗
log
(
x
∗
)
−
(
1
+
x
∗
)
log
(
1
+
x
∗
)
if
x
∗
>
0
0
if
x
∗
=
0
{\displaystyle {\begin{cases}x^{*}\log(x^{*})-(1+x^{*})\log(1+x^{*})&{\text{if }}x^{*}>0\\0&{\text{if }}x^{*}=0\end{cases}}}
R
+
{\displaystyle \mathbb {R} _{+}}
^ "Legendre Transform" . Retrieved April 14, 2019 .
^ Phelps, Robert (1993). Convex Functions, Monotone Operators and Differentiability (2 ed.). Springer. p. 42 . ISBN 0-387-56715-1 .
^ Bauschke, Heinz H.; Goebel, Rafal; Lucet, Yves; Wang, Xianfu (2008). "The Proximal Average: Basic Theory". SIAM Journal on Optimization . 19 (2): 766. CiteSeerX 10.1.1.546.4270 . doi :10.1137/070687542 .
^ Ioffe, A.D. and Tichomirov, V.M. (1979), Theorie der Extremalaufgaben . Deutscher Verlag der Wissenschaften . Satz 3.4.3
^ Borwein, Jonathan ; Lewis, Adrian (2006). Convex Analysis and Nonlinear Optimization: Theory and Examples (2 ed.). Springer. pp. 50 –51. ISBN 978-0-387-29570-1 .