The Sphere is Weakly Dense in the Ball

Another qual question, this one dealing with Hilbert spaces and the weak topology:

Exercise: Let \(\mathcal{H}\) be an infinite dimensional Hilbert space. Show that the unit sphere \(S := \{x\in\mathcal{H} : \|x\| = 1\}\) is weakly dense in the unit ball \(B := \{x\in\mathcal{H} : \|x\| \le 1\}\).

I find this to be a really surprising and counter-intuitive result. What this exercise asks us to prove is that every point inside of the unit ball in an infinite dimensional Hilbert space is, in some sense, arbitrarily close to the boundary of that ball. This is really striking to me—how can the center of a ball be really close to the boundary of that ball? Because this result is so unexpected, I think it is worth understanding. Indeed, the arguments presented below have helped me to build some better intuition about both infinite dimensional Hilbert spaces and the weak topology.


Since I am preparing for a qual, all of the following is (a) from memory, and (b) not mean to be entirely rigorous. Any errors, of which there are likely a few, are (obviously) my own.

Definitions: A vector space \(V\) over a field \(K\) (usually, we understand \(K\) to be either the real or complex numbers) is a set equipped with two algebraic operations—addition and scalar multiplication—that “get along” nicely.[1]

A normed vector space is a vector space \(V\) equipped with a norm \(\|\cdot\| : V\times V\to[0,\infty)\) which satisfies the properties \(\|v\| = 0\) if an only if \(v = 0\), \(\|\alpha v\| = |\alpha|\|v\|\) for all vectors \(v\) and scalars \(\alpha\), and the triangle inequality: \(\|u+v\| \le \|u\| + \|v\|\). A Banach space is a normed vector space for which the norm is complete, in the sense that if \((v_n)\) is a sequence of vectors such that \(\|v_m-v_n\|\) can be made arbitrarily small by choosing \(m\) and \(n\) large enough, then there is some \(v\in V\) such that \(\|v_n – v\|\) can be made arbitrarily small (i.e. Cauchy sequences in the space converge in the space).

An inner product on a vector space \(V\) is a function \(\langle\cdot,\cdot\rangle : V\times V\to K\) such that if \(u,v,w\in V\) and \(\alpha\in K\), then \(\langle u+v,w\rangle = \langle u,v\rangle + \langle w,v\rangle\), \(\langle \alpha u,v\rangle = \alpha\langle u,v\rangle\), \(\langle u,v\rangle = \overline{\langle v,u\rangle}\), and \(\langle v,v\rangle \ge 0\) with equality if and only if \(v = 0\). If \(u,v\in V\) with \(\langle u,v\rangle = 0\), then we say that \(u\) and \(v\) are orthogonal.[2] A vector space \(V\) equipped with an inner product is called an inner product space. Note that the inner product induces a norm as follows: \(\|v\| = \sqrt{\langle v,v\rangle}\). If \(V\) is a Banach space under this norm, then we call \(V\) a Hilbert space.

A bounded linear functional on a vector space \(V\) is a function \(f : V\to \mathbb{R}\) such that for any \(u,v\in V\) and \(\alpha \in K\), we have \(f(u+v) = f(u)+f(v)\) and \(f(\alpha u) = \alpha f(u)\). The (operator) norm of a functional is given by \(\|f\|_{op} := \sup_{\|v\|=1} \|f(v)\|\). If this supremum is finite, we say that \(f\) is a bounded linear functional. The collection of all bounded linear functionals on \(V\) is called the dual space of \(V\), and is usually denoted \(V^*\).

We now require a more general version of the Pythagorean theorem, and a couple of useful inequalities:

Theorem (Pythagorean Theorem): Let \(\mathcal{H}\) be a Hilbert space, and suppose that \(\{\xi_i\}_{i=1}^{n}\subset\mathcal{H}\) is a collection of mutually orthogonal vectors (that is, \(\langle \xi_i,\xi_j\rangle = 0\) whenever \(j\ne i\)). Then \[\left\|\sum_{i=1}^{n} \xi_i\right\|^2 = \sum_{i=1}^{n} \|\xi_i\|^2.\]

Proof: The proof is a computation: \[ \left\|\sum_{i=1}^{n} \xi_i\right\|^2 = \left\langle \sum_{i=1}^{n} \xi_i,\sum_{i=1}^{n} \xi_i\right\rangle = \sum_{i,j=1}^{n} \langle \xi_i,\xi_j\rangle.\] Since the vectors are assumed to be mutually orthogonal, \(\langle\xi_i,\xi_j\rangle\) is zero when \(i\ne j\), and so the final sum becomes \[ \sum_{i,j=1}^{n} \langle \xi_i,\xi_j\rangle = \sum_{i=1}^{n} \langle\xi_i,\xi_i\rangle = \sum_{i=1}^{n} \|\xi_i\|^2, \] which completes the proof.

Theorem (Cauchy-Schwartz Inequality): For any \(\xi\) and \(\eta\) in a Hilbert space, \(|\langle \xi,\eta\rangle| \le \|\xi\|\|\eta\|\), with equality if and only if \(\xi\) and \(\eta\) are linearly dependent (i.e. there is some scalar \(\lambda\) such that \(\xi = \lambda\eta\)).

Proof: If \(\langle \xi,\eta\rangle = 0\), there is nothing to prove. Otherwise, let \[ \lambda = \text{sgn}\langle \xi,\eta\rangle = \frac{\langle \xi,\eta\rangle}{|\langle \xi,\eta\rangle|}, \] and set \(\zeta = \lambda\eta\). Note that that \(|\lambda| = 1\), and so \(\|\eta\| = \|\zeta\|\), and that \[ \langle \xi,\zeta \rangle
= \overline{\lambda} \langle \xi,\eta\rangle
= \frac{\langle \eta,\xi\rangle}{|\langle \xi,\eta\rangle|}\langle \xi,\eta\rangle
= |\langle \xi,\eta\rangle|
= \frac{\langle \xi,\eta\rangle}{|\langle \xi,\eta\rangle|}\langle \eta,\xi\rangle
= \lambda\langle\eta,\xi\rangle
= \langle\zeta,\xi\rangle.\] From this, it follows that for any real number \(t\), we then have \[0
\le \|\xi + t\zeta\|^2
= \langle \xi-t\zeta,\xi-t\zeta\rangle
= \langle \xi,\xi\rangle – 2t|\langle \xi,\eta\rangle| + t^2\langle \zeta,\zeta\rangle
= \|\xi\|^2 – 2t|\langle \xi,\eta\rangle| + t^2\|\eta\|^2.\] This is a quadratic function in \(t\), which we seek to minimize. From the first derivative test, we see that the vertex of the described parabola occurs when \(t = |\langle\xi,\eta\rangle|\|\eta\|^{-2},\) and we see from the second derivative test that this point must be the global minimum. Substituting this into the previous equation for \(t\), we obtain \[0
\le \|\xi\|^2 – 2t|\langle \xi,\eta\rangle| + t^2\|\eta\|^2
= \|\xi\|^2 – 2|\langle\xi,\eta\rangle|^2\|\eta\|^{-2} + |\langle\xi,\eta\rangle|\|\eta\|^{-2}
= \|\xi\|^2 – |\langle\xi,\eta\rangle|^2\|\eta\|^{-2}. \] Canceling \(\|\eta\|^{-2}\), which is nonzero by the hypothesis that \(\langle \xi,\eta\rangle \ne 0\), we obtain \[ 0 \le \|\xi\|^2\|\eta\|^2 – |\langle\xi,\eta\rangle|^2,\] which is the desired result. Note also that equality holds if and only if \(0 = \|\xi-t\zeta\| = \|\xi – \lambda t \eta\|\), in which case \(\xi\) and \(\eta\) are linearly dependent.

Theorem (Bessel’s Inequality): Let \(\{\eta_\alpha\}_{\alpha\in A}\) be an orthonormal set[3] in a Hilbert space \(\mathcal{H}\). Then for any \(\xi \in \mathcal{H}\), \[ \sum_{\alpha\in A} |\langle \xi,\eta_\alpha\rangle|^2 \le \|\xi\|^2. \]

Proof: It is sufficient to show that the result holds for any finite \(F\subseteq A\). For any such set, we have \[\begin{align*} 0
&\le \left\| \xi – \sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle \eta_{\alpha}\right\|^2 \\
&= \left\langle \xi – \sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle\eta_{\alpha}, \xi – \sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle\eta_{\alpha} \right\rangle \\
&= \|\xi\|^2 – 2\Re\left\langle \xi,\sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle \eta_{\alpha}\right\rangle+ \left\|\sum_{\alpha\in F} \langle\xi,\eta_{\alpha}\rangle\eta_{\alpha} \right\|^2 \\
&= \|\xi\|^2 – 2 \sum_{\alpha\in F} |\langle \xi,\eta_{\alpha}\rangle|^2 + \sum_{\alpha\in F} |\langle \xi,\eta_{\alpha}\rangle|^2,

where the middle term results from the conjugate linearity of the inner product, and the third term follows from the Pythagorean theorem (since the \(\eta_{\alpha}\) are all mutually orthogonal). But then we have \[ 0 \le \|\xi\|^2 – \sum_{\alpha\in F} |\langle \xi,\eta_{\alpha}\rangle|^2, \] which is exactly the desired result.

There is one other very powerful result that we are going to need:

Theorem (Reisz Representation Theorem): If \(\mathcal{H}\) is a Hilbert space and \(f\in\mathcal{H}^*\), then there is a unique element \(\eta\in\mathcal{H}\) such that for all \(\xi\in\mathcal{H}\) we have \(f(\xi) = \langle \xi,\eta\rangle\). Moreover, the map \(f\mapsto\eta\) is an isometry, i.e. \(\|f\|_{op} = \|y\|\).

Intuitively, this result tells us that Hilbert spaces are self-dual, and that we may simply assume that the space of bounded linear functionals on a Hilbert space is, in fact, that Hilbert space. The proof relies on somewhat technical lemma stating that a Hilbert space can be decomposed into mutually orthogonal subspaces, and is quite beyond the scope of what seems reasonable for a qualifying exam, so I won’t give it here.

It should also be noted that the Gram-Schmidt process from undergraduate linear analysis generalizes quite nicely to Hilbert spaces. Given any sequence in a Hilbert space, there is a technique for obtaining an orthonormal sequence with the same “span” (where “span” has a suitably generalized meaning in infinite dimensional Hilbert spaces). While the most general case relies on the axiom of choice, the punchline is that Hilbert spaces contain orthonormal sequences having the same cardinality as the dimension of the space.

Finally, with one more definition, we can get to the main result:

Definition: Let \(\{\xi_n\}\) be a sequence in a Hilbert space \(\mathcal{H}\). We say that \(\xi_n\) converges to \(\xi\) weakly in \(\mathcal{H}\) if for every \(f\in\mathcal{H}^*\) the sequence \(\{f(\xi_n)\}\) converges to \(f(\xi)\) in \(\mathbb{R}\). We denote this by \(\xi_n \rightharpoonup \xi\).

The Exercise

Recall the goal:

Exercise: Let \(\mathcal{H}\) be an infinite dimensional Hilbert space. Show that the unit sphere \(S := \{x\in\mathcal{H} : \|x\| = 1\}\) is weakly dense in the unit ball \(B := \{x\in\mathcal{H} : \|x\| \le 1\}\).

We first prove the following lemma:

Lemma: Every orthonormal sequence in an infinite dimensional Hilbert space converges weakly to zero.

Proof: Let \(\{\eta_n\}\) be an orthonormal sequence in an infinite dimensional Hilbert space \(\mathcal{H}\), and let \(f\in\mathcal{H}^*\) be arbitrary. By the Reisz representation theorem, there exists some \(\xi\in\mathcal{H}\) such that \(f(\eta_n) = \langle \xi,\eta_n\rangle\) for all \(n\). By Bessel’s inequality, we have \[ \sum_{n=1}^{\infty} |\langle \xi,\eta_n\rangle|^2 \le \|\xi\|^2. \] Since \(\|\xi\|^2 < \infty\), the sum converges. In particular, this implies that \(\langle \xi,\eta_n\rangle \to 0\) as \(n\to \infty\). But this is exactly the desired result.

Now we can finally give a solution to the exercise:

Proof: Let \(\xi\in B\). We want to show that there is a sequence of elements \(\{\xi_n\}\subseteq B\) that converges weakly to \(\xi\). Let \(\{\eta_n\}\) be an orthonormal sequence in \(\{\xi\}^{\perp}\).[4] For each \(n\), define \[ \xi_n := \lambda \eta_n + \xi, \] where \(\lambda\) is chose so that \(\|\xi_n\| = 1\). Assuming that such a choice is possible, we would obtain from the Pythagorean theorem that \[ 1 = \|\xi_n\|^2 = \lambda^2\|\eta_n\|^2 + \|\xi\|^2. \] Then \[\lambda = \sqrt{1-\|\xi\|^2}.\] Since \(\|\xi\|^2 \le 1\), it is clear that \(\lambda\) is a nonnegative real number, hence such a choice of \(\lambda\) is clearly possible.

Because I think it is a little unclear, let me reiterate what I have done: I now have a sequence \(\{\xi_n\}\) which lives on the unit sphere, where each \(\xi_n = \xi + \lambda\eta_n\) and \(\lambda\) is a fixed constant. We claim that \(\xi_n \rightharpoonup \xi\).

To see this, note that for any \(f\in\mathcal{H}^*\), we have \[ f(\xi_n) = f(\xi + \lambda\eta_n) = f(\xi) + \lambda f(\eta_n). \] By the lemma, \(f(\eta_n)\to 0\) as \(n\to \infty\), and so we have \(f(\xi_n) \to f(\xi)\) as \(n\to \infty\). But then \(\xi_n\rightharpoonup \xi\), which completes the proof.


By “getting along”, we mean that we can add elements of the set (called vectors) and the addition behaves like addition should (it is associative and commutative), we can multiply elements of the set by elements of the field (scalars) and the multiplication of scalars behaves like multiplication should (it is also associative and commutative), and the scalar multiplication distributes over the vector addition. There are also some other technical requirements, such as the need for an identity element in the space. The easiest non-trivial example is the space of vectors in \(\mathbb{R}^2\). These can be written as tuples \((x,y)\) with \(x,y\in\mathbb{R}\). Vector addition is component-wise, and scalar multiplication works like \(\alpha(x,y) = (\alpha x, \alpha y)\).
Once again, \(\mathbb{R}^2\) provides a good example, with the inner product given by \(\langle (x_1,y_1), (x_2,y_2)\rangle = x_1x_2 + y_1y_2\). The usual notion of orthogonal vectors is recovered in this space.
That is, \(\langle \eta_\alpha,\eta_\beta\rangle = 0\) for any \(\alpha\ne\beta\) and \(\|\eta_\alpha\| = 1\) for all \(\alpha\).
That such a sequence exists is nonobvious, but follows from the fact that the span of \(\xi\) is a closed subspace of \(\mathcal{H}\) and so has orthogonal complement of infinite dimension. This orthogonal complement is also a closed subspace of \(\mathcal{H}\), and so has a (possibly infinite) orthonormal basis. We can select an orthonormal sequence from that basis.
This entry was posted in Mathematics and tagged , . Bookmark the permalink.