**Theorem:** Given any two angles \(\alpha, \beta\), we have the identities \[\cos(\alpha + \beta) = \cos(\alpha)\cos(\beta) – \sin(\alpha)\sin(\beta) \quad\text{and}\quad \sin(\alpha + \beta) = \cos(\alpha)\sin(\beta) + \sin(\alpha)\cos(\beta).\]

*Proof:* It is possible to obtain angle addition formulæ for both sines and cosines with one picture. The argument is as follows: first, we need to angles to add together, say \(\alpha\) and \(\beta\). We are going to draw them so that \(\alpha\), \(\beta\), and \(\alpha+\beta\) all live in quadrant I, though the argument can be generalized to other quadrants without too much additional work (exercise: do this).

Note that the point \(B\) is the point on the unit circle where the ray emanating from the origin with angle \(\alpha+\beta\) intersects, hence \[B = (\cos(\alpha+\beta),\sin(\alpha+\beta));\] we’ll come back to this later. Next, we can construct a line through \(B\) that is perpendicular to the ray emanating from the origin with angle \(\alpha\)—call the intersection \(C\).

Similarly, construct a segment perpendicular to the \(x\)-axis through \(C\), and a segment perpendicular to the \(y\)-axis through \(B\)—these are the points \(D\) and \(E\), respectively.

We finish the figure by extending the segments \(EB\) and \(DC\) until they intersect at a point \(F\). Note that we now have a rectangle \(ODFE\).

The line \(OB\) is transverse to the parallel lines \(EF\) and \(OD\) (since these are opposite sides of a rectangle), thus \(m\angle EBO = m\angle COD = \alpha+\beta\). Moreover, since \[m\angle FCB + m\angle BCO + m\angle OCD = \pi,\] it follows that \(m\angle FCB = \alpha\). These are labeled in the next figure.

Recall that \(B = (\cos(\alpha+\beta),\sin(\alpha+\beta))\). This gives us the lengths of the segments \(OE\) and \(EB\).

Via some right triangle trig identities on \(\triangle OCB\) (in particular, the identities given by the mnemonic “SOH CAH TOA”), we have that \(OC = \cos(\beta)\) and \(BC = \sin(\beta)\).

Using the same trig identities, this time applied to \(\triangle ODC\), we have that \(OD = \cos(\alpha)\sin(\beta)\), and \(CD = \sin(\alpha)\cos(\beta)\).

Finally, repeating the same argument on \(\triangle BFC\), we have \(BF = \sin(\alpha)\sin(\beta)\) and \(CF = \cos(\alpha)\sin(\beta)\).

But this figure essentially completes the proof. Since opposite sides of a rectangle have the same length, the vertical sides give us \[\sin(\alpha+\beta) = \cos(\alpha)\sin(\beta) + \sin(\alpha)\cos(\beta),\] and the horizontal sides render \[\cos(\alpha+\beta) + \sin(\alpha)\sin(\beta) = \cos(\alpha)\cos(\beta).\] Note that, after moving the \(\sin(\alpha)\sin(\beta)\) term of the second identity to the right-hand side, these are the proposed angle addition formulæ.

⬛

**Theorem:** Suppose that an arbitrary triangle has sides of length \(a\), \(b\), and \(c\), with the angle \(\theta\) opposite the side \(c\). Then \[c^2 = a^2 + b^2 – 2ab\cos(\theta).\] Note that if \(\theta\) is a right angle, then \(c\) is the hypotenuse of a right triangle and \(\cos(\theta) = 0\), giving the Pythagorean Theorem. Hence we may regard the Law of Cosines as a generalization of the Pythagorean Theorem with some kind of correction term.

*Proof:* We may assume that the triangle is oriented such that \(\theta\) is at the origin and one of the two “legs” is along the \(x\)-axis in the Cartesian plane. Labeling the vertices such that vertex \(A\) is opposite the side with length \(a\), \(B\) is opposite \(b\), and \(C\) is opposite \(c\), we obtain the figure shown below.

Since the point \(A\) is along the \(x\)-axis, the \(y\)-coordinate of \(A\) is zero, and since the distance from \(A\) to the origin (\(C\)) is the length of the corresponding leg of the triangle, we have that the \(x\)-coordinate of \(A\) is \(b\). That is,\[A = (b,0).\] To determine the coordinates of \(B\), note that the ray from \(C\) through \(B\) must intersect the unit circle at some point \(B’\), as shown below.

From the definitions of sine and cosine on the unit circle, we know that the coordinates of \(B’\) are given by \(B’ = (\cos(\theta),\sin(\theta))\). Then, as \(\triangle ABC\) is similar to \(\triangle AB’C\), it follows that \[B = (a\cos(\theta),a\sin(\theta)).\]

On the one hand, we know that \(d(A,B) = c\) (that is, the distance from \(A\) to \(B\) is \(c\) units). On the other hand, the distance formula tells us that \[d(A,B) = \sqrt{(a\cos(\theta)-b)^2 + (a\sin(\theta))^2} = \sqrt{a^2(\cos(\theta)^2 + \sin(\theta)^2) – 2ab\cos(\theta) + b^2}.\] Applying the Pythagorean Identity \(\cos(\theta)^2 + \sin(\theta)^2 = 1\) and squaring, we obtain \[c^2 = d(A,B)^2 = a^2 – 2ab\cos(\theta) + b^2,\] which is the desired result.

⬛

]]>I’m not entirely sure how one loaf of bread can consist of only 4.5 servings when a single serving is one ninth of a loaf. On the other hand, fractions are hard, I guess. :\

]]>**Exercise:** Let \(\mathcal{H}\) be an infinite dimensional Hilbert space. Show that the unit sphere \(S := \{x\in\mathcal{H} : \|x\| = 1\}\) is weakly dense in the unit ball \(B := \{x\in\mathcal{H} : \|x\| \le 1\}\).

I find this to be a really surprising and counter-intuitive result. What this exercise asks us to prove is that every point inside of the unit ball in an infinite dimensional Hilbert space is, in some sense, arbitrarily close to the boundary of that ball. This is really striking to me—how can the center of a ball be really close to the boundary of that ball? Because this result is so unexpected, I think it is worth understanding. Indeed, the arguments presented below have helped me to build some better intuition about both infinite dimensional Hilbert spaces and the weak topology.

Since I am preparing for a qual, all of the following is (a) from memory, and (b) not mean to be entirely rigorous. Any errors, of which there are likely a few, are (obviously) my own.

**Definitions:** A *vector space \(V\)* over a field \(K\) (usually, we understand \(K\) to be either the real or complex numbers) is a set equipped with two algebraic operations—addition and scalar multiplication—that “get along” nicely.^{[1]}

A *normed vector space* is a vector space \(V\) equipped with a norm \(\|\cdot\| : V\times V\to[0,\infty)\) which satisfies the properties \(\|v\| = 0\) if an only if \(v = 0\), \(\|\alpha v\| = |\alpha|\|v\|\) for all vectors \(v\) and scalars \(\alpha\), and the triangle inequality: \(\|u+v\| \le \|u\| + \|v\|\). A *Banach space* is a normed vector space for which the norm is complete, in the sense that if \((v_n)\) is a sequence of vectors such that \(\|v_m-v_n\|\) can be made arbitrarily small by choosing \(m\) and \(n\) large enough, then there is some \(v\in V\) such that \(\|v_n – v\|\) can be made arbitrarily small (i.e. Cauchy sequences in the space converge in the space).

An *inner product* on a vector space \(V\) is a function \(\langle\cdot,\cdot\rangle : V\times V\to K\) such that if \(u,v,w\in V\) and \(\alpha\in K\), then \(\langle u+v,w\rangle = \langle u,v\rangle + \langle w,v\rangle\), \(\langle \alpha u,v\rangle = \alpha\langle u,v\rangle\), \(\langle u,v\rangle = \overline{\langle v,u\rangle}\), and \(\langle v,v\rangle \ge 0\) with equality if and only if \(v = 0\). If \(u,v\in V\) with \(\langle u,v\rangle = 0\), then we say that \(u\) and \(v\) are *orthogonal*.^{[2]} A vector space \(V\) equipped with an inner product is called an inner product space. Note that the inner product induces a norm as follows: \(\|v\| = \sqrt{\langle v,v\rangle}\). If \(V\) is a Banach space under this norm, then we call \(V\) a *Hilbert space*.

A *bounded linear functional* on a vector space \(V\) is a function \(f : V\to \mathbb{R}\) such that for any \(u,v\in V\) and \(\alpha \in K\), we have \(f(u+v) = f(u)+f(v)\) and \(f(\alpha u) = \alpha f(u)\). The (operator) norm of a functional is given by \(\|f\|_{op} := \sup_{\|v\|=1} \|f(v)\|\). If this supremum is finite, we say that \(f\) is a *bounded linear functional*. The collection of all bounded linear functionals on \(V\) is called the *dual space* of \(V\), and is usually denoted \(V^*\).

We now require a more general version of the Pythagorean theorem, and a couple of useful inequalities:

**Theorem (Pythagorean Theorem):** *Let \(\mathcal{H}\) be a Hilbert space, and suppose that \(\{\xi_i\}_{i=1}^{n}\subset\mathcal{H}\) is a collection of mutually orthogonal vectors (that is, \(\langle \xi_i,\xi_j\rangle = 0\) whenever \(j\ne i\)). Then \[\left\|\sum_{i=1}^{n} \xi_i\right\|^2 = \sum_{i=1}^{n} \|\xi_i\|^2.\]*

*Proof:* The proof is a computation: \[ \left\|\sum_{i=1}^{n} \xi_i\right\|^2 = \left\langle \sum_{i=1}^{n} \xi_i,\sum_{i=1}^{n} \xi_i\right\rangle = \sum_{i,j=1}^{n} \langle \xi_i,\xi_j\rangle.\] Since the vectors are assumed to be mutually orthogonal, \(\langle\xi_i,\xi_j\rangle\) is zero when \(i\ne j\), and so the final sum becomes \[ \sum_{i,j=1}^{n} \langle \xi_i,\xi_j\rangle = \sum_{i=1}^{n} \langle\xi_i,\xi_i\rangle = \sum_{i=1}^{n} \|\xi_i\|^2, \] which completes the proof.

⬛

**Theorem (Cauchy-Schwartz Inequality):** *For any \(\xi\) and \(\eta\) in a Hilbert space, \(|\langle \xi,\eta\rangle| \le \|\xi\|\|\eta\|\), with equality if and only if \(\xi\) and \(\eta\) are linearly dependent (i.e. there is some scalar \(\lambda\) such that \(\xi = \lambda\eta\)).*

*Proof:* If \(\langle \xi,\eta\rangle = 0\), there is nothing to prove. Otherwise, let \[ \lambda = \text{sgn}\langle \xi,\eta\rangle = \frac{\langle \xi,\eta\rangle}{|\langle \xi,\eta\rangle|}, \] and set \(\zeta = \lambda\eta\). Note that that \(|\lambda| = 1\), and so \(\|\eta\| = \|\zeta\|\), and that \[ \langle \xi,\zeta \rangle

= \overline{\lambda} \langle \xi,\eta\rangle

= \frac{\langle \eta,\xi\rangle}{|\langle \xi,\eta\rangle|}\langle \xi,\eta\rangle

= |\langle \xi,\eta\rangle|

= \frac{\langle \xi,\eta\rangle}{|\langle \xi,\eta\rangle|}\langle \eta,\xi\rangle

= \lambda\langle\eta,\xi\rangle

= \langle\zeta,\xi\rangle.\] From this, it follows that for any real number \(t\), we then have \[0

\le \|\xi + t\zeta\|^2

= \langle \xi-t\zeta,\xi-t\zeta\rangle

= \langle \xi,\xi\rangle – 2t|\langle \xi,\eta\rangle| + t^2\langle \zeta,\zeta\rangle

= \|\xi\|^2 – 2t|\langle \xi,\eta\rangle| + t^2\|\eta\|^2.\] This is a quadratic function in \(t\), which we seek to minimize. From the first derivative test, we see that the vertex of the described parabola occurs when \(t = |\langle\xi,\eta\rangle|\|\eta\|^{-2},\) and we see from the second derivative test that this point must be the global minimum. Substituting this into the previous equation for \(t\), we obtain \[0

\le \|\xi\|^2 – 2t|\langle \xi,\eta\rangle| + t^2\|\eta\|^2

= \|\xi\|^2 – 2|\langle\xi,\eta\rangle|^2\|\eta\|^{-2} + |\langle\xi,\eta\rangle|\|\eta\|^{-2}

= \|\xi\|^2 – |\langle\xi,\eta\rangle|^2\|\eta\|^{-2}. \] Canceling \(\|\eta\|^{-2}\), which is nonzero by the hypothesis that \(\langle \xi,\eta\rangle \ne 0\), we obtain \[ 0 \le \|\xi\|^2\|\eta\|^2 – |\langle\xi,\eta\rangle|^2,\] which is the desired result. Note also that equality holds if and only if \(0 = \|\xi-t\zeta\| = \|\xi – \lambda t \eta\|\), in which case \(\xi\) and \(\eta\) are linearly dependent.

⬛

**Theorem (Bessel’s Inequality):** *Let \(\{\eta_\alpha\}_{\alpha\in A}\) be an orthonormal set ^{[3]} in a Hilbert space \(\mathcal{H}\). Then for any \(\xi \in \mathcal{H}\), \[ \sum_{\alpha\in A} |\langle \xi,\eta_\alpha\rangle|^2 \le \|\xi\|^2. \]*

*Proof:* It is sufficient to show that the result holds for any finite \(F\subseteq A\). For any such set, we have \[\begin{align*} 0

&\le \left\| \xi – \sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle \eta_{\alpha}\right\|^2 \\

&= \left\langle \xi – \sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle\eta_{\alpha}, \xi – \sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle\eta_{\alpha} \right\rangle \\

&= \|\xi\|^2 – 2\Re\left\langle \xi,\sum_{\alpha\in F}\langle \xi,\eta_{\alpha}\rangle \eta_{\alpha}\right\rangle+ \left\|\sum_{\alpha\in F} \langle\xi,\eta_{\alpha}\rangle\eta_{\alpha} \right\|^2 \\

&= \|\xi\|^2 – 2 \sum_{\alpha\in F} |\langle \xi,\eta_{\alpha}\rangle|^2 + \sum_{\alpha\in F} |\langle \xi,\eta_{\alpha}\rangle|^2,

\end{align*}\]

where the middle term results from the conjugate linearity of the inner product, and the third term follows from the Pythagorean theorem (since the \(\eta_{\alpha}\) are all mutually orthogonal). But then we have \[ 0 \le \|\xi\|^2 – \sum_{\alpha\in F} |\langle \xi,\eta_{\alpha}\rangle|^2, \] which is exactly the desired result.

⬛

There is one other very powerful result that we are going to need:

**Theorem (Reisz Representation Theorem):** *If \(\mathcal{H}\) is a Hilbert space and \(f\in\mathcal{H}^*\), then there is a unique element \(\eta\in\mathcal{H}\) such that for all \(\xi\in\mathcal{H}\) we have \(f(\xi) = \langle \xi,\eta\rangle\). Moreover, the map \(f\mapsto\eta\) is an isometry, i.e. \(\|f\|_{op} = \|y\|\).*

Intuitively, this result tells us that Hilbert spaces are self-dual, and that we may simply assume that the space of bounded linear functionals on a Hilbert space is, in fact, that Hilbert space. The proof relies on somewhat technical lemma stating that a Hilbert space can be decomposed into mutually orthogonal subspaces, and is quite beyond the scope of what seems reasonable for a qualifying exam, so I won’t give it here.

It should also be noted that the Gram-Schmidt process from undergraduate linear analysis generalizes quite nicely to Hilbert spaces. Given any sequence in a Hilbert space, there is a technique for obtaining an orthonormal sequence with the same “span” (where “span” has a suitably generalized meaning in infinite dimensional Hilbert spaces). While the most general case relies on the axiom of choice, the punchline is that Hilbert spaces contain orthonormal sequences having the same cardinality as the dimension of the space.

Finally, with one more definition, we can get to the main result:

*Definition:* Let \(\{\xi_n\}\) be a sequence in a Hilbert space \(\mathcal{H}\). We say that \(\xi_n\) *converges to \(\xi\) weakly* in \(\mathcal{H}\) if for every \(f\in\mathcal{H}^*\) the sequence \(\{f(\xi_n)\}\) converges to \(f(\xi)\) in \(\mathbb{R}\). We denote this by \(\xi_n \rightharpoonup \xi\).

Recall the goal:

**Exercise:** Let \(\mathcal{H}\) be an infinite dimensional Hilbert space. Show that the unit sphere \(S := \{x\in\mathcal{H} : \|x\| = 1\}\) is weakly dense in the unit ball \(B := \{x\in\mathcal{H} : \|x\| \le 1\}\).

We first prove the following lemma:

**Lemma:** *Every orthonormal sequence in an infinite dimensional Hilbert space converges weakly to zero.*

*Proof:* Let \(\{\eta_n\}\) be an orthonormal sequence in an infinite dimensional Hilbert space \(\mathcal{H}\), and let \(f\in\mathcal{H}^*\) be arbitrary. By the Reisz representation theorem, there exists some \(\xi\in\mathcal{H}\) such that \(f(\eta_n) = \langle \xi,\eta_n\rangle\) for all \(n\). By Bessel’s inequality, we have \[ \sum_{n=1}^{\infty} |\langle \xi,\eta_n\rangle|^2 \le \|\xi\|^2. \] Since \(\|\xi\|^2 < \infty\), the sum converges. In particular, this implies that \(\langle \xi,\eta_n\rangle \to 0\) as \(n\to \infty\). But this is exactly the desired result.

⬛

Now we can finally give a solution to the exercise:

*Proof:* Let \(\xi\in B\). We want to show that there is a sequence of elements \(\{\xi_n\}\subseteq B\) that converges weakly to \(\xi\). Let \(\{\eta_n\}\) be an orthonormal sequence in \(\{\xi\}^{\perp}\).^{[4]} For each \(n\), define \[ \xi_n := \lambda \eta_n + \xi, \] where \(\lambda\) is chose so that \(\|\xi_n\| = 1\). Assuming that such a choice is possible, we would obtain from the Pythagorean theorem that \[ 1 = \|\xi_n\|^2 = \lambda^2\|\eta_n\|^2 + \|\xi\|^2. \] Then \[\lambda = \sqrt{1-\|\xi\|^2}.\] Since \(\|\xi\|^2 \le 1\), it is clear that \(\lambda\) is a nonnegative real number, hence such a choice of \(\lambda\) is clearly possible.

Because I think it is a little unclear, let me reiterate what I have done: I now have a sequence \(\{\xi_n\}\) which lives on the unit sphere, where each \(\xi_n = \xi + \lambda\eta_n\) and \(\lambda\) is a fixed constant. We claim that \(\xi_n \rightharpoonup \xi\).

To see this, note that for any \(f\in\mathcal{H}^*\), we have \[ f(\xi_n) = f(\xi + \lambda\eta_n) = f(\xi) + \lambda f(\eta_n). \] By the lemma, \(f(\eta_n)\to 0\) as \(n\to \infty\), and so we have \(f(\xi_n) \to f(\xi)\) as \(n\to \infty\). But then \(\xi_n\rightharpoonup \xi\), which completes the proof.

⬛

- [1]
- By “getting along”, we mean that we can add elements of the set (called vectors) and the addition behaves like addition should (it is associative and commutative), we can multiply elements of the set by elements of the field (scalars) and the multiplication of scalars behaves like multiplication should (it is also associative and commutative), and the scalar multiplication distributes over the vector addition. There are also some other technical requirements, such as the need for an identity element in the space. The easiest non-trivial example is the space of vectors in \(\mathbb{R}^2\). These can be written as tuples \((x,y)\) with \(x,y\in\mathbb{R}\). Vector addition is component-wise, and scalar multiplication works like \(\alpha(x,y) = (\alpha x, \alpha y)\). ↩
- [2]
- Once again, \(\mathbb{R}^2\) provides a good example, with the inner product given by \(\langle (x_1,y_1), (x_2,y_2)\rangle = x_1x_2 + y_1y_2\). The usual notion of orthogonal vectors is recovered in this space. ↩
- [3]
- That is, \(\langle \eta_\alpha,\eta_\beta\rangle = 0\) for any \(\alpha\ne\beta\) and \(\|\eta_\alpha\| = 1\) for all \(\alpha\). ↩
- [4]
- That such a sequence exists is nonobvious, but follows from the fact that the span of \(\xi\) is a closed subspace of \(\mathcal{H}\) and so has orthogonal complement of infinite dimension. This orthogonal complement is also a closed subspace of \(\mathcal{H}\), and so has a (possibly infinite) orthonormal basis. We can select an orthonormal sequence from that basis. ↩

Many examples of continuous functions that are not differentiable spring to mind immediately: the absolute value function is not differentiable at zero; a sawtooth wave is not differentiable anywhere that it changes direction; the Cantor function^{[1]} is an example of a continous function that is not differentiable on an uncountable set, though it does remain differentiable “almost everywhere.”

The goal is to show that there exist functions that are continuous, but that are nowhere differentiable. In fact, what we actually show is that the collection of such functions is, in some sense, quite large and that the set of functions that are differentiable—even if only at a single point—is quite small. First, we need a definition and an important result.

**Definition:** Let \(X\) be a complete metric space,^{[2]} and let \(M \subseteq X\). Then \(M\) is said to be

*nowhere dense*in \(X\) if the closure of \(M\) has empty interior;^{[3]}*meager*in \(X\) if it is the union of countably many nowhere dense sets; and*residual*in \(X\) if it is not meager (i.e. if it is the complement of a meager set.

This definition provides a topological notion of what it means for a subset of a metric space to be “small.” Nowhere dense sets are tiny—almost insignificant—subsets, while residual sets are quite large (relative to the ambient space). It is also worth noting that meager sets were originally called “sets of the first category,” and residual sets were originally called “sets of the second category,” leading to the name of the following theorem:

**Baire’s Category Theorem:** *If a metric space \(X\ne\emptyset\) is complete, then it is residual in itself.*

*Proof:* Suppose for contradiction that \(X\) is meager in itself. Then we may write \[X = \bigcup_{k=1}^{\infty} M_k,\] where each \(M_k\) is nowhere dense. As \(M_1\) is nowhere dense in \(X\), its closure has empty interior, and therefore contains no nonempty open sets. But \(X\) does contain at least one nonempty open set—\(X\) itself. Hence \(\overline{M}_1\ne X\) and so, since \(\overline{M}_1\) is closed, its complement is both open and nonempty. Choose some \(p_1\in X\setminus \overline{M}_1\). Since \(X\setminus \overline{M}_1\) is open, there exists some \(\varepsilon_1 \in (0,\frac{1}{2}\) such that \(B(p_1,\varepsilon_1)\subseteq X\setminus \overline{M}_1\). Let \(B_1 := B(p_1,\varepsilon_1)\).

Now consider \(\overline{M}_2\). It also has empty interior, and so it contains no open balls. In particular, it does not contain \(B_1\). But then \(B_1\setminus\overline{M}_2\) is open. Let \(p_2 \in B_1\setminus\overline{M}_2\) and choose \(\varepsilon_2 < \frac{1}{2}\varepsilon_1\) such that \(B_2 := B(p_2,\varepsilon_2) \subseteq B_1\setminus\overline{M}_2\).

Continue this process by induction. That is, for each \(k\in\mathbb{N}\), choose \(B_{k+1}\) to be an open ball of radius \(\varepsilon_{k+1} < \frac{1}{2}\varepsilon_k\) such that \(B_{k+1} \subseteq B_k \setminus \overline{M}_{k+1}\). By this construction we have \(\varepsilon_k < 2^{-k}\) for each \(k\). In particular we have from the triangle inequality that \(d(p_m,p_n) \le 2^{-k}\) for all \(m,n > k\), as \(B_m,B_n\subseteq B_{k+1}\). Hence the sequence of points \((p_k)\) is Cauchy in \(X\). Since \(X\) is complete, there exists some \(p\in X\) such that \(p_k\to p\). But for any \(k\) the point \(p\) is contained in \(B_k\), which implies that \(p\not\in M_k\) for all \(k\). Hence we have found a point \(p\in X\) such that \(p\not\in \bigcup M_k = X\), which is a contradiction.

⬛

We now get to the main result, which has appeared on qualifying exams a few times in the past:

**Exercise:** Use Baire’s category theorem to prove the existence of continuous, nowhere differentiable functions on the unit interval.

*Solution:* For each natural number \(n\), define the set \[ E_n := \{ f\in C([0,1]) : \exists x_0\in[0,1] \text{ s.t. } |f(x)-f(x_0)| \le n|x-x_0| \forall x\in[0,1]\}. \] This is rather a lot of notation. Let’s try to unpack it just a bit: the derivative of \(f\) is defined by the limit \[ f'(x_0) = \lim_{x\to x_0} \frac{f(x)-f(x_0)}{x-x_0}. \] If this limit exists, then near \(x_0\) the difference quotient must be bounded by some number, say \(L_1\). Away from \(x_0\), the uniform continuity of \(f\) ensures that the difference quotient is bounded by some \(L_2\) when \(x\) is far from \(x_0\). Taking the larger of these two bounds, we have that if \(f\) is differentiable at some point \(x_0\), then \[ \frac{|f(x)-f(x_0)|}{|x-x_0|} \le \max\{L_1,L_2\}. \] Thus if a function \(f\) is differentiable at any point \(x_0\in[0,1]\), then \(f\) lives in \(E_n\) for some value of \(n\).

What we want to show is that each \(E_n\) is nowhere dense in space of continous functions \(C[0,1]\) (which is a normed vector space with respect to the uniform norm). Since all of the differentiable functions are contained in \(\bigcup_n E_n\), it would then follow that the set of differentiable functions is contained in a meager subset of \(C[0,1]\). From Baire’s category theorem, we could then conclude that nowhere differentiable functions exist and, indeed, that there is a residual set of nowhere differentiable functions.

To show that each \(E_n\) is nowhere dense, we have to show that the closure of each \(E_n\) has empty interior or, equivalently, if we have an arbitrary function in \(\overline{E}_n\), we need to find a continuous function that is “close” to \(f\) in the uniform norm, but which is not contained in \(\overline{E}_n\).

So, what is \(\overline{E}_n\)? We claim that it is just \(E_n\). To see this, suppose that \(\{f_k\}\) is a Cauchy sequence in \(E_n\). First, note that \(C[0,1]\) is complete, hence there is some \(f\in C[0,1]\) such that \(f_k\to f\). Now, since each \(f_k\)\in E_n, for each \(k\) there is some \(x_k\) such that \[ \frac{|f(x)-f(x_k)|}{|x-x_k|} \le n \quad\forall x\in[0,1]. \] But then the sequence of numbers \(x_k\) is a sequence in \([0,1]\). By the Bolzano-Weierstrass theorem, this sequence has a convergent subsequence, say \(x_{k_j} \to x\in[0,1]\). Hence by the uniform convergence of \(f_k\) to \(f\), we have \[ \frac{|f(x)-f(x_{k_j})|}{|x-x_{k_j}|} = \lim_{j\to\infty} \frac{|f_{k_j}(x)-f_{k_j}(x_{k_j})|}{|x-x_{k_j}|} \le n. \] Therefore \(f \in E_n\), and so Cauchy sequences in \(E_n\) converge in \(E_n\), which shows that \(E_n\) is closed.

Now, given a function \(f\in E_n\) how do we find a function \(g\) that is “close” to \(f\), but not in \(E_n\)? That is actually somewhat delicate, and is dealt with in the following lemma:

**Lemma:** *Given \(f\in C[0,1]\), \(n\in\mathbb{N}\), and \(\varepsilon > 0\), there exists a piecewise linear function \(g\) with only finitely many linear pieces such that each linear piece has slope \(\pm 2n\) and \(\|g-f\|_{u} < \varepsilon\).*

*Proof:* Since \(f\) is uniformly continuous, there exists a \(\delta\) such that for any \(x,y\in[0,1]\), if \(|x-y|< \delta\), then \(|f(x)-f(y)|<\varepsilon/2\). Choose \(m\in\mathbb{N}\) such that \(m > 1/\delta\). On the interval \([0,1/m]\), define \(g\) as follows: define the first linear piece of \(g\) by setting \(g(0) = f(0)\) and giving it slope \(2n\) on the interval \([0,\varepsilon/2n]\). On the interval \([\varepsilon/2n, 2\varepsilon/2n]\), let \(g\) have slope \(-2n\). Continue in the manner until the linear piece that intersects a line of slope \(\pm 2n\) through the point \((1/m,f(1/m))\) is constructed, and take \(g\) to be equal to that linear function from the point of intersection to \(1/m\).

Continue this procedure for each interval of the form \((k/m,(k+1)/m\) for \(k=1,2,\ldots,m-1\). That is, set \(g(k/m) = f(k/m)\) and construct a sawtooth function on the given interval with slope \(\pm 2n\). We claim that \(g\) is a function of the type desired.

We first note that \(g\) is piecewise linear, with each piece having slope \(\pm 2n\)—this is explicit in the construction. Moreover, there are only finitely many pieces, since the unit interval was broken into \(m\) subintervals, and each subinterval contains only finitely many linear pieces.^{[4]} Finally, it follows from the choice of \(\delta\) and \(m\), and the triangle inequality that for each \(x\in [0,1]\), we have \[ |g(x) – f(x)| \le \left|g(x) – f\left(\frac{\lfloor mx \rfloor}{m}\right)\right| + \left|f\left(\frac{\lfloor mx \rfloor}{m}\right) – f(x)\right| < \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon. \] Hence \(\|g-f\|_u < \varepsilon\).

Therefore \(g\) is exactly the kind of function that we want, and so the proof is complete.

⬛

With the lemma proved, we now have the following: if \(f\in E_n\), then for any \(\varepsilon > 0\), there exists a piecewise linear function \(g\) consisting of finitely many linear pieces each having slope \(\pm 2n\) such that \(\|g-f\|_u < \varepsilon\). But no such \(g\) is in \(E_n\), and so every neighborhood of \(f\) contains functions that are not in \(E_n\). Thus \(E_n\) contains no open sets, and is therefore nowhere dense.

Thus far, we have proved that each \(E_n\) is closed an nowhere dense, thus we are ready to apply Baire’s category theory: since \(C[0,1]\) is residual in itself, it follows that \[ C[0,1] \setminus \bigcup_{n=1}^{\infty} E_n \ne \emptyset. \] But, as noted above, every function that is differentiable anywhere—even if only at a single point—is contained in the union. Therefore \(C[0,1]\) contains at least one (in fact, a residual set of) nowhere differentiable function.

⬛

- [1]
- I’ll (hopefully) talk more about this later. ↩
- [2]
- A metric space is a set of points and a way of measuring the distance between those points. A sequence of points in a metric space is said to be Cauchy if the distance between any two (not necessarily consecutive) points in the sequence gets small for points sufficiently deep into the sequence. A metric space is said to be complete if every Cauchy sequence converges to some point in the space. The real numbers are a complete metric space, but the rational numbers are not. (Why?) ↩
- [3]
- This is a topological notion. In very broad strokes, if \(A\) is a subset of \(X\), then a point \(x\) is in the closure of \(A\) if we can find points of \(A\) that are arbitrarily “close” to \(x\). \(A\) is said to have empty interior if all of the points in \(A\) are arbitrarily “close” to points that are not in \(A\). For instance, the closure of the interval \((0,1)\) in \(\mathbb{R}\) is the interval \([0,1]\), and any finite collection of points in \(\mathbb{R}\) is nowhere dense. (Why?) ↩
- [4]
- This follows from the Archimedean principle—exact bounds on the number of pieces can be computed, but such a computation is tedious and we are, frankly, a bit lazy. ↩

**Exercise:** For each \(g\in L^1([0,1])\), define a bounded linear functional \(\varphi_g\) by \[ \varphi_g(f) = \int fg. \] We know that the map \(g\mapsto \varphi_g\) is an isometric injection of \(L^1([0,1])\) into \((L^\infty([0,1]))^*\). Show that this map is *not* surjective.

**Explanation:** Basically, we have the result that for any \(1 < p < \infty\), \((L^p)^*\)—the dual of \(L^p\)—is isometric to \(L^q\), where \(1/p + 1/q = 1\). That is, with a little bit of handwaving about identifying spaces via isometries, we can state that if \(p\in(0,\infty)\) and \(1/p + 1/q = 1\), then \((L^p)^*\) is \(L^q\). What we want to show is that this result does not (generally) hold for \(p=\infty\), i.e. that \((L^\infty)^*\) is almost always strictly larger than \(L^1\).

**Solution:** The goal is to explicitly construct bounded linear functional \(\varphi:L^\infty([0,1])\to\mathbb{R}\) such that \(\varphi\ne\varphi_g\) for any \(g\in L^1([0,1])\). We start on a somewhat smaller space: define \(\varphi:C([0,1]) \to \mathbb{R}\) by \(\varphi(f) = f(0)\). We make the claims that (1) \(\varphi\) is linear, and (2) \(\varphi\) is dominated by the \(L^\infty\) norm, i.e. if \(f\in C([0,1])\), then \(\varphi(f)\le\|f\|_{\infty}\).

Claim (1) is straight-forward: let \(f,g\in C([0,1])\) and take \(\alpha\in\mathbb{R}\) to be an arbitrary scalar. Then \[ \varphi(f+g) = (f+g)(0) = f(0) + g(0) = \varphi(f) + \varphi(g), \] and \[ \varphi(\alpha f) = (\alpha f)(0) = \alpha f(0) = \alpha \varphi(f), \] therefore \(\varphi\) is a linear functional on \(C([0,1])\), as claimed.

Claim (2) requires us to recall that the \(L^\infty\) norm is equal to the uniform norm on the space of continuous functions. Moreover, the domain \([0,1]\) is compact, and so the uniform norm of a continuous function is the maximum magnitude that the function attains. Hence for any \(f\in C([0,1])\), we have \[ \varphi(f) = f(0) \le |f(0)| \le \max_{x\in[0,1]} |f(x)| = \|f\|_u = \|f\|_{\infty}. \]

With both claims proved, we apply the Hahn-Banach Theorem. Namely, we can extend \(\varphi\) to a bounded linear functional \(\overline{\varphi}\) on \(L^{\infty}([0,1])\) such that \(\overline{\varphi}(f) = f(0)\) for any \(f\in C([0,1])\). Since there is no ambiguity, we will drop the overline and simply write \(\varphi := \overline{\varphi}\).

It remains to show that there is no \(g\in L^1([0,1])\) such that \(\varphi = \varphi_g\). For contradiction, suppose that such \(g\) does exist, and consider the sequence of functions \(\{f_n\}\subset C([0,1])\) defined by \(f_n(x) = \max\{1-nx,0\}\). On the one hand, note that \[ \varphi\left(\lim_{n\to\infty} f_n\right) = \lim_{n\to\infty} \varphi(f_n) = \lim_{n\to\infty} f_n(0) = 1. \] Hence by the construction of \(\varphi_g\), we have \[ \lim_{n\to\infty} \int f_n g = 1. \tag{*}\]

On the other hand, note that \(f_n\) converges pointwise to the zero function. Moreover, \(f_n(x) \le 1\) for all \(x\in[0,1]\), and so \((f_n g)(x) \le g(x)\) for all such \(x\). It then follows from the Dominated Convergence Theorem that \[ \lim_{n\to\infty} \int f_n g = \int \lim_{n\to \infty} f_n g = \int 0 = 0.\] This contradicts \((*)\), and so there cannot be any \(g\in L^1([0,1])\) such that \(\varphi = \varphi_g\), which completes the proof.

]]>

The approach that I have adopted here is fairly naïve, as my knowledge of music theory is rather lacking. That said, here is what I have done so far:

First, I pick a point in the complex plane and generate the orbit of that point. For instance, consider the point (taken arbitrarily) \(0.375+0.285i\). Iterating the Mandelbrot function 64 times gives us the following sequence of complex numbers, which represents the orbit of the original point:

0.38 + 0.28*i 0.43 + 0.50*i 0.31 + 0.72*i -0.04 + 0.74*i -0.17 + 0.22*i 0.35 + 0.21*i 0.46 + 0.43*i 0.39 + 0.68*i 0.07 + 0.82*i -0.29 + 0.40*i 0.30 + 0.05*i 0.46 + 0.32*i 0.49 + 0.58*i 0.28 + 0.85*i -0.27 + 0.77*i -0.14 + -0.14*i 0.38 + 0.32*i 0.41 + 0.53*i 0.27 + 0.72*i -0.07 + 0.67*i -0.07 + 0.19*i 0.34 + 0.26*i 0.43 + 0.46*i 0.34 + 0.68*i 0.03 + 0.75*i -0.19 + 0.33*i 0.30 + 0.16*i 0.44 + 0.38*i 0.42 + 0.62*i 0.17 + 0.81*i -0.25 + 0.56*i 0.13 + 0.01*i 0.39 + 0.29*i 0.45 + 0.51*i 0.31 + 0.74*i -0.07 + 0.75*i -0.18 + 0.18*i 0.38 + 0.22*i 0.47 + 0.45*i 0.39 + 0.71*i 0.03 + 0.84*i -0.33 + 0.33*i 0.37 + 0.07*i 0.51 + 0.34*i 0.52 + 0.63*i 0.25 + 0.94*i -0.45 + 0.76*i 0.00 + -0.39*i 0.22 + 0.29*i 0.34 + 0.41*i 0.32 + 0.57*i 0.16 + 0.65*i -0.02 + 0.49*i 0.14 + 0.26*i 0.33 + 0.36*i 0.35 + 0.52*i 0.23 + 0.65*i 0.01 + 0.59*i 0.03 + 0.29*i 0.29 + 0.30*i 0.37 + 0.46*i 0.30 + 0.62*i 0.07 + 0.66*i -0.05 + 0.38*i 0.23 + 0.24*i

Plotting this renders

To turn this into sound I divided the complex plane into 24 wedges, each of which is π/12 radians wide. Each wedge corresponds to a different note in a chromatic scale, hence each point in the orbit corresponds to a note. For the duration of each note, I used the distance from the origin with points closer to the origin having a longer duration. The exact details can be found in the program used to generate the music: fractal_music.py. You’ll need python and music21 to make this work.

The result for this sequence is a .midi file (embedded below):

Finally, just for giggles, I typset this sequence of notes using musescore. The result doesn’t look much like music, but is moderately entertaining.

Obviously, there are a lot of parameters here. In the code that generates these “tunes,” there are two obvious parameters: the initial point (the orbits can be quite different depending on what point is chosen) and the number of iterations (more iterations give longer pieces of “music”). There are also some not-so-obvious changes that could be made. Instead of looking at the points themselves, the note generated could depend on the change in position (i.e. the difference between two consecutive points in an orbit). The interpretation of phase and modulus could be exchanged, or the Cartesian (rather than polar) coordinates of each point could be used in some manner. If anyone plays with this at all, please let me know—I would love to hear the results.

]]>It seems that some poor student in the past was having trouble with his Greek, particularly with the lowercase letter ξ (xi). Since that little guy is the bane of my existence, I couldn’t help but chuckle at the struggles of a another.

]]>Mathematics is a field that has been historically dominated by men. This has been a self fulfilling prophecy—our society often tells young women that math isn’t for women (see, for instance, the Barbie doll who states “Math class is tough!” or the innumerable stories about women being advised to “consider ‘easier’ programs”) leading to women pursuing other courses of study, which in turn leads to an underrepresentation of women in mathematics, thereby reinforcing the perception that women can’t do math. Each of these factors needs to be addressed and I hope that Dr. Mirzakhani’s being honored in this manner will help with the misconception that women cannot do well in mathematics.

]]>