Introduction to Elementary FEM

I have recently been working on a project that involves using the Finite Element Method. Here is a sneak peak:

To begin what is hopefully going to be a series of blog posts, I would like to introduce the basic concept of the Finite Element Method. This note is aimed at around undergraduate level for vaguely STEM-ish fields with solid mathematical background. I tried to be a bit less formal than I would usually write these things, hopefully it makes this note less intimidating and more approachable.

To begin, let's consider solving a simple PDE in 1D. Let $f : (0, 1) \to \mathbb{R}$ be a continuous function. We are looking for a sufficiently differentiable function $u : [0, 1] \to \mathbb{R}$ such that.^*∂ denotes the derivative operator. That is, ∂u is the derivative of u and ∂²u is the second derivative of u.

\partial^2 u = f \qquad\text{and}\qquad u(0) = u(1) = 0.

This is nothing special, just finding the second antiderivative of $f$ . The boundary conditions are somewhat unimportant in this case, since one may always add a linear polynomial to the solution to satisfy them. This becomes more interesting in higher dimensions.

As one may expect, computing the function $u$ may be difficult, entirely depending on the form of $f$ . This means we need to look for an approximation.

To start, let's play a game:

First, let's introduce some notation. We are going to define the inner product of two reasonably nice functions $u$ and $v$ on the interval $[0, 1]$ by:

\langle u \mid v \rangle = \intop_0^1 u v \, \text{d}x.

Let's say that $u$ solves the PDE. Then, let's grab any other twice differentiable function $v$ that is also zero on the boundary, that is, $v(0) = v(1) = 0$ (meaning, it could plausibly be a solution of the PDE). and compute the following integral using integration by parts:

\begin{align*} 0 = \langle \partial^2 u - f \mid v \rangle &= \intop_0^1 (\partial^2 u - f) v \, \text{d}x \\ &= \intop_0^1 (\partial^2 u) v \, \text{d}x - \intop_0^1 f v \, \text{d}x \\ &= \underbrace{\left[ (\partial u) v\right]_0^1}_{0} - \intop_0^1 (\partial u) (\partial v) \, \text{d}x - \intop_0^1 f v \, \text{d}x \\ \end{align*}

Rearanging, we get the equality

\langle \partial u \mid \partial v \rangle = \intop_0^1 (\partial u) (\partial v) \, \text{d}x = \intop_0^1 f v \, \text{d}x = \langle f \mid v \rangle.

It is an exercise in analysis to show that, this condition is in fact sufficient: if $u$ satisfies this condition for every twice differentiable function $v$ that is zero on the boundary, then $u$ is a solution of the PDE. This is called the weak form of the PDE.

It is not immediately obvious how this is useful. On a first glance, we have converted a problem that requires us to find a function which has a certain property into a problem that requires us to find a function that satisfies a certain integral equation for every function $v$ . This does not seem to be easier at all!

Now comes the main trick: instead of considering the entire space of twice differentiable functions that are zero on the boundary, we are going to use a different, finite dimensional space of functions. This space does not even need to be a subspace of the space of smooth functions and in fact, it usually isn't in practical implementations.

Definition. For the remainder of this post, let

0 = x_0 < x_1 < \cdots < x_{n+1} = 1

be a partition of the unit interval. Furthermore, let the solution space, denoted by $V$ , be the vector space of functions on the unit interval that are

continuous

piecewise linear, that is, linear polynomial on every interval $[x_i, x_{i+1}]$

zero on the boundary, that is, $u(0) = u(1) = 0$ for every $u \in V$

The functions in this space look like this:

In more theoretical texts, the first and second conditions are often abstracted, for instance by considering polynomials of arbitrary degree.

It is worth observing that in practice, we are the ones that choose the partition $(x_1, \ldots, x_n)$ . This means that if we have some knowledge about the function $f$ , or maybe we demand better accuracy in some regions, we can increase the density of the partition in those regions.

Definition. A weak solution is a function $u \in V$ such that, for every $v \in V$ , the following holds:

\langle \partial u \mid \partial v \rangle = \langle f \mid v \rangle

I have smuggled in another trick here: notice that the weak form only requires once-differentiable functions. This is convenient since the second derivatives of our piecewise linear functions are zero almost everywhere. This allows us to work with simpler functions, simplifying the computation later.

Now for the final trick: notice that $V$ is a finite dimensional vector space. This means that we can choose a basis of $V$ , say $(v_1, \ldots, v_n)$ . We use this basis to write $u$ as a linear combination of the basis functions:

u = \alpha_1 v_1 + \alpha_2 v_2 + \cdots + \alpha_n v_n.

It is now an easy exercise in linear algebra to show that it is sufficient for the weak solution to satisfy the weak form for every basis function $v_i$ of $V$ instead of every function in $V$ . That is, we assemble a system of equations:

\begin{align*} \alpha_1 \langle \partial v_1 \mid \partial v_1 \rangle &+ \alpha_2 \langle \partial v_2 \mid \partial v_1 \rangle + \cdots + \alpha_n \langle \partial v_n \mid \partial v_1 \rangle = \langle f \mid v_1 \rangle, \\ &\qquad\qquad\qquad\vdots \\ \alpha_1 \langle \partial v_1 \mid \partial v_n \rangle &+ \alpha_2 \langle \partial v_2 \mid \partial v_n \rangle + \cdots + \alpha_n \langle \partial v_n \mid \partial v_n \rangle = \langle f \mid v_n \rangle. \end{align*}

Or in matrix form:

\begin{pmatrix} \langle \partial v_1 \mid \partial v_1 \rangle & \langle \partial v_2 \mid \partial v_1 \rangle & \cdots & \langle \partial v_n \mid \partial v_1 \rangle\\ \langle \partial v_1 \mid \partial v_2 \rangle & \langle \partial v_2 \mid \partial v_2 \rangle & \cdots & \langle \partial v_n \mid \partial v_n \rangle\\ \vdots & \vdots & \ddots & \vdots\\ \langle \partial v_1 \mid \partial v_n \rangle & \langle \partial v_2 \mid \partial v_n \rangle & \cdots & \langle \partial v_n \mid \partial v_n \rangle \end{pmatrix} \begin{pmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{pmatrix} = \begin{pmatrix} \langle f \mid v_1 \rangle \\ \langle f \mid v_2 \rangle \\ \vdots \\ \langle f \mid v_n \rangle \end{pmatrix}

Using fancy terminology, the matrix on the left is the Gram matrix of the inner product^*Verifying that this is an inner product is left as an exercise to the reader. More advanced treatments tend to allow more general bilinear forms here. $\langle \partial - \mid \partial - \rangle$ on the space $V$ . Engineers usually call this matrix the stiffness matrix. It can be shown that this system of equations always has a solution.

What remains is picking the basis functions. Let's consider several demands we would like to impose on the basis:

The stiffness matrix should be reasonably sparse.
The right hand side terms $\langle f \mid v_i \rangle$ should be easy to compute or at least approximate.
Once we get the solution $(\alpha_1, \ldots, \alpha_n)$ , we should be able to compute the function $u$ easily from the coefficients.

The usual basis functions are the hat functions. These are piecewise linear functions that look like this:

That is, an increasing linear slope between $x_{i-1}$ and $x_i$ and a decreasing linear slope between $x_i$ and $x_{i+1}$ . In more precise terms:

v_i(x) = \begin{cases} \frac{x - x_{i-1}}{x_i - x_{i-1}} &\text{if}\quad x_{i-1} \leq x < x_i, \\ \frac{x_{i+1} - x}{x_{i+1} - x_i} &\text{if}\quad x_i \leq x < x_{i+1}, \\ 0 & \text{otherwise}. \end{cases}

These functions have some desirable properties. First, note that if we have $\left|i - j\right| \geq 2$ , then $\langle \partial v_i \mid \partial v_j \rangle = 0$ , since the intervals where the functions are non-zero do not even overlap. This means that the stiffness matrix is tridiagonal, which satisfies our sparsity requirement. There are even specialized algorithms that can be used to solve systems of equations that have a tridiagonal matrix. It is not that easy in higher dimensions, but even there, methods that work well with sparse matrices are available.

Computing the right hand side terms may be a bit more difficult, depending on the form of $f$ . We are looking to evaluate the integral

\langle f \mid v_i \rangle = \intop_0^1 f(x) v_i(x) \, \text{d}x = \intop_{x_{i-1}}^{x_{i+1}} f(x) v_i(x) \, \text{d}x.

In practice, we use a numerical integration scheme here. An easy approach is to assume that $f \in V$ , that is, we linearly interpolate $f$ from its values on $(x_1, \ldots, x_n)$ . Then, we can simply compute the integral directly. Even simpler approach is to assume exploit the locality of the basis functions, assuming that $f$ is constant on $[x_{i-1}, x_{i+1}]$ and compute

\langle f \mid v_i \rangle \approx f(x_i) \intop_{x_{i-1}}^{x_{i+1}} v_i(x) \, \text{d}x.

Do note that this, again, splits our right hand side into a part that can be pre-computed for a given partition and a part that depends on the function $f$ .

And finally, since the basis functions are setup in a way that only $v_i$ is non-zero on $x_i$ , computing the value of the solution is simply:

u(x_i) = (\alpha_1 v_i + \alpha_2 v_i + \cdots + \alpha_n v_i)(x_i) = \alpha_i.