\[ \begin{align}
V(a,z) &= \max_{a'} u((1+r)a + z - a') + \beta \mathbb{E}\left[ V(a',z') | z \right]\\
\end{align} \]
Suppose you know the values of \(V(a_i,z_j)\), but only on two discrete grids \((\mathcal{A}, \mathcal{Z})\).
In order to perform the \(\max_{a'}\) operation, you need to evaluate \(V(a',z_j)\), where \(a' \notin \mathcal{A}\). How do you go about this task?
Approximation Methods
Confronted with a non-analytic function \(f\) (i.e. something not like \(\log(x)\)), we need a way to numerically represent \(f\) in a computer.
If your problem is to compute a value function in a dynamic problem, you often don’t have an analytic representation of \(V\).
If you need to compute an equilibrium distribution for your model, you probably can’t tell it’s from one parametric family or another. Hence you need to approximate it.
Approximations use data of some kind which informs us about \(f\). Most commonly, we know the function values \(f(x_i)\) at a corresponding finite set of points \(X = \{x_i\}_{i=1}^N\).
The task of approximation is to take that data and tell us what the function value is at \(f(y),y\not \in X\).
To an economist this should sound very familiar: take a dataset, learn it’s structure, and make predictions.
The only difference is that we can do much better here, because we have more degree’s of freedom (we can choose our \(X\) in \(Y=\beta X + \epsilon\))
Isn’t this Machine Learning?
You should notice that this sounds very similar to what machine learning attempts to do as well: make predictions on unseen data.
So: what is different here?
We focus mostly on in-sample prediction.
Machine learning exerts a lot of computational effort to train a model once very well, then uses the model to make many, many predictions.
We want to estimate structural economic models by changing the parameters of the model often; we would need to train the model very often, and use it make a prediction exactly once.
That said, there are many frontier methods which use deep learning to solve economic models: https://web.stanford.edu/~maliars/Files/JME2021.pdf, https://onlinelibrary.wiley.com/doi/full/10.1111/iere.12575 and references therein.
Some Taxonomy
Local Approximations: approximate function and it’s derivative \(f,f'\) at a single point \(x_0\). Taylor Series: \[ f(x) = f(x_0) + (x-x_0)f'(x_0) + \frac{(x-x_0)^2}{2}f''(x_0) + \dots + \frac{(x-x_0)^n}{n!}f^{n}(x_0) \]
Interpolation or Colocation: find a function \(\hat{f}\) that is a good fit to \(f\), and require that \(\hat{f}\)passes through the points. If we think of there being a residual\(\epsilon_i = f(x_i) - \hat{f}(x_i)\) at each grid point \(i\), this methods succeeds in setting \(\epsilon_i=0,\forall i\).
Regression: Minimize some notion of distance (or squared distance) between \(\hat{f}\) and \(f\), without the requirement of pass through.
Doing Interpolation in Julia
In practice, you will make heavy use of high-quality interpolation packages in julia.
List in the end.
Nevertheless, function approximation is extremely problem-specific, so sometimes a certain approach does not work for your problem.
This is why we will go through the mechanics of some common methods.
I would like you to know where to start drilling if you need to go and hack somebody elses code.
We could think of each row as a vector. A tuple of input data, like \(x_1 = (5.1,3.5,1.4,0.2)\), and a one-dimensional ouput \(y_1\), the species label, like “setosa”. Stack the \(x\) to get \(X\) of size (150,4).
How are the different measures distributed?
@chain iris beginstack(_, Not(:Species))subset(:Species => x -> x .=="Iris-setosa") @dfdensity(:value, group =:variable, xlab ="Setosa Measures",lw =2)end
Those measurements seem to be on different scales. The SepalLength measure seems to have an average of around 5, where as PetalWidth is smaller than 1 in most cases.
How can we do mathematical operations on vectors\(x\) like addition and scalar multiplication?
Like, suppose we want to standardize the data in iris? We would like to create those two objects (each of them a length 4 vector), standing for the mean and std deviation (across rows) of all 150 samples.
@chain iris beginsubset(:Species => x -> x .=="Iris-setosa") transform( [x => (z -> (z .-mean(z)) ./std(z)) => x for x in [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth] ] )stack(_, Not(:Species))@dfdensity(:value, group =:variable, xlab ="Standardized Setosa Measures")end
What we have done right now, i.e. to transform this data via addition and multiplication into a common space is very important. In fact, it is related to the concept of Vector Spaces.
Vector Spaces
A Vector Space is a structure \((V,F,+,\cdot)\), where:
\(V\) is a set of vectors
\(+: V \times V \mapsto V\) is the addition operation, with
\(x + y = y + x\)
\(x + (y + z ) = (x + y ) + z\)
There is a null vector: \(x + 0 = x, 0 \in V\)
there is an inverse \(-x \in V, x + (-x) = 0\)
\(F\) is a field of scalars: \(\mathbb{R}\) or \(\mathbb{C}\)
\(\cdot : F \times V \mapsto V\) is scalar multiplication, satisfying
\(a(bx) = (ab) x\)
\(a(x + y) = ax + ay\)
\(1x = x\)
Examples of Vector Spaces
\((\mathbb{R}^n,\mathbb{R},+,\cdot)\): the most widely encountered vector space - particularly with \(n=2\).
\(\mathbb{R}[x] = \left\{\sum_{i=0}^n p_i x^i : p_i \in \mathbb{R}, n = 0,1,\dots \right\}\): We can add 2 polynomials: \[(p + q)(x) = p(x) + q(x),\] and also multiply with a scalar \[(cp)(x) = c p(x)\] which makes \((\mathbb{R}[x],\mathbb{R},+,\cdot)\) a valid vector space.
A Linear Basis
\(V\) could contain infinitely many vectors.
We can reduce this complexity by finding a set of vectors from which to easily generate the other vectors.
\(E\) is a vector space basis. With addition and scalar multiplication defined, we can create any vector \(x\in V\), i.e. we can span the entire vector space.
\(E\) is like a skeleton for \(V\) (\(\mathbb{R}^n\) in this example).
Linear Combinations
If instead of \(e_i\) we write \(v_i\) for any vector, and \(b_i \in \mathbb{R}\) a weight , we can write any other vector\[z = \sum_{i=1}^n v_i b_i, v_i \in \mathbb{R}^n\] as a linear combination
Potentially, there are multiple combinations of vectors which result in the same combination, e.g. for \(v_1 = (1,0),v_2 = (0,1), v_3= (1,1)\), we can have \[(2,1) = 2 v_1 + v_2 = v_1 + v_3\] i.e. there are 2 equivalent ways to generate \((2,1)\) from those vectors.
The set \(S = \{v_1 , v_2 , v_3 \}\) contains redundant information.
Linear Independence
Let \(V\) be a vector space and \(S=\{v_1,\dots,v_n\}\) be a subset of its vectors. \(S\) is said to be linearly dependent if it only contains the null vector, or if there is a nonzero \(v_k\) which can be expressed as a linear combination of the others \[v_1,\dots,v_{k-1},v_{k+1},\dots,v_n\]
If \(S\) is not linearly dependent, it is linearly independent.
Consider \(v_k\), which we write as a linear combination of the other members of \(S\):
\[v_k = \sum_{i=1}^{k-1} b_i v_i + \sum_{i=k+1}^n b_i v_i\] for nonzero \(v_k\). We already say \(S\) is linearly dependent with this. But look:
Substracting \(v_k\) on both sides, we get the null vector:
\[0 = \sum_{i=1}^{n} b_i v_i\]
for some weights \(b_i\), where it has to be that \(b_k =-1\). So: we can obtain the null vector from a nontrivial linear combination.
Theorem
Let \(V\) be a vector space and \(S\) be a subset of it’s vectors.
\(S\) is linearly dependent if and only if the null vector can be generated as a nontrivial linear combination (i.e. a linear combination where not all coefficients are equal to zero)
\(S\) is linearly independent if and only if it is true that \(0 = \sum_{i=1}^n x_i v_i\), then all coefficients \(x_i = 0\).
Basis Definition
Let \(V\) be a vector space and \(S\) be a subset of it’s vectors. \(S\) is called basis of \(V\) if
\(S\) is linearly independent
\(S\)spans\(V\), i.e. \(span(S) = V\)
We obtain the span of a set of vectors \(S\) by creating all possible linear combinations amongst them. For example, the span of two linearly independent vectors is the plane.
The elements of a basis set are called basis vectors.
Basis Example in R2
The set \(\mathbb{R}^2\) of ordered pairs of real numbers is a vector space \(V\) for
One basis called the standard basis of \(V\) consists of two vectors:
\(e_1 = (1,0)\)
\(e_2 = (0,1)\)
Any vector \(v=(a,b) \in \mathbb{R}^2\) can be uniquely written as \(v = ae_1 + be_2\)
Any other pair of linearly independent vectors like \((1,1)\) and \((-1,2)\) is also a basis of \(\mathbb{R}^2\).
What is a Basis function?
Now we know that every vector \(\mathbf{v}\) in a vector space \(V\) can be represented by a linear combination of basis vectors.
Similarly, we can represent every continuous function in a particular function space\(F\) by a linear combination of basis functions.
Blending Functions
Another good name for basis functions is blending functions. We often use a mixture of several basis functions to find an approximation. We will use this machinery now to see how we can approximate function from several, simple, component functions (which will be basis functions).
Approximating Functions
Let \(F\) by the space of continuous real-valued functions with domain \(\mathbf{x}\in \mathbb{R}\). \(F\) is a vector space.
Next, we define an inner-product operation on that space: \[
<g,h> = \int_\mathbf{x} g(x) h(x) w(x) dx
\] where functions \(g,h,w \in F\) and \(w\) is a weighting function.
the pair \(\{F, <.,.>\}\) form an inner-product vector space.
Now we want to approximate a known function \(f:\mathbf{x}\mapsto\mathbb{R}\) in \(\{F, <.,.>\}\)
Let’s define \(\hat{f}(\cdot;c)\) to be our parametric approximation function. We generically define this as
\(\phi_j : \mathbb{R}^d \mapsto \mathbb{R}\) is called a basis function, and \(\Phi_J = \{\phi_j\}_{j=0}^{J-1}\).
\(c={c_0,c_1,\dots,c_{J-1}}\) is a coefficient vector
The integer \(J\) is the order of the interpolation.
Our problem is to choose \((\phi_i,c)\) in some way.
We want to minimize the residual function, i.e. \(\gamma(x,c) \equiv f(x) - \sum_{j=0}^{J-1} c_j \phi_j(x)\)
To find coefficients \(c\), one could for example minimize the squared errors, OLS: \[
c^* = \arg \min \int_\mathbf{x} \gamma(x,c)^2 w(x) dx
\]
In standard OLS, we’d set the weighting function to \(w(x) = 1\)
We will construct a grid of \(M\geq J\) points \({x_1,\dots,x_M}\) within the domain \(\mathbb{R}^d\), and we will denote the residuals at each grid point by \(\epsilon = {\epsilon_1,\dots,\epsilon_M}\):
If we have more evaluation points than basis functions, \(M>J\) say, interpolation nodes than basis functions, we cannot do that. Instead we can define a loss function, and minimize it.
In the case of squared loss, of course, this leads to the least squares solution:
Spectral Methods are such that the basis functions are non-zero over the entire domain of \(f\).
Polynomial interpolation
Chebychev interpolation
Finite Element methods are such that basis functions are non-zero only on a subset of the domain.
Splines
Linear splines, i.e. splines of degree 1, a.k.a. linear approximation
Higher order splines, mainly the cubic spline.
What makes a good Approximation?
Should be arbitrarily accurate as we increase \(n\).
\(\Phi\) Should be efficiently (fast) computable. If \(\Phi\) were differentiable, we could easily get e.g. \(\hat{f}'(x) = \sum_{j=1}^J c_j \phi_j'(x_i)\)
\(c\) Should be efficiently (fast) computable.
Polynomial Interpolation
What Polynomial to use? What form for \(\Phi\)?
In principle the monomial basis could be used. It is just the power functions of \(x\): \(1,x,x^2,x^3,\dots\)
Stacking this up for each evaluation node gives the Vandermonde Matrix:
for the case with \(m\) evaluation nodes for \(x\), and \(n\) basis functions for each \(x_i\).
Choosing Basis Matrices
If we choose \(\Phi\) so that it’s elements would share a lot of information, it will be hard to invert it. Basically multicollinearity.
So, if the elements share very little information, it’s easier to invert, hence good for us
For example, if we have \(\Phi\) close to diagonal (i.e. different basis \(\phi_j(x)\) are non-zero at different \(x\)).
Such \(\Phi\) are generated by using orthogonal Polynomials (under our inner product definition).
An orthogonal Polynomial has 2 sequences of polynomials orthogonal under an inner product: their inner product is zero under a certain weighting function \(w(x)\).
For example, the Chebyshev polynomials are orthogonal wrt. to weight \(\frac{1}{\sqrt{1-x^2}}\)
for degrees \(n,m\) of that polynomial, \(T_n,T_m\), we have \[
\int_{-1}^1 T_n(x)T_m(x)\,\frac{dx}{\sqrt{1-x^2}}=
\begin{cases}
0 & n\ne m \\ \pi & n=m=0 \\ \frac{\pi}{2} & n=m\ne 0
\end{cases}
\]
Chebyshev Polynomials of the first kind
There is a nice recursion to get \(T_n(x)\): \[\begin{align}
T_0(x) & = 1 \\
T_1(x) & = x \\
T_{n+1}(x) & = 2xT_n(x) - T_{n-1}(x).
\end{align}\]
Code that up and compute \(T_{3}(0.5)\)!
Check you get the same as with the alternative closed form \(T_n(x) = \cos(n \arccos(x))\)
Ok let’s check this orthogonality claim. The formula says that if \(n \neq m\), then integrating the product of two polynomials is exactly zero:
# weighting functionw(x) =1.0/sqrt(1-x^2)usingStatisticsusingTest# simple integration works for n not equal m@showsum(T(x,0) *T(x,1) *w(x) for x inrange(-0.99,stop =0.99, length=100))@showsum(T(x,4) *T(x,5) *w(x) for x inrange(-0.99,stop =0.99, length=100))
sum((T(x, 0) * T(x, 1) * w(x) for x = range(-0.99, stop = 0.99, length = 100))) = -1.7763568394002505e-15
sum((T(x, 4) * T(x, 5) * w(x) for x = range(-0.99, stop = 0.99, length = 100))) = -4.440892098500626e-15
-4.440892098500626e-15
Chebyshev Nodes
Chebyshev Nodes are the roots of the chebyshev polynomial (i.e. where \(P(x) = 0\)).
They are defined in the interval \([-1,1]\) as \[ x_i = \cos\left(\frac{2k-1}{2n} \pi\right), k=1,\dots,n \]
Which maps to general interval \([a,b]\) as \[ x_i = \frac{1}{2} (a+b) + \frac{1}{2} (b-a) \cos\left(\frac{2k-1}{2n} \pi\right) , k=1,\dots,n \]
Chebyshev nodes are not evenly spaced: there are more points towards the boundaries. You might remember that from Gaussian Integration nodes.
WARNING: using ApproxFun.transform in module Notebook conflicts with an existing identifier.
# whats the first deriv of that function at at 0.785?println(f'(0.785))
1.281223714282486
Splines: Piecewise Polynomial Approximation
Splines are a finite element method, i.e. there are regions of the function domain where some basis functions are zero.
As such, they provide a very flexible framework for approximation instead of high-order polynomials.
Keep in mind that Polynomials basis functions are non-zero on the entire domain. Remember the Vandermonde matrix.
They bring some element of local approximation back into our framework. What happens at one end of the domain to the function is not important to what happens at the other end.
Looking back at the previous plot of random data: we are searching for one polynomial to fit all those wiggles. A spline will allow us to design different polynomials in different parts of the domain.
Splines: Basic Setup
The fundamental building block is the knot vector, or the breakpoints vector\(\mathbf{z}\) of length \(p\). An element of \(\mathbf{z}\) is \(z_i\).
\(\mathbf{z}\) is ordered in ascending order.
\(\mathbf{z}\) spans the domain \([a,b]\) of our function, and we have that \(a=z_1,b=z_p\)
A spline is of order k if the polynomial segments are k-th order polynomials.
Literature: [@deboor] is the definitive reference for splines.
Splines: Characterization
Given \(p\) knots, there are \(p-1\) polynomial segments of order \(k\), each characterized by \(k+1\) coefficients, i.e. a total of \((p-1)(k+1)\) parameters.
However, we also require the spline to be continuous and differentiable of degree \(k-1\) at the \(p-2\) interior breakpoints.
Imposing that uses up an additional \(k(p-2)\) conditions.
We are left with \(n = (p-1)(k+1) - k(p-2) = p+k-1\) free parameters.
A Spline of order \(k\) and \(p\) knots can thus be written as a linear combination of it’s \(n = p+k-1\) basis functions.
B-Splines: Definition
We mostly use Basis Splines, or B-Splines.
Here is a recursive definition of a B-Spline (and what is used in ApproXD):
Denote the \(j\)-th basis function of degree \(k\) with knot vector \(\mathbf{z}\) at \(x\) as \(B_j^{k,\mathbf{z}} (x)\)
Again, there are \(n = k + p - 1\)\(B\)’s (where \(p\)= length(z))
We can define \(B_j^{k,\mathbf{z}} (x)\) recursively like this:
Similarly, the Integral is just the sum over the basis functions: \[ \int_a^x B_j^{k,\mathbf{z}} (y) dy = \sum_{i=j}^n \frac{z_i - z_{i-k}}{k} B_{i+1}^{k+1,\mathbf{z}} (x) \]
B-Splines in Action
Here is a nice illustration of spline basis functions from BSplineKit:
It’s important to realize here that even though we have 17 lines on this plot (i.e. 17 basis functions), they are non-zero only over a small portion of the domain of interest (i.e. \([0,1]\) in this example). This improves stability of the approximation.
How to approximate a function with BSplines? Let’s head over to the great documentation of this package to see an example.
Interpolations.jl assumes that data is uniformly spaced on grid 1:N
Same for multiple dimensions
However, we can scale 1:N to different domains
finally, we can also supply our own, non-uniform grids.
Can get gradients of interpolations right away.
usingInterpolationsimportInterpolations: interpolate as itperA =rand(10,5)itp =itper(A, Interpolations.BSpline(Quadratic(Reflect(OnCell()))))itp(1.9,4.9)
WARNING: using Interpolations.knots in module Notebook conflicts with an existing identifier.
0.3613597951472
A_x =1.:2.:40. # x in [1,40]A = [log(x) for x in A_x] # f(x)itp =itper(A, Interpolations.BSpline(Cubic(Interpolations.Line(OnGrid()))))sitp =scale(itp, A_x) # scale x-axis of interpolator@showsitp(3.) # exactly log(3.)@showsitp(3.5) # approximately log(3.5)@showitp(3); # is the 3rd index(!) in A, not the *value* 3
We often have situations where several functions are defined on a common support \(x\).
what is \(f(1.75)\)?
usingStaticArraysx =range(1,stop =3, length =200) # cash on handa =Float64[log(1+j)*i for j in x, i in1:2] # cons and save functionb =reinterpret(SVector{2,Float64}, a')[:]itp =itper(b, Interpolations.BSpline(Quadratic(Reflect(OnCell()))))@showitp(3)sitp =scale(itp,x)@showsitp(3);
v =sitp(1.75) # get interpolated values for both functionplot(x,a,labels=["save""cons"],yticks=convert(Array{Float64},v),xlabel="current cash",lw =3)vline!([1.75],lab="")hline!([v[1],v[2]],lab="")
Up to now, most of what we did was in one dimesion.
Economic problems often have more dimension than that.
The number of state variables in your value functions are the number of dimensions.
We can readily extend what we learned into more dimensions.
However, we will quickly run into feasibility problems: hello curse of dimensionality.
Tensor Product of univariate Basis Functions: Product Rule
One possibility is to approximate e.g. the 2D function \(f(x,y)\) by \[ \hat{f}(x,y) = \sum_{i=1}^n \sum_{j=1}^m c_{i,j} \phi_i^x(x) \phi_j^y(y) \]
here \(\phi_i^x\) is the basis function in \(x\) space,
you can see that the coefficient vector \(c_{i,j}\) is indexed in two dimensions now.
Notice that our initial notation was general enough to encompass this case, as we defined the basis functions as \(\mathbb{R}^d \mapsto \mathbb{R}\). So with the product rule, this mapping is just given by \(\phi_i^x(x) \phi_j^y(y)\).
This formulation requires that we take the product of \(\phi_i^x(x), \phi_j^y(y)\) at all combinations of their indices, as is clear from the summations.
This is equivalent to the tensor product between \(\phi_i^x\) and \(\phi_j^y\).
Computing Coefficients from Tensor Product Spaces
Extending this into \(D\) dimensions, where in each dim \(i\) we have \(n_i\) basis functions, we get \[ \hat{f}(x_1,x_2,\dots,x_D) = \sum_{i_1=1}^{n_1} \sum_{i_2=1}^{n_2} \dots \sum_{i_D=1}^{n_D} c_{i_1,i_2,\dots,i_D} \phi_{i_1}(x_1) \phi_{i_2}(x_2) \dots \phi_{i_D}(x_D) \]
In Vector notation \[
\hat{f}(x_1,x_2,\dots,x_D) = \left[ \phi_{D}(x_D) \otimes \phi_{D-1}(x_{D-1}) \otimes \dots \otimes \phi_{1}(x_1) \right] c \] where \(c\) is is an \(n=\Pi_{i=1}^D n_i\) column vector
The solution is the interpolation equation as before, \[ \begin{aligned}\Phi c =& y \\
\Phi =& \Phi_D \otimes \Phi_{D-1} \otimes \dots \otimes \Phi_{1} \end{aligned} \]
The Problem with Tensor Product of univariate Basis Functions
What’s the problem?
Well, solving \(\Phi c = y\) is hard.
If we have as many evaluation points as basis functions in each dimension, i.e. if each single \(\Phi_i\) is a square matrix, \(\Phi\) is of size (n,n).
Inverting this is extremely hard even for moderately sized problems.
Sometimes it’s not even possible to allocate \(\Phi\) in memory.
Here it’s important to remember the sparsity structure of a spline basis function.
usingApproXDgr()bs = ApproXD.BSpline(7,3,0,1) #7 knots, degree 3 in [0,1]n =500eval_points =collect(range(0,stop =1.0,length = n))B =Array(ApproXD.getBasis(eval_points,bs))ys = [string("Basis",i) for i =size(B)[2]-1:-1:0]heatmap(eval_points,ys,reverse(B',dims=1))
This is a cubic spline basis. at most \(k+1=4\) basis are non-zero for any \(x\).
High Dimensional Functions: Introducing the Smolyak Grid
This is a modification of the Tensor product rule.
It elemininates points from the full tensor product according to their importance for the quality of approximation.
The user controls this quality parameter, thereby increasing/decreasing the size of the grid.
[@jmmv] is a complete technical reference for this method.
[@maliar-maliar] chapter 4 is very good overview of this topic, and the basis of this part of the lecture.
The Smolyak Grid in 2 Dimensions
Approximation level \(\mu \in \mathbb{N}\) governs the quality of the approximation.
Start with a unidimensional grid of points \(x\): \[ x = \left\{-1,\frac{-1}{\sqrt{2}},0,\frac{1}{\sqrt{2}},1\right\} \] which are 5 Chebyshev nodes (it’s not important that those are Chebyshev nodes, any grid will work).
Then, we construct all possible 2D tensor products using elements from these nested sets in a table (next slide).
Finally, we select only those elements of the table, that satisfy the Smolyak rule: \[ i_1 + i_2 \leq d + \mu \] where \(i_1,i_2\) are column and row index, respectively, and \(d,\mu\) are the number of dimensions and the quality of approximation.
The Smolyak Grid in 2D: Tensor Table
[@maliar-maliar] table 3: All Tensor Products
Selecting Elements
Denote the Smolyak grid for \(d\) dimensions at level \(\mu\) by \(\mathcal{H}^{d,\mu}\).
if \(\mu=0\) we have \(i_1+i_2\leq 2\). Only one point satisfies this, and \[ \mathcal{H}^{2,0} = \{(0,0)\} \]
if \(\mu=1\) we have \(i_1+i_2\leq 3\). Three cases satisfy this:
\(i_1 = 1, i_2=1 \rightarrow (0,0)\)
\(i_1 = 1, i_2=2 \rightarrow (0,0),(0,-1),(0,1)\)
\(i_1 = 2, i_2=1 \rightarrow (0,0),(-1,0),(1,0)\)
Therefore, the unique elements from the union of all of those is \[ \mathcal{H}^{2,1} = \{(0,0),(-1,0),(1,0),(0,-1),(0,1)\} \]
if \(\mu=2\) we have \(i_1+i_2\leq 4\). Six cases satisfy this:
\(i_1 = 1, i_2=1\)
\(i_1 = 1, i_2=2\)
\(i_1 = 2, i_2=1\)
\(i_1 = 1, i_2=3\)
\(i_1 = 2, i_2=2\)
\(i_1 = 3, i_2=1\)
Therefore, the unique elements from the union of all of those is \[ \mathcal{H}^{2,2} = \left\{(-1,1),(0,1),(1,1),(-1,0),(0,0),(1,0),(-1,-1),(0,-1),(1,-1),\left(\frac{-1}{\sqrt{2}},0\right),\left(\frac{1}{\sqrt{2}},0\right),\left(0,\frac{-1}{\sqrt{2}}\right),\left(0,\frac{1}{\sqrt{2}}\right)\right\} \]
Note that those elements are on the diagonal from top left to bottom right expanding through all the tensor products on table 3.
Size of Smolyak Grids
The Smolyak grid grows much slower (at order \(d\) to a power of \(\mu\)) than the Tensor grid (exponential growth)
[@maliar-maliar] figure 2: Tensor vs Smolyak in 2D
[@maliar-maliar] figure 4: Tensor vs Smolyak in 2D, number of grid points
Smolyak Polynomials
Corresponding to the construction of grid points, there is the Smolyak way of constructing polynomials.
This works exactly as before. We start with a one-dimensional set of basis functions (again Chebyshev here, again irrelevant): \[ \left\{1,x,2x^2-1,4x^3-3x,8x^4-8x^2+1\right\} \]
Denoting \(\mathcal{P}^{d,\mu}\) the Smolyak polynomial, we follow exactly the same steps as for the grids to select elements of the full tensor product table 5:
[@maliar-maliar] figure 5: All Smolyak Polynomials in 2D
Smolyak Interpolation
This proceeds as in the previouses cases:
Evaluate \(f\) at all grid points \(\mathcal{H}^{d,\mu}\).
Evaluate the set of basis functions given by \(\mathcal{P}^{d,\mu}\) at all grid points \(\mathcal{H}^{d,\mu}\).
Solve for the interpolating coefficients by inverting the Basis function matrix.
There is a lot of redundancy in computing the grids the way we did it.
More sophisticated approaches take care not to compute repeated elements.
Smolyak Grids in Julia
There are at least 2 julia packages that implement this idea:
# simple interpolationfi(x) = x.^2-0.5*x.^3sp =SplineParams(collect(grid1),0,3) # nodes, whether only 2 nodes, degree of splinesb =Basis(sp)s,n = BasisMatrices.nodes(sb)sv =fi(s)coef, bs_direct =funfitxy(sb, s, sv) # compute coefficientsplot(fi,1,3,label="truth")newx =1.+rand(10)*2newy =funeval(coef,sb,newx) # evaluate basis at new pointsscatter!(newx,newy,label="approx")
usingPlotlyJS
# multiple dimensionsbasis =Basis(SplineParams(25, -1, 1, 2),SplineParams(20, -1, 2, 1)) # quadratic and linear spline# get nodesX, x12 = BasisMatrices.nodes(basis)# function to interpolatef2(x1, x2) =cos.(x2) ./exp.(x1)f2(X::Matrix) =f2.(X[:, 1], X[:, 2])# f at nodesy =f2(X)Ymat = [f2(i,j) for i in x12[1], j in x12[2]]coefs, bbs =funfitxy(basis, X, y)ynew =funeval(coefs, basis, X[5:5, :])[1]usingTest@testmaximum(abs, ynew - y[5]) <=1e-12plotlyjs()Plots.surface(x12[1],x12[2],Ymat',alpha=0.7,cbar=false)xnew=hcat(-1.+rand(10)*2 , -1.+rand(10)*3)ynew = [funeval(coefs, basis, xnew[i,:]) for i in1:size(xnew,1)]Plots.scatter!(xnew[:,1],xnew[:,2],ynew,markersize=2,markershape=:rect)
# same with the smolyak grid# multiple dimensionsusingLinearAlgebrasp =SmolyakParams(2,3, [-1,-1],[1,2])basis =Basis(sp) # get nodesX, x12 = BasisMatrices.nodes(basis)# # f at nodesy =f2(X)Ymat = [f2(i,j) for i inunique(X[:,1]), j inunique(X[:,2])]# map domain into unit hypercubecube = BasisMatrices.dom2cube(X, basis.params[1])# evaluate basis matrixeb = BasisMatrices.build_B(sp.d, sp.mu, cube, sp.pinds)coef =pinv(eb) * yxnew =hcat(-1.+rand(30)*2 , -1.+rand(30)*3)cube = BasisMatrices.dom2cube(xnew, sp)eb = BasisMatrices.build_B(sp.d, sp.mu, cube, sp.pinds)ynew = eb * coefscatter(X[:,1],X[:,2],y,markersize=2)scatter!(xnew[:,1],xnew[:,2],ynew,markersize=2,markershape=:rect)
The Tasmanian.jl is a simple wrapper to that library.
Simon Scheidegger has many useful resources on his website, in particular their joint paper with Johannes Brumm features an application of sparse grids for high dimensional economics problems.