2024-08-11 13:02:16 -04:00

386 lines
95 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "134e7f9d",
"metadata": {},
"source": [
"# API 5: Grid"
]
},
{
"cell_type": "markdown",
"id": "2571d531",
"metadata": {},
"source": [
"One important feature of KANs is that they embed splines to neural networks. However, splines are only valid for approximating functions in known bounded regions, while the range of activations in neural networks may be changing over training. So we have to update grids properly according to that. Let's first take a look at how we parametrize splines. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "2075ef56",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'B_i(x)')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from kan.spline import B_batch\n",
"import torch\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from kan.spline import extend_grid\n",
"\n",
"# consider a 1D example.\n",
"# Suppose we have grid in [-1,1] with G intervals, spline order k\n",
"G = 5\n",
"k = 3\n",
"grid = torch.linspace(-1,1,steps=G+1)[None,:]\n",
"grid = extend_grid(grid, k_extend=k)\n",
"\n",
"# and we have sample range in [-1,1]\n",
"x = torch.linspace(-1,1,steps=1001)[None,:]\n",
"\n",
"basis = B_batch(x, grid, k=k)\n",
"\n",
"for i in range(G+k):\n",
" plt.plot(x[0].detach().numpy(), basis[0,:,i].detach().numpy())\n",
" \n",
"plt.legend(['B_{}(x)'.format(i) for i in np.arange(G+k)])\n",
"plt.xlabel('x')\n",
"plt.ylabel('B_i(x)')"
]
},
{
"cell_type": "markdown",
"id": "75af662c",
"metadata": {},
"source": [
"There are $G+k$ B-spline basis. The function is a linear combination of these bases $${\\rm spline}(x)=\\sum_{i=0}^{G+k-1} c_i B_i(x).$$ We don't need worry about the implementation since it's already built in KAN. But let's check if KAN is indeed implementing this. We initialize a [1,1] KAN, which is simply a 1D spline."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "ccfecd98",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Parameter containing:\n",
"tensor([[[ 0.0781, 0.0073, -0.0178, -0.0140, 0.0396, -0.0596, 0.0312,\n",
" 0.0469]]], requires_grad=True)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.act_fun[0].coef"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "c3461a32",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1001, 8])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basis[0].shape"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "ac751154",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[ 0.0781, 0.0073, -0.0178, -0.0140, 0.0396, -0.0596, 0.0312, 0.0469]],\n",
" grad_fn=<SelectBackward0>)"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.act_fun[0].coef[0]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "4369a310",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"checkpoint directory created: ./model\n",
"saving model version 0.0\n"
]
},
{
"data": {
"text/plain": [
"tensor(0.0040, grad_fn=<MeanBackward0>)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from kan import KAN\n",
"\n",
"model = KAN(width=[1,1], grid=G, k=k)\n",
"# obtain coefficients c_i\n",
"model.act_fun[0].coef\n",
"assert(model.act_fun[0].coef[0].shape[1] == G+k)\n",
"\n",
"# the model forward\n",
"model_output = model(x[0][:,None])\n",
"\n",
"# spline output\n",
"spline_output = torch.einsum('j,ij->i',model.act_fun[0].coef[0][0], basis[0])[:,None]\n",
"\n",
"torch.mean((model_output - spline_output)**2)"
]
},
{
"cell_type": "markdown",
"id": "82150587",
"metadata": {},
"source": [
"They are not the same, what's happening? We want to remind that we model the activation function to have two additive parts, a residual function $b$(x) plus the spline function, i.e., $$\\phi(x)={\\rm scale\\_base}*b(x)+{\\rm scale\\_sp}*{\\rm spline}(x),$$ and by default $b(x)={\\rm silu}(x)=x/(1+e^{-x})$."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "7d76a3c4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(0., grad_fn=<MeanBackward0>)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# residual output\n",
"residual_output = torch.nn.SiLU()(x[0][:,None])\n",
"scale_base = model.act_fun[0].scale_base\n",
"scale_sp = model.act_fun[0].scale_sp\n",
"torch.mean((model_output - (scale_base * residual_output + scale_sp * spline_output))**2)"
]
},
{
"cell_type": "markdown",
"id": "3d72e076",
"metadata": {},
"source": [
"What if my grid does not match my data? For example, my grid is in [-1,1], but my data is in [10,10] or [-0.5,0.5]. Use update_grid_from_sample to adjust grids to samples. This grid update applies to all splines in all layers."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "46717e8b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"checkpoint directory created: ./model\n",
"saving model version 0.0\n",
"Parameter containing:\n",
"tensor([[-2.2000, -1.8000, -1.4000, -1.0000, -0.6000, -0.2000, 0.2000, 0.6000,\n",
" 1.0000, 1.4000, 1.8000, 2.2000]])\n",
"Parameter containing:\n",
"tensor([[-22., -18., -14., -10., -6., -2., 2., 6., 10., 14., 18., 22.]])\n"
]
}
],
"source": [
"model = KAN(width=[1,1], grid=G, k=k)\n",
"print(model.act_fun[0].grid) # by default, the grid is in [-1,1]\n",
"x = torch.linspace(-10,10,steps = 1001)[:,None]\n",
"model.update_grid_from_samples(x)\n",
"print(model.act_fun[0].grid) # now the grid becomes in [-10,10]. We add a 0.01 margin in case x have zero variance"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "de04db15",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"checkpoint directory created: ./model\n",
"saving model version 0.0\n",
"Parameter containing:\n",
"tensor([[-2.2000, -1.8000, -1.4000, -1.0000, -0.6000, -0.2000, 0.2000, 0.6000,\n",
" 1.0000, 1.4000, 1.8000, 2.2000]])\n",
"Parameter containing:\n",
"tensor([[-1.1000, -0.9000, -0.7000, -0.5000, -0.3000, -0.1000, 0.1000, 0.3000,\n",
" 0.5000, 0.7000, 0.9000, 1.1000]])\n"
]
}
],
"source": [
"model = KAN(width=[1,1], grid=G, k=k)\n",
"print(model.act_fun[0].grid) # by default, the grid is in [-1,1]\n",
"x = torch.linspace(-0.5,0.5,steps = 1001)[:,None]\n",
"model.update_grid_from_samples(x)\n",
"print(model.act_fun[0].grid) # now the grid becomes in [-10,10]. We add a 0.01 margin in case x have zero variance"
]
},
{
"cell_type": "markdown",
"id": "e418ca2c",
"metadata": {},
"source": [
"Uniform grid or non-uniform? We consider two options: (1) uniform grid; (2) adaptive grid (based on sample distribution) such that there are (rougly) same number of samples in each interval. We provide a parameter grid_eps to interpolate between these two regimes. grid_eps = 1 gives (1), and grid_eps = 0 gives (0). By default we set grid_eps = 1 (uniform grid). There could be other options but it is out of our scope here."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "d2c4f636",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"checkpoint directory created: ./model\n",
"saving model version 0.0\n",
"Parameter containing:\n",
"tensor([[-2.2000, -1.8000, -1.4000, -1.0000, -0.6000, -0.2000, 0.2000, 0.6000,\n",
" 1.0000, 1.4000, 1.8000, 2.2000]])\n",
"Parameter containing:\n",
"tensor([[-7.4371, -6.0845, -4.7319, -3.3793, -2.0267, -0.6741, 0.6785, 2.0311,\n",
" 3.3837, 4.7363, 6.0889, 7.4415]])\n"
]
}
],
"source": [
"# uniform grid\n",
"model = KAN(width=[1,1], grid=G, k=k)\n",
"print(model.act_fun[0].grid) # by default, the grid is in [-1,1]\n",
"x = torch.normal(0,1,size=(1000,1))\n",
"model.update_grid_from_samples(x)\n",
"print(model.act_fun[0].grid) # now the grid becomes in [-10,10]. We add a 0.01 margin in case x have zero variance"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "b9b354c6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"checkpoint directory created: ./model\n",
"saving model version 0.0\n",
"Parameter containing:\n",
"tensor([[-2.2000, -1.8000, -1.4000, -1.0000, -0.6000, -0.2000, 0.2000, 0.6000,\n",
" 1.0000, 1.4000, 1.8000, 2.2000]])\n",
"Parameter containing:\n",
"tensor([[-7.4371, -6.0845, -4.7319, -3.3793, -0.8336, -0.2805, 0.2751, 0.8132,\n",
" 3.3837, 4.7363, 6.0889, 7.4415]])\n"
]
}
],
"source": [
"# adaptive grid based on sample distribution\n",
"model = KAN(width=[1,1], grid=G, k=k, grid_eps = 0.)\n",
"print(model.act_fun[0].grid) # by default, the grid is in [-1,1]\n",
"x = torch.normal(0,1,size=(1000,1))\n",
"model.update_grid_from_samples(x)\n",
"print(model.act_fun[0].grid) # now the grid becomes in [-10,10]. We add a 0.01 margin in case x have zero variance"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f7b8f994",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}