GitHub_collection_pykan/tutorials/API_11_create_dataset.ipynb
2024-08-11 13:02:16 -04:00

168 lines
3.5 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "53ff2e87",
"metadata": {},
"source": [
"# API 11: Create dataset"
]
},
{
"cell_type": "markdown",
"id": "25a90774",
"metadata": {},
"source": [
"how to use create_dataset in kan.utils"
]
},
{
"cell_type": "markdown",
"id": "2f9ae0c7",
"metadata": {},
"source": [
"Standard way"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3e2b9f8b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1000, 1])"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from kan.utils import create_dataset\n",
"\n",
"f = lambda x: x[:,[0]] * x[:,[1]]\n",
"dataset = create_dataset(f, n_var=2)\n",
"dataset['train_label'].shape"
]
},
{
"cell_type": "markdown",
"id": "877956c9",
"metadata": {},
"source": [
"Lazier way. We sometimes forget to add the bracket, i.e., write x[:,[0]] as x[:,0], and this used to lead to an error in training (loss not going down). Now the create_dataset can automatically detect this simplification and produce the correct behavior."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b14dd4a2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1000, 1])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f = lambda x: x[:,0] * x[:,1]\n",
"dataset = create_dataset(f, n_var=2)\n",
"dataset['train_label'].shape"
]
},
{
"cell_type": "markdown",
"id": "60230da4",
"metadata": {},
"source": [
"Laziest way. If you even want to get rid of the colon symbol, i.e., you want to write x[;,0] as x[0], you can do that but need to pass in f_mode = 'row'."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e764f415",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1000, 1])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f = lambda x: x[0] * x[1]\n",
"dataset = create_dataset(f, n_var=2, f_mode='row')\n",
"dataset['train_label'].shape"
]
},
{
"cell_type": "markdown",
"id": "8e1f1732",
"metadata": {},
"source": [
"if you already have x (inputs) and y (outputs), and you only want to partition them into train/test, use create_dataset_from_data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "accf900a",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"from kan.utils import create_dataset_from_data\n",
"\n",
"x = torch.rand(100,2)\n",
"y = torch.rand(100,1)\n",
"dataset = create_dataset_from_data(x, y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c45062a8",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}