{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Time Series Forecasting with FinTorch and NeuralForecast\n",
    "\n",
    "In this tutorial, we will demonstrate how to use the [FinTorch](https://github.com/AI4FinTech/FinTorch) Python package to perform time series forecasting on the `AuctionDataset`, which is related to the Kaggle competition [Trading at the Close](https://www.kaggle.com/competitions/optiver-trading-at-the-close/overview). \n",
    "\n",
    "We will utilize state-of-the-art models from the [NeuralForecast](https://github.com/Nixtla/neuralforecast) package, including `NHITS`, `BiTCN`, `NBEATS`, and `NBEATSx`, to forecast auction data.\n",
    "\n",
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/AI4FinTech/FinTorch/blob/main/docs/tutorials/tradingattheclose/tradingattheclose.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install fintorch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note:** For GPU acceleration, install the GPU-compatible version of PyTorch.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional: colab Kaggle setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To download the dataset from Kaggle in Colab, you need to set your kaggle username and kaggle secret."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First configure the `KAGGLE_USERNAME` and `KAGGLE_SECRET` in colab.\n",
    "![Colab secrets](colabsecrets.png)\n",
    "\n",
    "Next, make the secrets available as environment variables as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.colab import userdata\n",
    "import os\n",
    "\n",
    "os.environ[\"KAGGLE_KEY\"] = userdata.get('KAGGLE_KEY')\n",
    "os.environ[\"KAGGLE_USERNAME\"] = userdata.get('KAGGLE_USERNAME')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset Background\n",
    "\n",
    "### Trading at the Close\n",
    "\n",
    "Stock exchanges are dynamic environments where every second counts, and the final moments of the trading day are particularly critical. On the Nasdaq Stock Exchange, the trading day concludes with the **Nasdaq Closing Cross** auction. This process determines the official closing prices for securities listed on the exchange, serving as key indicators for investors and analysts in evaluating market performance.\n",
    "\n",
    "Approximately 10% of Nasdaq's average daily volume occurs during this closing auction. The auction provides true price and size discovery, determining benchmark prices for index funds and various investment strategies. Market makers play a crucial role in this process by consolidating information from both the traditional order book and the auction book during the last ten minutes of trading.\n",
    "\n",
    "### The Challenge\n",
    "\n",
    "In this tutorial, we aim to develop a model capable of predicting the closing price movements for hundreds of Nasdaq-listed stocks using data from both the order book and the closing auction. Accurate predictions can enhance market efficiency and accessibility, especially during the intense final moments of trading.\n",
    "\n",
    "### Understanding the Order Book and Auction Mechanics\n",
    "\n",
    "#### Order Book\n",
    "\n",
    "The **order book** is an electronic ledger of buy (bid) and sell (ask) orders for a specific security, organized by price levels. It displays the interest of buyers and sellers, helping market participants gauge supply and demand.\n",
    "\n",
    "In continuous trading:\n",
    "\n",
    "- **Best Bid**: The highest price a buyer is willing to pay.\n",
    "- **Best Ask**: The lowest price a seller is willing to accept.\n",
    "- Orders are matched when the bid price meets or exceeds the ask price.\n",
    "\n",
    "#### Auction Order Book\n",
    "\n",
    "The **auction order book** differs from the continuous trading order book:\n",
    "\n",
    "- Orders are collected over a predefined timeframe but are not immediately matched.\n",
    "- The auction culminates at a specific time, matching orders at a single price known as the **uncross price**.\n",
    "- The goal is to maximize the number of matched shares.\n",
    "\n",
    "Key terms:\n",
    "\n",
    "- **Uncross Price**: The price at which the maximum number of shares can be matched.\n",
    "- **Matched Size**: The total number of shares matched at the uncross price.\n",
    "- **Imbalance**: The difference between the number of buy and sell orders that remain unmatched at the uncross price.\n",
    "\n",
    "#### Combining Order Books\n",
    "\n",
    "Merging the traditional order book with the auction book provides a comprehensive view of market interest across price levels. This combined book aids in better price discovery, allowing for a more accurate equilibrium price when the auction uncrosses.\n",
    "\n",
    "Additional terms:\n",
    "\n",
    "- **Near Price**: The hypothetical uncross price of the combined book, provided by Nasdaq five minutes before the closing auction.\n",
    "- **Far Price**: The hypothetical uncross price based solely on the auction book.\n",
    "- **Reference Price**: An indicator of the fair price, calculated based on the near price and bounded by the best bid and ask prices.\n",
    "\n",
    "### The Data\n",
    "\n",
    "The dataset we use is sourced from the Kaggle competition [Trading at the Close](https://www.kaggle.com/competitions/optiver-trading-at-the-close/overview). It provides comprehensive data on the Nasdaq Closing Cross auction, including both order book and auction book data.\n",
    "\n",
    "For a detailed exploration of the dataset and the auction mechanisms, please refer to this excellent notebook: [Optiver Trading at the Close Introduction](https://www.kaggle.com/code/tomforbes/optiver-trading-at-the-close-introduction).\n",
    "\n",
    "In this tutorial, we prepare training and test sets that can be directly used with the NeuralForecast library, formatted as Polars DataFrames for efficient processing. The `AuctionDataset` is implemented as a PyTorch dataset, which returns tensors through the `__getitem__` method when iterating over the dataset. This design allows seamless integration with PyTorch-based models and facilitates efficient data handling during model training and evaluation.\n",
    "\n",
    "### Objective\n",
    "\n",
    "Our objective is to use state-of-the-art neural forecasting models to predict the closing prices of stocks. By accurately forecasting these prices, we contribute to improved market efficiency and provide valuable insights into the supply and demand dynamics during the critical closing moments of trading."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing Libraries\n",
    "\n",
    "First, import the necessary libraries and set up the environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "from pathlib import Path\n",
    "\n",
    "import polars as pl\n",
    "import torch\n",
    "from neuralforecast import NeuralForecast\n",
    "from neuralforecast.models import NBEATS, NHITS, BiTCN, NBEATSx\n",
    "from neuralforecast.losses.numpy import mae, mse\n",
    "\n",
    "from fintorch.datasets.auctiondata import AuctionDataset\n",
    "\n",
    "# Set up logging\n",
    "logging.basicConfig(level=logging.INFO)\n",
    "torch.set_float32_matmul_precision(\"medium\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the AuctionDataset\n",
    "\n",
    "Load the `AuctionDataset` from FinTorch:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Load auction data\n"
     ]
    }
   ],
   "source": [
    "# Define the data path\n",
    "data_path = Path(\"~/.fintorch_data/auctiondata-optiver/\").expanduser()\n",
    "\n",
    "# Load the auction data\n",
    "auction_data = AuctionDataset(data_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining Model Parameters\n",
    "\n",
    "Set common parameters for the models:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_size = 30          # Number of past time steps used for prediction\n",
    "days = 3                 # Number of days to forecast\n",
    "steps_per_day = 55       # Number of steps per day\n",
    "horizon = days * steps_per_day  # Forecast horizon\n",
    "max_steps = 10           # Max training steps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initializing the Models\n",
    "\n",
    "Initialize the models with the defined parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Seed set to 1\n",
      "Seed set to 1\n",
      "Seed set to 1\n",
      "Seed set to 1\n"
     ]
    }
   ],
   "source": [
    "# Initialize the models\n",
    "models = [\n",
    "    NHITS(\n",
    "        input_size=input_size,\n",
    "        h=horizon,\n",
    "        futr_exog_list=[\"wap\", \"bid_price\", \"ask_price\"],\n",
    "        scaler_type=\"robust\",\n",
    "        max_steps=max_steps,\n",
    "    ),\n",
    "    BiTCN(\n",
    "        input_size=input_size,\n",
    "        h=horizon,\n",
    "        futr_exog_list=[\"wap\", \"bid_price\", \"ask_price\"],\n",
    "        scaler_type=\"robust\",\n",
    "        max_steps=max_steps,\n",
    "    ),\n",
    "    NBEATS(\n",
    "        input_size=input_size,\n",
    "        h=horizon,\n",
    "        max_steps=max_steps,\n",
    "    ),\n",
    "    NBEATSx(\n",
    "        input_size=input_size,\n",
    "        futr_exog_list=[\"wap\", \"bid_price\", \"ask_price\"],\n",
    "        h=horizon,\n",
    "        max_steps=max_steps,\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing the Data\n",
    "\n",
    "Select relevant columns from the training data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define validation and test sizes\n",
    "val_size = horizon  # Validation set size\n",
    "test_size = horizon  # Test set size\n",
    "\n",
    "# Select necessary columns\n",
    "train_df = auction_data.train.select(\n",
    "    [\n",
    "        \"y\",\n",
    "        \"ds\",\n",
    "        \"unique_id\",\n",
    "        \"wap\",\n",
    "        \"imbalance_size\",\n",
    "        \"imbalance_buy_sell_flag\",\n",
    "        \"reference_price\",\n",
    "        \"matched_size\",\n",
    "        \"bid_price\",\n",
    "        \"ask_price\",\n",
    "        \"ask_size\",\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Performing Cross-Validation\n",
    "\n",
    "Create a `NeuralForecast` object and perform cross-validation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/marcel/Documents/research/FinTorch/.conda/lib/python3.11/site-packages/neuralforecast/common/_base_model.py:346: UserWarning: val_check_steps is greater than max_steps, setting val_check_steps to max_steps.\n",
      "  warnings.warn(\n",
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
      "\n",
      "  | Name         | Type          | Params | Mode \n",
      "-------------------------------------------------------\n",
      "0 | loss         | MAE           | 0      | train\n",
      "1 | padder_train | ConstantPad1d | 0      | train\n",
      "2 | scaler       | TemporalNorm  | 0      | train\n",
      "3 | blocks       | ModuleList    | 3.2 M  | train\n",
      "-------------------------------------------------------\n",
      "3.2 M     Trainable params\n",
      "0         Non-trainable params\n",
      "3.2 M     Total params\n",
      "12.763    Total estimated model params size (MB)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  5.71it/s, v_num=8, train_loss_step=2.440, train_loss_epoch=2.320, valid_loss=7.210]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "`Trainer.fit` stopped: `max_steps=10` reached.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  5.70it/s, v_num=8, train_loss_step=2.440, train_loss_epoch=2.320, valid_loss=7.210]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 27.58it/s] \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
      "\n",
      "   | Name          | Type          | Params | Mode \n",
      "---------------------------------------------------------\n",
      "0  | loss          | MAE           | 0      | train\n",
      "1  | padder_train  | ConstantPad1d | 0      | train\n",
      "2  | scaler        | TemporalNorm  | 0      | train\n",
      "3  | lin_hist      | Linear        | 80     | train\n",
      "4  | drop_hist     | Dropout       | 0      | train\n",
      "5  | net_bwd       | Sequential    | 5.4 K  | train\n",
      "6  | lin_futr      | Linear        | 64     | train\n",
      "7  | drop_futr     | Dropout       | 0      | train\n",
      "8  | net_fwd       | Sequential    | 8.6 K  | train\n",
      "9  | drop_temporal | Dropout       | 0      | train\n",
      "10 | temporal_lin1 | Linear        | 496    | train\n",
      "11 | temporal_lin2 | Linear        | 2.8 K  | train\n",
      "12 | output_lin    | Linear        | 49     | train\n",
      "---------------------------------------------------------\n",
      "17.4 K    Trainable params\n",
      "0         Non-trainable params\n",
      "17.4 K    Total params\n",
      "0.070     Total estimated model params size (MB)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  4.95it/s, v_num=10, train_loss_step=2.800, train_loss_epoch=2.320, valid_loss=6.510]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "`Trainer.fit` stopped: `max_steps=10` reached.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  4.94it/s, v_num=10, train_loss_step=2.800, train_loss_epoch=2.320, valid_loss=6.510]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 25.55it/s] "
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "  | Name         | Type          | Params | Mode \n",
      "-------------------------------------------------------\n",
      "0 | loss         | MAE           | 0      | train\n",
      "1 | padder_train | ConstantPad1d | 0      | train\n",
      "2 | scaler       | TemporalNorm  | 0      | train\n",
      "3 | blocks       | ModuleList    | 2.9 M  | train\n",
      "-------------------------------------------------------\n",
      "2.9 M     Trainable params\n",
      "64.5 K    Non-trainable params\n",
      "2.9 M     Total params\n",
      "11.663    Total estimated model params size (MB)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  5.93it/s, v_num=12, train_loss_step=7.140, train_loss_epoch=7.030, valid_loss=6.290]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "`Trainer.fit` stopped: `max_steps=10` reached.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  5.91it/s, v_num=12, train_loss_step=7.140, train_loss_epoch=7.030, valid_loss=6.290]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 29.18it/s] \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
      "\n",
      "  | Name         | Type          | Params | Mode \n",
      "-------------------------------------------------------\n",
      "0 | loss         | MAE           | 0      | train\n",
      "1 | padder_train | ConstantPad1d | 0      | train\n",
      "2 | scaler       | TemporalNorm  | 0      | train\n",
      "3 | blocks       | ModuleList    | 3.8 M  | train\n",
      "-------------------------------------------------------\n",
      "3.7 M     Trainable params\n",
      "64.5 K    Non-trainable params\n",
      "3.8 M     Total params\n",
      "15.257    Total estimated model params size (MB)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  5.62it/s, v_num=14, train_loss_step=7.750, train_loss_epoch=7.600, valid_loss=6.450]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "`Trainer.fit` stopped: `max_steps=10` reached.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1:  43%|████▎     | 3/7 [00:00<00:00,  5.61it/s, v_num=14, train_loss_step=7.750, train_loss_epoch=7.600, valid_loss=6.450]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 28.11it/s] \n"
     ]
    }
   ],
   "source": [
    "# Create NeuralForecast object\n",
    "nf = NeuralForecast(models=models, freq=\"10s\")\n",
    "\n",
    "# Perform cross-validation\n",
    "Y_hat_df = nf.cross_validation(\n",
    "    df=train_df,\n",
    "    val_size=val_size,\n",
    "    test_size=test_size,\n",
    "    n_windows=None,  # Uses expanding window if None\n",
    ").to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluating the Models\n",
    "\n",
    "Compute MSE and MAE for each model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Model: NHITS\n",
      "MSE: 103.54132703286024\n",
      "MAE: 7.068754453999026\n",
      "\n",
      "Model: BiTCN\n",
      "MSE: 100.66011143269779\n",
      "MAE: 6.521046070610383\n",
      "\n",
      "Model: NBEATS\n",
      "MSE: 63.95072834895031\n",
      "MAE: 5.594107221445525\n",
      "\n",
      "Model: NBEATSx\n",
      "MSE: 73.7288435229088\n",
      "MAE: 5.9910440166960015\n"
     ]
    }
   ],
   "source": [
    "# List of model names\n",
    "model_names = [\"NHITS\", \"BiTCN\", \"NBEATS\", \"NBEATSx\"]\n",
    "\n",
    "# Number of unique series\n",
    "n_series = len(auction_data.train[\"unique_id\"].unique())\n",
    "\n",
    "# Iterate over models to compute metrics\n",
    "for model_name in model_names:\n",
    "    # Extract true values and predictions\n",
    "    y_true = Y_hat_df.y.values\n",
    "    y_hat = Y_hat_df[model_name].values\n",
    "\n",
    "    # Reshape arrays\n",
    "    y_true = y_true.reshape(n_series, -1, horizon)\n",
    "    y_hat = y_hat.reshape(n_series, -1, horizon)\n",
    "\n",
    "    # Compute metrics\n",
    "    mse_score = mse(y_true, y_hat)\n",
    "    mae_score = mae(y_true, y_hat)\n",
    "\n",
    "    # Print results\n",
    "    print(f\"\\nModel: {model_name}\")\n",
    "    print(f\"MSE: {mse_score}\")\n",
    "    print(f\"MAE: {mae_score}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Making Predictions on the Test Set\n",
    "\n",
    "Prepare the test data and make predictions:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 37.01it/s] \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 34.14it/s] \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 33.87it/s] "
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "HPU available: False, using: 0 HPUs\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Predicting DataLoader 0: 100%|██████████| 7/7 [00:00<00:00, 36.93it/s] \n"
     ]
    }
   ],
   "source": [
    "# Prepare the future dataframe\n",
    "fcsts_df = nf.make_future_dataframe()\n",
    "\n",
    "# Select columns from test data\n",
    "selected_data = auction_data.test.select(\n",
    "    [\n",
    "        \"unique_id\",\n",
    "        \"ds\",\n",
    "        \"wap\",\n",
    "        \"imbalance_size\",\n",
    "        \"imbalance_buy_sell_flag\",\n",
    "        \"reference_price\",\n",
    "        \"matched_size\",\n",
    "        \"bid_price\",\n",
    "        \"ask_price\",\n",
    "        \"ask_size\",\n",
    "        \"row_id\",\n",
    "        \"time_id\",\n",
    "    ]\n",
    ")\n",
    "\n",
    "# Add 'y' column filled with zeros due to a requirement in NeuralForecast\n",
    "selected_data = selected_data.with_columns(pl.lit(0).alias(\"y\"))\n",
    "\n",
    "# Make predictions\n",
    "fcsts_df = nf.predict(futr_df=selected_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Displaying Predictions\n",
    "\n",
    "Join predictions with test data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predictions on the test dataset by all models:\n",
      "shape: (33_000, 17)\n",
      "┌───────────┬───────────┬───────────┬───────────┬───┬───────────┬─────────────┬─────────┬─────┐\n",
      "│ NHITS     ┆ BiTCN     ┆ NBEATS    ┆ NBEATSx   ┆ … ┆ ask_size  ┆ row_id      ┆ time_id ┆ y   │\n",
      "│ ---       ┆ ---       ┆ ---       ┆ ---       ┆   ┆ ---       ┆ ---         ┆ ---     ┆ --- │\n",
      "│ f32       ┆ f32       ┆ f32       ┆ f32       ┆   ┆ f64       ┆ str         ┆ i64     ┆ i32 │\n",
      "╞═══════════╪═══════════╪═══════════╪═══════════╪═══╪═══════════╪═════════════╪═════════╪═════╡\n",
      "│ 1.783648  ┆ -0.212747 ┆ 0.895329  ┆ 1.31668   ┆ … ┆ 9177.6    ┆ 478_0_0     ┆ 26290   ┆ 0   │\n",
      "│ 0.22948   ┆ 6.136395  ┆ -0.696216 ┆ 0.21183   ┆ … ┆ 19692.0   ┆ 478_0_1     ┆ 26290   ┆ 0   │\n",
      "│ 13.327369 ┆ -3.136321 ┆ 12.290155 ┆ 10.873125 ┆ … ┆ 34955.12  ┆ 478_0_2     ┆ 26290   ┆ 0   │\n",
      "│ 0.986083  ┆ -1.599667 ┆ 1.061661  ┆ 1.216592  ┆ … ┆ 10314.0   ┆ 478_0_3     ┆ 26290   ┆ 0   │\n",
      "│ 1.251347  ┆ -0.632016 ┆ 0.878551  ┆ 0.20115   ┆ … ┆ 7245.6    ┆ 478_0_4     ┆ 26290   ┆ 0   │\n",
      "│ …         ┆ …         ┆ …         ┆ …         ┆ … ┆ …         ┆ …           ┆ …       ┆ …   │\n",
      "│ 2.138978  ┆ 3.584202  ┆ 0.083392  ┆ 0.929887  ┆ … ┆ 319862.4  ┆ 480_540_195 ┆ 26454   ┆ 0   │\n",
      "│ -10.24681 ┆ -1.241484 ┆ -0.229928 ┆ -0.039628 ┆ … ┆ 93393.07  ┆ 480_540_196 ┆ 26454   ┆ 0   │\n",
      "│ -2.383823 ┆ -0.391682 ┆ 1.629578  ┆ -0.340136 ┆ … ┆ 180038.32 ┆ 480_540_197 ┆ 26454   ┆ 0   │\n",
      "│ 3.550156  ┆ 0.998026  ┆ 0.4129    ┆ 2.423398  ┆ … ┆ 669893.0  ┆ 480_540_198 ┆ 26454   ┆ 0   │\n",
      "│ -7.250549 ┆ -1.062331 ┆ -0.050534 ┆ -0.207352 ┆ … ┆ 300167.56 ┆ 480_540_199 ┆ 26454   ┆ 0   │\n",
      "└───────────┴───────────┴───────────┴───────────┴───┴───────────┴─────────────┴─────────┴─────┘\n"
     ]
    }
   ],
   "source": [
    "# Join predictions with test data\n",
    "fcsts_df = fcsts_df.join(selected_data, on=[\"unique_id\", \"ds\"], how=\"right\")\n",
    "\n",
    "# Display predictions\n",
    "print(\"Predictions on the test dataset by all models:\")\n",
    "print(fcsts_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "In this tutorial, we demonstrated how to use FinTorch and NeuralForecast to perform time series forecasting on auction data. We initialized multiple models, performed cross-validation, evaluated their performance, and made predictions on the test set.\n",
    "\n",
    "Feel free to experiment with different models and parameters to improve forecasting accuracy."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}