{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# GraphBEAN: A Powerful Tool for Anomaly Detection on Bipartite Graphs\n", "\n", " \"Open\n", "\n", "\n", "## Introduction\n", "This tutorial provides a hands-on introduction to the GraphBEAN model, a novel graph neural network architecture for unsupervised anomaly detection on bipartite node-and-edge-attributed graphs. This model was originally presented in the paper \"Interaction-Focused Anomaly Detection on Bipartite Node-and-Edge-Attributed Graphs\" by Fathony et al. (2023), which we have implemented as part of our FinTorch project. Note that we generalized the concepts of GraphBEAN from bipartite networks to k-partite networks.\n", "\n", "GraphBEAN addresses the limitations of existing anomaly detection models, which typically focus on homogeneous graphs or neglect rich edge information. It leverages an autoencoder-like approach, employing a customized encoder-decoder structure to effectively encode both node and edge attributes, as well as the underlying graph structure, into low-dimensional latent representations. These representations are then used to reconstruct the original graph, and reconstruction errors are used to identify anomalous edges and nodes.\n", "\n", "This tutorial will guide you through the core concepts of GraphBEAN, demonstrating its usage with a practical example using the Elliptic dataset. You will learn how to:\n", "* Load and explore bipartite node-and-edge-attributed graph data.\n", "* Define and train a GraphBEAN model using PyTorch Lightning.\n", "* Analyze and interpret anomaly detection results.\n", "\n", "This tutorial will enable you to effectively apply GraphBEAN to diverse applications involving bipartite graphs, such as fraud detection in financial transactions, malicious activity detection in network security, or anomaly detection in user-item interaction networks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install FinTorch" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fintorch" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "\n", "# Installation of PyTorch Geometric and dependencies based on detected versions\n", "def install_pyg_and_dependencies():\n", " !pip install pyg-lib -f https://data.pyg.org/whl/torch-{torch.__version__}.html\n", " !pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-{torch.__version__}.html\n", "\n", "# Detect PyTorch version\n", "if torch.__version__ >= \"1.13.0\":\n", " print(\"PyTorch version 1.13.0 or newer detected. Installing PyG and dependencies...\")\n", " install_pyg_and_dependencies()\n", "else:\n", " print(\"PyTorch version is older than 1.13.0. PyG might not work correctly. Please upgrade PyTorch or use the pip install torch_geometric method.\")\n", " \n", "\n", "# Verify installation\n", "try:\n", " import torch_geometric\n", " print(f\"PyTorch Geometric successfully installed. Version: {torch_geometric.__version__}\")\n", "except ImportError:\n", " print(\"PyTorch Geometric not found. Installation might have failed.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The tutorial kicks off by importing the necessary libraries: PyTorch Lightning for streamlined training, PyTorch Geometric for powerful graph convolution operations, and FinTorch modules for loading the Elliptic dataset and utilizing the GraphBEAN model. We then create an instance of the EllipticppDataModule, clearly defining the dataset's bipartite structure with \"wallets\" and \"transactions\" as node types and \"to\" as the edge type. This module takes care of data loading, splitting, and generating data loaders for efficient training." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import lightning as L\n", "from torch_geometric.nn.conv import TransformerConv\n", "\n", "from fintorch.datasets.ellipticpp import EllipticppDataModule\n", "from fintorch.models.graph.graphbean.graphBEAN import GraphBEANModule" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we prepare the dataset by initializing the data module and displaying its structure, revealing the node types, their attributes, and edge connections. We then delve deeper into the dataset's structure by retrieving its metadata, gaining a high-level understanding of the relationships within the bipartite graph." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# We use an example data module from the elliptic dataset which is bipartite\n", "data_module = EllipticppDataModule((\"wallets\", \"to\", \"transactions\"))\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Start download from HuggingFace...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/7 [00:00