{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d421f136",
   "metadata": {
    "id": "d421f136"
   },
   "source": [
    "# Logistic regression using HEaaN.ml\n",
    "\n",
    "In this documentation, we introduce default detection model using lending club dataset from Kaggle (https://www.kaggle.com/datasets/ethon0426/lending-club-20072020q1)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "164febb8",
   "metadata": {
    "id": "164febb8"
   },
   "source": [
    "## 1. Data import"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "603a72c4",
   "metadata": {
    "id": "603a72c4"
   },
   "source": [
    "We import lending club data extracted from original set downloaded from Kaggle. In this exercise, we only use 32,768 observations and selected some sensitive features. Let's import the provided data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e35a2025",
   "metadata": {
    "id": "e35a2025"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cfa9b3ac",
   "metadata": {
    "id": "cfa9b3ac",
    "outputId": "ef0f6120-9aab-4e69-ec37-09a1a8869006"
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('lc.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3f183ae0",
   "metadata": {
    "id": "3f183ae0",
    "outputId": "4c7d8625-00ae-48b4-cfdf-f2f1ac76001e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 32768 entries, 0 to 32767\n",
      "Data columns (total 15 columns):\n",
      " #   Column                Non-Null Count  Dtype  \n",
      "---  ------                --------------  -----  \n",
      " 0   loan_amnt             32768 non-null  float64\n",
      " 1   annual_inc            32768 non-null  float64\n",
      " 2   term                  32768 non-null  object \n",
      " 3   int_rate              32768 non-null  object \n",
      " 4   installment           32768 non-null  float64\n",
      " 5   grade                 32768 non-null  object \n",
      " 6   sub_grade             32768 non-null  object \n",
      " 7   purpose               32768 non-null  object \n",
      " 8   loan_status           32768 non-null  object \n",
      " 9   dti                   32743 non-null  float64\n",
      " 10  last_fico_range_high  32768 non-null  float64\n",
      " 11  last_fico_range_low   32768 non-null  float64\n",
      " 12  total_acc             32768 non-null  float64\n",
      " 13  delinq_2yrs           32768 non-null  float64\n",
      " 14  emp_length            30518 non-null  object \n",
      "dtypes: float64(8), object(7)\n",
      "memory usage: 3.8+ MB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "-e7_kVk9I1QN",
   "metadata": {
    "id": "-e7_kVk9I1QN"
   },
   "source": [
    "## 2. Data preprocessing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ENNr6eY7DCFY",
   "metadata": {
    "id": "ENNr6eY7DCFY"
   },
   "source": [
    "Before conducting the analysis, let's perform some data preprocessing. First, Let's transform int_rate as float variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "hIeCWEIiDCaK",
   "metadata": {
    "id": "hIeCWEIiDCaK"
   },
   "outputs": [],
   "source": [
    "df['int_rate'] = df['int_rate'].apply(lambda x: float(x.strip('%')) / 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67ddb4d6",
   "metadata": {
    "id": "67ddb4d6"
   },
   "source": [
    "There are several ways to deal with missing observations in variables. In this example, we will drop the missing observations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8a722e13",
   "metadata": {
    "id": "8a722e13"
   },
   "outputs": [],
   "source": [
    "df = df.dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c63b7bae",
   "metadata": {
    "id": "c63b7bae"
   },
   "source": [
    "We can apply a log transformation to the 'annual_inc' and 'loan_amnt' variables using either the numpy or math modules in Python. To apply the transformation, you can use the following code: np.log(df\\['annual_inc'\\]) and np.log(df\\['loan_amnt'\\]). Once the transformation is applied, it's a good practice to check whether it was successful by inspecting the transformed data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "277cc9e2",
   "metadata": {
    "id": "277cc9e2",
    "outputId": "1b39f538-aa01-4f22-b752-18d22e1752d7"
   },
   "outputs": [],
   "source": [
    "df['log_inc']=np.log(df['annual_inc'])\n",
    "df['log_loan_amnt']=np.log(df['loan_amnt'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9833b97",
   "metadata": {
    "id": "c9833b97"
   },
   "source": [
    "One of useful variable transformation technique is one-hot encoding. We have some categorical dataset such as term, interest rate, grade, sub_grade, and purpose of loan. For example, let's transform purpose of loan into one-hot encoded variables. Note that one of one-hot encoded variables is omitted because of exact collinearity in analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "_XNFQfx8E-ka",
   "metadata": {
    "id": "_XNFQfx8E-ka"
   },
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import LabelEncoder\n",
    "le = LabelEncoder()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "dab5ea69",
   "metadata": {
    "id": "dab5ea69",
    "outputId": "14aaa3ce-070e-4f71-9051-c48313965e40"
   },
   "outputs": [],
   "source": [
    "df['purpose_trans']=le.fit_transform(df['purpose'])\n",
    "dummies = pd.get_dummies(df['purpose_trans'], prefix='purpose')\n",
    "df = pd.concat([df, dummies], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e39cc2e",
   "metadata": {
    "id": "0e39cc2e"
   },
   "source": [
    "The remainings are for same transformation on all categorical variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "affdb112",
   "metadata": {
    "id": "affdb112",
    "outputId": "a5c3929a-8f69-4546-e7e8-1a3e3387fe18"
   },
   "outputs": [],
   "source": [
    "df['term_trans']=le.fit_transform(df['term'])\n",
    "dummies_term = pd.get_dummies(df['term_trans'], prefix='term')\n",
    "df = pd.concat([df, dummies_term], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "95fe48d3",
   "metadata": {
    "id": "95fe48d3",
    "outputId": "9d4a39e5-78a1-4c23-ab6f-e76e793df9bf"
   },
   "outputs": [],
   "source": [
    "df['grade_trans']=le.fit_transform(df['grade'])\n",
    "dummies_grade = pd.get_dummies(df['grade_trans'], prefix='grade')\n",
    "df = pd.concat([df, dummies_grade], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "efc10083",
   "metadata": {
    "id": "efc10083"
   },
   "source": [
    "You can delete some variables for several purpose. For example, to avoid the collinearity problem in the regression analysis, we omit one of one-hot encoded categorical variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e23c7e12",
   "metadata": {
    "id": "e23c7e12"
   },
   "outputs": [],
   "source": [
    "df = df.drop(['purpose_0'], axis=1)\n",
    "df = df.drop(['term_0'], axis=1)\n",
    "df = df.drop(['grade_0'], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a9b14a",
   "metadata": {
    "id": "f6a9b14a"
   },
   "source": [
    "## 3. Data Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2d03586",
   "metadata": {
    "id": "c2d03586"
   },
   "source": [
    "We will use various features defined the above section including all one-hot encoded variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "c73050ef",
   "metadata": {
    "id": "c73050ef",
    "outputId": "7e74949b-2ffc-4166-8342-80cc2854d1bb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 30516 entries, 0 to 32767\n",
      "Data columns (total 40 columns):\n",
      " #   Column                Non-Null Count  Dtype  \n",
      "---  ------                --------------  -----  \n",
      " 0   loan_amnt             30516 non-null  float64\n",
      " 1   annual_inc            30516 non-null  float64\n",
      " 2   term                  30516 non-null  object \n",
      " 3   int_rate              30516 non-null  float64\n",
      " 4   installment           30516 non-null  float64\n",
      " 5   grade                 30516 non-null  object \n",
      " 6   sub_grade             30516 non-null  object \n",
      " 7   purpose               30516 non-null  object \n",
      " 8   loan_status           30516 non-null  object \n",
      " 9   dti                   30516 non-null  float64\n",
      " 10  last_fico_range_high  30516 non-null  float64\n",
      " 11  last_fico_range_low   30516 non-null  float64\n",
      " 12  total_acc             30516 non-null  float64\n",
      " 13  delinq_2yrs           30516 non-null  float64\n",
      " 14  emp_length            30516 non-null  object \n",
      " 15  log_inc               30516 non-null  float64\n",
      " 16  log_loan_amnt         30516 non-null  float64\n",
      " 17  purpose_trans         30516 non-null  int64  \n",
      " 18  purpose_1             30516 non-null  uint8  \n",
      " 19  purpose_2             30516 non-null  uint8  \n",
      " 20  purpose_3             30516 non-null  uint8  \n",
      " 21  purpose_4             30516 non-null  uint8  \n",
      " 22  purpose_5             30516 non-null  uint8  \n",
      " 23  purpose_6             30516 non-null  uint8  \n",
      " 24  purpose_7             30516 non-null  uint8  \n",
      " 25  purpose_8             30516 non-null  uint8  \n",
      " 26  purpose_9             30516 non-null  uint8  \n",
      " 27  purpose_10            30516 non-null  uint8  \n",
      " 28  purpose_11            30516 non-null  uint8  \n",
      " 29  purpose_12            30516 non-null  uint8  \n",
      " 30  purpose_13            30516 non-null  uint8  \n",
      " 31  term_trans            30516 non-null  int64  \n",
      " 32  term_1                30516 non-null  uint8  \n",
      " 33  grade_trans           30516 non-null  int64  \n",
      " 34  grade_1               30516 non-null  uint8  \n",
      " 35  grade_2               30516 non-null  uint8  \n",
      " 36  grade_3               30516 non-null  uint8  \n",
      " 37  grade_4               30516 non-null  uint8  \n",
      " 38  grade_5               30516 non-null  uint8  \n",
      " 39  grade_6               30516 non-null  uint8  \n",
      "dtypes: float64(11), int64(3), object(6), uint8(20)\n",
      "memory usage: 5.5+ MB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "Bz2_gpRpDLCy",
   "metadata": {
    "id": "Bz2_gpRpDLCy"
   },
   "source": [
    "Since our objective is to predict whether a loan is well-paid or not, let's create a binary variable called 'bad'. We can define 'bad=0' if the loan_status is either 'Fully Paid' or 'Current', and 'bad=1' if the loan_status is not well-paid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "20d7d359",
   "metadata": {
    "id": "20d7d359"
   },
   "outputs": [],
   "source": [
    "df['bad'] = df['loan_status'].apply(lambda x: 0 if x == 'Fully Paid' or x == 'Current' else 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e22cf69",
   "metadata": {
    "id": "0e22cf69"
   },
   "source": [
    "Now, we can construct a binary logistic regression model to predict the probability of a loan transaction being classified as \"bad\" based on various independent variables such as borrower credit score, loan amount, debt-to-income ratio, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "332d6910",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 167
    },
    "id": "332d6910",
    "outputId": "7f1d912c-1895-4e13-d183-67ffa35bae10"
   },
   "outputs": [],
   "source": [
    "y = df['bad'].to_numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abcf7f77-bca3-4d73-bbf1-8e5b5a0c4d1e",
   "metadata": {},
   "source": [
    "Normalize 'log_inc' and 'log_loan_amnt' to ensure that the weight of each feature is evenly distributed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "61defc27-aa32-4e4c-87a2-d27bc10388fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['log_inc'] /= df['log_inc'].max()\n",
    "df['log_loan_amnt'] /= df['log_loan_amnt'].max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "8323649a",
   "metadata": {
    "id": "8323649a"
   },
   "outputs": [],
   "source": [
    "X = df[['log_inc','log_loan_amnt','purpose_1','purpose_2','purpose_3','purpose_4','purpose_5','purpose_6','purpose_7','purpose_8','purpose_9','purpose_10','purpose_11','purpose_12','purpose_13','term_1','grade_1','grade_2','grade_3','grade_4','grade_5','grade_6']].to_numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ipe9bl1KnAX",
   "metadata": {
    "id": "6ipe9bl1KnAX"
   },
   "source": [
    "Divide data into training data and inference data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "9nasIJUjKuu2",
   "metadata": {
    "id": "9nasIJUjKuu2"
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, stratify=y, random_state=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1ab62f0",
   "metadata": {
    "id": "e1ab62f0"
   },
   "source": [
    "Let's import HEaaN-SDK to conduct logistic regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "ZbS10J32Fiou",
   "metadata": {
    "id": "ZbS10J32Fiou"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HEaaN-SDK uses CUDA v11.5 (> v11.2)\n"
     ]
    }
   ],
   "source": [
    "import heaan_sdk\n",
    "\n",
    "context = heaan_sdk.Context(\n",
    "    parameter=heaan_sdk.HEParameter.from_preset(\"FGb\"),\n",
    "    key_dir_path=\"./keys\",\n",
    "    load_keys=\"all\",\n",
    "    generate_keys=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc3deebc",
   "metadata": {},
   "source": [
    "Let us set hyperparameters first and then encrypt train data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "mVr0RGUk07nT",
   "metadata": {
    "id": "mVr0RGUk07nT"
   },
   "outputs": [],
   "source": [
    "num_epoch = 10\n",
    "learning_rate = 1.0\n",
    "batch_size = 1024\n",
    "optimizer = \"sgd\"\n",
    "lr_scheduler = \"constant\"\n",
    "activation = \"sigmoid_wide\"\n",
    "classes = sorted([val for val in df['bad'].unique()])\n",
    "num_feature = len(X[0])\n",
    "unit_shape = (batch_size, context.num_slots // batch_size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "7ec92e24",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data = heaan_sdk.ml.preprocessing.encode_train_data(context, X_train, y_train, unit_shape, dtype=\"classification\", path=\"./training\")\n",
    "train_data.encrypt()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8faccd2",
   "metadata": {},
   "source": [
    "Now, create model and encrypt that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "ucSv6iWI1CsJ",
   "metadata": {
    "id": "ucSv6iWI1CsJ"
   },
   "outputs": [],
   "source": [
    "model = heaan_sdk.ml.LogisticRegression(context, unit_shape, num_feature, classes, path=\"./model\")\n",
    "model.encrypt()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47819a7f",
   "metadata": {},
   "source": [
    "If GPU is available, send model to GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "cf97a6c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.to_device()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19d19978",
   "metadata": {},
   "source": [
    "To train data, use `fit()` of model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "e-BnVsg-1Sk0",
   "metadata": {
    "id": "e-BnVsg-1Sk0"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Epoch 0: 100%|██████████| 24/24 [00:42<00:00,  1.78s/it]\n",
      "Epoch 1: 100%|██████████| 24/24 [00:30<00:00,  1.25s/it]\n",
      "Epoch 2: 100%|██████████| 24/24 [00:29<00:00,  1.24s/it]\n",
      "Epoch 3: 100%|██████████| 24/24 [00:30<00:00,  1.25s/it]\n",
      "Epoch 4: 100%|██████████| 24/24 [00:29<00:00,  1.22s/it]\n",
      "Epoch 5: 100%|██████████| 24/24 [00:29<00:00,  1.22s/it]\n",
      "Epoch 6: 100%|██████████| 24/24 [00:29<00:00,  1.22s/it]\n",
      "Epoch 7: 100%|██████████| 24/24 [00:29<00:00,  1.22s/it]\n",
      "Epoch 8: 100%|██████████| 24/24 [00:29<00:00,  1.22s/it]\n",
      "Epoch 9: 100%|██████████| 24/24 [00:29<00:00,  1.22s/it]\n"
     ]
    }
   ],
   "source": [
    "model.fit(\n",
    "    train_data,\n",
    "    lr=learning_rate,\n",
    "    num_epoch=num_epoch,\n",
    "    batch_size=batch_size,\n",
    "    optimizer=optimizer,\n",
    "    lr_scheduler=lr_scheduler,\n",
    "    activation=activation,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2fa7539",
   "metadata": {},
   "source": [
    "Now the training is over. Let's decrypt the trained model and look at it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "9WLYiSusHoWq",
   "metadata": {
    "id": "9WLYiSusHoWq"
   },
   "outputs": [],
   "source": [
    "model.to_host()\n",
    "model.decrypt()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "_m257y3G1Zzk",
   "metadata": {
    "id": "_m257y3G1Zzk"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "========== model description ==========\n",
       "path: model\n",
       "epoch_state: 10\n",
       "theta: [[-2.49783586  1.59195499 -0.2623614  -0.3699792  -0.00835056 -0.48050553\n",
       "  -0.84698687 -0.10585963 -0.47300105 -0.09100378 -0.3519928  -0.10423002\n",
       "   0.08120644 -0.27560361 -0.18446809 -0.16446498  0.81136428  1.39605765\n",
       "   1.75279313  2.18702234  2.54425704  2.97494707 -2.3168639 ]]\n",
       "======================================="
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "Tcp00__RJAPX",
   "metadata": {
    "id": "Tcp00__RJAPX"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>13</th>\n",
       "      <th>14</th>\n",
       "      <th>15</th>\n",
       "      <th>16</th>\n",
       "      <th>17</th>\n",
       "      <th>18</th>\n",
       "      <th>19</th>\n",
       "      <th>20</th>\n",
       "      <th>21</th>\n",
       "      <th>22</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-2.497836</td>\n",
       "      <td>1.591955</td>\n",
       "      <td>-0.262361</td>\n",
       "      <td>-0.369979</td>\n",
       "      <td>-0.008351</td>\n",
       "      <td>-0.480506</td>\n",
       "      <td>-0.846987</td>\n",
       "      <td>-0.10586</td>\n",
       "      <td>-0.473001</td>\n",
       "      <td>-0.091004</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.275604</td>\n",
       "      <td>-0.184468</td>\n",
       "      <td>-0.164465</td>\n",
       "      <td>0.811364</td>\n",
       "      <td>1.396058</td>\n",
       "      <td>1.752793</td>\n",
       "      <td>2.187022</td>\n",
       "      <td>2.544257</td>\n",
       "      <td>2.974947</td>\n",
       "      <td>-2.316864</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         0         1         2         3         4         5         6   \\\n",
       "0 -2.497836  1.591955 -0.262361 -0.369979 -0.008351 -0.480506 -0.846987   \n",
       "\n",
       "        7         8         9   ...        13        14        15        16  \\\n",
       "0 -0.10586 -0.473001 -0.091004  ... -0.275604 -0.184468 -0.164465  0.811364   \n",
       "\n",
       "         17        18        19        20        21        22  \n",
       "0  1.396058  1.752793  2.187022  2.544257  2.974947 -2.316864  \n",
       "\n",
       "[1 rows x 23 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.to_dataframe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "RspK5HS6K7KQ",
   "metadata": {
    "id": "RspK5HS6K7KQ"
   },
   "source": [
    "For the inference, encrypt inference data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "r4hqFSiAK-WX",
   "metadata": {
    "id": "r4hqFSiAK-WX"
   },
   "outputs": [],
   "source": [
    "test_data_feature = heaan_sdk.HEMatrix.encode_encrypt(context, X_test, unit_shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5526f5d",
   "metadata": {},
   "source": [
    "If GPU is available, send test data to GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "39fc8f7f",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data_feature.to_device()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32e0d02e",
   "metadata": {},
   "source": [
    "To inference data, use `predict()` of model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "a6a138a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "output_binary = model.predict(test_data_feature)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1de06e0b",
   "metadata": {},
   "source": [
    "Let's decrypt the inference and look at the result of model performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "7cfd6d23",
   "metadata": {},
   "outputs": [],
   "source": [
    "output_binary.to_host()\n",
    "output_arr_binary = output_binary.decrypt_decode()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "123590fd-4b33-4b10-9773-25abc8e37a64",
   "metadata": {},
   "source": [
    "The output consists of values before conversion to probability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "wGII6dP1LakD",
   "metadata": {
    "id": "wGII6dP1LakD"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test accuracy:  86.50%\n"
     ]
    }
   ],
   "source": [
    "threshold = 0.5\n",
    "probs = 1 / (1 + np.exp(-output_arr_binary))\n",
    "probs = probs.squeeze()\n",
    "preds = probs > threshold\n",
    "correct_cnt = (preds == y_test).sum()\n",
    "acc = correct_cnt / len(y_test)\n",
    "print(f\"Test accuracy: {acc * 100: .2f}%\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4695ae08",
   "metadata": {},
   "source": [
    "According to the result, the out-of-sample test of our model has 86.50% accuracy. This means that it can distinguish whether a borrower causes a bad transaction with an accuracy of about 86.50%."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [
    "164febb8"
   ],
   "provenance": []
  },
  "kernelspec": {
   "display_name": "heaansdk-230317",
   "language": "python",
   "name": "heaansdk-1dd88bf-2e47166"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}