Tutorial • heaan.sdk.R

01_Basic_Operatation

Step 1. Import and load heaan.sdk.r

# load heaan_sdk
library(tidyverse)
library(heaan.sdk.R)
import_heaan_sdk()

Step 2. Create a context

A context is an object containing information for homomorphic encryption. The context contains a parameter for HEaaN, keys for homomorphic encryption, and homevaluator of HeaaN for homomorphic operations.

A parameter determines accuracy and latency for homomorphic operations (latency is also depends on the device). HEaaN gives several parameter preset. In the tutorial, we use “FGb” parameter.

Homomorphic encryption keys are generated based on the parameter. If a ciphertext is encrypted by a keypack, then the ciphertext should be only operated by “that” keypack. Otherwise, the result becomes irrelevant.

The homevaluator of HEaaN ins a calculation tool for homomorphic operation. Every homomorphic operations except encryption and decryption require homevaluator.

A context contains everything for using HEaaN. So almost all objects of HEaaN_SDK require a context.

A context receives a configuration parameter, whether to create new keys for homomorphic encryption, and which type of keys to be loaded for usage. The options of load_keys are as follows:

    If load_keys is "all", load both public keys and secret key.
    If load_keys is "pk", load public keys only.
    If load_keys is "enc", load encryption key only.
    If load_keys is "enc_dec", load encryption key and decryption key(secret key).
    If load_keys is "dec", load decryption key(secret key) only.

# set key_dir_path
key_dir_path <- "./keys"

# set parameter
params <- heaan_sdk.HEParameter("FGb")

# init context and load all keys
context <- heaan_sdk.Context(
    params,
    key_dir_path = key_dir_path,
    load_keys = "all",
    generate_keys = TRUE
)

Step 3. Input data

An HESeries is a one-dimensional array corresponds to a numeric vector and can also be created from a numeric vector, and can also be converted back to a numeric vector. HESeries.from_series() generates an HESeries from a numeric vector, and to_series() of an HESeries makes a numeric vector from data of the HESeries.

An HESeries created by HESeries.from_series() is initially in plaintext at the beginning. It is remained in plaintext until encryption command is executed.

series <- c(1L, 2L, 3L, 4L)

# vector to HESeries
column <- HESeries.from_series(context, series)

# HESeries to vector
res <- column %>% to_series()

print(res)

Use +, -, and * to perform addition, subtraction, and multiplication, respectively.

series1 <- c(1, 2, 3, 4, 5)
series2 <- c(2, 3, 4, 2, 3)

column1 <- HESeries.from_series(context, series1)
column2 <- HESeries.from_series(context, series2)

series_1 <- column1 %>% to_series()
series_2 <- column2 %>% to_series()

add_column <- column1 + column2
print(add_column %>% to_series())

sub_column <- column1 - column2
print(sub_column %>% to_series())

mult_column <- column1 * column2
print(mult_column %>% to_series())

Step 4. Encrypt data and perform homomorphic operations

Recall that an HESeries created by HESeries.from_series() is in plaintext at the beginning. Use encrypt() to encrypt the HESeries. If you want to decrypt an encrypted HESeries, perform decrypt().

series <- c(1L, 2L, 3L, 4L)
column <- HESeries.from_series(context, series)

# encrypt
column %>% encrypt()

# decrypt
column %>% decrypt()

Let’s encrypt two different HESeries.

series1 <- c(1, 2, 3, 4, 5)
series2 <- c(2, 3, 4, 2, 3)

column1 <- HESeries.from_series(context, series1)
column2 <- HESeries.from_series(context, series2)

column1 %>% encrypt()
column2 %>% encrypt()

Now we are going to add two encrypted HESeries. The way to add them is the same as for plaintext. Put + between the two HESeries. If you want to see the result, you need to decrypt it first.

add_column <- column1 + column2
add_column %>% decrypt()
result <- add_column %>% to_series()

print(result)

Note that the two HESeries, column1 and column2, are still encrypted. This is because we did not decrypt either column1 or column2. The encrypted property of an HESeries, indicates whether it is encrypted or not.

column1$encrypted()

Let’s decrypt column1.

column1 %>% decrypt()

Then column1 is ciphertext but column2 is plaintext.

print(paste("Is column1 encrypted?", column1$encrypted))
print(paste("Is column2 encrypted?", column2$encrypted))

Let’s calculate the multiplication of column1 and column2. Even if column2 is plaintext, the result is encrypted because column1 is encrypted. So we need to decrypt the result if we want to see the result.

mult_column <- column1 * column2
print(paste("Is mult_column encrypted? ", mult_column$encrypted))

mult_column %>% decrypt()
mul <- mult_column %>% to_series()

print(mul)

You can set an HESeries to an element of the HEFrame.

c <- c(1, 2, 1, 1, 2)
new_col <- HESeries.from_series(context, c, name = "C")

hf %>% mutate(new_col, col_name = "C")
hf$info
hf %>% to_frame()

Step 5. Input data using HEFrame

An HEFrame is a two-dimensional data type corresponding to a data frame. It can also be created from a data frame, and can be converted back to a data frame. HEFrame.from_frame() generates an HEFrame from a data frame, and to_frame() of an HEFrame makes a data frame from data of the HEFrame.

An HEFrame created by HEFrame.from_frame() is in plaintext at the beginning.

df <- data.frame(A = c(1L, 2L, 3L, 4L, 5L), B = c(2L, 3L, 4L, 2L, 3L))

# data frame to HEFrame
hf <- HEFrame.from_frame(context, df)

# HEFrame to data frame
res <- hf %>% to_frame()

print(res)

info of an HEFrame shows information of the HEFrame.

hf$info

You can select the column of an HEFrame using column name. The type of columns of an HEFrame is the HESeries class.

column_a <- hf["A"]

print(paste("Which type is the column: ", column_a$col_type))
column_a %>% to_series()

c <- c(1, 2, 1, 1, 2)
new_col <- HESeries.from_series(context, c, name = "C")

hf %>% mutate(new_col, col_name = "C")
hf$info
hf %>% to_frame()

Step 6. Encrypt HEFrame and perform homomorphic operations.

HEaaN.STAT provides encrypt() and decrypt() of an HEFrame, just like the case of HESeries. An HEFrame is the gathering of HESeries, and encrypt() or decrypt() of an HEFrame encrypt or decrypt all HESeries in the HEFrame. If it needs to encrypt or decrypt some specific HESeries, use encrypt() or decrypt() of the HESeries.

# encrypt
hf %>% encrypt()

# decrypt
hf %>% decrypt()

Basic operations can be done using columns and the results can be set as new columns.

hf %>% mutate(hf["A"] + hf["B"], col_name = "A+B")
hf %>% mutate(hf["A"] - hf["B"], col_name = "A-B")
hf %>% mutate(hf["A"] * hf["B"], col_name = "A*B")

The results can be seen after the HEFrame is decrypted.

hf %>% decrypt()

hf %>% to_frame()

02_Stats

Obtaining Statistical Values of Encrypted Data using HEaaN-SDK

Step 1. Import HEaaN-SDK and create Context and HEFrame

First import heaan_sdk and others.

# load heaan_sdk
library(tidyverse)
library(heaan.sdk.R)
import_heaan_sdk()

Create a context. The option of generate_keys is set to False since we are going to use the keys which are created in Tutorial 01_basic_operations. Change the option to TRUE if you want to generate new homomorphic encryption keys.

# set key_dir_path
key_dir_path <- "./keys"

# set parameter
params <- heaan_sdk.HEParameter("FGb")

# init context and load all keys
context <- heaan_sdk.Context(
    params,
    key_dir_path = key_dir_path,
    load_keys = "all",
    generate_keys = FALSE
)

Let’s create an HEFrame from data. We use iris data. We will treat features as numeric and the species as category.

data(iris)
tb <- iris
tb$Species <- as.factor(tb$Species)
str(tb)

# from data frame to HEFrame (encode)
hf <- HEFrame.from_frame(context, tb)

# encrypt
hf %>% encrypt()

info of an HEFrame shows information of the HEFrame.

hf$info

Step 2. Descriptive statistics

Let’s calculate basic statistics of hf[“Sepal.Length”]

# sum, avg, var of clumn `Sepal.Length`
sum <- hf["Sepal.Length"] %>% sum()
avg <- hf["Sepal.Length"] %>% mean()
var <- hf["Sepal.Length"] %>% var()

To see the result values, it is neccessary to decrypt those.

sum %>% decrypt()
avg %>% decrypt()
var %>% decrypt()

print("SUM")
print(sum %>% to_series())

print("AVG")
print(avg %>% to_series())

print("VAR")
print(var %>% to_series())

HEaaN-SDK provides various statistical functions. Let’s calculate standard deviation, skewness, and kurtosis of hf[“Sepal.Length”].

# standard deviation
sd <- hf["Sepal.Length"] %>% sd()

# decrypt
sd %>% decrypt()

print("STANDARD DEVIATION")
print(sd %>% to_series())

# skewness
skewness <- hf["Sepal.Length"] %>% skewness()

# decrypt
skewness %>% decrypt()

print("SKEWNESS")
print(skewness %>% to_series())

# kurtosis
kurtosis <- hf["Sepal.Length"] %>% kurtosis()

# decrypt
kurtosis %>% decrypt()

print("KURTOSIS")
print(kurtosis %>% to_series())

Step 3. Analysis on two varaibles

Now let’s see how to calculate covariance and correlation between two encrypted column, hf[“Sepal.Length”], and hf[“Sepal.Width”].

# covariance
cov <- hf["Sepal.Length"] %>% cov(hf["Sepal.Width"])

# decrypt
cov %>% decrypt()

print("COVARIANCE")
print(cov %>% to_series())

# Pearson correlation coefficient
corr <- hf["Sepal.Length"] %>% corr(hf["Sepal.Width"])

# decrypt
corr %>% decrypt()

print("CORRELATION")
print(corr %>% to_series())

Step 4. Categorical data Analysis

Let’s calculate statistics of hf[“Sepal.Length”] and hf[“Sepal.Width”] with categorical column hf[“Species”].

species <- hf["Species"]
species_setosa <- hf["Species"] == "setosa"
species_virginica <- hf["Species"] == "virginica"

Let’s compute statistics with respect to the setosa and virginica.

number_of_iris <- species %>% count()
number_of_setosa <- species_setosa %>% count()
number_of_virginica <- species_virginica %>% count()

print("number of iris")
print(number_of_iris %>% decrypt() %>% to_series())

print("number of setosa")
print(number_of_setosa %>% decrypt() %>% to_series())

print("number of virginica")
print(number_of_virginica %>% decrypt() %>% to_series())

# Average of sepal length
setosa_sepal_length <- hf["Sepal.Length"] %>% filter(species_setosa) %>% mean()
virginica_sepal_length <- hf["Sepal.Length"] %>% filter(species_virginica) %>% mean()

# decrypt
setosa_sepal_length %>% decrypt()
virginica_sepal_length %>% decrypt()

print("The average of setosa's Sepal Length")
print(setosa_sepal_length %>% to_series())

print("The average of virginica's Sepal Length")
print(virginica_sepal_length %>% to_series())

# Average of sepal width
setosa_sepal_width <- hf["Sepal.Width"] %>%
                      filter(species_setosa) %>%
                      mean()
virginica_sepal_width <- hf["Sepal.Width"] %>%
                      filter(species_virginica) %>%
                      mean()

# decrypt
setosa_sepal_width %>% decrypt()
virginica_sepal_width %>% decrypt()

print("The average of setosa's Sepal Width")
print(setosa_sepal_width %>% to_series())

print("The average of virginica's Sepal Width")
print(virginica_sepal_width %>% to_series())

# Correlation of setosa's length and width
setosa_length <- hf["Sepal.Length"] %>% filter(species_setosa)
setosa_width <- hf["Sepal.Width"] %>% filter(species_setosa)

corr_setosa <- setosa_length %>% corr(setosa_width)

corr_setosa %>% decrypt()

print("CORRELATION of length and width")
print(corr_setosa %>% to_series())

Step 5. Confidence interval and hypothesis test

HEaaN-SDK provides the calculation of confidence intervals and hypothesis test. Let’s compute confidence interval of mean of “Sepal.Length” of setosa. and test 5.84 could be mean of “Sepal.Length”.

Let’s compute the 95% interval of mean of “Sepal.Length” of iris.

ttest <- t_test(hf["Sepal.Length"], mu = 5.84, conf.level = 0.95)
res <- ttest$statistic
confidence_interval <- ttest$conf.int

# t- value and degressof freedom
t_value <- (res %>% decrypt() %>% to_series())[1]
df <- (res %>% decrypt() %>% to_series())[2]

# confidence interval
conf_int <- confidence_interval %>% decrypt() %>% to_series()

print("T value")
print(t_value)

print("Degrees of freedom")
print(df)

print("Confidence Interval")
print(conf_int)

03_Logistic Regression

We introduce how to train and inference a Logistic Regression model on Encrypted data using HEaaN-SDK. For this, import heaan_sdk.

Step 1. Import HEaaN-SDK and Create a Context

library(tidyverse)
library(heaan.sdk.R)
import_heaan_sdk()

As before, create a context, load public keypack that we generated in Tutorial 01, and generate Homomorphic Evaluators.

# set test directories
test_dir <- "lr_tutorial"
key_dir_path <- "./keys"
train_data_path <- file.path(test_dir, "train_data")
model_path <- file.path(test_dir, "model")
test_data_path <- file.path(test_dir, "test_data")
report_path <- file.path(test_dir, "report")

# set parameter
params <- heaan_sdk.HEParameter("FGb")

# init context and load all keys
context <- heaan_sdk.Context(
    params,
    key_dir_path = key_dir_path,
    load_keys = "all",
    generate_keys = TRUE
)

Step 2. Preprocessing Data

For this tutorial, we use the iris dataset.

    data(iris)
    # use only two species of iris
    species <- sample(c("setosa", "versicolor", "virginica"), 2)
    tb <- iris[iris$Species %in% species, ]
    tb$species <- ifelse(tb$Species == species[1], 0, 1)
    # split data into train and test sets
    split <- sample(nrow(tb), 0.8 * nrow(tb))
    train <- tb[split, ]
    test <- tb[-split, ]
    test <- test[sample(nrow(test)), ]
    X_train <- train[, 1:4]
    y_train <- as.numeric(train[, 6])
    X_test <- test[, 1:4]
    y_test <- as.numeric(test[, 6])

# set hyperparameter
classes <- unique(y_train)
batch_size <- 32L
unit_shape <- c(batch_size,
                as.integer(reticulate::py_to_r(context$num_slots) / batch_size))
activation <- "sigmoid_wide"
num_epoch <- 20L
learning_rate <- 1
optimizer <- "sgd"

# encode train data
train_data <- encode_train_data(
    context, X_train, y_train,
    unit_shape <- unit_shape,
    dtype <- "classification",
    path <- train_data_path
)

Step 3. Training Model

Set a Logistic Regression model to train.

# set model and initialize model parameters
model <- hml_logit(
    context,
    unit_shape = unit_shape,
    num_feature = ncol(X_train),
    classes = unique(y_train),
    path = model_path
)
model$fit(
    train_data,
    num_epoch = num_epoch,
    lr = learning_rate,
    batch_size = batch_size,
    optimizer = optimizer,
    activation = activation
)

Step 4. Predict

Let’s encrypt test data and predict it with out trained model

test_data_feature <- encode_encrypt(
    context,
    X_test,
    unit_shape = unit_shape
)

# predict on CPU
output <- model %>% predict(test_data_feature)

output_arr <- output %>% decrypt_decode()

probs <- 1 / (1 + exp(-output_arr))
probs <- as.vector(probs)
auc <- as.double(pROC::roc(y_test, probs)$auc)
print("Test auc")
print(auc)