Tutorial.Rmd
# load heaan_sdk
library(tidyverse)
library(heaan.sdk.R)
import_heaan_sdk()
A context is an object containing information for homomorphic encryption. The context contains a parameter for HEaaN, keys for homomorphic encryption, and homevaluator of HeaaN for homomorphic operations.
A parameter determines accuracy and latency for homomorphic operations (latency is also depends on the device). HEaaN gives several parameter preset. In the tutorial, we use “FGb” parameter.
Homomorphic encryption keys are generated based on the parameter. If a ciphertext is encrypted by a keypack, then the ciphertext should be only operated by “that” keypack. Otherwise, the result becomes irrelevant.
The homevaluator of HEaaN ins a calculation tool for homomorphic operation. Every homomorphic operations except encryption and decryption require homevaluator.
A context contains everything for using HEaaN. So almost all objects of HEaaN_SDK require a context.
A context receives a configuration parameter, whether to create new keys for homomorphic encryption, and which type of keys to be loaded for usage. The options of load_keys are as follows:
If load_keys is "all", load both public keys and secret key.
If load_keys is "pk", load public keys only.
If load_keys is "enc", load encryption key only.
If load_keys is "enc_dec", load encryption key and decryption key(secret key).
If load_keys is "dec", load decryption key(secret key) only.
# set key_dir_path
key_dir_path <- "./keys"
# set parameter
params <- heaan_sdk.HEParameter("FGb")
# init context and load all keys
context <- heaan_sdk.Context(
params,
key_dir_path = key_dir_path,
load_keys = "all",
generate_keys = TRUE
)
An HESeries is a one-dimensional array corresponds to a numeric
vector and can also be created from a numeric vector, and can also be
converted back to a numeric vector. HESeries.from_series()
generates an HESeries from a numeric vector, and
to_series()
of an HESeries makes a numeric vector from data
of the HESeries.
An HESeries created by HESeries.from_series()
is
initially in plaintext at the beginning. It is remained in plaintext
until encryption command is executed.
series <- c(1L, 2L, 3L, 4L)
# vector to HESeries
column <- HESeries.from_series(context, series)
# HESeries to vector
res <- column %>% to_series()
print(res)
Use +
, -
, and *
to perform
addition, subtraction, and multiplication, respectively.
series1 <- c(1, 2, 3, 4, 5)
series2 <- c(2, 3, 4, 2, 3)
column1 <- HESeries.from_series(context, series1)
column2 <- HESeries.from_series(context, series2)
series_1 <- column1 %>% to_series()
series_2 <- column2 %>% to_series()
add_column <- column1 + column2
print(add_column %>% to_series())
sub_column <- column1 - column2
print(sub_column %>% to_series())
mult_column <- column1 * column2
print(mult_column %>% to_series())
Recall that an HESeries created by
HESeries.from_series()
is in plaintext at the beginning.
Use encrypt()
to encrypt the HESeries. If you want to
decrypt an encrypted HESeries, perform decrypt()
.
series <- c(1L, 2L, 3L, 4L)
column <- HESeries.from_series(context, series)
# encrypt
column %>% encrypt()
# decrypt
column %>% decrypt()
Let’s encrypt two different HESeries.
series1 <- c(1, 2, 3, 4, 5)
series2 <- c(2, 3, 4, 2, 3)
column1 <- HESeries.from_series(context, series1)
column2 <- HESeries.from_series(context, series2)
column1 %>% encrypt()
column2 %>% encrypt()
Now we are going to add two encrypted HESeries. The way to add them
is the same as for plaintext. Put +
between the two
HESeries. If you want to see the result, you need to decrypt it
first.
add_column <- column1 + column2
add_column %>% decrypt()
result <- add_column %>% to_series()
print(result)
Note that the two HESeries, column1 and column2, are still encrypted.
This is because we did not decrypt either column1 or column2. The
encrypted
property of an HESeries, indicates whether it is
encrypted or not.
column1$encrypted()
Let’s decrypt column1.
Then column1 is ciphertext but column2 is plaintext.
print(paste("Is column1 encrypted?", column1$encrypted))
print(paste("Is column2 encrypted?", column2$encrypted))
Let’s calculate the multiplication of column1 and column2. Even if column2 is plaintext, the result is encrypted because column1 is encrypted. So we need to decrypt the result if we want to see the result.
mult_column <- column1 * column2
print(paste("Is mult_column encrypted? ", mult_column$encrypted))
mult_column %>% decrypt()
mul <- mult_column %>% to_series()
print(mul)
You can set an HESeries to an element of the HEFrame.
An HEFrame is a two-dimensional data type corresponding to a data
frame. It can also be created from a data frame, and can be converted
back to a data frame. HEFrame.from_frame()
generates an
HEFrame from a data frame, and to_frame()
of an HEFrame
makes a data frame from data of the HEFrame.
An HEFrame created by HEFrame.from_frame()
is in
plaintext at the beginning.
df <- data.frame(A = c(1L, 2L, 3L, 4L, 5L), B = c(2L, 3L, 4L, 2L, 3L))
# data frame to HEFrame
hf <- HEFrame.from_frame(context, df)
# HEFrame to data frame
res <- hf %>% to_frame()
print(res)
info
of an HEFrame shows information of the HEFrame.
hf$info
You can select the column of an HEFrame using column name. The type of columns of an HEFrame is the HESeries class.
HEaaN.STAT provides encrypt()
and decrypt()
of an HEFrame, just like the case of HESeries. An HEFrame is the
gathering of HESeries, and encrypt()
or
decrypt()
of an HEFrame encrypt or decrypt all HESeries in
the HEFrame. If it needs to encrypt or decrypt some specific HESeries,
use encrypt()
or decrypt()
of the
HESeries.
Basic operations can be done using columns and the results can be set as new columns.
hf %>% mutate(hf["A"] + hf["B"], col_name = "A+B")
hf %>% mutate(hf["A"] - hf["B"], col_name = "A-B")
hf %>% mutate(hf["A"] * hf["B"], col_name = "A*B")
The results can be seen after the HEFrame is decrypted.
Obtaining Statistical Values of Encrypted Data using HEaaN-SDK
First import heaan_sdk and others.
# load heaan_sdk
library(tidyverse)
library(heaan.sdk.R)
import_heaan_sdk()
Create a context. The option of generate_keys
is set to
False since we are going to use the keys which are created in Tutorial
01_basic_operations. Change the option to TRUE
if you want
to generate new homomorphic encryption keys.
# set key_dir_path
key_dir_path <- "./keys"
# set parameter
params <- heaan_sdk.HEParameter("FGb")
# init context and load all keys
context <- heaan_sdk.Context(
params,
key_dir_path = key_dir_path,
load_keys = "all",
generate_keys = FALSE
)
Let’s create an HEFrame from data. We use iris data. We will treat features as numeric and the species as category.
# from data frame to HEFrame (encode)
hf <- HEFrame.from_frame(context, tb)
# encrypt
hf %>% encrypt()
info
of an HEFrame shows information of the HEFrame.
hf$info
Let’s calculate basic statistics of hf[“Sepal.Length”]
# sum, avg, var of clumn `Sepal.Length`
sum <- hf["Sepal.Length"] %>% sum()
avg <- hf["Sepal.Length"] %>% mean()
var <- hf["Sepal.Length"] %>% var()
To see the result values, it is neccessary to decrypt those.
sum %>% decrypt()
avg %>% decrypt()
var %>% decrypt()
print("SUM")
print(sum %>% to_series())
print("AVG")
print(avg %>% to_series())
print("VAR")
print(var %>% to_series())
HEaaN-SDK provides various statistical functions. Let’s calculate standard deviation, skewness, and kurtosis of hf[“Sepal.Length”].
# standard deviation
sd <- hf["Sepal.Length"] %>% sd()
# decrypt
sd %>% decrypt()
print("STANDARD DEVIATION")
print(sd %>% to_series())
# skewness
skewness <- hf["Sepal.Length"] %>% skewness()
# decrypt
skewness %>% decrypt()
print("SKEWNESS")
print(skewness %>% to_series())
# kurtosis
kurtosis <- hf["Sepal.Length"] %>% kurtosis()
# decrypt
kurtosis %>% decrypt()
print("KURTOSIS")
print(kurtosis %>% to_series())
Now let’s see how to calculate covariance and correlation between two encrypted column, hf[“Sepal.Length”], and hf[“Sepal.Width”].
Let’s calculate statistics of hf[“Sepal.Length”] and hf[“Sepal.Width”] with categorical column hf[“Species”].
species <- hf["Species"]
species_setosa <- hf["Species"] == "setosa"
species_virginica <- hf["Species"] == "virginica"
Let’s compute statistics with respect to the setosa and virginica.
number_of_iris <- species %>% count()
number_of_setosa <- species_setosa %>% count()
number_of_virginica <- species_virginica %>% count()
print("number of iris")
print(number_of_iris %>% decrypt() %>% to_series())
print("number of setosa")
print(number_of_setosa %>% decrypt() %>% to_series())
print("number of virginica")
print(number_of_virginica %>% decrypt() %>% to_series())
# Average of sepal length
setosa_sepal_length <- hf["Sepal.Length"] %>% filter(species_setosa) %>% mean()
virginica_sepal_length <- hf["Sepal.Length"] %>% filter(species_virginica) %>% mean()
# decrypt
setosa_sepal_length %>% decrypt()
virginica_sepal_length %>% decrypt()
print("The average of setosa's Sepal Length")
print(setosa_sepal_length %>% to_series())
print("The average of virginica's Sepal Length")
print(virginica_sepal_length %>% to_series())
# Average of sepal width
setosa_sepal_width <- hf["Sepal.Width"] %>%
filter(species_setosa) %>%
mean()
virginica_sepal_width <- hf["Sepal.Width"] %>%
filter(species_virginica) %>%
mean()
# decrypt
setosa_sepal_width %>% decrypt()
virginica_sepal_width %>% decrypt()
print("The average of setosa's Sepal Width")
print(setosa_sepal_width %>% to_series())
print("The average of virginica's Sepal Width")
print(virginica_sepal_width %>% to_series())
# Correlation of setosa's length and width
setosa_length <- hf["Sepal.Length"] %>% filter(species_setosa)
setosa_width <- hf["Sepal.Width"] %>% filter(species_setosa)
corr_setosa <- setosa_length %>% corr(setosa_width)
corr_setosa %>% decrypt()
print("CORRELATION of length and width")
print(corr_setosa %>% to_series())
HEaaN-SDK provides the calculation of confidence intervals and
hypothesis test. Let’s compute confidence interval of mean of
“Sepal.Length” of setosa. and test 5.84
could be mean of
“Sepal.Length”.
Let’s compute the 95% interval of mean of “Sepal.Length” of iris.
ttest <- t_test(hf["Sepal.Length"], mu = 5.84, conf.level = 0.95)
res <- ttest$statistic
confidence_interval <- ttest$conf.int
# t- value and degressof freedom
t_value <- (res %>% decrypt() %>% to_series())[1]
df <- (res %>% decrypt() %>% to_series())[2]
# confidence interval
conf_int <- confidence_interval %>% decrypt() %>% to_series()
print("T value")
print(t_value)
print("Degrees of freedom")
print(df)
print("Confidence Interval")
print(conf_int)
We introduce how to train and inference a Logistic Regression model on Encrypted data using HEaaN-SDK. For this, import heaan_sdk.
library(tidyverse)
library(heaan.sdk.R)
import_heaan_sdk()
As before, create a context, load public keypack that we generated in Tutorial 01, and generate Homomorphic Evaluators.
# set test directories
test_dir <- "lr_tutorial"
key_dir_path <- "./keys"
train_data_path <- file.path(test_dir, "train_data")
model_path <- file.path(test_dir, "model")
test_data_path <- file.path(test_dir, "test_data")
report_path <- file.path(test_dir, "report")
# set parameter
params <- heaan_sdk.HEParameter("FGb")
# init context and load all keys
context <- heaan_sdk.Context(
params,
key_dir_path = key_dir_path,
load_keys = "all",
generate_keys = TRUE
)
For this tutorial, we use the iris dataset.
data(iris)
# use only two species of iris
species <- sample(c("setosa", "versicolor", "virginica"), 2)
tb <- iris[iris$Species %in% species, ]
tb$species <- ifelse(tb$Species == species[1], 0, 1)
# split data into train and test sets
split <- sample(nrow(tb), 0.8 * nrow(tb))
train <- tb[split, ]
test <- tb[-split, ]
test <- test[sample(nrow(test)), ]
X_train <- train[, 1:4]
y_train <- as.numeric(train[, 6])
X_test <- test[, 1:4]
y_test <- as.numeric(test[, 6])
# set hyperparameter
classes <- unique(y_train)
batch_size <- 32L
unit_shape <- c(batch_size,
as.integer(reticulate::py_to_r(context$num_slots) / batch_size))
activation <- "sigmoid_wide"
num_epoch <- 20L
learning_rate <- 1
optimizer <- "sgd"
# encode train data
train_data <- encode_train_data(
context, X_train, y_train,
unit_shape <- unit_shape,
dtype <- "classification",
path <- train_data_path
)
Set a Logistic Regression model to train.
# set model and initialize model parameters
model <- hml_logit(
context,
unit_shape = unit_shape,
num_feature = ncol(X_train),
classes = unique(y_train),
path = model_path
)
model$fit(
train_data,
num_epoch = num_epoch,
lr = learning_rate,
batch_size = batch_size,
optimizer = optimizer,
activation = activation
)
Let’s encrypt test data and predict it with out trained model
test_data_feature <- encode_encrypt(
context,
X_test,
unit_shape = unit_shape
)
# predict on CPU
output <- model %>% predict(test_data_feature)
output_arr <- output %>% decrypt_decode()
probs <- 1 / (1 + exp(-output_arr))
probs <- as.vector(probs)
auc <- as.double(pROC::roc(y_test, probs)$auc)
print("Test auc")
print(auc)