Homomorphic Encryption

Building Privacy-Preserving ML Models with Zama Concrete ML

Learn how to convert existing Scikit-Learn and PyTorch models into FHE-compatible versions for secure inference on fully encrypted datasets.

SecurityAdvanced15 min read

In this article

The Trust Gap: Why Homomorphic Encryption Matters

Bridging the Gap Between Math and Machine Learning

Migrating Scikit-Learn Models to the FHE Domain

The Role of Quantization and Bit-Width

Translating PyTorch Neural Networks for Secure Inference

Quantization-Aware Training (QAT)

Operational Constraints and Performance Optimization

The Client-Server Handshake

Conclusion: The Future of Privacy-Preserving AI

The Trust Gap: Why Homomorphic Encryption Matters

In traditional cloud computing, data security is often treated as a binary state of being either in transit or at rest. We use robust encryption protocols like TLS to protect information as it travels and AES to secure it while it sits on a disk. However, a significant vulnerability exists the moment that data needs to be processed by a machine learning model on a remote server.

To perform a calculation, a server must typically decrypt the data into its raw plaintext form within the system memory. This creates a window of exposure where an administrative insider, a compromised kernel, or a sophisticated side-channel attack could potentially exfiltrate sensitive user information. Fully Homomorphic Encryption or FHE solves this by allowing mathematical operations to be performed directly on the encrypted ciphertext.

The fundamental promise of FHE is that a client can send encrypted data to a server, the server can run a model on that data, and the result remains encrypted throughout the entire lifecycle. Only the client who holds the private key can ever see the final prediction. This architecture effectively transforms the cloud into a zero-trust environment where the service provider processes data they cannot see.

Homomorphic encryption represents the final frontier of data privacy, shifting the security burden from the network perimeter to the mathematical structure of the data itself.

Bridging the Gap Between Math and Machine Learning

Applying FHE to machine learning is not a straightforward task because FHE schemes are mathematically restricted. Most efficient schemes only natively support additions and multiplications of integers or fixed-precision polynomials. This is a far cry from the complex floating-point operations and non-linear activation functions that modern neural networks rely on.

To bridge this gap, we must translate our familiar ML models into a format that the FHE compiler can understand. This involves a process called quantization where continuous floating-point weights are mapped to discrete integer values. By discretizing the model, we ensure that the computations stay within the bounds of the FHE scheme without requiring excessive memory or processing time.

Migrating Scikit-Learn Models to the FHE Domain

For many developers, the easiest entry point into secure computation is through the Scikit-Learn ecosystem. Frameworks like Concrete ML provide drop-in replacements for standard estimators such as Logistic Regression, Linear Regression, and Random Forests. These wrappers encapsulate the complex logic of key management and circuit compilation behind a familiar interface.

The migration process typically starts with a standard training phase using plaintext data. Since FHE only impacts the inference stage, you can train your model using your existing pipelines and datasets. Once the model is trained, it undergoes a transformation into an FHE-compatible version that uses integer arithmetic to simulate the original logic.

pythonSecure Logistic Regression for Patient Diagnostics

1from concrete.ml.sklearn import LogisticRegression
2from sklearn.datasets import load_breast_cancer
3from sklearn.model_selection import train_test_split
4
5# 1. Load a realistic healthcare dataset
6data = load_breast_cancer()
7X, y = data.data, data.target
8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
9
10# 2. Instantiate the FHE-compatible model
11# n_bits determines the precision of the quantization
12model = LogisticRegression(n_bits=8)
13
14# 3. Standard training on plaintext data
15model.fit(X_train, y_train)
16
17# 4. Compile the model into an FHE circuit
18# This step generates the cryptographic parameters and keys
19model.compile(X_train)
20
21# 5. Execute inference on encrypted data
22# The input X_test is encrypted before being sent to the model logic
23y_pred_fhe = model.predict(X_test, execute_in_fhe=True)
24
25print(f"FHE Prediction for first sample: {y_pred_fhe[0]}")

The Role of Quantization and Bit-Width

In the code example above, the n_bits parameter is the most critical configuration for your FHE model. It defines how many bits are used to represent the weights and activations of your model after quantization. Choosing a value that is too low can lead to a significant drop in accuracy, while a value that is too high increases the complexity of the FHE circuit.

Most real-world scenarios find a sweet spot between 4 and 8 bits. At 8 bits, a Logistic Regression model can often match the accuracy of its 32-bit floating-point counterpart within a fraction of a percent. This trade-off is essential because FHE performance scales exponentially with the depth of the circuit and the bit-width of the integers being processed.

Translating PyTorch Neural Networks for Secure Inference

While linear models are effective for simple tabular data, more complex tasks like image recognition or natural language processing require the power of PyTorch. Converting a PyTorch model to FHE involves a more nuanced workflow compared to Scikit-Learn. The model must be compiled through an ONNX intermediary to map the computation graph to FHE primitives.

One of the primary challenges in deep learning for FHE is the handling of non-linear activation functions like ReLU or Sigmoid. In Zama's Concrete ML, these are handled via Programmable Bootstrapping, which uses Look-Up Tables to evaluate any univariate function on encrypted data. This allows for a much broader range of architectures than traditional polynomial approximations.

pythonConverting a PyTorch CNN to FHE

1import torch
2import torch.nn as nn
3from concrete.ml.torch.compile import compile_torch_model
4
5# Define a simple CNN for MNIST-style classification
6class SimpleCNN(nn.Module):
7    def __init__(self):
8        super().__init__()
9        self.conv = nn.Conv2d(1, 4, kernel_size=3)
10        self.fc = nn.Linear(4 * 26 * 26, 10)
11        self.relu = nn.ReLU()
12
13    def forward(self, x):
14        x = self.relu(self.conv(x))
15        x = x.view(x.size(0), -1)
16        return self.fc(x)
17
18torch_model = SimpleCNN()
19# Assume torch_model is already trained
20
21# Representative dataset for calibration during quantization
22input_shape = (1, 1, 28, 28)
23representative_data = torch.randn(100, 1, 28, 28)
24
25# Compile the PyTorch model to FHE
26# The compiler traces the graph and optimizes for TFHE operations
27fhe_model = compile_torch_model(
28    torch_model,
29    representative_data,
30    n_bits=4
31)
32
33# Simulate or execute the encrypted inference
34sample_input = torch.randn(1, 1, 28, 28)
35encrypted_output = fhe_model.forward(sample_input.numpy(), fhe="execute")

Quantization-Aware Training (QAT)

For deep neural networks, simply quantizing a pre-trained model after the fact can lead to significant errors. A better approach is Quantization-Aware Training, where the model is trained with the knowledge that its weights will eventually be discretized. This allows the gradient descent process to find weights that are robust to the precision loss inherent in 4-bit or 8-bit representations.

During QAT, the forward pass simulates quantization noise, forcing the network to learn features that remain stable even when rounded. When you finally export this model to FHE, the gap between the plaintext simulation and the encrypted reality is virtually non-existent. This technique is mandatory for high-performance applications like secure medical imaging or confidential credit scoring.

Operational Constraints and Performance Optimization

While FHE is technologically revolutionary, it comes with significant operational overhead that developers must manage carefully. Encrypted computations are several orders of magnitude slower than their plaintext counterparts. A prediction that takes milliseconds on a standard CPU might take seconds or even minutes when executed within an FHE circuit.

The primary driver of latency is the accumulation of cryptographic noise. Every time you perform a multiplication, the noise level within the ciphertext increases. If the noise exceeds a certain threshold, the data can no longer be decrypted correctly, necessitating a costly operation called bootstrapping to refresh the ciphertext and reset the noise levels.

Circuit Depth: Minimize the number of sequential multiplications to reduce noise growth and the frequency of bootstrapping.
Batching: Process multiple encrypted inputs simultaneously to improve the throughput of the server, even if individual latency remains high.
Hardware Acceleration: Utilize specialized FHE libraries that leverage GPUs or FPGAs to speed up polynomial arithmetic.
Key Size: Larger security parameters increase the size of the public and private keys, which can impact network bandwidth between client and server.

The Client-Server Handshake

Implementing FHE in production requires a specific architecture for key management. The client generates a secret key for decryption and a set of public evaluation keys that allow the server to perform operations. These evaluation keys can be quite large, sometimes reaching hundreds of megabytes depending on the complexity of the model circuit.

You must ensure that your deployment infrastructure can handle the transfer of these large keys and the encrypted payloads. Most production patterns involve the client sending the evaluation keys once per session and then streaming encrypted inference requests. This minimizes the setup overhead while maintaining the strict privacy guarantees that FHE provides.

Conclusion: The Future of Privacy-Preserving AI

Homomorphic encryption is transitioning from a theoretical cryptographic curiosity to a practical tool for software engineers. By leveraging frameworks like Concrete ML, developers can now protect sensitive data without needing a PhD in abstract algebra. The ability to convert existing Scikit-Learn and PyTorch models ensures that the massive progress in AI remains accessible even in highly regulated environments.

As hardware acceleration for FHE continues to mature, we can expect the performance gap to narrow further. This will unlock new use cases in decentralized finance, private genomic research, and confidential collaborative learning. The core mental shift is moving from a world where we trust the cloud provider to a world where we trust the mathematics of encryption.

When designing your next secure application, consider FHE not just as an encryption layer, but as a fundamental architectural choice. By implementing secure inference today, you are future-proofing your systems against the increasing demands for data sovereignty and user privacy. The tools are ready, the models are compatible, and the path to zero-trust AI is now clear.

Choosing Between BFV, BGV, CKKS, and TFHE Schemes Managing Noise and Bootstrapping in Fully Homomorphic Encryption