Skip to main content

RedCap VMs: Adding application secure VM support to tiCrypt

· 6 min read
Alin Dobra
Alin Dobra
CEO & Co-founder

REDCap is a widely used platform for building and managing online surveys and databases. It serves two fundamentally different roles:

  1. Data collection: REDCap collects responses from patients, research participants, or other users through its survey interface, storing submissions in a secure database.
  2. Data management and analysis: Authorized users access the REDCap server and database directly to clean, analyze, and report on collected data, often exporting it to other tools for further processing.

This article focuses on enabling external data collection while securing the REDCap server and database with tiCrypt. Data management and analysis are straightforward since authorized users can work inside secure VMs in tiCrypt. The overarching goal is to achieve CMMC Level 2 compliance with reasonable effort and no security or compliance risks.

tiCrypt Functional Architecture

· 8 min read
Tera Insights Team
Tera Insights Team
Documentation Writer

tiCrypt is composed of several distinct components that communicate over tightly controlled channels. This article walks through the functional architecture of a tiCrypt deployment, explaining how requests flow from the user's workstation through the backend and into secure compute nodes. For infrastructure planning and deployment sizing, see Understanding tiCrypt Infrastructure.

Secure HPC Batch Processing with tiCrypt and SLURM

· 11 min read
Thomas Samant
Thomas Samant
Senior Partner

High-performance computing clusters are shared environments by nature. Dozens or hundreds of researchers submit jobs to the same nodes, access the same file systems, and rely on the same scheduler. That model works well for performance but creates serious challenges when the job or the data is sensitive.

The scope of what counts as sensitive has expanded. Controlled unclassified information (CUI), protected health information (PHI), and export-controlled research have long required special handling. Now AI workflows introduce new categories: training datasets contain proprietary or licensed content that cannot be exposed to other tenants or administrators. Trained models are intellectual property, not disposable artifacts. The training code itself may encode trade secrets, novel architectures, or fine-tuning techniques that are as valuable as the data they process. On a shared HPC cluster, all of these are visible to system administrators and potentially to other users.

This article explains how tiCrypt integrates with SLURM through a dual-scheduler architecture that separates resource allocation from secure execution.

Bare-Metal Support: Running tiCrypt+Slurm on Physical Hardware

· 9 min read
Alin Dobra
Alin Dobra
CEO & Co-founder

Motivation

Currently, tiCrypt only supports Secure Virtual Machines as compute nodes for Local Slurm. While this provides strong security guarantees, it may not be the best fit for all workloads since it adds virtualization overhead in the form of:

  • High startup cost: it takes 20-30 seconds to create and provision a VM as a Slurm node for Local Slurm
  • Virtualization cost: virtual machines add a 5-10% overhead to most computational tasks.
  • Degraded high-performance networking: virtualization reduces network throughput and increases latency, which is particularly damaging for MPI jobs.

To address these issues, we are planning to add support for bare-metal nodes in Local Slurm. This article explores the design and implementation of this new feature.

Existing Architecture

The Global Slurm manages overall cluster resources and ensures fairness among projects and users. The Local Slurm securely executes jobs inside secure enclaves. Users submit jobs to the Local Slurm, which transforms them into global jobs while keeping the details of execution (code and data) secret from the Global Slurm. The execution steps are as follows:

  1. The user submits a job to the Local Slurm, specifying the resources needed and the code to execute.
  2. The Local Slurm job gets intercepted (using Lua plugin) by ticrypt-vm-controller and transformed into a request for a global job to the tiCrypt Backend.
  3. The tiCrypt Backend creates a Global Slurm job (marked with the account of the user and the project).
  4. The Global Slurm schedules the job on a compute node and starts it.
  5. ticrypt-host-controller, the tiCrypt agent running on the compute nodes, receives the job (via the job submission script) and starts a secure VM for the job. The tiCrypt backend is notified of the VM creation to ensure it is handed over to the ticrypt-vm-controller for provisioning.
  6. The ticrypt-vm-controller starts slurmd on the VM, incorporating it into the Local Slurm cluster. The job then executes inside the secure VM under slurmd.
  7. Once the job finishes, ticrypt-vm-controller is notified by the Lua plugin and destroys the VM, removing it from the Local Slurm cluster.
  8. The Global Slurm job finishes, and the resources are released.

The existing architecture is illustrated in the figure below:

Slurm Architecture Diagram

Design of the New Feature

Challenges of Running on Bare-metal Nodes

While it might be tempting to use the traditional Slurm execution model on bare-metal nodes, job security would be significantly compromised. The following issues would arise:

  • Job isolation: without VMs, jobs would run directly on the host OS, which means that they would not be isolated from each other. This could lead to security issues, especially if one job is compromised and can access the data of another job.
  • Access to data: the current execution model makes data available only inside the secure VMs. Specifically, the VM provisioning sets up a VPN (based on StrongSwan) and mounts the data from the controlling VM running the Local Slurm controller. Since all the communication, including the file accesses, goes through the VPN, the data is secured from the infrastructure.
  • Protection from infrastructure and admins: without VMs, jobs would be exposed to the infrastructure and admins, which could lead to security issues, especially if the infrastructure is compromised.

Key Ideas for the New Feature

The following key ideas address these issues while preserving security:

  • Use of containers: instead of running jobs directly on the host OS, we can use containers to provide isolation between jobs. This would allow us to run jobs on bare-metal nodes while still providing some level of isolation. Specifically, we want to use Apptainer/Singularity, which is a container platform designed for HPC environments.
  • Use of the whole node: instead of sharing the node between multiple jobs, we can dedicate the whole node to a single job. This would provide better isolation and security, as well as better performance since there would be no contention for resources between jobs. This is a perfect model for MPI jobs, which require high-performance networking and low latency. It is extremely wasteful for small jobs.
  • Auto-provisioned security: to ensure that the jobs are still secure, automatic mechanisms need to be put in place to provision the security. No manual or admin control mechanism should be used, as such mechanisms could compromise security.
  • tiCrypt integration: The ticrypt-vm-controller and Local Slurm cannot tell the difference between a VM and a bare-metal container. As long as the container integrates the networking, tiCrypt registration and provisioning steps, the rest of the architecture is compatible with the new feature.
  • Networking for containers: to ensure integration with existing architecture, the br-secure network bridge used for the VMs should also be used for the containers. This would allow the containers to communicate with the Local Slurm controller and to access the data securely.

Overall Implementation Plan

The following components need to be enhanced to implement the new feature:

  1. Container image: a new container image needs to be created that includes slurmd, the ticrypt-vm-controller, StrongSwan and the software required to run Slurm jobs. This container image should be based on the existing VM images used for Local Slurm to ensure compatibility.
  2. Container provisioning: the ticrypt-host-controller needs to be enhanced to detect when the Slurm job requires --exclusive access to a node and to start a container instead of a VM. The container should be started so that it takes over the whole node.
  3. Networking: the container needs to be configured to use the br-secure network bridge to ensure secure communication with the Local Slurm controller and access to data.
  4. Container execution: the ticrypt-host-controller needs to
    • Execute the container with capabilities for network configuration, NFS mounts, and "fake-root" to allow user creation.
    • Use overlays to allow changes mandated by user management and Slurm job execution.
    • Write overlay data to a temporary encrypted drive (in case sensitive data is written to the overlay) and delete it after the job finishes. Do not save the encryption key to ensure the drive cannot be recovered after deletion.
    • Stop the container once the Global Slurm job finishes, similar to how VMs are stopped in the existing architecture.
    • Enhance the local job-tracking database to track both VMs and containers.
  5. Information propagation: The --exclusive flag needs to be propagated from the Local Slurm job submission to the Global Slurm via ticrypt-vm-controller and the tiCrypt Backend.

Security Considerations

Using containers instead of VMs does reduce the security guarantees, but the design choices outlined above mitigate the risks:

  • Blocking outgoing traffic: the container will use Open VSwitch based networking set up by the ticrypt-host-controller using exclusively the br-secure bridge. This network is already set up to prevent data exfiltration since it controls masquerading to only allowed IP+Port combinations. This is equivalent to the Libvirt solution for VMs, in which Libvirt is told to use br-secure, not to set up its own bridge+dnsmasq+masquerading rules. This is, by far, the most critical security measure since it prevents data exfiltration even if the container is compromised or if the user tries to exfiltrate data.
  • Whole-node containers: dedicating the whole node to a single job provides better isolation and security since there are no other jobs running at the same time. By scrubbing the node between jobs (mostly using the overlays on encrypted drives that get destroyed after the job finishes), we can ensure that no data is left on the node after the job finishes.
  • IPSec (StrongSwan) inside the container: should ensure some level of protection against the host/infrastructure as well.

Aspects for which mitigation is difficult include:

  • Access to the NFS data share: As the VMs do, the containers will mount the NFS data share from the controlling VM. This mount is visible in the host OS, thus, in principle admins can access it. There are ways to make it "invisible" to the OS but probably not a full mitigation.
note

The exposure to the system admin is much higher with containers than with VMs, but such a risk can be mitigated and this solution allows better performance. Since no user accounts are needed on the host, at least rootkit-style attacks are not possible.

Implementation Challenges

Most of the work is straightforward, but a few areas require extra care:

  • Networking: Ports from the br-secure OpenVSwitch bridge must be used to provision networking for the containers. The OVS-LINK tool should serve as inspiration for this.
  • Capabilities: The container needs the right Linux capabilities to configure networking, mount NFS drives, and create users. These must be carefully scoped so the container runs as the ticrypt user while still having the permissions needed to execute jobs and access data. Running as root is a fallback but undesirable from a security standpoint.
  • StrongSwan and hardware-assisted encryption: StrongSwan can offload encryption to network hardware, ensuring the VPN does not add a performance penalty. This will likely require careful implementation and experimentation.
  • MPI integration: MPI is tricky with containers, especially when networking is virtualized. Careful configuration and testing are needed to ensure MPI jobs run efficiently.

Conclusions

Adding bare-metal support to Slurm+tiCrypt is made surprisingly straightforward by an existing architecture that is already flexible and built on standard components. The challenges are real but manageable with careful design. The new feature will deliver better performance for demanding workloads, especially MPI jobs, while preserving a strong security posture. We look forward to seeing how it performs in practice.

Usage Reporting and Forensics in tiCrypt Audit

· 12 min read
Thomas Samant
Thomas Samant
Senior Partner

Why Audit Is Not Optional

Compliance frameworks like CMMC 2.0 Level 2 do not treat audit logging as a best practice. They treat it as a requirement. The Audit and Accountability (AU) domain of CMMC, mapped directly from NIST SP 800-171 Revision 2, defines nine controls that govern how systems must create, retain, protect, correlate, and report on audit records. These controls exist for a reason: without a trustworthy audit trail, there is no forensics, no accountability, and no way to prove that CUI was actually protected.

tiCrypt was designed from the start with the assumption that every action must be recorded and that records must be resistant to tampering. This article explains how tiCrypt's audit system works, what it captures, how it supports forensic investigation, and how it maps to the CMMC AU controls that organizations are assessed against.

The New tiCrypt Network Architecture Based on OpenVSwitch

· 13 min read
Alin Dobra
Alin Dobra
CEO & Co-founder

Motivation

The "traditional" LibVirt networking is based on Linux bridges. This architecture is simple yet effective for providing networking connectivity to VMs. If the VMs run on a single server, this architecture is sufficient. However, if the VMs run on multiple servers, the Linux bridge architecture becomes more complex and less efficient. Specifically, in the case of tiCrypt, it creates the following issues:

  • Host network isolation: The Linux bridge network is confined to the host it is defined on. The network can be extended using routing, but this creates significant complexity.
  • IP management complexity: IP assignment becomes very difficult since each host must have its own IP range.
  • Control of external access: tiCrypt needs to control external access to the VMs, and this is more difficult to achieve with Linux bridges since firewall rules must be defined on each host.
  • External proxied access: tiCrypt needs to provide external proxied access to the VMs. This is accomplished by mapping port ranges on the host to the possible VM IPs (port 22). Such mapping rules pollute the firewall rules on the hosts.
  • VM migration: The Linux bridge architecture does not support VM migration. This is a planned feature for tiCrypt.
  • Proxy performance: The Linux bridge solution forces the use of "software proxying" for external access to VMs. This is much slower than a firewall-based solution that requires a unified network architecture across hosts.
  • Rigid network integration: Libvirt, when using the Linux bridge architecture, only supports a few (nat, route, and open) setups. This makes it difficult to deal with custom firewall rules on hosts and backend.

How tiCrypt Isolates Virtual Machines at the Network Level

· 7 min read
Thomas Samant
Thomas Samant
Senior Partner

Secure virtual machines in tiCrypt run in near-complete isolation from each other and from the surrounding environment. This isolation is the foundation of tiCrypt's security model. Every network pathway into or out of a VM is tightly controlled, authenticated, and encrypted, with no exceptions.

This post explains the mechanisms that make this possible: proxy-mediated communication, application port tunneling, VM-level network isolation, and controlled access to external licensing servers.

Understanding tiCrypt Infrastructure: Components, Connectivity, and Deployment Options

· 7 min read
Thomas Samant
Thomas Samant
Senior Partner

Planning a tiCrypt deployment starts with understanding the infrastructure that powers it. This guide walks through the core components, how they connect, and the deployment architectures available, from a lightweight demo system to a full-scale production environment with batch processing. For a detailed walkthrough of how traffic flows between these components, see tiCrypt Functional Architecture.

Note: This guide covers infrastructure planning and setup. The tiCrypt installation and software deployment process is covered separately.

Why tiCrypt Uses MFA: But Never Trusts It

· 5 min read
Thomas Samant
Thomas Samant
Senior Partner

Security isn't just about having the right tools. It's about how you use them.

Multi-Factor Authentication has become a cornerstone of modern cybersecurity. Whether you're chasing CMMC compliance, meeting NIST standards, or simply trying to keep bad actors out, MFA is table stakes. Duo, Shibboleth, NetID — these tools are everywhere, and for good reason: they work.

So why does tiCrypt refuse to trust them?

Getting Data Into the Enclave: tiCrypt's Ingress Methods Explained

· 6 min read
Thomas Samant
Thomas Samant
Senior Partner

tiCrypt's security model is designed to protect data once it's inside the secure enclave. But in practice, the first question administrators and researchers ask is more immediate: how does data get in?

tiCrypt supports several ingress methods, each built for a different set of constraints, including dataset size, whether the sender has tiCrypt credentials, where the data needs to land, and who owns the process. This post breaks down each option and when to use it.