7. Gaps, Innovation, and Development¶

7.1. Introduction to Gaps, Innovation, and Development¶

Functional gaps exist between available open source technology and this Reference Architecture’s (RA) or Reference Model’s (RM) requirements. This chapter details these gaps and proposes solutions. This may identify and target various “upstream” community projects for development efforts.

7.1.1. Gap Template¶

Related requirements: List requirement references (e.g., abc.xyz.00) from RA2 or RM addressed by this gap.

Baseline project: Identify the upstream project (e.g., Kubernetes) where the gap exists. If none, state “none”.

7.1.2. Container Run-Time Interfaces Towards NFVI Resources¶

This describes the southbound interface from the container to IaaS-provided infrastructure resources; e.g., the network interface type presented to a running container.

7.1.3. Multitenancy and Workload Isolation with Kubernetes¶

Related requirements: e.man.004, sec.ci.008, sec.wl.005, sec.wl.006

Baseline project: Kubernetes

Gap description: Kubernetes currently lacks robust multitenancy, preventing secure infrastructure resource sharing among untrusted tenants. This poses security risks when separating workloads by category (e.g., production vs. non-production). Tenant network segmentation is also needed, while maintaining central administration. Beyond security, this creates operational challenges. Deploying many CNFs in one cluster risks version and configuration conflicts, and software lifecycle management problems. Lack of isolation increases cascading failure risk.

Proposals & Resolution: Kubernetes isn’t a single-cluster solution. Industry case studies (Alibaba Cloud Blog: What Can We Learn from Twitter's Move to Kubernetes [169], YouTube: Kubernetes Failure Stories, or: How to Crash Your Cluster - Henning Jacobs [170], CNCF Blog: Demystifying Kubernetes as a service – How Alibaba cloud manages 10,000s of Kubernetes clusters [171]) and CNCF surveys show growing multi-cluster deployments within organizations. A multi-cluster approach addresses security and software lifecycle management challenges.

Without in-cluster multitenancy, separate clusters provide necessary CNF isolation (based on vendor, environment, category, or independent lifecycles). Co-locating similar CNFs allows for simultaneous upgrades, while separate clusters accommodate independent upgrades for CNFs with different versions, configurations, and dependencies.

However, managing numerous clusters at scale poses significant operational challenges if done manually. Operators should carefully consider their multi-cluster management strategy, including application management.

7.1.4. Kubernetes as a VM-based VNF Orchestrator¶

Related requirements: None.

Baseline project: Kubernetes, Kubevirt

Gap description: Kubernetes and a CRI-compliant runtime should support VNF execution without altering VNF architecture or deployment artifacts.

7.1.5. Native Multiple Network Interfaces on Pods¶

Related requirements: Virtual Network Interface Specifications (Chapter 4 of Reference Model for Cloud Infrastructure (RM) [1])

Baseline project: Kubernetes

Gap description: Kubernetes lacks native support for multiple Pod interfaces, requiring a CNI multiplexer (like GitHub: Multus-CNI [34]). Network service implementation (Network Policies, Ingress, Egress, Load Balancers) depends on the multiplexer and its CNI plugins, leading to inconsistency.

Status: A KEP (Google Docs: KEP: MultiNetwork podNetwork object [172]) aims to natively support multiple Pod interfaces.

7.1.6. Dynamic Network Management¶

Related requirements: inf.ntw.03 (Kubernetes Architecture Requirements)

Baseline project: Kubernetes

Gap description: Kubernetes lacks an API for network service management (e.g., VPNs). A CNI plugin (GitHub: Multus-CNI [34]) or integration with SDN controllers (using NetConf, etc.) is needed to provide APIs for network services and connect VPNs (e.g., L3VPN) to CNFs on demand.

7.1.7. Control Plane Efficiency¶

Related requirements: None

Baseline project: Kubernetes

Gap description: Multi-site/availability zone deployments often utilize multiple Kubernetes clusters for security, multitenancy, fault tolerance, resilience, and latency. This creates Kubernetes control plane node overhead. More efficient multi-cluster operation is needed to meet non-functional requirements.

7.1.8. Interoperability with VRF-based Networking¶

Related requirements: None

Baseline project: Kubernetes

Gap description: L3 VRFs/VPNs are commonly used for traffic separation (signaling, charging, LI, O&M). CNFs must interoperate with existing network elements, requiring Kubernetes Pod connection to L3 VPNs (currently only possible via Multus). However, network orchestration (connecting the interface to a gateway router terminating the L3 VPN) isn’t handled by Kubernetes, and lacks a production-grade open source solution.

Note

While possible with IaaS, this creates an undesirable dependency between Kubernetes workload orchestration and IaaS infrastructure orchestration.

7.1.9. Hardware Topology-Aware Huge Pages¶

Related requirements: infra.com.cfg.004 and infra.com.cfg.002 (Virtual Compute Profiles, Chapter 5 of Reference Model for Cloud Infrastructure (RM) [1]).

Baseline project: Kubernetes

Gap description: The Memory Manager (alpha feature in v1.21) is addressed in Management of Memory and Huge Pages Resources.

7.1.10. User Namespaces in Kubernetes¶

Related requirements: e.man.004 (Cloud Infrastructure Management Capabilities, Chapter 4 of Reference Model for Cloud Infrastructure (RM) [1]), inf.ntw.03 (Platform and Access Requirements, Chapter 2 of Reference Architecture (RA1) for OpenStack based cloud infrastructure [55])

Baseline project: Kubernetes

Gap description: Kubernetes has historically lacked a mechanism to isolate the user within a container from the host’s user, meaning a container running as root (UID 0) was also seen as root by the host kernel. This creates a significant security risk.

Status: This gap is now being addressed. The User Namespaces feature, which graduated to Beta and is enabled by default in Kubernetes v1.33, allows re-mapping the user inside the Pod to a different, non-privileged user range on the node. This means a container can run as root internally while being mapped to a harmless user externally, drastically reducing the attack surface for privilege escalation. While the feature is still Beta, its default-on status makes it a viable solution for enhancing workload isolation. This RA recommends its adoption to satisfy security requirements for multitenancy.