Managed Security Services: CISO Does Not Bear Sole Liability
Benedikt Langer
8 min. read In many organisations, the CISO is seen as the person who stands accountable for security. ...
6 Min. Reading Time
NVIDIA Vera Rubin (NVL576) is in full production. AWS, Google Cloud, and Microsoft Azure are already deploying the new architecture. CIOs who still base their AI infrastructure roadmaps for 2026/2027 on Hopper are planning with cost curves that are off by a factor of 10 – in the wrong direction.
The Essentials at a Glance
What is NVIDIA Vera Rubin? Vera Rubin (internally NVL576) is NVIDIA’s successor architecture to the Blackwell generation. The name honors the astronomer Vera Rubin. The NVL576 combines 576 Vera Rubin tensor cores with NVIDIA’s new NVLink interconnect technology and is optimized for inference workloads – i.e., the productive operation of trained AI models – with 10 times better token-per-watt efficiency than the previous generation H100.
Related: cloudmagazin: Kubernetes 1.36 Haru – Infrastructure Upgrade Checklist
The relevant number for CIOs is not GPU performance in FLOPS, but the price per million output tokens in productive operation. On H100, GPT-4-like inference costs between $8 and $15 per 1 million output tokens, depending on utilization and cloud provider. Vera Rubin brings this curve down to around $0.8 to $1.5 – a factor of 10 cheaper.
Token Cost Comparison (Inference, Cloud, 70B Model Equivalent)
H100 (Hopper, 2023)
~$10
per 1M Output-Tokens
B200 (Blackwell, 2025)
~$3
per 1M Output-Tokens
Vera Rubin (2026)
~$1
per 1M Output-Tokens
What this means for business cases: A company that currently spends $50,000 per month on AI inference on cloud H100 capacities would pay around $5,000 on Vera Rubin. An internal AI assistant platform that didn’t seem profitable on H100 could work on Vera Rubin. Make-or-buy decisions for own on-prem AI servers shift significantly towards cloud.
Q1/Q2 2026 – Production Starts
NVIDIA begins volume production of Vera Rubin NVL576. Google Cloud and AWS receive first dedicated allocations for their own internal workloads.
Q2 2026 – Enterprise Preview
AWS, Google Cloud, and Azure open Vera Rubin capacities for strategic enterprise customers in private preview. DACH region availability in Frankfurt and Amsterdam is top priority.
Q3 2026 – On-Demand (Planned)
On-demand availability for all enterprise customers. Pricing based on current NVIDIA production costs – expected to be significantly below H100 spot prices of the same generation.
Cloud-first strategy gains ground
On-prem risks misinvestment
The pragmatic CIO position for 2026: freeze on-prem AI server investments based on H100/H200 until Vera Rubin on-prem availability is clear. Pre-book cloud inference capacities for Vera Rubin (Reserved Instances) if your own inference usage is predictable. Address managed service providers that calculate on Hopper basis regarding the Vera Rubin roadmap.
More from the MBF Media Network
Fact sources: NVIDIA GTC 2026, AWS re:Invent Pre-Announcement April 2026, Google Cloud Blog, Microsoft Azure AI Infrastructure Blog.
AWS, Google Cloud, and Azure are planning on-demand availability for Q3 2026. Frankfurt and Amsterdam as EU regions are the top priority for DACH rollout. Private preview access can be requested for strategic enterprise customers starting from Q2 2026 through their respective account managers.
The 10x figure comes from NVIDIA’s internal benchmarks for inference workloads under optimal conditions. Real-world production numbers will be lower – a 5-7x cost reduction compared to H100 is a more realistic expectation for productive workloads. Even at 5x, this remains a strategically significant difference for infrastructure budget planning.
Not categorically. H100 infrastructure ordered today and going into production in Q4 2026 still has 2-3 years of productive use before Vera Rubin parity in the on-prem segment. Training workloads are less affected than inference. The question is: What do I need the GPU capacity for? For inference scaling, the Vera Rubin pause makes sense. For training, H100 can still be justifiable.
TCO analyses based on H100 cloud costs as a baseline systematically underestimate the attractiveness of the cloud from 2027 onwards. Anyone currently conducting an AI infrastructure analysis should include Vera Rubin cloud prices as a scenario. Standalone on-prem AI investments over 5 million EUR project volume should be explicitly analyzed with this factor in mind.
AMD MI350 and MI400 are coming as competition but are not yet in full production. Google TPU v6 (Trillium) is already in production but not available to external customers. AWS Trainium 3 and Inferentia 3 are specialized for training and inference but are not GPU-compatible for existing CUDA workloads. For DACH companies without their own chip dependency, Vera Rubin via cloud is the most pragmatic option in 2026.
Source title image: Pexels / panumas nikhomkhai (px:17489157)