The Proteus Protocol: How to rollout AI-ready data centers faster than your competitors for next-to-nothing

A novel take on how to convert existing cloud & compute facilities for AI workloads - with a blueprint for a modular AI PCIe Expansion Blade running a custom firmware

Jul 06, 2025

I'd forgive you if you'd never heard of Proteus before.

In Greek mythology, he was a sea god, known for his ability to shape-shift. Anyone who delves deeper into the challenges data center operators face at present, there is always one roadblock which seems to be insurmountable:

Time.

We can come up with clever engineering solutions for the power bottleneck; SMRs, gas turbine efficiencies, renewables.
We can come up with novel ideas to improve the cooling; hybrid solutions, glycol-based liquid cooling, supercritical fluids.
And we can also solve the regulation and site strategy problems; through careful consideration, liaison with stakeholders and many other workarounds.

The problem is never the capacity or the capability - the problem always remains the speed at which that can be realised.

Ultimately, the timeline for delivery cannot be realistically compressed any more than it is at present. The rollout of even the most creative of solutions is often dependent on many other industries, infrastructure, and factors requiring significant planning and expense before those parts can be implemented.

I have full confidence that through collaboration and innovation, all of the major hurdles the industry currently faces regarding power, cooling, and site selection will continue to be overcome permanently. However, unfortunately for operators, the reality is that demand exists presently for these facilities and will remain so for the foreseeable future.

So today's question: If the timeline for new facility rollout is the main problem, is there a method to which we can convert existing Cloud or Compute data centers to be suitable for running AI tasks?

Like Protheus, my solution involves a bit of shape-shifting; utilizing the core existing infrastructure that’s already there, but implementing a protocol into it, which I’ll loosely refer to as the AI Migration Acceleration Kit (AIMAK).

Above: What ChatGPT thinks that Proteus would look like if he was upgrading his server racks with AIMAK GPUs. I think it’s remarkably accurate.

AIMAK - The elevator pitch

If there's a challenge that is universally accepted with converting existing cloud or compute hardware to be suitable for running AI tasks - it's the practise of converting CPU-centric racks to to run GPU-centric tasks.

This is not designed to replace the hyperscale AI facilities that are already being planned; but to assist in providing more bandwidth for simpler AI tasks requiring less power; it is accepted that than intense image or video rendering will still remain unlikely for this application - but such traffic can be routed to the already operational facilities designed for AI.

My AIMAK solution features three core components to be implemented properly:

1. Hardware: Modular AI Accelerator Nodes

Plug-and-Play GPU Expansion Blades or PCIe Add-on Cards with:
- NVIDIA L4, A2, or AMD MI300-class GPUs (low power, high AI performance)
- NVMe storage for model caching.
- High-bandwidth PCIe Gen4/5 or Infiniband/Ethernet (100G+) connectivity.
These blades connect into standard servers or 1U/2U chassis.
Optional: FPGA-based acceleration (for low-latency AI inference).

2. Software: AI Virtualization & Orchestration Layer

Lightweight hypervisor or container orchestration (Kubernetes, K3s, or Nomad).
AI workload scheduler that detects available acceleration and dispatches accordingly.
GPU virtualization (NVIDIA vGPU, AMD MxGPU) to allow multi-tenancy on each GPU blade.
Monitoring, usage metering, and fallback to CPU when necessary.

3. Network Fabric Upgrades (If required)

Implement RoCE or Infiniband-compatible fabrics as needed for high-speed interconnects (depends on AI workload demands).

Why this approach?

Simply put: it’s about the best performance we can squeeze for the lowest cost. But a few extra considerations:

It’s fine to cost a bit, but shouldn’t be alot - this is designed to be a temporary upgrade to help meet the demand until more suitable permanent facilities are constructed.
It needs to be quick - if it takes forever to get it implemented, then frankly we all may as well wait until the permanent facilities are constructed. So it needs to be able to be rolled out as fast as people can fit GPUs in a server rack and reboot them.
It needs to work within the constraints of the existing infrastructure - which means all of the existing power capacity and cooling performance needs to be able to serve this use. If everything needs to be upgraded, then again - we may as well wait for the permanent facility.
AI traffic is becoming more simplistic - as people shift away from Google and onto ChatGPT for search enquiries - this architecture will still be adequate for LLM inference (e.g., OpenAI, Meta LLaMA-based models), computer vision at scale (retail, surveillance, manufacturing), speech recognition and translation, and even lightweight training for edge AI models.
Supports multi-cloud or hybrid environments
Scalable and reversible

I’m sold. Hit me with the hardware design

Any of the above GPU designs could be easily adjusted to suit this use case better at the production stage. A few ideas:

Half-Height Modular Heat Sink Kit

Redesign cooling with a swappable passive heat sink kit that fits both:

Standard 2U servers with existing airflow (high-density fins).
Edge/1U deployments (compact design).

This increases flexibility without needing SKU splits; reduces field engineering time.

NVMe Storage on Board

Many legacy servers have limited PCIe lanes or slow storage, so adding a high speed M.2 NVMe slot is the fastest storage we could deploy for the least amount of GPU PCB real-estate

This will also create a local cache for AI models, embeddings, or datasets; useful for inference workloads where disk I/O is the bottleneck.

Front IO Bracket for QuickSwap

Backplane access in some traditional racks can be hard; so a redesign of the front IO will help with access - support field maintenance without full server teardown.

It doesn’t have to be anything too extensive - redesign PCIe bracket to allow front-facing access to status LEDs, reset, or firmware update port (e.g., USB-C):

What the above should achieve is:

More efficient resource usage in multi-app environments.
Lower risk of power/cooling overrun in legacy data centers.
Faster time-to-inference with preloaded models or local caching.
Field-friendly maintenance for temporary deployments.

And that shouldn’t add more than probably about 5% to the typical bill of materials per unit.

Again, ChatGPT’s take on what an AIMAK Blade should look like. I think it made a better job of Proteus, but it’ll do for illustrative purposes.

Firm up that firmware outline for me…

Not my area of expertise (at all), and I may have this completely wrong. But loosely, my thoughts are:

Multi-Instance GPU (MIG)-Lite

Introduce a simplified MIG-Lite feature via firmware update, for better utilization in retrofitted cloud servers where no single app maxes out the GPU:

Allow hardware-level partitioning into smaller inference instances (e.g., 2 or 4 logical GPUs).
Ideal for small batch AI inference in multi-tenant environments.
Reduces the risk of idle GPU capacity when workloads are light.

Dynamic Power Profiles

Add firmware-level adaptive power and thermal management profiles optimized for constrained power/cooling in older racks. Many legacy racks were not designed for AI-class thermal loads—this lets the GPU fit better within existing envelopes:

“Legacy Rack Mode” (e.g., power cap at 60–70W instead of 72W peak).
Aggressive clock gating when workloads are sparse.

Boot-time Resource Negotiation

Develop firmware that can negotiate GPU availability at boot based on system-level BIOS calls, to ensure safe and automated integration without manual intervention:

Automatically disable/scale down GPU if power or cooling is marginal.
Report availability to the host OS via SMBIOS or ACPI tables.

Pre-installed Model Support

Add secure firmware-level pre-loading of common open-source AI models (ONNX, TensorRT) into onboard memory to speed up deployment for temporary AI use cases without sophisticated DevOps.:

Reduce cold-start times for inference workloads.
Optional encrypted model store in firmware.

And there’s my take on an AIMAK.

Utilizing everything you’ve got, adding a few cheap items, and you’ve acquired a new AI data center to offer your customers who are chomping at the bit for increased capacity.

Find me on Linkedin

See more engineering posts

You may also find these of interest:

AWS is perfectly positioned for a hostile takeover of conventional data centers in a way that no one could predict

Tom Hyde

Jun 29

AWS is perfectly positioned for a hostile takeover of conventional data centers in a way that no one could predict

With the covers coming off in 2015, Amazon Web Services released the Snow family - a series of physical devices designed to remove some of the friction in migrating traditional users from local storage into cloud computing.

Read full story

A Hot Take, from Bits to Beets: how data centers could be growing our lunch for free

Tom Hyde

Jun 27

A Hot Take, from Bits to Beets: how data centers could be growing our lunch for free

This started out as a shower thought; but the more I think about it the more I'm becoming obsessed with it.

Read full story

I'll be your server today: A novel wafer design for a better data center solution

Tom Hyde

Jun 26

I'll be your server today: A novel wafer design for a better data center solution

This could either be a billion-dollar idea or a complete failure. However, as data centers begin to reach hyperscale, the construction challenges - and subsequently the costs associated in overcoming them - are soon becoming prohibitive.

Read full story

The Proteus Protocol: How to rollout AI-ready data centers faster than your competitors for next-to-nothing

A novel take on how to convert existing cloud & compute facilities for AI workloads - with a blueprint for a modular AI PCIe Expansion Blade running a custom firmware

I'd forgive you if you'd never heard of Proteus before.

AIMAK - The elevator pitch

1. Hardware: Modular AI Accelerator Nodes

2. Software: AI Virtualization & Orchestration Layer

3. Network Fabric Upgrades (If required)

Why this approach?

I’m sold. Hit me with the hardware design

Half-Height Modular Heat Sink Kit

NVMe Storage on Board

Front IO Bracket for QuickSwap

What the above should achieve is:

Firm up that firmware outline for me…

Multi-Instance GPU (MIG)-Lite

Dynamic Power Profiles

Boot-time Resource Negotiation

Pre-installed Model Support

And there’s my take on an AIMAK.

AWS is perfectly positioned for a hostile takeover of conventional data centers in a way that no one could predict

A Hot Take, from Bits to Beets: how data centers could be growing our lunch for free

I'll be your server today: A novel wafer design for a better data center solution

Discussion about this post