The inference platform for the real world

Compile once, deploy everywhere.

Inference infrastructure is broken

Homogeneous assumptions

Most platforms assume identical GPUs in a single cluster. Real deployments have mixed CPUs, GPUs, NPUs across machines.

Fragile pipelines

Multi-model pipelines break silently in production. Type mismatches between models surface only after burning compute.

Scattered tooling

Separate tools for compilation, orchestration, serving, and monitoring. Each seam is a place things break.

From models to production

Models

PyTorch, ONNX, NNEF

Compile

Target hardware natively

Compose

Pipeline builder with validation

Serve

Typed API endpoint

Built for production inference

Heterogeneous Execution

Run on the hardware you actually have. CPUs, GPUs, NPUs, or a mix across machines. Compile once, deploy everywhere.

Typed Pipelines

Compose multi-model pipelines with strong type guarantees on inputs and outputs. Catch integration errors before models load into memory, not after.

Distributed by Default

Computation routes across nodes automatically. Split graphs across a network of heterogeneous accelerators without manual orchestration.

Declarative SDK

Define inference products, not just model calls. Chain models, add transformations, expose typed endpoints. Built in Rust for performance, with bindings for Python and more.

How it works

01

Define

Bring your models in PyTorch, ONNX, or NNEF. Declare your pipeline, types, and transformations.

ONNXPTNNEF
02

Compose

Chain models into pipelines declaratively. Add pre/post-processing and custom logic between stages.

transform
03

Validate

Pipelines are type-checked before any model loads into memory. Input/output contracts are enforced at definition time.

f32[]strimg
04

Compile

Compile ahead of time for distributed multi-machine workloads, at runtime for single-node execution, or use our compilation service for large graphs and complex infrastructure.

pragmaCPUGPUNPU
05

Deploy

Expose your pipeline as a typed inference endpoint. Orthos handles distribution across your hardware fleet automatically.

APIendpoint