Write scalar.
Execute parallel.
Ark is a compute-native language that treats tensors, cost, and distributed state as first-class primitives. You write clear intent — the compiler produces efficient kernels and deterministic dispatch.
__global__ void matrixMul(float *A, float *B, float *C, int N) {
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
float sum = 0.0f;
if (row < N && col < N) {
for (int i = 0; i < N; i++) {
sum += A[row * N + i] * B[i * N + col];
}
C[row * N + col] = sum;
}
}
// + 100 lines of memory management...// Matrix Multiplication
fn[gpu] matmul(a: Tensor<f32, 2>, b: Tensor<f32, 2>) -> Tensor<f32, 2> {
// Ark handles tiling, memory layout, and kernel dispatch.
return a @ b;
}Forget malloc and cudaMemcpy.
High-performance kernels shouldn’t require hand-rolled pointer arithmetic and launch tuning. Ark keeps the code simple, while the compiler + runtime handle placement and scheduling.
- Zero-cost tensor abstractions (no manual launch params)
- Compile-time verification of shape + constraints
- Deterministic kernels across heterogeneous GPUs
- Placement hints are explicit and readable
let y = matmul(a, b) @runtime preset("prod");let y = matmul(a, b) @runtime { target: "gpu:0" };Implicit parallelism
Write code as if it runs sequentially. The compiler analyzes dependencies and parallelizes operations across GPU cores.
- Dependency analysis
- Auto-tiling + fusion
- Predictable scheduling
Resource aware
The type system and compiler track VRAM usage and constraints. If your workload won't fit, you find out before dispatch.
- VRAM estimation
- Constraint propagation
- Fail-fast deployment
Hardware agnostic
Target CUDA, ROCm, and future backends from one codebase. Ark serves as a universal IR with deterministic lowering.
- Multi-backend lowering
- Stable IR
- Portable artifacts
The toolchain
A pipeline built for correctness first, then ruthless optimization.
Ready to port your kernels?
Install the compiler, run the tour, then deploy your first kernel to the grid.