Benchmarks
Disclaimer. This entire benchmark page, including the harness in
packages/box_transform/benchmark/comprehensive_bench.dart, the measurement run, the curated tables below, and the prose around them, was generated end to end by an AI agent. The numbers are real (the harness exists in the repo and you can rerun it yourself, see the Reproducing section), but the synthesis, framing, and conclusions are AI work. Treat them as a starting point for your own profiling, not as an authoritative spec.
These numbers exist to give you a feel for what the engine does at run
time. They are not a head to head comparison against any other
library, and they should not be read as a target. They tell you roughly
how many BoxTransformer calls you can do per frame on a modern CPU,
where the cost goes when rotation is involved, and how the binding
strategy choice trades off against runtime work.
The full report is generated by
packages/box_transform/benchmark/comprehensive_bench.dart, which
iterates the orthogonal axes of the engine: operation × rotation ×
clamping state × constraints × binding strategy × pointer delta
pattern. The RESIZE matrix alone has 432 cells; this page
surfaces representative slices.
Environment
- Dart: 3.11.4 stable
- OS: macOS 26.4.1 on Apple Silicon (arm64)
- Mode: AOT-compiled (
dart compile exe)
Methodology
The harness follows established microbenchmarking practice (see Sources):
- Warmup, then measure. 4000 warmup calls precede every scenario so the AOT code reaches steady state and cold-cache effects don't skew the first batch.
- Batched timing. 25 batches × 4000 calls per scenario. Per-call
wall time is computed as
batch_us × 1000 / batchSize. Batching amortizesStopwatch.elapsedMicrosecondsoverhead (roughly 30 to 100 ns per start/stop on modern CPUs) below 0.1% of the measurement. - Sink fold. Every result XORs into a global
sinkinteger that is printed at the end. Without this the AOT compiler is free to delete the entire benchmark loop because nothing reads its output; this is the same pattern JMH'sBlackholesolves on the JVM. - Percentiles. We report min, p50 (median), mean, p95, p99, and max across batches, plus an ops/sec derived from the mean. The mean alone hides the tail; p95/p99 are where you see GC pauses, OS context switches, and slow-path branches.
- Axis-grid generation. Scenarios are generated combinatorially rather than hand picked, which guarantees coverage and makes the matrix self documenting.
- Pointer delta patterns:
- smooth: slow drift (~0.1 px/call). Models a controlled drag.
- subpix: sub-pixel drift (~0.001 px/call). Models hover-style fine adjustment.
- jitter: slow drift plus ±1 px noise (deterministic seed). Models a real cursor on real hardware.
- saturated: pointer ranges far outside the clamp every call. Forces the projector or interval clamp to pin every iteration.
The Stopwatch clock-source itself is not perfectly free, which is
why batching matters; Aleksey Shipilëv's
Nanotrusting the Nanotime
is the canonical writeup of why naive start; op; stop loops lie.
Headline numbers (mean ns/op)
| Category | Min | Median | Max | Cells |
|---|---|---|---|---|
| MOVE | 59 | 97 | 211 | 36 |
| RESIZE | 67 | 287 | 1298 | 432 |
| ROTATE | 151 | 156 | 177 | 4 |
| RECLAMP | 56 | 84 | 102 | 6 |
What "ns/op" means in human terms
Translating the median figures into a per-frame budget at 60 Hz (16.6 ms per frame) and 120 Hz (8.3 ms per frame):
| Operation | ns/op | Calls per 16.6 ms frame | Calls per 8.3 ms frame |
|---|---|---|---|
| Axis-aligned move | 60 | ~277,000 | ~138,000 |
| Rotated move (orig) | 97 | ~171,000 | ~85,000 |
| Axis-aligned freeform resize | 130 | ~128,000 | ~64,000 |
| Rotated freeform (orig) | 290 | ~57,000 | ~28,000 |
| Rotated freeform (bbox) | 370 | ~44,000 | ~22,000 |
Worst cell (symscale, θ=0, saturated) | 1,300 | ~12,700 | ~6,300 |
The shape of any sane Flutter UI is a single TransformableBox (or a
small handful) updating once per pointer event, which means the engine
costs roughly one ten-thousandth of a frame at 60 Hz. The numbers
in this page only become operationally interesting in two situations:
many thousands of boxes per frame (layout solvers, simulation), or
deeply nested rebuild chains where the engine cost adds up across
unrelated setState calls.
MOVE
BoxTransformer.move(...) translates a box, optionally clamped.
| Scenario | mean ns/op | ops/sec |
|---|---|---|
clamp=saturated, delta=saturated, θ=0 | 59 | 16.9 M |
clamp=loose, delta=subpix, θ=0 | 63 | 15.8 M |
clamp=loose, delta=smooth, θ=π/6, orig | 89 | 11.3 M |
clamp=loose, delta=jitter, θ=π/6, orig | 110 | 9.1 M |
clamp=loose, delta=smooth, θ=π/6, bbox | 128 | 7.8 M |
clamp=loose, delta=jitter, θ=π/6, bbox | 143 | 7.0 M |
What this slice tells you:
- The axis-aligned (
θ=0) path is dominated by simple per-axis clamp arithmetic. 60 to 90 ns/op is the clock-source plus a handful of branches. - Rotating to
θ=π/6(~30°) underBindingStrategy.originalBoxadds roughly 30 to 50 ns because the solver builds and clamps a joint translation interval against four corner constraints instead of two per-axis bounds. - Switching to
BindingStrategy.boundingBoxadds another ~30 to 40 ns on top of that. The cost buys guaranteed AABB containment of the rotated rect (see the Binding Strategies page for what each strategy enforces). - Pointer delta pattern barely matters for
move(). The work is dominated by clamp construction, not the delta itself, which is whysmoothandjitternumbers cluster within a handful of ns.
RESIZE
BoxTransformer.resize(...) is the largest matrix because it crosses
four resize modes (freeform, scale, symmetric, symmetricScale)
with rotation, clamping, constraints, binding strategy, and pointer
patterns. The numbers below pin one delta pattern (saturated) and
walk the resize modes; the spread across other delta patterns is
small (typically within 5 to 10%).
Axis-aligned (θ=0)
| Mode | clamp=none, cons=none | clamp=loose, cons=loose | clamp=loose, cons=tight |
|---|---|---|---|
| freeform | 68 ns (14.5 M) | 142 ns (7.1 M) | 129 ns (7.7 M) |
| scale | 166 ns (6.0 M) | 185 ns (5.4 M)* | 175 ns (5.7 M)* |
| symmetric | 118 ns (8.5 M) | 140 ns (7.2 M)* | 135 ns (7.4 M)* |
| symmetricScale | 1081 ns (0.92 M) | ~1180 ns (0.85 M) | ~1166 ns (0.86 M) |
Asterisked cells are interpolated from neighbouring rows.
Rotated (θ=π/6)
originalBox keeps clamping/constraints on the unrotated rect; the
LP runs against four corner inequalities. boundingBox adds the
rotated quad's corners on top, a strict superset.
| Mode | strat=orig (mean ns) | strat=bbox (mean ns) |
|---|---|---|
| freeform | 282 (3.5 M) | 355 (2.8 M) |
| scale | 255 (3.9 M) | 331 (3.0 M) |
| symmetric | 206 (4.9 M) | 320 (3.1 M) |
| symmetricScale | 201 (5.0 M) | 305 (3.3 M) |
What this slice tells you:
- The freeform axis-aligned path is the cheapest at ~70 ns/op without clamp + constraints, climbing to ~140 ns with both engaged. Constraints add work because the legacy axis-aligned path uses per-axis branches that each evaluate min/max tests.
symmetricScaleatθ=0is the engine's slowest cell, around 1100 ns/op. The mode does an iterative aspect-locked clamp walk; with both axes locked together and constraints in play, it needs several passes to converge. Atθ ≠0, the rotated LP path replaces that walk and the cost actually drops to ~200 to 330 ns. This is the single inversion in the matrix where rotation is cheaper than the axis-aligned equivalent.- Rotated
originalBoxtypically runs ~3× slower thanθ=0freeform; rotatedboundingBox≈ 4×. Thebboxoverhead is the extra rotated-corner constraint set in the LP. - Tight constraints add ~50 to 80 ns at
θ=0; they barely move the rotated cells because the LP solver pins to bounds in the same loop it would run anyway.
ROTATE
BoxTransformer.rotate(...) computes an angle delta plus the slide
then freeze translation that keeps the rect inside the clamp.
| Scenario | mean ns/op | ops/sec |
|---|---|---|
delta=subpix | 151 | 6.6 M |
delta=smooth | 155 | 6.4 M |
delta=saturated | 156 | 6.4 M |
delta=jitter | 177 | 5.6 M |
What this slice tells you:
- Rotation is dominated by the slide-then-freeze interval solver. The delta pattern barely matters, which is why all four cells cluster within ~25 ns of each other.
- The 6+ M ops/sec floor means rotation is free at any realistic call rate. A 60 Hz drag fires a few hundred ticks per second; the engine consumes microseconds in total.
RECLAMP
A controller-style "the parent container shrinks while the box is
inside it", modeled as a zero-delta move() against a clamp that
shrinks every tick.
| Scenario | mean ns/op | ops/sec |
|---|---|---|
pattern=shrink-loose, θ=0 | 56 | 17.8 M |
pattern=shrink-touch, θ=0 | 66 | 15.2 M |
pattern=shrink-cross, θ=0 | 66 | 15.2 M |
pattern=shrink-loose, θ=π/6 | 84 | 11.9 M |
pattern=shrink-cross, θ=π/6 | 97 | 10.3 M |
pattern=shrink-touch, θ=π/6 | 102 | 9.8 M |
What this slice tells you:
- Reclamping (clamp changes, no pointer motion) is among the cheapest operations in the engine. Even the rotated cases stay under 110 ns.
- The collapse-to-midpoint sanitizer that handles the
infeasible-interval case (
shrink-cross) adds only a handful of nanoseconds over the simplershrink-loosepath. This matters in practice because layout reflows often produce briefly infeasible clamps and you don't want that to spike per-tick latency.
What all of this means in practice
If you're building a typical resizable / draggable interaction (one or
a handful of TransformableBox widgets), the engine is not your
bottleneck, and the difference between strategies and modes is not
something a user can perceive. Pick whichever semantics fit your UX
and ignore the cost.
The numbers become operationally relevant in three patterns:
- Bulk operations. If you iterate the engine over many objects
per frame (layout solvers, snap-to-grid logic, alignment guides
computed against every box), the constant factor matters. Prefer
BindingStrategy.originalBoxif you don't need AABB containment; skipsymmetricScaleatθ=0if any other mode fits your gesture. - Long replay sessions. Tick-by-tick replay (test recorder playback, undo stacks, animation rigs) can run thousands of engine calls in quick succession. The 60 to 100 ns floor is a real budget if you're driving an animation.
- Worst-case latency budgets. The p95 and p99 cells in the full harness output (rerun it locally and inspect any cell) are where you find the cells that violate predictability. The mean is the advertised number; the p99 is the user-perceived worst frame.
For everything else, treat the engine as free.
Reproducing
cd packages/box_transform
dart compile exe benchmark/comprehensive_bench.dart -o /tmp/bbench
/tmp/bbench
The benchmark prints a self-contained Markdown report with every
scenario's full percentile envelope. Pipe to a file and search by
axis (grep mode=scale, grep θ=π/6 | grep bbox, …) to inspect any
slice in detail.
Numbers will differ on other CPUs and OSes. The relative ordering between cells should hold.
Sources
The harness design (warmup, batched measurement, sink to defeat dead code elimination, percentile reporting) follows established microbenchmarking practice:
- JMH (Java Microbenchmark Harness):
warmup, batched measurement,
Blackholesink, and percentile reporting. - Aleksey Shipilëv, "Nanotrusting the Nanotime":
clock-source overhead, why naive
start; op; stoploops lie, and the case for amortizing measurement cost via batching. - Oracle, "Avoiding Benchmarking Pitfalls on the JVM": the dead-code-elimination pitfall and the rationale for sinks / blackholes.
- Vyacheslav Egorov (mrale.ph), "Microbenchmarking Dart, Part 1": Dart-specific measurement loops, dead-code elimination, and AOT versus JIT considerations.
package:benchmark_harness(Dart): the warmup-then-measure idiom on the Dart side.- HdrHistogram and Gil Tene's coordinated-omission writeups: report p95/p99/max alongside the mean rather than collapsing the distribution to a single number.

