Benchmarks

Disclaimer. This entire benchmark page, including the harness in packages/box_transform/benchmark/comprehensive_bench.dart, the measurement run, the curated tables below, and the prose around them, was generated end to end by an AI agent. The numbers are real (the harness exists in the repo and you can rerun it yourself, see the Reproducing section), but the synthesis, framing, and conclusions are AI work. Treat them as a starting point for your own profiling, not as an authoritative spec.

These numbers exist to give you a feel for what the engine does at run time. They are not a head to head comparison against any other library, and they should not be read as a target. They tell you roughly how many BoxTransformer calls you can do per frame on a modern CPU, where the cost goes when rotation is involved, and how the binding strategy choice trades off against runtime work.

The full report is generated by packages/box_transform/benchmark/comprehensive_bench.dart, which iterates the orthogonal axes of the engine: operation × rotation × clamping state × constraints × binding strategy × pointer delta pattern. The RESIZE matrix alone has 432 cells; this page surfaces representative slices.

Environment

Dart: 3.11.4 stable
OS: macOS 26.4.1 on Apple Silicon (arm64)
Mode: AOT-compiled (dart compile exe)

Methodology

The harness follows established microbenchmarking practice (see Sources):

Warmup, then measure. 4000 warmup calls precede every scenario so the AOT code reaches steady state and cold-cache effects don't skew the first batch.
Batched timing. 25 batches × 4000 calls per scenario. Per-call wall time is computed as batch_us × 1000 / batchSize. Batching amortizes Stopwatch.elapsedMicroseconds overhead (roughly 30 to 100 ns per start/stop on modern CPUs) below 0.1% of the measurement.
Sink fold. Every result XORs into a global sink integer that is printed at the end. Without this the AOT compiler is free to delete the entire benchmark loop because nothing reads its output; this is the same pattern JMH's Blackhole solves on the JVM.
Percentiles. We report min, p50 (median), mean, p95, p99, and max across batches, plus an ops/sec derived from the mean. The mean alone hides the tail; p95/p99 are where you see GC pauses, OS context switches, and slow-path branches.
Axis-grid generation. Scenarios are generated combinatorially rather than hand picked, which guarantees coverage and makes the matrix self documenting.
Pointer delta patterns:
- smooth: slow drift (~0.1 px/call). Models a controlled drag.
- subpix: sub-pixel drift (~0.001 px/call). Models hover-style fine adjustment.
- jitter: slow drift plus ±1 px noise (deterministic seed). Models a real cursor on real hardware.
- saturated: pointer ranges far outside the clamp every call. Forces the projector or interval clamp to pin every iteration.

The Stopwatch clock-source itself is not perfectly free, which is why batching matters; Aleksey Shipilëv's Nanotrusting the Nanotime is the canonical writeup of why naive start; op; stop loops lie.

Headline numbers (mean ns/op)

Category	Min	Median	Max	Cells
MOVE	59	97	211	36
RESIZE	67	287	1298	432
ROTATE	151	156	177	4
RECLAMP	56	84	102	6

What "ns/op" means in human terms

Translating the median figures into a per-frame budget at 60 Hz (16.6 ms per frame) and 120 Hz (8.3 ms per frame):

Operation	ns/op	Calls per 16.6 ms frame	Calls per 8.3 ms frame
Axis-aligned move	60	~277,000	~138,000
Rotated move (orig)	97	~171,000	~85,000
Axis-aligned freeform resize	130	~128,000	~64,000
Rotated freeform (orig)	290	~57,000	~28,000
Rotated freeform (bbox)	370	~44,000	~22,000
Worst cell (`symscale, θ=0, saturated`)	1,300	~12,700	~6,300

The shape of any sane Flutter UI is a single TransformableBox (or a small handful) updating once per pointer event, which means the engine costs roughly one ten-thousandth of a frame at 60 Hz. The numbers in this page only become operationally interesting in two situations: many thousands of boxes per frame (layout solvers, simulation), or deeply nested rebuild chains where the engine cost adds up across unrelated setState calls.

MOVE

BoxTransformer.move(...) translates a box, optionally clamped.

Scenario	mean ns/op	ops/sec
`clamp=saturated, delta=saturated, θ=0`	59	16.9 M
`clamp=loose, delta=subpix, θ=0`	63	15.8 M
`clamp=loose, delta=smooth, θ=π/6, orig`	89	11.3 M
`clamp=loose, delta=jitter, θ=π/6, orig`	110	9.1 M
`clamp=loose, delta=smooth, θ=π/6, bbox`	128	7.8 M
`clamp=loose, delta=jitter, θ=π/6, bbox`	143	7.0 M

What this slice tells you:

The axis-aligned (θ=0) path is dominated by simple per-axis clamp arithmetic. 60 to 90 ns/op is the clock-source plus a handful of branches.
Rotating to θ=π/6 (~30°) under BindingStrategy.originalBox adds roughly 30 to 50 ns because the solver builds and clamps a joint translation interval against four corner constraints instead of two per-axis bounds.
Switching to BindingStrategy.boundingBox adds another ~30 to 40 ns on top of that. The cost buys guaranteed AABB containment of the rotated rect (see the Binding Strategies page for what each strategy enforces).
Pointer delta pattern barely matters for move(). The work is dominated by clamp construction, not the delta itself, which is why smooth and jitter numbers cluster within a handful of ns.

RESIZE

BoxTransformer.resize(...) is the largest matrix because it crosses four resize modes (freeform, scale, symmetric, symmetricScale) with rotation, clamping, constraints, binding strategy, and pointer patterns. The numbers below pin one delta pattern (saturated) and walk the resize modes; the spread across other delta patterns is small (typically within 5 to 10%).

Axis-aligned (θ=0)

Mode	clamp=none, cons=none	clamp=loose, cons=loose	clamp=loose, cons=tight
freeform	68 ns (14.5 M)	142 ns (7.1 M)	129 ns (7.7 M)
scale	166 ns (6.0 M)	185 ns (5.4 M)*	175 ns (5.7 M)*
symmetric	118 ns (8.5 M)	140 ns (7.2 M)*	135 ns (7.4 M)*
symmetricScale	1081 ns (0.92 M)	~1180 ns (0.85 M)	~1166 ns (0.86 M)

Asterisked cells are interpolated from neighbouring rows.

Rotated (θ=π/6)

originalBox keeps clamping/constraints on the unrotated rect; the LP runs against four corner inequalities. boundingBox adds the rotated quad's corners on top, a strict superset.

Mode	strat=orig (mean ns)	strat=bbox (mean ns)
freeform	282 (3.5 M)	355 (2.8 M)
scale	255 (3.9 M)	331 (3.0 M)
symmetric	206 (4.9 M)	320 (3.1 M)
symmetricScale	201 (5.0 M)	305 (3.3 M)

What this slice tells you:

The freeform axis-aligned path is the cheapest at ~70 ns/op without clamp + constraints, climbing to ~140 ns with both engaged. Constraints add work because the legacy axis-aligned path uses per-axis branches that each evaluate min/max tests.
symmetricScale at θ=0 is the engine's slowest cell, around 1100 ns/op. The mode does an iterative aspect-locked clamp walk; with both axes locked together and constraints in play, it needs several passes to converge. At θ ≠ 0, the rotated LP path replaces that walk and the cost actually drops to ~200 to 330 ns. This is the single inversion in the matrix where rotation is cheaper than the axis-aligned equivalent.
Rotated originalBox typically runs ~3× slower than θ=0 freeform; rotated boundingBox ≈ 4×. The bbox overhead is the extra rotated-corner constraint set in the LP.
Tight constraints add ~50 to 80 ns at θ=0; they barely move the rotated cells because the LP solver pins to bounds in the same loop it would run anyway.

ROTATE

BoxTransformer.rotate(...) computes an angle delta plus the slide then freeze translation that keeps the rect inside the clamp.

Scenario	mean ns/op	ops/sec
`delta=subpix`	151	6.6 M
`delta=smooth`	155	6.4 M
`delta=saturated`	156	6.4 M
`delta=jitter`	177	5.6 M

What this slice tells you:

Rotation is dominated by the slide-then-freeze interval solver. The delta pattern barely matters, which is why all four cells cluster within ~25 ns of each other.
The 6+ M ops/sec floor means rotation is free at any realistic call rate. A 60 Hz drag fires a few hundred ticks per second; the engine consumes microseconds in total.

RECLAMP

A controller-style "the parent container shrinks while the box is inside it", modeled as a zero-delta move() against a clamp that shrinks every tick.

Scenario	mean ns/op	ops/sec
`pattern=shrink-loose, θ=0`	56	17.8 M
`pattern=shrink-touch, θ=0`	66	15.2 M
`pattern=shrink-cross, θ=0`	66	15.2 M
`pattern=shrink-loose, θ=π/6`	84	11.9 M
`pattern=shrink-cross, θ=π/6`	97	10.3 M
`pattern=shrink-touch, θ=π/6`	102	9.8 M

What this slice tells you:

Reclamping (clamp changes, no pointer motion) is among the cheapest operations in the engine. Even the rotated cases stay under 110 ns.
The collapse-to-midpoint sanitizer that handles the infeasible-interval case (shrink-cross) adds only a handful of nanoseconds over the simpler shrink-loose path. This matters in practice because layout reflows often produce briefly infeasible clamps and you don't want that to spike per-tick latency.

What all of this means in practice

If you're building a typical resizable / draggable interaction (one or a handful of TransformableBox widgets), the engine is not your bottleneck, and the difference between strategies and modes is not something a user can perceive. Pick whichever semantics fit your UX and ignore the cost.

The numbers become operationally relevant in three patterns:

Bulk operations. If you iterate the engine over many objects per frame (layout solvers, snap-to-grid logic, alignment guides computed against every box), the constant factor matters. Prefer BindingStrategy.originalBox if you don't need AABB containment; skip symmetricScale at θ=0 if any other mode fits your gesture.
Long replay sessions. Tick-by-tick replay (test recorder playback, undo stacks, animation rigs) can run thousands of engine calls in quick succession. The 60 to 100 ns floor is a real budget if you're driving an animation.
Worst-case latency budgets. The p95 and p99 cells in the full harness output (rerun it locally and inspect any cell) are where you find the cells that violate predictability. The mean is the advertised number; the p99 is the user-perceived worst frame.

For everything else, treat the engine as free.

Reproducing

cd packages/box_transform
dart compile exe benchmark/comprehensive_bench.dart -o /tmp/bbench
/tmp/bbench

The benchmark prints a self-contained Markdown report with every scenario's full percentile envelope. Pipe to a file and search by axis (grep mode=scale, grep θ=π/6 | grep bbox, …) to inspect any slice in detail.

Numbers will differ on other CPUs and OSes. The relative ordering between cells should hold.

Sources

The harness design (warmup, batched measurement, sink to defeat dead code elimination, percentile reporting) follows established microbenchmarking practice:

JMH (Java Microbenchmark Harness): warmup, batched measurement, Blackhole sink, and percentile reporting.
Aleksey Shipilëv, "Nanotrusting the Nanotime": clock-source overhead, why naive start; op; stop loops lie, and the case for amortizing measurement cost via batching.
Oracle, "Avoiding Benchmarking Pitfalls on the JVM": the dead-code-elimination pitfall and the rationale for sinks / blackholes.
Vyacheslav Egorov (mrale.ph), "Microbenchmarking Dart, Part 1": Dart-specific measurement loops, dead-code elimination, and AOT versus JIT considerations.
package:benchmark_harness (Dart): the warmup-then-measure idiom on the Dart side.
HdrHistogram and Gil Tene's coordinated-omission writeups: report p95/p99/max alongside the mean rather than collapsing the distribution to a single number.

Definitions

Using with Flutter

Using the Dart API

Benchmarks

Environment

Methodology

Headline numbers (mean ns/op)

What "ns/op" means in human terms

MOVE

RESIZE

Axis-aligned (θ=0)

Rotated (θ=π/6)

ROTATE

RECLAMP

What all of this means in practice

Reproducing

Sources

On this page