Service · KumoMTA

KumoMTA optimization & tuning

Optimizing KumoMTA is not about sending faster — it is about more mail reaching the inbox, sustainably. We refine shaping beyond the community baseline, tune Traffic Shaping Automation, size queues, spool and memory for your real load, and keep the Lua hot path light — measuring every change against a baseline instead of guessing.

Request the free audit Talk to an expert

KumoMTA optimization is the work of getting more mail to the inbox from an engine that already runs, without sending faster: refining per-provider shaping beyond the community baseline, tuning Traffic Shaping Automation, sizing egress pools, queues, the ready queue and the RocksDB spool to real load, lifting the OS limits underneath, and keeping the Lua hot path light. Every change is measured against a baseline of deferrals, latency, queue depth and seed-list placement, so improvement is a number you can check rather than a claim.

In short

→ Optimization targets inbox placement, not raw speed — a single tuned node already pushes tens of millions of messages an hour to a sink; receivers are the constraint.
→ The highest-return levers are usually per-provider connection concurrency, TSA rules and pool design — rarely anything exotic.
→ Concurrency, not message rate, is what several large providers police most closely; cutting connections often clears deferrals a rate change would not.
→ RocksDB spool on fast local storage, separated from logs and OS, is the production default; flat on-disk spool ties throughput to filesystem speed.
→ Changes go in one at a time, validated before they load and hot-reloaded, each measured against the baseline before the next — and written down.

KumoMTA is a precision engine, and like every precision engine it amplifies whatever it is given: good decisions and bad ones, at line speed. The difference between a park that delivers beautifully and one that limps is almost never the hardware — the project's own benchmarks show a single big node pushing tens of millions of messages per hour when pointed at a sink — it is how the layers above the engine are tuned: shaping against each provider's tolerance, automation that reacts the right way, queues and spool sized for the real load, policy code that stays out of its own way. Optimization closes the distance between what your configuration currently does and what your reputation would permit. We run KumoMTA in production daily, we deploy it for senders moving serious volume, and this page describes what we move, in which order, and with what evidence — because tuning without measurement is opinion, and opinion is expensive at a million messages an hour.

What does optimizing KumoMTA actually mean?

Serious tuning is a data discipline, and KumoMTA is unusually generous with data: structured delivery logs, a rich bounce taxonomy, and Prometheus metrics on everything from queue depths to spool memtables, ready to graph in Grafana. The first thing we do is exploit that: fix a baseline of your deferral rate per provider, delivery latency, queue depth and drain time, bounce mix, and real inbox placement measured with seed lists. Only then does anything get touched, and every change is evaluated against that baseline so we know whether it helped, hurt or merely felt productive. The engine tells you exactly how every receiver is treating you, response by response; the difference between a blocked sender and a trusted one is largely how carefully that feedback is read. Tuning by baseline turns optimization from a box of tricks into engineering, and it protects you too: if a change underperforms, the numbers expose it and we revert, instead of assuming it worked.

The levers, in the order they should move

There is a correct order to touch things, and skipping it causes half the trouble we get called for. Before any talk of volume or speed, the foundation has to hold: authentication passing and aligned, suppression actually suppressing, shaping at least at the community baseline, automation wired, and logs visible. Only on top of that does it make sense to refine limits, resize queues and chase throughput — and the refinement happens one lever at a time, measured between moves, never ten edits in one afternoon that make cause and effect unknowable. Raising rates on top of a broken DKIM alignment, or widening connections without automation to catch the pushback, is how a well-meant tuning session becomes a reputation incident. Boring as it sounds, sequence is the safety system. We work bottom-up, and we do not scale a layer until the one beneath it has proven stable under the new settings.

Beyond the community baseline: shaping that fits your reputation

A fresh deployment rightly starts from the community shaping rules — limits distilled by operators running real volume, polite enough for infrastructure nobody knows yet. But the baseline is a floor, not a destination: its defaults are deliberately conservative, on the order of a handful of connections and modest per-minute rates for unknown destinations, because they are written for strangers. An established sender with months of clean history is leaving delivery on the table at stranger settings. Refinement means adjusting, per provider, the four values that govern a path — connection limit, connection rate, deliveries per connection and message rate — to what your reputation has actually earned, then watching the response. The engine makes this honest work: shaping files validate before they load, a resolver shows exactly which rules apply to any destination domain, and changes hot-reload without interrupting delivery. We move these values in steps, provider by provider, and the deferral curve tells us when we have found each receiver's current ceiling — which is a moving target, and which is why this layer is tuned, not set.

Connections or message rate: which one do providers police?

The least understood distinction in provider behavior is between how fast you send and how many connections you hold open at once. Intuition says speed is what receivers police; in practice, several of the largest providers watch concurrent connections more jealously than message pace, and crossing the connection ceiling is the quickest way to harvest temporary failures. When a path runs into "too many connections" language, the lever to move is concurrency for that provider — not the message rate that most people reflexively lower, leaving the actual cause untouched. KumoMTA expresses both independently per path, which makes the fix surgical once the diagnosis is right. Getting this one distinction correct, provider by provider, routinely removes a startling share of deferrals in one move; it is among the first things we check, because it is cheap to fix and expensive to keep wrong.

Traffic Shaping Automation as a tuning instrument

Most teams treat TSA as plumbing — wire the daemon, load the community rules, done. We treat it as the most valuable tuning surface in the engine. The automation history is a record of every time a provider pushed back and what the rules did about it: which regex fired, how often, what got suspended or throttled and for how long. Reading that record tells you where your shaping is still too eager, which paths live close to their ceiling, and which rules are over-reacting — suspending two hours where a thirty-minute rate reduction would have kept mail moving. Tuning here means refining triggers and thresholds so automation distinguishes a real storm from a sneeze, choosing config-adjustment actions over blunt suspensions where the response language permits, and adding rules for the provider responses your own logs show you receiving. Done well, TSA becomes institutional memory: every lesson a receiver ever taught you, encoded as policy that reacts in seconds at 3 a.m. instead of waiting for a human at 9.

The KumoMTA tuning stack, bottom to top

Optimization works bottom to top: the OS layer and spool decide raw throughput, the ready queue and pools decide how cleanly traffic spreads, shaping and TSA decide how each provider is treated, and inbox placement — measured per provider with seed lists — is the goal the whole stack serves. No layer is tuned until the one beneath it is stable.

How many sending IPs does a KumoMTA park actually need?

Nearly every optimization surfaces the same question: does the park have the right number of IPs for its volume? Both directions of wrong are common. Too few addresses for the load concentrates pressure until each one reads as an aggressive sender; too many dilutes traffic until none builds recognizable reputation, because an IP that sends droplets never becomes known. The right count comes out of your real numbers — daily volume, provider mix, how steady the flow is across the week — and the fix lives in the egress source and pool design: enough identities that no path runs hot, few enough that each accumulates history, with streams of different nature kept on separate pools so transactional reputation is never hostage to a promotional campaign's bad day. Resizing this layer often returns more than any rate change, because it treats the cause rather than the symptom: a well-divided park delivers better at the same total volume, with less shaping effort.

Queues: retries, age and what depth is telling you

Queue behavior is the engine's thermometer, and tuning it has two sides. The mechanical side is retry policy: intervals that grow sensibly between attempts, a maximum age after which a message expires rather than haunting the spool, both set per queue so a sluggish provider gets patience and a dead address does not. The interpretive side matters more: a queue that grows without draining is a signal, not a nuisance, and the reflex of raising rates to "empty it" is precisely backwards — it presses harder on whatever upstream brake caused the growth. We read depth together with the automation history and the provider responses to find the cause: a suspension still active, a ceiling crossed, a retry pattern too eager. KumoMTA's per-queue visibility makes this diagnosis fast once the dashboards exist; part of the optimization is leaving you with those dashboards, so the next anomaly is legible in minutes instead of mysterious for days.

How should the KumoMTA ready queue and memory be sized?

Between the spool and the wire sits the ready queue — messages staged in memory for imminent delivery — and its sizing is a genuine trade. The project's guidance, which matches what we see in production, is to keep the per-site ready limit modest by default and raise it specifically for your top handful of destinations by egress rate: the providers that take most of your volume get the deep staging that keeps their connections fed, while the long tail stays lean. Oversizing everything looks generous and squanders memory; undersizing the big paths starves throughput exactly where it counts. The same layer includes the engine's memory protection: under real pressure KumoMTA shrinks message bodies back to spool, purges its caches, and — at zero headroom — stops accepting new injections until it recovers. That behavior is a feature, but if you ever see it, sizing was wrong upstream. We tune ready limits against your traffic distribution and watch the memory metrics, so the protection exists and never fires.

Does the KumoMTA spool affect throughput?

When a high-volume park feels slow, the bottleneck is rarely the CPU — it is storage, because the engine persists every message for durability and that honesty has an I/O price. KumoMTA offers two spool kinds, and the choice matters: the plain local-disk spool writes each message as files, coupling performance tightly to filesystem speed, while the RocksDB spool combines in-memory buffering with write-ahead logging and asynchronous flushing — near the speed of the dangerous spool-in-RAM shortcuts, without their data-loss exposure on a crash. Our production stance is RocksDB for most senders, on fast local storage kept separate from logs and the OS, with the write-buffer and parallelism parameters set for the machine rather than left at defaults. Where plain disk spooling is the right call, the filesystem details stop being pedantry: mount options and storage class show up directly in messages per second. It is unglamorous tuning that decides whether the park absorbs its biggest send of the year or chokes on it.

Reading the engine before changing it

ops@mta-01 — kumomta

# Which provider is throttling, and is it connections or rate? (C=open conns)
$ kcli queue-summary --by-site
SITE              D       T    C      Q   last-response
gmail.com      18204    910   10    642   421 4.7.28 rate limited
outlook.com    12050    120    4      0   250 OK

# Confirm which rule actually applies to the throttled route
$ kcli resolve-shaping --domain gmail.com
connection_limit=10 max_message_rate=300/min  ← concurrency ceiling hit

# Cut concurrency (not rate), hot-reload, no delivery gap
$ kcli set-shaping --domain gmail.com --connection-limit 6 --reload
applied · 642 queued draining · deferrals falling

A real diagnosis: queue-summary --by-site shows Gmail rate-limited with 10 open connections and 642 queued; resolve-shaping confirms the concurrency ceiling, not the message rate, is the cause; cutting connection-limit and hot-reloading drains the queue without sending any faster. Reading the engine before touching it is the whole method.

What is the real bottleneck once the engine is tuned?

A stock Linux install is tuned for general purpose, not for holding tens of thousands of concurrent SMTP conversations, so part of the work happens below the engine: kernel settings that reuse connections sitting in TIME_WAIT instead of exhausting ports, file-descriptor ceilings that match the connection design, network buffers sized for sustained egress. The project's own performance work is blunt about where the ceiling sits once engine and OS are tuned: the network adapter, not the software. That is a useful inversion of instinct — teams shop for cores when they should audit their kernel table and their NIC. We bring the OS layer up to the engine's level as a standard part of optimization, because a tuned KumoMTA on an untuned kernel is a sports engine bolted to bicycle wheels.

Lua policy: keeping the hot path light

Configuration as code is KumoMTA's superpower and its one self-inflicted risk: your policy runs inside the delivery path, so slow policy is slow mail. Optimizing this layer means auditing what executes per message and making it cheap. Lookups that hit databases or HTTP on every message get memoized through the engine's caching so the answer is fetched once and reused; signing and DNS already cache, and we make sure nothing in custom code defeats that. Heavy work moves out of the hot path entirely — the modern pattern, which recent releases made first-class, is webhook and analytics processing running as a separate long-lived consumer of the compressed JSONL logs, with its own retries and failure modes, so an analytics outage can never slow delivery. The test we apply is simple: nothing in the per-message path should block on anything outside the machine. Policy that passes that test disappears from the performance picture, which is exactly where policy belongs.

Peaks: the moment that finds every weakness

A park is measured at its peaks. Plenty of installs cruise on the daily average and stumble when the big campaign lands — exactly when delivery matters most — because peaks surface what the average hides: spool I/O saturating, ready queues thrashing against memory limits, a provider ceiling crossed by the burst, automation suspending paths mid-send. Optimizing for peaks means sizing for the worst hour rather than the mean — spool headroom, memory budget, connection capacity — and pacing the send itself, spreading injection across a window the receivers will tolerate instead of releasing everything at once and letting the queues sort out the wreckage. KumoMTA gives you the pacing controls; the discipline of using them is operational. A park that only survives quiet days is not optimized, it is lucky, and luck has a habit of running out on the night of the year's biggest send.

How does warm-up fit into KumoMTA optimization?

Warm-up is where the most mythology circulates, so we keep it plain: new IPs climb in steps over weeks, engaged recipients first, pace governed by provider response rather than a rigid calendar — and the same patience applies to re-warming an IP that sat idle or took a reputation hit. What optimization adds is enforcement: the ramp is expressed as shaping limits the engine obeys, instead of campaign-side discipline that erodes under deadline pressure, and the automation acts as the safety rail that slows a path the moment a step triggers pushback. Rushed or skipped warm-up remains the most common reason a technically clean park delivers badly for months; encoded as policy, the ramp survives the humans being busy. We write the plan, set the limits, and let the providers' own responses tell us when each step has been accepted.

Suppression hygiene and the signals you send back

A dirty list sabotages the best-tuned engine, so optimization always audits what happens to mail that comes back. Hard bounces must suppress immediately and permanently — retrying dead addresses is one of the fastest ways to teach providers you are careless. Soft failures deserve patience proportional to their language. Feedback-loop complaints must land on the suppression list automatically, with no human in the loop, because mailing someone who pressed the spam button is the single most corrosive signal in the business; the 0.30% complaint threshold where providers act regardless of authentication is crossed by bad list discipline long before bad configuration. We verify the loops are registered, the events flow, and the suppression actually holds across re-imports — the quiet failure mode where a cleaned address sneaks back in through the next upload. The engine sends signals about you with every message it does or does not retry; tuning makes sure those signals say careful operator.

Engagement is the goal the engine cannot reach for you

Providers stopped asking only whether your mail authenticates and started asking whether anyone wants it: opens, replies, deletions-without-reading and spam markings now weigh more in placement than any header. Honest optimization therefore looks past the engine at what is sent and to whom — active recipients prioritized, long-dormant ones rested or removed, cadence that respects attention instead of strip-mining it. The engine helps at the margins: pacing deliveries into the windows where your audience actually reads, keeping transactional and promotional reputations apart so one stream's fatigue never taxes the other. But a perfectly tuned KumoMTA firing at a dead list still delivers to spam, because no configuration compensates for sending mail nobody asked for. We say this plainly in every engagement: when the ceiling is the list, more tuning is the wrong purchase, and we would rather point at the real constraint than invoice for moving screws that are already tight.

What is KumoMTA optimization not?

Clearing expectations saves everyone time. Optimization is not raising every limit to maximum — that is the opposite, and it reliably makes delivery worse as receivers punish the push. It is not a trick that undoes years of neglect or a damaged reputation in an afternoon; those recover with behavior over weeks, and we say so. It is not operating at the edge of the rules: we tune permission-based sending on legitimate infrastructure, because placement built on shortcuts collapses on its own schedule. And it is not change for the sake of billing — if your park already runs near its ceiling, we tell you that, hand you the numbers proving it, and point at whichever constraint is actually binding, even when it is not an engine we can tune. Optimization is simply closing the gap between your real delivery and what your configuration and reputation permit. No more, and no less.

A before-and-after, concretely

A typical case, to make the method tangible. A sender arrives with Gmail deferrals climbing and queues that stop draining at night; the team has been raising send rates to empty them, which deepens the hole. The baseline shows the real story: concurrency to Gmail sits above what their reputation currently tolerates, and the automation rules — stock, never reviewed — respond to the rate-limit language with a two-hour suspension that turns every pushback into a queue mountain. We move three things, one at a time, measuring between: drop concurrent connections for that path, replace the blunt suspension with a thirty-minute rate reduction matched to the response text, and split the transactional stream onto its own pool so receipts stop queueing behind promotions. Deferrals halve inside a week; queues drain within the daily window; seed-list placement climbs and holds. Not one change sent faster. Every change sent with more control, and the numbers — not adjectives — carried the conclusion.

Placement, measured per provider

Tuning without measuring where mail actually lands is adjusting in the dark, so the work includes real placement: seed-list tests at the major providers showing inbox, promotions tab or spam folder, path by path, cross-read with the postmaster dashboards the receivers themselves publish and with your own engagement data. Measured this way, every configuration change maps to a consequence you can see — whether widening Gmail's connections moved its placement, whether the pool split recovered transactional delivery, whether the retuned automation cut the deferral rate — instead of dissolving into a general impression that things feel better. Per-provider reading also catches the divergence cases that aggregate numbers hide: a change that helps three receivers and quietly costs you a fourth. Without this layer, optimization is testimony; with it, it is a loop of adjust, measure, confirm — and the confirmed gains are the only kind that compound.

What we track to know it worked

The metrics that matter are few and the engine surfaces all of them. Temporary deferrals per provider, which read as how hard receivers are braking you. Hard-bounce share, which reads as list health. Delivery latency — injection to acceptance — which lengthens before most problems become visible anywhere else. Queue depth and drain time, the thermometer of the whole machine. And above all, measured placement: inbox versus spam, per provider, from tests rather than inference. These numbers trend before they break, which is the practical value of watching them: a creeping latency or a rising deferral curve announces a ceiling or a reputation problem while it is still cheap to correct. Optimization means moving these curves the right way and keeping them there; the baseline-and-after discipline means you never have to take our word for whether it happened.

Every change, written down

Tuning production without a trail is how parks become haunted, so we document as we go: each change, the reason, the value before, the value after, in a log that lives with the configuration — which, this being KumoMTA, is already in version control where a change log belongs. The record serves three masters. Safety: an underperforming adjustment reverts precisely instead of by archaeology. Continuity: when your team takes the park over, or we return months later, the current state explains itself. And trust: you see exactly what was done and what each step bought, rather than receiving back a different machine and a shrug. The discipline costs minutes per change and converts optimization from a black box into an auditable piece of engineering.

How the engagement runs

The shape is consistent. We baseline first: deferrals, latency, queues, bounce mix, placement, plus a read of your shaping, automation history, pools, spool and policy code. We identify the biggest levers — most often concurrency, automation rules and pool design before anything exotic — and apply them in steps, bottom-up, measuring between, with shaping validated before it loads and everything hot-reloaded so delivery never pauses. We finish with the change log, the dashboards, and the before-and-after against the original baseline. If you want it as a project, it ends there, documented and handed over; if you want the tuning to stay tuned, it folds into managed KumoMTA, where limits breathe with your reputation as routine maintenance. Either way, what we deliver is measurable placement — and if the honest finding is that your ceiling lives outside the engine, the deliverability audit is where we chase it.

The starting point costs nothing: the free 25-point audit reads your KumoMTA and your sending and tells us where the recoverable delivery is hiding — in the shaping, the automation, the pools, the spool, or somewhere no engine tuning will reach. From there we optimize with data, moving the biggest levers first and proving every step against your own baseline. And if what you actually need is a deployment built right from the start, that is the setup service — this one is for making a running engine earn its keep.

FAQ

Frequently asked questions

Is optimization the same as the setup service?

No. Setup builds a production deployment from zero: policy, shaping, warm-up, monitoring. Optimization takes a KumoMTA that already runs and closes the gap between what it does and what it could do — refining shaping against your accumulated reputation, tuning automation, queues, spool and memory, and proving the gain with numbers. They often follow each other, but you can buy either alone: if your install was rushed or inherited, optimization is usually where the recoverable delivery lives.

How much will our delivery improve?

It depends on the starting point, so we refuse to promise numbers blind. A park still running stock defaults at real volume has a lot of margin; one already tuned has less. What we do instead is fix a baseline before touching anything — deferrals, latency, queue depth, placement — apply changes in steps and measure again, so the improvement is a verifiable fact rather than a sales line. If the margin looks small, we say so before you spend.

Do you change things in production?

Yes, with method. Changes go in one at a time, each measured before the next, and shaping edits are validated with the project tooling before they load — KumoMTA ships a validator precisely so a typo never reaches a live queue. Most tuning is also hot-reloadable, which means adjustments apply without restarts or delivery gaps. Slow, observable, reversible: that is how production tuning stays safe.

Is faster always better with KumoMTA?

No, and the engine itself proves the point: benchmarks show a single large node processing tens of millions of messages per hour when delivery is pointed at a sink. Real receivers are the constraint, not the engine. Optimizing for raw output chases a number that impresses on a dashboard and comes back as deferrals; optimizing for control — right pace per provider, automation that yields when a receiver pushes back — is what actually raises inbox placement. Speed is the vanity metric; sustained placement pays the bills.

Our queues keep growing — is that what you fix?

It is one of the most common reasons we are called, and the fix is rarely "send faster". A growing queue is a signal: a provider braking you, an automation rule suspending a path, a retry policy out of tune, or spool and memory pressure upstream. We diagnose which it is from the engine’s own metrics and logs, correct the cause, and leave you with the dashboards that make the next anomaly legible at a glance.

One-off project or ongoing?

Either. As a one-off, we tune the park, document every change with its before-and-after numbers, and hand it over. As part of managed KumoMTA, tuning becomes routine maintenance: limits breathe with your reputation, automation rules evolve with provider behavior, and the park never drifts far enough to need rescue. The first repairs; the second keeps it repaired.

More mail in the inbox — measured, not promised.

We tune KumoMTA against your own baseline and prove every change with numbers. Start with the free 25-point audit, no strings attached.