Our platform team is eight engineers supporting a 14-million-customer bank. People ask how that is possible without everyone being on fire. The short answer is we say no well, we automate the boring things, and we refuse to be heroes. The longer answer is this post.
The first thing an eight-person team has to do is decide what it does not do. We maintain the orchestration runtime. We maintain the bot authoring platform. We maintain the observability surface. We maintain the deploy pipeline. We do not maintain individual bots, individual BPMN processes, or individual flows. Those are owned by the teams that build them.
This is harder than it sounds. Teams want us to own their thing because owning things is a chore. Our job is to give them the tools so that owning their thing is not a chore, and then decline to take it on when they ask.
The second thing is to push everything repeatable into tooling. If a platform engineer is manually doing something more than twice a quarter, we automate it. Every time.
Our deploy pipeline handles bot deploys, BPMN deploys, and flow deploys the same way. One command. Same rollback. Same gates. Platform engineers do not review deploys. The pipeline reviews deploys.
Our observability surface is the same across all three authoring experiences. Platform engineers do not teach teams how to find their logs. The teams find their logs.
Our onboarding tooling is thorough enough that a new bot author is productive within two weeks without sitting next to a platform engineer. This took us eighteen months to get right. It has paid back many times over.
The hero engineer is a bug, not a feature. If you need a specific person to ship a specific thing safely, that thing is not production-ready, regardless of how elegant the code is.
Any one of our eight platform engineers can be on call. Any one can ship a hotfix to any part of the platform. We do not have specialists. We have generalists who have picked up deep knowledge in two or three areas.
This is a deliberate choice and it costs us something. A specialist is faster at their specialty. A generalist is slower at any one thing but faster across the whole system. For a team of eight supporting a bank, we think generalists win. We might not make the same choice at team of thirty.
One-on-ones. Code review. Postmortems. Career conversations. Weekly platform reviews where we look at the last seven days of incidents together and discuss patterns. These are human rituals and we keep them human.
The reason is that these are the rituals that let us automate everything else. If we lose the postmortems, we lose the pattern recognition that tells us what to automate next. If we lose the weekly review, the team drifts. A team of eight is small enough that it can drift without anyone noticing, and it can drift in ways that take six months to unwind.
We hire for three things, in order: judgment, writing, and engineering. Judgment because eight people supporting a bank have to make calls constantly about what to prioritize, what to drop, and what to defer, and those calls compound. Writing because the team communicates asynchronously across three time zones and bad writing slows everyone down. Engineering because we are, in the end, an engineering team.
We do not hire for narrow specialty. We hire for people who can pick up a specialty in three months and teach it to the team in six.
Growing fast. We have grown from six to eight engineers in three years. That is on purpose. If we doubled the team next year, we would spend twelve months figuring out how to work at that size before we got anything else done. Small teams compound slowly. We have made peace with that.
Long-term strategy documents. Our planning horizon is one quarter ahead. We do rough sketches of the next two quarters. Beyond that, we assume we will be wrong and we focus on making the current quarter solid.
Being prestigious. Our team is not famous. Our technology is not featured in industry awards. None of our engineers are conference keynote speakers. This is fine. The bank runs. Our incidents are rare. Our engineers are happy. Prestige is a nice-to-have for a platform team. Reliability is the brief.