
3 February '26
When a service breaks, the first question is simple. Who owns this?
In many 50-engineer organizations, that answer takes too long. People search Slack. Alerts bounce between teams. Engineers guess. While this happens, the issue stays live. This is not a scale problem. It is a clarity problem.
At this size, service ownership breaks in predictable ways. You have enough services to lose track of responsibility, but not enough structure to enforce it. Ownership lives in people’s heads. Documentation exists but no one fully trusts it. Teams change faster than updates. On-call rotations drift from reality. Over time, accountability fades.
Good service ownership fixes one thing. It lets you answer, fast and with confidence, who is accountable for a service right now. Accountability does not mean who built it or who last touched the code. It means one team owns the service today. You know how to reach them. You know how escalation works if they do not respond. You know when ownership was last reviewed. If any of this is unclear, ownership is broken.
Service ownership needs a system, not a document. An internal developer portal works because ownership lives next to the service itself. Engineers already use it, which makes it the source of truth by default. When the portal is trusted, engineers stop asking Slack and start acting.
You do not need much metadata to make this work. In fact, more fields usually make things worse. At minimum, you need to know the owning team, the primary contact, the escalation path, the runtime environment, and the last ownership review date. If you cannot keep these few facts current, you do not really have ownership.
Speed matters more than completeness. Ownership must appear where engineers already look. It should show up on the service page, inside alerts, and close to the code. Engineers should not search or scroll. They should glance and move on. If finding the owner takes effort, the system has failed.
Keeping ownership current is the hardest part. Reorgs, attrition, and shifting scope break ownership faster than outages do. Good intentions do not solve this. Pressure does. Tie ownership confirmation to deploy rights so updates happen as part of normal work. Review ownership on a fixed cadence and do it asynchronously. Make stale ownership visible and uncomfortable. Silent decay is worse than missing data.
Escalation deserves special care. During incidents, no one reads paragraphs. Escalation must be simple and predictable. Page the primary contact. Wait a defined amount of time. Escalate to the next layer. Declare an incident. Consistency reduces mistakes when stress is high.
Many teams fall into the same traps. They assign multiple owners and create ambiguity. They let platform teams own everything and hide real accountability. They define ownership by repo instead of by runtime service. They rely on manual updates with no enforcement. In each case, ownership turns into documentation instead of operational data.
You do not need a big rollout to start. Pick a small set of critical services. Define ownership using a minimal set of fields. Surface that data where alerts already point. Then simulate an incident and see how fast you find the owner. The gaps will be obvious.
Clear service ownership saves time, but the deeper value is behavioral. When ownership is visible, teams take responsibility. Engineers stop debating and start fixing. Incidents resolve faster. Trust improves.
If service ownership feels harder than it should be, take a look at Moai. Moai gives you a clear, shared place to define service ownership, contacts, and escalation paths, and keep them current as teams change. You can see how it works on the demo video and see how it helps your organization.
Geert P. Thiemens
The Moai team
Sign up for the monthly update!