Key Takeaways
- Flow gets slower over time for many reasons, especially software which gets harder and harder to change unless proactively cared for
- Many organizations have addressed the issue of flow dropping off, and are able to sustain a high pace of delivery over the course of years through incentives, architecture, and platforms
- Incentivising good technical practices is essential to sustaining flow, otherwise code becomes harder and more expensive to change
- Aligning teams and software with business domains increases team empowerment and reduces coupling, both supporting greater and more sustainable flow
- Infrastructure is often a major blocker to flow - modern platforms with a great DX preclude many infrastructure blockers and help to achieve sustainable
The ubiquity and growing popularity of Agile over the past 20 years symbolizes the significance that technology companies place on delivering software with a fast flow of changes. There is a pressure to get as much work done as quickly as possible. Yet there is an equal amount of frustration. Why does everything take so long? Why does it take three months to put a text box on a web page? Or more specifically, why did it take one day at the start of the project and now two years later it takes three months? Have our software engineers stopped working hard? Are we using the wrong agile framework? Do we need to hire some consultants?
This isn’t a problem that all organizations face, however. Organizations in many different industries have been able to sustain a fast flow of changes over long periods of time. Fundamentally, they understand that systems naturally get more complex and harder to manage over time, so they proactively invest in reducing complexity so that teams can continue to deliver at pace.
There isn’t a single silver bullet to managing complexity. These organizations address both the social and technical, socio-technical, aspects of reducing complexity such as incentivising good technical practices to keep code maintainable, architecting systems to minimize dependencies and maximize team motivation, and leveraging platforms to preclude whole categories of infrastructure blockers. These things aren’t easy to just copy-paste from one organization to another. Done half-heartedly, they can exacerbate the problems affecting the flow of value.
Flow tends to get slower over time
In my experience, some of the biggest contributors to declining flow are:
- Poorly maintained software systems
- Dependencies between teams
- Too much work in progress combined with a lack of focus
- Unmotivated employees (often accompanied by high-turnover of talented employees)
Just a few weeks ago a head of marketing expressed his frustration to me, asking “Why does it take us three months to add a few text boxes to a web page?” When I spoke to the engineering team he was alluding to, I learned they were responsible for a codebase older than five years (which had changed hands multiple times); they had three engineers working on the team, and they had 10 stakeholders all putting work in their backlog.
The motivation of the team was quite interesting. They were good engineers, they did want to apply good practices, and they did care about helping all of their stakeholders. On the other hand, they felt in an impossible situation, and I think it was particularly difficult due to the leaders of the organization not appreciating the amount of technical debt they were dealing with and the effort needed to get it under control.
From my background as a developer, I would also call out extraneous cognitive load, in the form of infrastructure challenges, as being an especially painful blocker to flow. Over the years, it’s often been a struggle just to get code into production, even with the rise of the cloud over the past 15 years.
Behaviors that lead to unsustainable software
A focus on delivering features at all costs, especially when coupled with a culture or arbitrary deadlines will always result in unsustainable software. Keeping software maintainable requires a healthy balance of delivering new work and keeping the code healthy. We’ve known this for years, but despite that it’s still a major problem. In my experience, the desire to deliver more work faster puts too much emphasis on short-term delivery, at the expense of code and architectural maintainability, which results in the cost of change continuously increasing over time especially as dependencies and coupling grow.
The way out of this trap is to invest more time and effort in keeping systems in a healthier state to keep the cost of change constant. It does require more than just allocating time to code health, though. For example, knowledge of how to design and refactor code is essential, but there still seem to be a lot of developers who don’t even know what mob programming, TDD, and DDD are.
Socio-technical thinking
I don’t want to try and hype it up into something big. I simply use the term socio-technical thinking to bring a balance in companies where there is a lack of attention from management on building a culture where employees have a strong sense of purpose, autonomy, and mastery. When I look at teams that I consider to be high-performing and those that aren’t, this is what’s different.
For example: some companies think that employees working long hours is a sign that they are committed and more productive. But really, when people are tired and burned out they produce low quality code that actually makes flow worse. Socio-technical thinking is simply making sure we think carefully about how social decisions affect technology, and technical decisions affect the social needs of team members.
I started to use the term socio-technical when I was a principal engineer at Salesforce in 2016. I was one of the people responsible for helping to map out a system and identify architectural boundaries. But I kept running into the same problem: however we wanted to shape the software boundaries, we were going to have to change the way teams were currently structured, otherwise there would be multiple teams partially owning every codebase. The idea that our organization structure and software architecture can be designed independently didn’t make sense to me and I have held that opinion since.
I kept looking for a word that collectively meant “the architecture of the software and the architecture of the teams” so that I could describe what I was doing, but there didn’t seem to be one. Peter Hintjens was using the term “social architecture,” so I thought that socio-technical was a good fit.
Since then I have realised that other people are using socio-technical to apply the same type of thinking not just to architecture, but other aspects of software engineering, like incentivising and rewarding teams (a social activity) in a way that results in more maintainable software (a technical activity).
Incentivise sustainable technical practices
When I worked at 7digital back in 2012, I always felt encouraged to invest in quality. The whole company was passionate about pair programming, test-driven development, and continuous delivery. And we all knew it was the right thing because we had proof: every team was deploying to production multiple times per-day.
The CTO himself was a big fan of these practices in addition to setting WIP limits. We weren’t loaded with work and encouraged to hit arbitrary deadlines, we were encouraged to take our time and invest in quality. One of my favourite initiatives was allowing people to go and spend time working with other teams to learn from them and understand about their parts of the system. I remember having a great pairing session with a senior engineer who completely changed my opinions about using exceptions for control flow, and that in-turn was something that made me challenge all of my opinions.
We also had a lot of time during working hours to improve our ways of working, including two days per month to learn new things. It was like having a birthday every month. I always loved the excitement of deciding how I would spend those two days. I spent time learning things like Erlang, the Riak database, and the actor model in Scala.
The motivation alone was worth it because it helped me to be more productive all of the time, but there were more direct benefits for 7digital too. We were encouraged to write up and share what we learned during our two days, and that created opportunities for all of the developers to come together and discuss if the things we had learned would be useful during our day-to-day work.
Decouple teams to increase flow
Firstly, I have to point out that we can’t load up the system with too much work in progress. If we do that then no amount of decoupling is going to prevent dependencies between teams that catastrophically block flow.
But, assuming we don’t overload the system with work, then how we structure our teams and architect our software can have a big impact on flow.
Primarily, the challenge is aligning teams and software with business domains. Business domains are areas of the business where we gain customer knowledge and build expertise to help us develop capabilities, like ETA Calculation which was a particularly important domain for a large Austrian transportation company. This domain was where they developed capabilities for predicting when trucks would arrive at certain legs of the journey, and accuracy was extremely important. The company had a variety of other business domains like Orders, Journey Planning, and Special Offers.
Benefits of a domain-aligned socio-technical architecture
If we shape the domain boundaries right, groups of related business concepts that change together will belong together and there will be fewer social and technical dependencies.
Shaping good domain boundaries isn’t always a trivial task. When you stay high-level, you can easily fool yourself into thinking something is a sensible domain like the “customer domain” (this is usually something which connects to everything about the customer and results in a very tightly coupled system). I recommend using techniques like Event Storming and Value Stream Mapping to really get into the details of how your business works before attempting to define domain boundaries.
Event Storming is a technique where you map out user journeys and business processes using sticky-notes. There aren’t too many rules, it’s a lo-fi technique which increases participation due to a very small learning curve. There is one rule though: processes are mapped out using domain events which represent something happening in the domain and are phrased in past tense, for example, ETA Calculated, Order Placed, Claim Rejected, and so on.
Event Storming is a collaborative discovery and modelling techniques. This means that it works best when you invite people from different teams and people with a mix of skills like engineers, product managers, and designers. By bringing a diverse group together you can build a much deeper picture of the domain which means that when it comes to identifying your domain boundaries you are making decisions based on a larger amount of information with fewer blind spots.
I actually created a Miro board where you can explore an example Event Storm and play around with slicing it up into domains. It’s free to use and there is no marketing: Slicing an Event Storm into Domains Kata.
Platforms and infrastructure
When I worked on a government project with HMRC in 2015, a few lines of configuration and a few button clicks later and my team had a brand new microservice with environments, deployment pipelines, and everything else we needed to get the code into production and support it like metrics monitoring and logging.
It’s now 2022 and I still see a lot of companies force their developers to jump through loads of unnecessary hoops just to get code into production, eating up a chunk of their time and cognitive load which could be better spent identifying unmet user needs and improving products.
Well-designed platforms, like MDTP, remove all of the unnecessary distractions for developers and allow them to focus all of their time on gaining business domain expertise and continuously delivering product enhancements.
Enabling fast flow with digital platforms
Developer Experience (DX) is key to designing good platforms. One of my clients built a platform that forced all of their teams to create and maintain multiple huge kubernetes files to get their code into production. This ended up becoming almost a full time job for one person in the team who spent a big chunk of time learning kubernetes and how they were using it. This is not a good DX in 2022.
A good DX optimises every developer interaction with the platform. Great documentation, slick paved roads, and fully self-service platform capabilities are some of the key enablers. In order to achieve this, the Platform Team needs the mindset of wanting to help developers by creating a great DX.
The Ops engineers almost went on a protest in one of my clients when they heard that they were going to build a platform and their role was to help developers to be successful. They had always seen themselves as there to prevent developers doing stupid things by locking things down, and they had an air of superiority. That’s definitely not compatible with building a great DX and an effective digital platform.
A good platform team measures and continuously tries to improve DX: how quickly can teams get a new service to production? How can they reduce the number of support tickets being created? How can they help teams to improve deployment frequency and improve uptime etc?
Adopting socio-technical thinking
In the past couple of years I’ve become a big fan of Jonathan Smart’s Better Value Sooner Safer Happier. I think this is a great tool that all organizations can apply to help them balance social, technical, and business needs in everything they do. Sooner recognises the need to deliver work faster, but “better” is there to emphasise the importance of quality which results in sustainable fast flow, while “happier” makes the social needs of every individual a first-class concern in the decision-making process.
At first it will be necessary to be deliberate. Whenever making a decision, use the BVSSH diagram to studiously consider each aspect of BVSSH. Over time, this will then naturally become ingrained into your thinking patterns. The perseverance will be worth it.
Conclusion
Organizations that make an ongoing commitment to reduce the complexity of their systems are rewarded with sustainable fast flow. The rate at which new product enhancements can be added remains high over a long period of time. For organizations suffering from a drop off in flow, these organizations can be an inspiration, in particular the ways that they incentivize good practices, architect teams and software, and leverage platforms.
There are no copy-paste solutions, though. Each of these three areas requires attention to both the social and technical aspects which takes time, effort, and even more time. But companies in all kinds of industries have proven it’s achievable – you’re not trying to do the impossible or something academic that works in theory but hasn’t been proven in the real world.
If you persevere you’ll be rewarded, and one day somebody will be writing an article about how awesome and life changing it was to be part of an amazing engineering culture that you helped to build.