April 3, 2020
Designing software that maximizes developer productivity – Kirsten Westeinde | #LeadDevNewYork

Designing software that maximizes developer productivity – Kirsten Westeinde | #LeadDevNewYork


Thanks! Hello, everyone! What an honor it
is to be up here amongst this amazing speaker lineup. So Shopify is also a sponsor to this
conference, so I’m gonna just for one hot sec take off my speaker hat and put on my
sponsor hat to tell you about Shopify. So we’re the leading cloud-based multichannel
commerce platform, and we are hiring, so if any of what you hear about today is interesting
to you, please check out our careers page or come and find me afterwards. I joined Shopify
over six years ago now. And at the time, we had less than 200 employees, and were serving
around 70,000 merchants. With the relatively straightforward online store and point of
sale offering. Now, fast-forward to today. We have 4,000 employees, are serving upwards
of 800,000 merchants all over the world, including some as big as Kylie Jenner, and our product
offering is a lot more complex. So we’ve learned a lot along the way, and I want to share some
of those learnings with you. We’re all here to grow our technical leadership
skills. As everyone here knows, being a good technical leader encompasses a lot. It includes
many aspects of people management, like having good one on ones and helping identify your
career ladder, but also some technical aspects, like having an opinion on how your team architects
and builds the software, and being able to course correct if the current solution is
no longer best serving the product or the team. These technical solutions can affect
your team’s productivity and happiness, so getting it right can result in a virtuous
cycle. At some point in the not so distant past, Shopify was a massive monolith. It was
responsible for billing merchants, customer checkout, managing developers’ apps, creating
orders, handling shipping, updating products, and a lot more. All of this was being powered
by the same Shopify core codebase. So Shopify is built using Ruby on Rails, and started
similar to how all Rails projects start, with a command to rails.new. He started with a
zipped version of Rails that was emailed to him by DHH right after creating it. Since
then, it has evolved into one of the largest Ruby on Rails codebases in existence. So the architectural ideas in this talk can
be applied to a lot of different programming languages, but I am gonna be giving some concrete
examples to help drive the points home, and they’re gonna be given within the context
of Ruby on Rails applications. For those of you not familiar with Ruby on Rails, all of
the code is globally accessible, meaning that you can call anything from anywhere without
having to worry about explicitly saying that you have to depend on it to get access to
it. What this meant for us at Shopify was that, as an example, the code that was used
to calculate shipping rates could use parts of the code that was used to calculate tax
rates and vice versa. Even though they were powering different parts of the product and
arguably should not be sharing code. There are a lot of benefits to monoliths.
They’ve been getting an especially bad rap in the tech scene recently, so I do want to
emphasize that while it’s no longer the best solution for us at Shopify, it was for a very
long time. Even though this talk is called deconstructing the monolith, I would actually
still recommend that new companies and new products start off with a monolith or with
a minimal design. Let me explain why. In the beginning, the number one priority is getting
your product to market, and it’s not worth spending a lot of time getting your product
to be perfect. Even if you did spend time up front, there’s no way that you can get
it right, because you don’t have all the information about your product and problem domain. So
start with no design and move as fast as you can to add the required functionality. However,
a time will come where it becomes slower and slower to add the same amount of incremental
functionality. Martin Fowler refers to this as crossing the design payoff line. Where
all of the lines in the diagram in front of me denotes the perfect time to design. But
unfortunately no one is gonna come knock at your door and say: Hey, you’re at your design
payoff line, and unfortunately attempting to design and rearchitect too soon or too
late have associated downsides. There’s no magical way to figure this out,
unfortunately. I would love to tell you that, but I’ll share you some of the symptoms that
told us we had passed our design payoff line in the hopes that it might help you identify
where you hit yours. We realized we had to rearchitect in early 2016 when the cons of
our monolithic system began to outweigh the pros. Specifically, there were a couple of
things that served as trip wires for us. The tests were so slow to run that a lot of people
just fully opted out of running them locally. They were also extremely painful to write,
due to how entangled our objects were. Making a seemingly innocuous change could trigger
a cascade of failures, because if the calculation of tax rates is using shipping code, if we
change how we update the shipping, it could break the tests for the taxes and it wouldn’t
be super clear why. And one of the main cons was that developing
in Shopify required a lot of context to make seemingly simple changes. It was taking new
developers way too long to onboard onto teams, because they not only had to understand the
code owned by their team but also the code owned by many other teams to be able to make
seemingly simple changes. So all of these challenges were affecting our developer productivity
and happiness. We realized that all of the things we liked about our monolith were a
result of the code living in one place, and all of the issues we were experiencing were
a direct result of lack of boundaries between distinct functionality in our code. So it
was clear we need to decrease the coupling between different domains, and now the question
was how. One solution that is very trendy in the industry
is microservices. It is an option that we explored but one that we quickly ruled out.
While microservices would address the problems we were experiencing, they would also bring
with them a whole nother suite of problems. We would have to maintain multiple different
test and deployment pipelines and take on infrastructural overhead for each service,
and would have to get creative about getting access to the data we need, because it’s a
lot harder for a bunch of microservices to share a database than it is for a monolith.
And since each service is deployed independently, communicating with them means making calls
across the network, which adds latency, decreases reliability, and increases vulnerability by
increasing the surface area that could be attacked. Additionally, large refactors across all the
services could be really tedious, involving PRs to each service and potentially having
to coordinate deploys rather than making the change on one side of the codebase. So it’s
not to say that microservices are never the solution. It just was not the solution for
us at this time. Especially since transitioning from our current monolith to microservices
might mean having to start over from scratch because of how intertwined all of our different
functionality was, and we weren’t super keen on scrapping 10 years of work at this point.
So where did that leave us? We wanted a solution that increased modularity
without increasing the number of deployment units. This would allow us to get the advantages
of both monoliths and microservices without so many of the downsides. Once it was clear
that we needed to make a change, a small but mighty team of engineers leaders set out to
find a solution to the problem. We wanted to be data-informed when coming up with the
solution, because we wanted to ensure that we were solving the problem that we actually
had, not just the anecdotally reported one. So we sent out a survey to all of the developers
working at this time to identify the main pain points. The results of the survey informed
the decision to split up our codebase. The project was originally named break core up
into multiple pieces. This is what happens when you let developers name things. But it
eventually evolved into being called componentization. So componentizing our codebase is our version
of implementing a modular monolith, where all the code that powers an application lives
in the same codebase and is deployed to the same place, but there are strictly enforced
boundaries between the different domains. This all sounds like a nice idea, but what
does it actually mean? The approach that we took can loosely be broken
into these three steps: Reorganizing our code, isolating dependencies, and enforcing the
boundaries. I’m sure this makes it sound really easy, so I’m gonna take a second here to say
that it absolutely is not. I don’t want to artificially get anyone’s hopes up. We’ve
been working on this for over two years now, with a ton of very smart people, and we have
made a lot of progress, but we still do have a ways to go. So let’s dive into each step.
The first issue we chose to address is the organization of our code. At this time, our
code was organized like a typical Rails application, which is by software concept. So models, views,
and controllers. We wanted to reorganize it by real world concepts, things like orders,
shipping, inventory, and billing, in an attempt to make it easier to locate code, to locate
people who understand the code, and to understand the individual pieces of the system on their
own. We decided to split the code into components,
where each component would be structured as its own mini-Rails app. We eventually want
to take this one step further and namespace each component as a Ruby module. The hope
was that this new organization would highlight areas that are unnecessarily coupled, and
boy, did it ever! Coming up with the initial list of components was no easy task, though.
It involved a lot of research and input from stakeholders in each area of the company.
We did this by listing every Ruby class, around 6,000 in total at the time, in a massive spreadsheet,
and manually labeling which each component belonged in, and we went over this many, many
times, with stakeholders from all over the condition looking at it. Once we finally completed
it, we achieved the move in one big bang PR which we made with scripts from this spreadsheet.
Even though no code was changed, it still touched the entire codebase and was potentially
very risky if done incorrectly. The failures that might occur might result from our code
not being able to find object definitions, which would resulted in run time errors. Our codebase is well tested, so by running
through locally without failures as well as on staging, we were able to make sure nothing
was missed. We did it in one PR rather than disrupting developers. There was only one
day of rebase hell rather than doing it many days at a time. The next step was isolating
dependencies by decoupling components from one another. Each component found a clean
dedicated interface with domain boundaries expressed through a public API and took ownership
of its associated data. The componentization team couldn’t achieve this for the whole codebase.
Doing so required experts from each business domain, so each dev team took responsibility
to do this for the components they owned. What the componentization team did do was
provide the patterns to use, as well as the tools needed to complete the task. One tool that we rely on very heavily for
our componentization efforts is wedge, which we built in-house to track the progress towards
each goal of isolation. Wedge builds up a call graph and uses it to determine which
of these cross-component things are okay and which are violating. As a rule of thumb, associations
and inheritance across domain boundaries are always violating and calls are violating whenever
anything is accessed through anything other than its public API. Wedge reports an overall
score and a list of violations per component. We’re able to use this for a bit of healthy
competition, which helps get devs to work on this technical debt. Teams have managed
to make it a source of pride to have the most isolated component, and we’re able to use
this tool to enforce standards. Each component has to be at least 50% progress by a certain
date. We aim to eventually Open Source this tool once we’ve worked out the kinks internally,
but it’s specific to Ruby. Once each team has achieved 100% isolation,
we want to take this one step further and enforce boundaries programmatically. The high
level plan is to have each component only load other components that it has explicitly
depended on. This would result in run time errors if the component tried to use code
that it hadn’t said it was dependent upon. A nice benefit to having the dependencies
explicitly stated is that it will allow us to build up a dependency graph we can use
to visualize if we have any circular dependencies or dependencies that may have been accidental.
We’ve learned that it’s important to use your software to enforce the ideas of how you want
your software to be built. People with context might leave and people without it will always
be joining. And even the most well-intentioned people can be lazy, so just asking nicely
will never be good enough. I’ve seen this done through everything from failing tests
to bots that will come in and shame any PRs that violate the rules. How you do it is not
important. It’s just important that it is done in whatever way makes sense for your
team and your company culture. All this work has made it easier to extract
things out that don’t make sense to live in the core codebase. Because we are believers
in service-oriented architecture. It also helps us swap in components for newly designed
ones much more easily. There’s a post for the blog coming out about how we swapped out
our old tax engine for a new one quite easily due to these efforts. The progress that we’ve
seen so far tells us that it is absolutely worthwhile work to do. Interestingly enough,
we mirrored a move towards a more separated codebase in how we organized our company.
In 2017, we split into product and service lines with clear leadership and ownership
over certain parts of the product. This was done in an attempt to minimize context needed
to be effective at Shopify. And also to allow each product group to feel empowered to make
the decisions they need to — to move quickly. Good software architecture is a constantly
evolving task, and the correct solution for your app absolutely depends on the scale that
you’re operating at. In my opinion, monoliths — modular monoliths — and service oriented
architecture follow on an evolutionary scale as your app increases in complexity. If you
choose to go to the microservice direction, transitioning first to a modular monolith
will make this much less painful. Each architecture will be appropriate for a different sized
team and app and will be separated by periods of pain and suffering. When you do start to
experience many of the pain points we were seeing before we rearchitected, that’s where
you know you’ve outgrown your current solution and it’s time to look for the next one. A well designed system can increase the joy
of developing in it and be a great tool for speeding up your team’s productivity. In case
I haven’t made it obvious, this is a topic that I love to talk about. I find it extra
interesting because of my experiences on all of the different ends of the architectural
spectrum. In the past couple of years at Shopify, I’ve worked on a bunch of different teams
and projects. I worked in the core codebase back when it was a massive monolith, on a
team that extracted service out of core before we started these componentization efforts,
and I can tell you it was a lot harder than it would be today, and I worked on a team
made up of 30 services, which we boiled down to three components, each deployed on a different
run time. I wrote a blog post on this with more technical implementation details, so
if that’s of interest to you, give it a read, and if you have any questions, come find me
at my office hours. Thank you so much for listening!

Leave a Reply

Your email address will not be published. Required fields are marked *