April 5, 2020
Communicating and documenting architectural decisions – David Ayers | #LeadDevNewYork

Communicating and documenting architectural decisions – David Ayers | #LeadDevNewYork

DAVID: How is everyone doing? Good! All right. So I’m here to talk about communicating and
documenting architectural decisions. Something that as we all know is hard. It’s got the word “documenting” in it, so
no one likes to do it. So this is a little bit about me. She talked about me. I’ve been doing this for a long time. I spent 15 years at the Container Store helping
them build up all of their systems. So a lot of the experience I’m gonna talk
about here is gonna be based on what we did there, and how we built up our teams and how
we communicated and built practices that helped everyone understand what we’re doing. So we do live in interesting times, right? The world has changed. You look at what’s happened in the last five
years, in the last ten years, in the last 15 years, about how we build systems… And everything’s changed. We’ve got Agile practitioners telling us how
we should build things, and we should have decision making pushed down to the teams. We should let architectures emerge from the
ground and magically show up and everything is gonna work together and it’s all gonna
be great. Right? You should delegate those responsibilities
to the last responsible moment. Right? But we still have jobs to do as technology
leads. Right? We have to maintain the health and welfare
of our system. Liz talked earlier about understanding and
dealing with production problems. How do you do that if you’ve got 15 different
teams building things 15 different ways, with no guardrails around how they make their decisions? So I’ve struggled with this a lot, and I’ve
really tried to figure out: How do you balance those two things? How do you balance giving people freedom to
make decisions and to choose tools that work best for the way they do work, versus how
do you then make sure that those systems are all maintainable and observable, as you go
into it? So I’m gonna start with an example. Something we ran into at the Container Store. So… We had a team. Right? And so we were all about letting teams kind
of make their decisions. This was very early on in the process. This was probably 8 years ago. And they’re like… We want to use Mongo! We’re like… Cool. That sounds awesome, right? They were… Schemaless, did their development, super fast,
got everything done, and it’s great. Cool. Let’s go to the ops guys and get this… And they’re like… You did what? We use Oracle here. What are you doing with this Mongo thing? That was a failure. This is pre-DevOpsDays and pre-understanding
how you do these things. They made this decision, built this piece
of software, and operations had no ability to support it. You have to run Mongo in a three-node cluster. They hadn’t provisioned anything. This isn’t a tale about DevOps. It’s a tale about communication. There was no communication. No standard. The team did what they wanted with no ability
to support it. So I’m gonna talk about three techniques that
we kind of settled on. We didn’t invent any of these in particular,
but I’m gonna talk about the ways we use them, and hopefully that’ll be useful for some of
you, as you go through your journeys at your companies. So the first one is: Lightweight architectural
decision records, the second one is enterprise architecture guilds, and the third one is
building reference implementations. So I’ll start with lightweight architectural
decision records. So this is something that everyone asks themselves,
right? What in the crap were we thinking? Right? Now, we make decisions all the time. All of us do. And we care about them. This isn’t saying that you made a bad decision. Because everyone cares about the decision. Everyone is on white boards and there are
pros and cons and there’s arguments and there’s all kinds of great discussion around this
stuff. Right? But what we aren’t good at is socializing
those decisions. Socializing just means understanding that
the group — we’re a social group — that those decisions are socialized and talked
about even amongst the team and the organization at large. Whoever participated in those conversations,
they understand why you made the decision. But no one else does. And we’re not good at recording them. Right? No one is particularly good at documentation. So one of the things that you can do is try
to make that as easy as possible. As developers, we’re lazy. We want to do as little work as possible and
automate as much as we can. So
you can look in the codebase and see… Oh, they decided to use React or Angular or
whatever. So you can see what the architectural decisions
are. But what you can’t see is the whys, and the
context about what led you to that decision. So architectural decision records — Michael
Nygard is an author and speaker. He came up with this idea about documenting
your decisions in a lightweight way. And doing it along with the code. So it’s a way to document, and it’s a place
to put the documentation. There’s a special case for decisions that
span across multiple projects. That wouldn’t make sense to put in a particular
codebase. So I’ll talk about those at the end of this. So all too often, the decisions we make aren’t
well documented, and if they are, they’re written in some shared document, or, even
worse, on some wiki page. You may all have great wikis. Raise your hand if you have a great wiki. Right? That’s good. Good job. Good job. Wikis normally super suck. And nobody is good at maintaining them. You go… I’m gonna search on architectural things. And you find six pages. You’re like… Cool. One of them is right. Right? So the idea is that you put the architectural
decisions right in the codebase. It’s kind of where the rubber meets the road. So that really resonated with me, because
as a developer, with a developer background, I like to go to something new, check out the
code, and see what’s what. Right? So having a place where I can see all the
decisions that were made for a particular project, along with the code I’ve checked
out, to me really resonated. So the real question is: Who cares? Right? Why is it important to document this stuff? So for me, at least, and I’ve mentioned this,
the important thing is to capture the context around the decisions. It’s not the decisions themselves. Although that’s good to know. It’s really that context. So when someone is faced with a decision,
our developers and ourselves are faced with decisions every day, so we can do one of two
things. We can say: I’d like to make a change, and
I can either blindly accept the original decision, I can say: Cool. They knew what they were doing. I’m just gonna accept that. But the thing is, at this moment, the context
may have changed. Something about the way that decision was
made might not be relevant anymore. Right? Or even worse… They can just change it. Whee! (laughter) And that might be okay. Right? Again, the context may have changed. It might be the right time to change that
decision. But not having any understanding of why you
made those decisions in the beginning might lead you to make a change that is a bad change,
because all of those things are still in play. So I’ll talk about real quickly what an ADR
looks like. So an ADR is a text document, and it’s usually
written in markdown. You create one ADR per file. So the idea is that these things are short
and to the point. And usually they’re named something like this. So they’re numbered, so if you have 26 decisions,
you’ll have 26 documents. Usually the file name is as descriptive as
you can make a file name. And there’s not really limitations, necessarily. You don’t have to make your file names really
short anymore. And then you save them in a folder in your
project. So this is an example. There’s an ADR folder at the top level. This is a Maven project. So in that, you have these documents. Right? So this is an example of what markdown — excuse
me. Of what an ADR looks like. And this is a real thing, by the way. There’s some tooling out there that helps
you generate ADRs, and you go to my website at the end — you can see a link to this talk. One of this is an ADR that says you’re gonna
generate your ADRs in markdown. Anyway, so this is a real thing. So the parts of an ADR. There’s the title. And it’s usually as simple as you can say
it. Markdown format. Git for version control. L dot for authentication, Angular for frontend
development. Something like that. And there’s a status. So the idea is that decisions aren’t static. They change over time. In the original proposal, this was something
like… Proposed, accepted, superseded, and deprecated. So you would have this directory, and among
them would be things in various statuses. And the original idea is that these things
were immutable. So if you changed a decision, you introduce
the second document that changed that decision. You mark this one as deprecated. In practice, we found that was a little cumbersome. Right? You’re in source control, after all. If you want to change something, just change
it. And you have a record of it. So we ended up getting rid of superseded and
deprecated. If we did deprecate something completely,
we would move it to an archive directory. And again, the point here is: Adopt these
things for how your work style works best. So the context. Again, this is, to me, the most important
part. Right? You describe these things in a factual way. Right? There’s gonna be biases. Everyone has, like, biases, and you make one
decision or another because of biases. So record those things. And try to record them in a way that’s non-threatening. You don’t want to make someone feel bad, because
you chose their decision. It shouldn’t be like… Ha-ha, your thing sucked. We chose this one instead. Try to record them in a factual way. And the other thing is: If there’s tension,
everything has tension. And tension isn’t a bad thing. Where you have ideas and tension are where
you get creativity. But record that tension. And then there’s the decision itself. Usually something like: We will do X. And that’s a simple statement of what the
decision is. And then the last thing is consequences. For good or ill. Every decision is gonna have something good
about it. And maybe something bad about it. Record those things. Again, these are like… Like Liz said, I think, these are cookies
to your future self. Right? You want to leave these notes to your future
self, so you can say: Why did we do this? Oh, that’s why we did it. So again, these are designed to be short documents. And this is one of the things that, like,
people don’t like documentation, because they think… Oh my gosh. I’m gonna have to write books and books and
books. This is as simple as it can be from a documentation
standpoint. So one of the things that we found is we started
using ADRs on projects, and it was great. We started documenting things. They were especially awesome with greenfield
projects, because you can literally record every decision you make, from the inception
of the project to the end. But that doesn’t mean you can’t start with
a project that’s been around for 10 years. The best time to do it is now. Well, the best time to do it is ten years
ago, but if not, the best time is now. So we found that there’s decisions that we’re
making, that were overarching. They weren’t specific to one project or another. So what we ended up doing is creating a specific
git repo called technical architecture. And it contained all of the decisions that
we made, that applied to all of the teams. Things like: We’re gonna use git for version
control. We’re gonna use a particular format for our
commits. We’re gonna use Docker for deployments. Again, anything that sort of spans across
multiple projects. That repo is also a place where we put things
like architectural diagrams for how things fit together. So it was a great place… If someone wanted to now… Hey, what’s up? How does this company work? They can start there, as a new developer,
and see all the decisions they made that led them to today. So the nice thing about this is that most
people use some sort of pull or merge request workflow. And these fit right into that. Right? You can make… You can propose a new architectural decision
by way of a pull request. And you can have robust conversation on that,
and if you’re using something like GitHub, all that conversation is recorded. So not only are you recording the context
in the final ADR, but you’re recording the context and the discussions that led to that
decision as part of this process. So I think ADRs are awesome. I think everyone should use them. I’m in a new role now at Leslie’s Poolmart. So I’m in the process of introducing these
into the company. The other thing I found is that as I kept
in touch with my friends at the Container Store, they’ve almost taken the “A” out of
architectural decision records. Because if you think about it, this is a good
way to just record decisions. It doesn’t have to be architectural. The genesis of this was architectural decisions,
but they can be used to record any kind of decision, because all of these things are
useful. So you don’t have to just believe me. I think ThoughtWorks is here. They probably showed this, but they do their
technology radar, and last year, they said: Everyone should adopt lightweight architectural
decision records. So it’s a great technique. And you should do it. So the next thing I’m gonna talk about is
enterprise architecture guilds. So we’ve adopted new approaches. We’ve built cross functional teams that have
two pizzas when they get together. I’m not sure. It’s an Amazon thing. By the way, what kind of team eats only two
pizzas? Isn’t the serving size at least half a pizza? So the teams are four people? Anyway… So we give these teams free rein to choose
whatever technology they want. And then enterprise architects are all sad. Does anyone here actually have the role of
enterprise architect in your company? Zero people? So that’s something that’s not common anymore,
necessarily. Is even having a discipline around enterprise
architecture. So that’s a role we fill, or it’s a role that
no one fills in an organization. And I think that’s a mistake. Right? I think it’s important to have. Which makes it hard to make decisions. It’s really hard in a large group like that
to make decisions. So we came up with sort of three different
ways to make decisions in a group like that. The first one was — and actually, this is
a common theme. The first thing was — someone would suggest
something new. Then we would say: Cool. Go write an ADR. Right? And they would get together with maybe three
or four like-minded people. And produce an ADR, and bring it back to the
guild. The second thing is: We would form short-term
special interest groups. So often there’s a decision that’s complex. That might take some time to do. An example of this is when we looked at adopting
Docker as our single method for deploying all the things. Like, that’s complex, right? There’s a bunch of problems you need to solve. This is before Kubernetes. So we were doing Docker Swarm. But there were still a bunch of complexities
around how you build and deploy Docker things, right? So we set up a short-term SIG. They met every week, for probably six to eight
weeks. They wrote several ADRs. They built some reference implementations,
which I’m gonna talk about next. And they proved how we should do Docker things. And there came a point where… That was done. Right? So it was a short-term SIG. They had come up with their recommendations,
and they dissolved as a SIG. And the last thing that was part of the architecture
guild was long running special interest groups. These are things like… In our environment, we had three main languages
that we used. So we had Java, Spring Boot, generally speaking,
we had Ruby on Rails, and we had Node, as ways to build things. So each of those had a special interest group
around it. For things like… Code standards, and linting and, like, how
do you — in the Java world, how do you keep up with the stuff that’s going on with Java? With releases every three months or every
six months? How do you do that as an organization? So we delegated the responsibility to the
Java SIG as an example. So some companies… Doesn’t sound like many in here… Have a thing called an architectural review
board. This might be my view, coming from a more
enterprise company, as opposed to a product or startup.So we essentially gave the responsibility
of architecture review board to the guild. At the beginning, I was nervous about this. We had some control over the environment. Especially after the Mongo incident. We tried to be thoughtful about what we allowed
into the infrastructure. I went into this, and I sort of retained>>So you have to approach it from multiple different
angles to know — there’s a new way to do database deployments. There’s a brand-new method where everything
is automated. You should do it that way. So each of these meetings had a set agenda. So it went like this: At the beginning of
every meeting, there were reports from each special interest group on work that they had
done. That’s the short term and the long term special
interest groups. There was discussion on: Has this particular
short-term SIG reached its end of life? Is there still more discovery that needs to
be done around this topic? And then review and discuss open ADRs. And we kind of do the Gladiator — you get
a thumbs up or thumbs down, and if there were enough thumbs down, that usually meant there
was additional discussion that needed to be had, or additional ideas. And lastly, open discussion. It was a forum for letting people talk about
things that were on their mind. Right? This is something like… Everyone thinks about new technologies. Everyone thinks about what’s coming up next,
or new ways to work. This was a great venue for allowing that discussion
to happen. And again, it was filtered down to where the
discussion got interesting enough. We would form a short-term SIG, or tell people:
Think about some ADRs and bring them back to the group, and we’ll think about incorporating
them into our infrastructure. So the last thing is building reference implementations. So the company I work for, Container Store,
we set up Docker as our deployment environment. We’re using Vault for secret management, Consul
for discovery and config management. Our first implementation of this was with
Java and Spring Boot. That’s an incredibly complex environment. So think about what that means. I’m a new developer. I’m gonna go work here. Here’s all of the things you need to learn
in order to work in our environment. Right? Each one of those is complex in its own right. So what we decided to do was start building
reference implementations that baked in our best practices. We spent a lot of time thinking about how
Docker should work in our environment. And how Docker should work with Vault. And how Docker should work with Consul. So we had a reference implementation of a
Spring Boot app that had all of those things in it. It had a Docker file, it had a Gradle build,
and it essentially means that you can take that and you can start building your code
without necessarily understanding all of these things. Now, you have to be careful too. Because you can’t treat this stuff like… Oh, it’s all just magic and I don’t need to
understand how this stuff happens. But if you at least give someone a road map
and show them how it works, it’s a lot easier for them to get started than if you give them
nothing. Right? And it really was a way to enhance developer
productivity, and to ensure best practices around things like linting rules. And static code analysis. And security. Right? We had all of these things baked into this
reference implementation. So you didn’t have to worry about how you
were gonna do observability. Because it already had logging and application
performance monitoring baked into the base thing you got for free. So everything you needed to just start solving
the business problem and not worry about futzing with the technical details. So in the real world, it worked. Right? We saw this really accelerated — we were
at a time then when we were standing up a lot of new microservices. Sorry, we did microservices. (laughter) We were standing up a lot of new microservices
and sort of deconstructing some monoliths, and so setting up a new thing and getting
it running was a matter of cloning a git repo, deleting the .git directory, resubmitting
to repo, and running one build. And you had a thing that was deployed to your
test or your dev environment. So from a productivity standpoint, imagine
what that’s like, if that’s the simplest you need to do, to stand up an application. It was pretty great. And then over time, as I said, we ended up
with — we added Ruby on Rails as a second way to build things. And before Docker, all these things had their
own unique snowflake deployments. Right? It was how you deployed Rails, and how you
deployed Node and how you deployed Java were three separate processes. So Docker allowed us to deploy those in one
way, and with three reference implementations, which had the same things baked into them,
it allowed us to be productive in any of our approved language stacks. So it’s important that you realize that when
you do something like this, there’s an implicit commitment to maintain these things. Right? Reference implementation, documentation, anything
else you do is going to get stale, unless you put effort into it. The good thing was, for us, we sort of — a
lot of these things — and you can probably see this — a lot of these things sort of
start to reinforce each other. So you would have the Java reference implementation
for Spring Boot was owned by the Java SIG. Let’s say we wanted to adopt something new,
like a new version of Spring Boot, or maybe we wanted to switch from Consul to Zookeeper
or something else for service discovery. You do that in the reference implementation
first. So it also became a playground for trying
new things and ensuring they’re gonna work with the rest of our environment. So it was really self-reinforcing. It wasn’t hard to get people to maintain these
applications, because they wanted to maintain them so that they got that stuff for free
the next time they built something new. Right? So the benefits were really way beyond just
all the things I’ve talked about. Right? Again, this is another way you can socialize
your best practices and your architectural decisions. Someone wants to know how Vault and how secrets
are injected into our Docker containers. You go look at it. Right? And as developers, you’re much more likely
to understand something by going and seeing what the code does than necessarily reading
some document about it. So we do have a document about it. We talk about using Vault for secret management. There’s an ADR for that. And then you can go look at a reference implementation
in Java or Rails or Node that shows how secrets are injected into the application, and how
the application gets a handle to a secret it needs to log onto a database or to call
a web service. So it really helps with that. Again, that socialization aspect. It gives people a place to go see how those
decisions were realized in code. So I’ve got about three minutes left. So I had this extra slide in here, in case
I was early. This is an architectural technique that is
also good, I think, at building and helping socialize how you’re gonna build something. It’s especially useful when you’re starting
something from scratch. Has anyone heard of either of these techniques? A few people. Right? So the idea is — and I’ve heard different
names for it. I always called it a tracer bullet. It’s also been called a steel thread. And essentially what it means is that: As
you begin to build something, you build the simplest possible thing that exercises all
of the components. Right? So you’re building — in the steel thread
analogy, you’re building this thread through the whole thing that becomes the basis for
your entire application. So your frontend talks to your backend, which
talks to your database, or talks to your web service. Whatever the parts of your system are, and
it logs to your logging system, and you can see it in your application performance monitoring
tools. It’s the simplest possible thing that executes
all of the paths. And think about how that helps with the socialization
aspect. Each of those things is an architectural decision. Now you’ve got an implementation that shows
how each of those things connect. So socializing it amongst the team, now they
can see how they connect. And it’s especially useful when you’re dealing
with a project that has multiple teams involved. Each of them can now see how those connections
are made and make sure that they all work. So that when you’re building things later
— it’s like the worst thing ever. Probably no one does this anymore. Right? Where the frontend team is over here, building
the frontend and the backend team is over here, building the backend, and they’re trying
to build that train to the middle, and hopefully they meet? Is that familiar to anyone, maybe? I know people still do that, right? So this is a way… I mean, if you still have multiple teams doing
it, at least you do it as soon as possible, and maybe if it’s a dumb, unstyled frontend,
talking to a minimal backend, that talks to maybe one database table. But again, it gives you that… Something to build on. And now all the interactions you’re building,
you’re building on top of proven ways to talk through these systems. So thanks for your time! So as I said, we are really at a crossroads
with how we do development. Right? The old ways are changing. And I’m running across this now in my new
role, where I’ve got a ton of legacy systems that I’ve got to bring up to speed. So these things are very much on the top of
my mind. So I think hopefully these things that I’ve
given you are just some additional tools you can put in your tool box and adopt for the
way you use it. So here’s the git repo that has these slides
for the talk. You can also get them on my website. IAmAGiantNerd.com. So thanks so much! (applause)

Leave a Reply

Your email address will not be published. Required fields are marked *