The Fear of Deployment Factor

Kyrylo Yefimenko

Published in

Talkdesk Engineering

7 min readDec 4, 2020

Nothing in life is to be feared. It is only to be understood.
— Marie Curie

Introduction

Everyone wants to deploy fast and frequently. This can be looked at as a maximization problem where one wants to maximize two variables: speed and frequency.

Achieving the best results in frequency is quite simple — it is enough to deploy more often. But imagine that the deployment is extremely complex and dangerous each time it is performed. The overall results would not be favorable for the clients nor the Engineer performing it.

Therefore, it is important to have the deployment speed variable maximized too. In such cases, speed defines how fast deployment is made available to clients with great confidence that no issue will occur.

Therefore, it is also important to have the deployment speed variable maximized. In such cases, speed defines how fast deployment is made available to clients, with great confidence that no issue will occur.

Consider that you want to deploy an application, all the external factors for your deployment are in an excellent state, and no issue will occur. Then, the deployment time (in human hours) really depends on a few factors only:

The number of humans involved in the deployment.
The time an app takes to be deployed and to be fully available in production.
The time required to perform smoke tests.
The amount of constant monitoring time required after the deployment to be confident everything is fine.

All the above multiplied by the fear of deployment factor.

The Fear of Deployment Factor

It is fascinating how we can connect the probability of something going wrong with an emotion the Engineer might feel — fear!

Those two do not always correlate, as fearing to deploy doesn’t necessarily mean that there is a higher probability of something going wrong. But if the fear is of the right type, the correlation between the two is clear.

For the sake of example, let’s consider that the factor goes from 1 up to 100. At 1 — the Engineer is very confident in the deployment. At 100 — the Engineer is absolutely sure they will have to work a bit more enthusiastically, stressfully, and cry in the end.

Don’t forget that the real Engineers don’t cry: they blame everything on the interns. Unfortunately, sometimes there are no interns around. In such cases, it is better to lower the fear of deployment factor as much as possible.

The easiest solution to avoid fear is being ignorant, but let’s not go down that path.

Three Types of Fear of Deployment

Fear of missing lunch (by far the most important one).
Fear by incompetence (easily improved with practice).
Fear of changelog (the one that triggers the correlation between the fear and the probability of having an issue during or after the deployment).

The first two are important, but they are easy to take care of and are not worth our time. Let’s rather focus on a more interesting and complex type of fear — the fear of changelog.

The Fear of Changelog

When deploying, you must think of how hard it will be to rollback a deployment if things don’t go as planned.

But, what if instead of thinking of how easy it is to rollback a deployment, one goes to the root cause and thinks of how to diminish the necessity to perform a rollback at all?

Some core pillars support the Engineer’s sanity during and after the deployment. The main idea is not to pray for success but to act to succeed. That means that to have a successful deployment, it is necessary to correctly prepare for it.

The deployment preparation mostly happens during the development phase, and it is important to think about the future you and make your life easier:

Practice defensive programming. Prepare yourself for the known bugs and be proactive about the unknown. Some issues are overseen even during the testing phase. For example, you might be testing the creation of a new table in the QA environment, where it does not exist, using CREATE TABLE. This will successfully create the desired table. Moving to the production environment, you miss the fact that the table already exists and, therefore, the CREATE TABLE command will fail during the deployment. In this case, using a more defensive approach with CREATE TABLE IF NOT EXISTS is advisable.
Review the Pull Requests by seeing and not just looking. There should be no instance during which you would think that it is acceptable to skip or to slack during a review (be that because of how respected the author is, or just because you are not feeling like it). Everyone makes mistakes, and it is during the review stage that they should be caught, and not later in production. It is best to see the Pull Request by checking the branch out and testing the introduced code.
Have an extensive test suite that you can trust and rely upon that tackles various levels of the test pyramid and uses multiple testing strategies, each with its own purpose. It will boost confidence that the deployment will go as intended.
Introduce the QA phase into the development cycle. Having tests in the code is not enough to make sure everything works fine. Having full code coverage doesn’t mean the application has no issues. The QA phase, preferably executed by a specialized QA Engineer, in a pre-production environment equivalent to the production one, will open your eyes to issues you might have not even thought about as a developer. The tests for those unforeseen issues should later be implemented in the code.
Prioritize introducing small changes whenever possible. The smaller the change, the easier it is to understand which part of the application is going to be affected. This allows the Engineer to understand whether an issue is due to the introduced change, or not during the deployment, and resolve it faster.
Analyze what might go wrong before the deployment. A great idea would be to create documentation with notes on what should be done before, during, and after the deployment. This documentation can range from a checklist to be consulted before the deployment, to a feature rollout plan that should be reviewed as if it were a code piece, to an elaborated rollback plan for the case if something goes wrong. For example, you might want to validate that some environment variables are set in production before performing a deployment. A checklist assists such necessities with tracking and is very useful when the Engineer executing the deployment is not 100% familiar with the introduced changes. On the other hand, having a complete rollback plan for the case of an issue will surely relieve some fear, and give some confidence to the Engineer performing the deployment.
Deploy when the application has less traffic. This should be considered a smell, meaning that if an application can’t handle a deployment during the high traffic hours, then a random restart of that application in that timeframe might result in unexpected issues. Nonetheless, if it is the case, then deploying during the low traffic hours will not only put less load on the application hosting cluster, but it also will be less impactful for the clients if something goes wrong. This shouldn’t be normal and must be resolved.
Deploy a feature gradually. If a feature can culminate in possible issues, it should be made available gradually. This results in, once again, fewer clients being impacted if something goes wrong, and it can be achieved through various approaches. For simple features, using feature flags would be enough. As for the more complex features, you could use a more elaborate approach like canary deployment along with the blue-green technique. If one thinks that this step is just an unnecessary loss of time, then it is suggested to try not doing that, crying (and losing 10 years of life due to stress), and then starting doing it.
Introduce backward-compatible features. Imagine having an API as a service and changing some part of it when no client is ready. That change would result in a broken service you are providing. Always think about clients when introducing a change. Assume someone still depends on the latest version, until proven otherwise. This is necessary to make sure everything continues operable after a deployment.
Have deep observability of the application that is being deployed. Before the deployment, it is important to identify which metrics could be affected, giving it some extra attention. Seeing how a deployment is impacting customer metrics will allow the Engineer to make the decision to rollback or perform other fixing operations faster, hence producing less negative impact.
Finally, warn others of the deployment. This should be considered another smell of your application. Instead of having to rely upon the teammates, you should rely on correctly configured monitoring and alerts. Anyways, it surely is possible that the monitoring performed after the deployment, might not suffice and might not oversee some part of the application. In such cases, teammates and even other teams are a great source of information. They will warn the Engineer of some issue if they actually know who to warn. Therefore, having global deployment awareness is highly suggested.

Conclusion

The recommendations mentioned do not necessarily mean that there will be no issue during or after the deployment. Nonetheless, following them allows for a lower probability of that happening. It also means that if the issue occurs, the Engineer can take care of it swiftly.

Be aware that the more deployments you perform, the more confident you become. And we all know that overconfidence blurs out the risks and you might get too comfortable, allowing your eyes to miss something. Hence, always following the mentioned ideas, regardless of the Engineer’s experience, is highly advised.

To conclude, to kill the fear of deployment, you should just follow the best practices imposed for this context.

The Fear of Deployment Factor

Introduction

The Fear of Deployment Factor

Three Types of Fear of Deployment

The Fear of Changelog

Conclusion

Written by Kyrylo Yefimenko