Software upgrade checklist in production

Software upgrades are an essential step of keeping production environments up-to-date and bug fixing. However, performing an upgrade can be a complex and challenging process that requires careful planning and execution. This blog post outlines a software upgrade checklist for a production environment that can be used to ensure that the upgrade process goes smoothly.

Pre-Upgrade Checklist:

  • Identify the scope of the upgrade: This includes the version that needs to be upgraded from and to. Check the upgrade path and adopted policy. Take notes on any exception on the upgrade path.

  • Check the prerequisites: See the compatibility matrix for infrastructure versions, such as database, secret managers, service mesh, Kubenetes cluster and docker runtime etc, it is extremely important to take into consideration these different versions because a mismatch can lead to erroneous behaviour post-deployment.

  • Assess the impact of the upgrade: This includes identifying any potential risks or issues that may arise during or after the upgrade, as well as any necessary downtime required to perform the upgrade. Read the What’s New, Release Notes and Important Changes. Schedule downtime if any for major upgrades, such as schema change, database migration etc.

  • Develop a rollback plan: It is essential to develop a rollback plan in case the upgrade fails or causes issues in the production environment. This plan should outline the steps required to revert to the previous version of the software.

  • Notify stakeholders: This includes end-users, regulator, and internal stakeholders such as the IT team and management. Assign a person in charge (PIC) for the relevant task in the checklist. Create a slack channel or zoom call for real time communication.

  • Check configuration: See any new fields that need to configure or whether the default value makes sense in your environment. Update the config files to match the new version and review if all the values are correct.

  • Access of the secrets: Ensure that the secrets are present in secret manager, such as root database password. If the password has been changed, get the latest one and update from the database team.

  • Conduct a dry run: Perform the upgrade process in a non-production environment to identify any potential issues or risks.

During Upgrade Checklist:

  • Perform the upgrade: Begin the upgrade process via following the steps outlined in documentation. Monitor the upgrade process closely to identify any issues or errors that may arise.

  • Health check: Once the upgrade is completed, check the system status. If there are any unexpected issues, such as pods crash-looping, raise a production issue and contact the relevant team member.

Post-Upgrade Checklist:

  • Perform any manual post-upgrade steps: follow the instructions (if any) as per the documentation. Some common tasks include garbage collection or removing unused resources.

  • Verify system functionality: Verify that the upgraded system is functioning correctly and that all data and configurations have been migrated correctly.

  • Perform sanity check: Conduct user acceptance testing to ensure that end-users are satisfied with the upgraded system and that it meets all requirements.

  • Monitor the metrics data: Use dashboard, such as Grafana to see if any abnormal behaviour. Besides, check the logs for any error messages.

  • (For database upgrade only): check hashing schema columns and comparing, generating query results -> sort -> hash -> compare, counting rows.

  • Update documentation: Update all relevant internal documentation to reflect the changes made during the upgrade process.

  • Conduct a post-implementation review: Conduct a review to evaluate the success of the upgrade process and identify any areas for improvement.

Conclusion

Moving to a new software version in a production environment is an important process and a checklist can help the upgrade run smoothly. This checklist should be customised to meet the specific needs and requirements of each system and should be updated regularly to reflect changes in the production environment or software version being used.