LEP/ReleaseFeaturesWhenTheyAreDone

Not logged in - Log In / Register

Rationale

We currently tie together two unrelated things:

This causes many problems for us, including:

We want to decouple these two things, deliver features when they are ready, not earlier or later, and streamline and simplify our deployment of code, changes to the database and general maintenance processes.

Stakeholders

Launchpad developers: Various mails have been sent to launchpad-dev. Developers have their processes impacted when we change development process, so need to be able to have their needs met.

Launchpad users: hard to have a discussion with all the users that are affected. Users generally want Launchpad to be fast and reliable, and if we are successful with this LEP will get both of those things more often.

Constraints

Out-of-scope

Nice-to-have

Implementation

The end state we want to arrive at is:

This may take some time to completely achieve, so we are staging the implementation.

In progress - Stage 0 - Stop using edge for 'unreleased features'

To detangle the two concerns (deployment and releasing) we need to have features in the code base enabled at runtime, rather than deployment time. This will be accomplished by LEP/FeatureFlags. Nothing should depend on the 'is_edge' check. The feature flags facility is now available in db-devel, and from 10.09 should be used for all changes which the developer does not want *immediately* given to users.

DONE - Stage 1 - Remove appserver rollout downtime

RT 40685 : deploy icing to apache before updating appservers, will fix the downtime experienced by some users during appserver-only rollouts.

DONE - Stage 2 - QA all code

This involves setting up a QA environment on the staging server running the production database schema against the 'stable' branch. Rather than deploy 'stable tip' we will start deploying a nominated revision of stable which will be the highest revision which every commit has been QA'd. QA failures will require the failing revision to be reverted and any additional revisions landed since the failing one to also be QA'd OK or reverted.

See MergeWorkflow for the QA process details. This stage permits no-downtime DB patches to be applied within a release cycle, as long as they are blessed as such, and code that depends on them is landed after the patch has been applied.

In progress - Stage 3 - remove 'edge'

With all code QA'd we will deploy to all appservers when we deploy, rather than to edge; the edge appservers will be repurposed as production appservers, and the edge sites turned into redirects to production. We considered using the edge hostname edge to trigger-on many/all in progress features, but decided against it because it makes testing interactions in the daily QA process require significantly more complexity, and we're aiming for as simple as possible here.

These bugs affect zero-downtime deployments to appservers:

We need to add a features flag for recipes, and a team based scope, before we can disable edge.

Finally we need to test the behaviour of redirects on old launchpadlib clients.

Stage 4 - iterate on deployment friction

To reduce the complexity of our environment we want all the servers running the same revision, but we have some areas that are hard to deploy to, or cause downtime at the moment.

The following RT tickets will improve this:

Some bugs in the LP codebase will also help, but are less strictly needed:

There may be other issues, but we will discover these if/when a deployment goes wrong, and feed them back into the process as high/critical bugs.

Success

When we can update the db schema without rolling out features under development, and the Launchpad developers haven't gone mad from crazy process changes.

Better quality features released to production.

Thoughts?

LEP/ReleaseFeaturesWhenTheyAreDone (last edited 2011-04-05 14:47:16 by flacoste)