Diff for "LEP/FeatureFlags"

Not logged in - Log In / Register

Differences between revisions 11 and 12
Revision 11 as of 2010-07-12 15:25:11
Size: 7030
Editor: mbp
Comment:
Revision 12 as of 2010-07-12 15:30:54
Size: 7363
Editor: mbp
Comment:
Deletions are marked like this. Additions are marked like this.
Line 141: Line 141:
 * How should these be tested? Perhaps we want a small number of tests that try flipping the flag and checking both ways works?

 * How to edit? One big textarea? How about races?

 * Which scope matches? Explicit ordering? Most-specific? Require no overlaps?
Line 143: Line 149:
The API is: {{{config(name) => value}}}. The API is: {{{config(name, context) => value}}}.

Can/should we get the context implicitly?
Line 157: Line 165:
  {{{user_subset=0,10/registry_layout_new=True}}} (give users with id%10==0 a new layout to see how they like it)   * {{{user_subset=0,10/registry_layout_new=True}}} (give users with id%10==0 a new layout to see how they like it)
Line 159: Line 167:
  {{{sitewide_message=Going down for an upgrade, should be back in 10m}}}
  {{{sitewide_countdown_time=20101220T00:00}}} (show "in %d minutes" based on this)
  * {{{sitewide_message=Going down for an upgrade, should be back in 10m}}}

 * {{{sitewide_countdown_time=20101220T00:00}}} (show "in %d minutes" based on this)

The purpose of this template is to help us get ReadyToCode on features or tricky bugs as quickly as possible. See also LaunchpadEnhancementProposalProcess.

Dynamic Configuration

Launchpad has a registry of configuration options that can be changed by admins through the web ui, without restarting Launchpad.

As a Launchpad developer/operator
I want to turn features on and off without a heavyweight deployment
so that I can more adroitly test and deploy new features (A:B testing, long-running closed betas, etc)
and so that I can recover from emergencies by cutting-off problem features

As a Launchpad user
I want you to tell me about impending downtime through the web ui
so that I can I can plan not to be using Launchpad when it's offline/readonly.

  • This is not a mechanism for general per-user configuration
  • This is not a complete replacement for static configuration or other parts of the deployment process
  • This only affects new code that specifically uses it; it doesn't magically affect existing code
  • This may become a replacement for things that are currently done through SQL queries or configuration files
  • This is site-wide not per-appserver.
  • This does not yet replace the readonly-mode flag (implemented as a special file on disk) because it's special.

  • This embraces "feature flags" and more, such as site-wide notifications.

Scenarios:

  • Dark launches (aka embargoes: land code first, turn it on later)
  • Closed betas
  • Scram switches (omg daily builds are killing us, make it stop)
  • Soft/slow launch (let just a few users use it and see what happens)
  • Site-wide notification

Rationale

  • We want people to land features faster, and to deploy more often. Having control over when features are generally exposed separately from landing the code may help.
  • This could support things like site-wide notifications which would help our users by warning them when Launchpad's about to go offline.

  • Some developers are interested in A:B testing of UIs and this would help with that too.
  • Some other sites find this very useful: see http://www.scribd.com/doc/16877392/10-Deploys-Per-Day-Dev-and-Ops-Cooperation-at-Flickr

  • At the 2010-02 team leads meeting there was enthusiastic support for feature flags but they've stalled.
  • Doing configuration changes through a branch, merge, landing and deploy is hugely expensive, compared to changing a web ui.
  • We've had problems with configuration being inadvertently set inconsistently across different servers.
  • Provides visibility into the system.

Stakeholders

Who really cares about this feature? When did you last talk to them?

  • LOSAs
  • Launchpad devs
  • Design group?
  • Architect and product strategist
  • Curtis
  • mthaddon
  • Gary

Constraints and Requirements

Must

  • A function that can be called from a template or other code, that tells you the value of a configuration item.
  • The function must be very cheap to call so that it does not cause performance problems even if it's called several times per request.
  • Feature flags can be used to hide or disable some user interface items.

Nice to have

  • Configuration scopes:
    • "on edge"
    • "for authenticated/unauthenticated users"
    • "in readonly mode"
    • "for x% of users"
    • "for users in the beta group"
  • Configuration that can be changed while in readonly mode.
  • Configuration is validated before it is applied: eg if something must be an IP address, we won't let the admin commit a change that makes it invalid.
  • Log of changes that were made, when, and by whom.

Must not

  • Reduce test coverage by encouraging us to multiply scenarios excessively.
  • Cause entanglement by having the same feature flag checked at many points in the code.

Subfeatures

Other LaunchpadEnhancementProposals that form a part of this one.

  • Site Wide Notification (to be written)

Workflows

What are the workflows for this feature?

Change configuration

LOSA goes to https://launchpad.net/+config where they see a simple web form allowing them to edit the configuration.

Developers can go there to see but not edit the configuration.

Normal users are not allowed to see it.

Provide mockups for each workflow.

Success

How will we know when we are done?

  • You can check flags in code or templates.
  • You can change the configuration.
  • People do actually change the configuration.

How will we measure how well we have done?

  • Adoption of feature flags.
  • Developers and LOSAs report satisfaction with the facility and it becomes a standard practice.

Thoughts?

Put everything else here. Better out than in.

  • Perhaps needs a better name that "dynamic configuration" that's not confusable with static configuration.
  • As a general rule, each switch should be checked only once or only a few time in the codebase. We don't want to disable the same thing in the ui, the model, and the database.
  • Obviously it would be better not to ever have planned downtime. But...
  • Would this have helped with daily builds, or other things?
  • If we want to unify the edge and production appservers, this may help.
  • Having useful differences across edge and lpnet seem to imply having at least that level of scoping from the beginning.
  • Could get an interesting feedback loop between oops_per_second vs config changes.
  • How should these be tested? Perhaps we want a small number of tests that try flipping the flag and checking both ways works?
  • How to edit? One big textarea? How about races?
  • Which scope matches? Explicit ordering? Most-specific? Require no overlaps?

Implementation

The API is: config(name, context) => value.

Can/should we get the context implicitly?

This should probably live on a zope utility? Is "config" confusable with other names, and if so what should we call it instead?

The flags are named with the same syntax as Python identifiers. All punctuation is reserved so that we can try scope selectors like server=edge/user_group=beta/soyuz_build_from_branch=True.

The value is a Unicode string.

We will add a machine-readable registry of known names, with a help string and a description of the type to be stored in them. (A little like the Mailman admin interface but much simpler.)

The values are stored in a database table, with two columns: name, value. (If we add scope selectors we'll add a third column, so you can quickly pull out all the rows possibly relevant to the name.) This means you perhaps can't change it while we're in readonly mode. Later we can split it to a separate replicated database, or to some non-sql database.

More examples:

  • user_subset=0,10/registry_layout_new=True (give users with id%10==0 a new layout to see how they like it)

  • sitewide_message=Going down for an upgrade, should be back in 10m

  • sitewide_countdown_time=20101220T00:00 (show "in %d minutes" based on this)

LEP/FeatureFlags (last edited 2011-02-18 15:32:17 by flacoste)