Diff for "LEP/FeatureFlags"

Not logged in - Log In / Register

Differences between revisions 71 and 72
Revision 71 as of 2011-02-15 07:16:07
Size: 13716
Editor: mbp
Comment:
Revision 72 as of 2011-02-18 15:32:17
Size: 13700
Editor: flacoste
Comment: Updated link to FeatureFlags
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
/!\ '''NOTE: This LEP is now implemented, and stored here for reference. It is not guaranteed to be up to date. See [[Foundations/FeatureFlags]] for information about how to use feature flags.''' /!\ '''NOTE: This LEP is now implemented, and stored here for reference. It is not guaranteed to be up to date. See FeatureFlags for information about how to use feature flags.'''

Feature Flags

Launchpad Enhancement Proposal

/!\ NOTE: This LEP is now implemented, and stored here for reference. It is not guaranteed to be up to date. See FeatureFlags for information about how to use feature flags.

(previously called Dynamic Configuration)

goal state: Launchpad has a registry of configuration options that can be changed by admins through the web ui, without restarting Launchpad.

As a Launchpad developer/operator
I want to turn features on and off without a heavyweight deployment
so that I can more adroitly test and deploy new features (A:B testing, long-running closed betas, etc)
and so that I can recover from emergencies by cutting-off problem features

As a Launchpad user
I want you to tell me about impending downtime through the web ui
so that I can I can plan not to be using Launchpad when it's offline/readonly.

  • This is not a mechanism for general per-user configuration
  • This is not a complete replacement for static configuration or other parts of the deployment process
  • This only affects new code that specifically uses it; it doesn't magically affect existing code
  • This may become a replacement for things that are currently done through SQL queries or configuration files
  • This is site-wide not per-appserver.
  • This does not yet replace the readonly-mode flag (implemented as a special file on disk) because it's special.

  • This embraces "feature flags" and more, such as site-wide notifications.

Implementation status

See https://bugs.edge.launchpad.net/launchpad-foundations/+bugs?field.tag=feature-flags

  • Database changes landed
  • Flags exist and can be checked in code or in TAL
  • Changing the rules requires an SQL statement run by a sysadmin: insert into featureflag (scope, priority, flag, value) values ('pageid:BugTask:+index', 0, 'memcache', 'disabled');

  • Actually used by memcached, etc
  • Relevant flags and scopes shown in a comment

Open issues / future work

  • Web ui to show/edit them: http://pad.lv/616631 inprogress

  • Need better infrastructure for writing tests that depend on features: with feature_flags(...): ... http://pad.lv/645768

  • Need a "default" scope that's always on http://pad.lv/650903

  • Need clearer namespace-based way of defining scope selectors
  • API naming is a bit inconsistent and should be cleaned up
  • Flag naming and value interpretation also inconsistent and should be documented/cleaned up
  • Show how to use them in cron jobs or other non-webapp code
  • Use flags for site-wide notifications

Scenarios

  • Dark launches (aka embargoes: land code first, turn it on later)
  • Closed betas
  • Scram switches (omg daily builds are killing us, make it stop)
  • Soft/slow launch (let just a few users use it and see what happens)
  • Site-wide notification
  • Show an 'alpha', 'beta' or 'new!' badge next to a UI control, then later turn it off without a new rollout
  • Control page timeouts (or other resource limits) either per page id, or per user group
  • Set resource limits (eg address space cap) for jobs.

Rationale

  • We want people to land features faster, and to deploy more often. Having control over when features are generally exposed separately from landing the code may help. See MergeWorkflowDraft.

  • This could support things like site-wide notifications which would help our users by warning them when Launchpad's about to go offline.

  • Some developers are interested in A:B testing of UIs and this would help with that too.
  • Some other sites find this very useful: see http://www.scribd.com/doc/16877392/10-Deploys-Per-Day-Dev-and-Ops-Cooperation-at-Flickr

  • At the 2010-02 team leads meeting there was enthusiastic support for feature flags but they've stalled.
  • Doing configuration changes through a branch, merge, landing and deploy is hugely expensive, compared to changing a web ui.
  • We've had problems with configuration being inadvertently set inconsistently across different servers.
  • Provides visibility into the system.

Stakeholders

Who really cares about this feature? When did you last talk to them?

  • LOSAs
  • Launchpad devs
  • Design group?
  • Architect and product strategist
  • Curtis
  • mthaddon
  • Gary

Constraints and Requirements

Must

  • A function that can be called from a template or other code, that tells you the value of a configuration item.
  • The function must be very cheap to call so that it does not cause performance problems even if it's called several times per request. (It should do at most one database query (of reasonable size) per request that cares about configuration.)
  • Feature flags can be used to hide or disable some user interface items.

Nice to have

  • Configuration scopes:
    • "on edge"
    • "for authenticated/unauthenticated users"
    • "in readonly mode"
    • "for x% of users"
    • "for users in the beta group"
    • "before/after date D"
  • Configuration that can be changed while in readonly mode.
  • Configuration is validated before it is applied: eg if something must be an IP address, we won't let the admin commit a change that makes it invalid.
  • Log of changes that were made, when, and by whom.
  • A machine-readable registry of known names, with a help string and a description of the type to be stored in them. (A little like the Mailman admin interface but much simpler.)
  • Hardcoded access controls on the configuration rules used on production systems: they might mention the existence of for example commercial user teams that should be private. Therefore, only ~admins and ~launchpad-dev can see the current flag rules, and only ~admins can edit them. Anyone who sees a page can see, in an html comment, which scopes and flags were used in rendering the page.

Must not

  • Reduce test coverage by having code paths that are only hit when certain variables are set, and there are not tests for those variables being set. (Using bzr-style scenario multiplication may help.)
  • Cause entanglement by having the same feature flag checked at many points in the code.

Subfeatures

Other LaunchpadEnhancementProposals that form a part of this one.

  • Site Wide Notification (to be written)

Workflows

What are the workflows for this feature?

Change configuration

A LOSA goes to https://launchpad.net/+feature-rules where they see a simple web form allowing them to edit the configuration.

Anyone else can see the configuration but cannot change it. (Perhaps we should hide it from people other than developers, but since they can see the source this may not matter...)

Provide mockups for each workflow.

Success

Bugs are at: feature-flags

How will we know when we are done?

  • You can check flags in code or templates.
  • You can change the configuration.
  • People do actually change the configuration.

How will we measure how well we have done?

  • Adoption of feature flags.
  • Developers and LOSAs report satisfaction with the facility and it becomes a standard practice.

Thoughts?

Put everything else here. Better out than in.

  • As a general rule, each switch should be checked only once or only a few time in the codebase. We don't want to disable the same thing in the ui, the model, and the database.
  • Obviously it would be better not to ever have planned downtime. But...
  • Would this have helped with daily builds, or other things?
  • If we want to unify the edge and production appservers, this may help.
  • Having useful differences across edge and lpnet seem to imply having at least that level of scoping from the beginning.
  • Could get an interesting feedback loop between oops_per_second vs config changes.
  • How should these be tested? Perhaps we want a small number of tests that try flipping the flag and checking both ways works?
  • How to edit? One big textarea? How about races?
  • Which scope matches? Explicit ordering? Most-specific? Require no overlaps?
  • Perhaps you'll accumulate an ever-increasing inventory of configuration options that are never used, and will break if they are used. Perhaps a switch that has not been changed in the last year should be considered to be removed altogether.
  • Arguably we should couple together "this feature is only for beta users" with "this feature has a beta badge next to it", but perhaps it's simpler at this level and more flexible to just have separate flags for the two of them.
  • Should document a naming convention that explains what feature of thing this flag affects, and what kind of effect it has.
  • We need to create a culture that people do actually add and make use of flags; as part of our incident analysis we should consider whether adding a flag might have helped.

Implementation

Developer APIs to control things via flags

  • Python code: features.getFeatureFlag(name) => value

  • TAL code: <div tal:condition="features/name">hello world!"</div>

The authoritative reference is the features package in the source tree: see FeatureController api docs and linked pages.

Internal details

Any particular request can be in several scopes, perhaps set(global, edge_server, beta_user, override). These can be inferred from the URL, the server static configuration, the user's group membership, perhaps other things.

The value for any flag is the highest-priority setting for any relevant scope. If we don't find a value the default is None.

If the scopes set is not passed to getFlag, in the web server it is computed from the request object. In other places like jobs or the code host we need to pass in some other object with similar info. New scopes can be added by adding a python function that says whether a particular scope is active or not.

A configuration variable can be defined up to once per configuration scope. A setting defines its priority so we can choose a single definition when several match. Priority must be unique across all flags (useful to know when crafting new rules).

For any particular scope set it is a single SQL query to get the full environment of settings, something like:

  select flag.name, first(value)
  from flag
  where scope_id in ${active_scopes}
  order by priority desc
  group by flag.name

or to get just one flag

  select first(value)
  from flag
  where scope_id in ${active_scopes}
    and flag.name == ${flag}
  order by priority desc
  group by flag.name

The scopes are not stored in the database: they're just defined by whatever the Python code looks up. (This avoids needing to keep the celebrities in sync with the code, though we may have to expose/document the available scopes in some other way.)

The name looks like dotted python identifiers, with the form APP.FEATURE.EFFECT. The value is a Unicode string.

We will define the following scopes:

 override  -- can be used to mask out anything

 edge_server_beta_user -- set only when both are true

 production_server -- one of these is chosen based on the url or static config
 edge_server
 staging_server
 dev_server

 beta_user -- set based on group membership; we can add more specific beta groups later

 default -- lowest priority and always set; used when we want a None-null default

examples:

scope

name

value

explanation

edge_server_beta_user

soyuz.build_from_branch.ui_visible

True

default

soyuz.build_from_branch.badge

beta

show "beta" icon next to the ui

edge_server

soyuz.build_from_branch.run_jobs

True

production_server

notification.global.message

Going down for an upgrade, should be back in 10m

production_server

notification.global.countdown_time

20101220T00:00

(show "in %d minutes" based on this)

Once the build_from_branch.ui_visible feature is stable, we would either set it to True in the default scope. Perhaps later we would make it unconditionally enabled.

Application code will normally just need to do something like this:

  from lp.services.features import getFeatureFlag

  ...

  if getFeatureFlag('thing.enabled'):
    ....

  print getFeatureFlag('other.thing')

See also

Scopes and Flags that can be used

Obsoleted by https://launchpad.net/+feature-info.

Flags

flag

values

controls

code.branchmergequeue

'on',

Will turn on the use of branch merge queues.

soyuz.derived-series-ui.enabled

'on'

Will enable the current development derived series ui.

code.incremental_diffs.enabled

boolean

Enable the display of incremental diffs in merge proposals

LEP/FeatureFlags (last edited 2011-02-18 15:32:17 by flacoste)