Diff for "Code/RemoveHostedArea"

Not logged in - Log In / Register

Differences between revisions 5 and 14 (spanning 9 versions)
Revision 5 as of 2010-03-17 03:08:38
Size: 2801
Editor: mwhudson
Comment:
Revision 14 as of 2010-04-27 14:49:23
Size: 4415
Editor: jml
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
The purpose of this template is to help us get ReadyToCode on features or tricky bugs as quickly as possible. See also LaunchpadEnhancementProposalProcess.
Line 7: Line 5:
'''On Launchpad:''' ''hyperlink to a blueprint, normally'' '''On Launchpad:''' ''https://blueprints.edge.launchpad.net/launchpad-code/+spec/remove-hosted-area''
Line 20: Line 18:
''Why are we doing this now?''

No particular reason for right now. It's something we've been wanting to do since forever.

''What value does this give our users? Which users?''

It mainly wastes less resources, and so benefits our sysadmins.
Doing this mainly wastes less resources, and so benefits our sysadmins.
Line 33: Line 25:
''Who really cares about this feature? When did you last talk to them?''

Admins, I guess. I haven't talked to them for a while.
 * Canonical Admins, I guess. I haven't talked to them for a while.
Line 39: Line 29:
''What MUST the new behaviour provide?'' No functional change.
Line 41: Line 31:
''What MUST it not do?'' Use less disk space than current implementation.
Line 55: Line 46:
''Put everything else here. Better out than in.''
Line 59: Line 48:
 * I guess we don't care too much. Some kind of monitoring for abuse, there shouldn't be a security risk here as the branch data is only served over http (or some ssh backed protocol), not https and the interesting launchpad cookies are marked secure.
Line 60: Line 51:
 * We have to be careful about opening branches. We can probably do this by making it easy to open branches the safe way (i.e. by having IBranch.getBzrBranch DTRT).
Line 61: Line 53:
The puller should change to not actually pull any revisions. Something like the puller still needs to process hosted branches so that the scanner runs and various other fields get updated. Also the stacked-on URL may need massaging. Places that open branches for writing:

 * codehosting, sort of -- the launchpad specific stuff is all at the transport level though
 * the puller
 * createmergeproposaljob -- the bundle -> branch + merge proposal stuff
 * the translations export to branch stuff

Places that open for reading:

 * all of the above
 * codebrowse
 * translations import
 * scanner
 * other places??

Although there would be no need to have a puller for hosted branches, in the sense that there will be no revisions to pull, there are still some things that need doing: the stacked on URL may need massaging and certainly needs recording, a scanner job needs creating, a few fields like last_mirrored_id may need updating (if anything still cares after this work is done). ''However'' we can probably do this immediately, in the codehosting process itself and by extending the XML-RPC call the puller currently makes to trigger a puller run.
Line 64: Line 71:

In general, the changes will make hosted branches less like mirrored and imported branches, but I think the current similarities are a bit artificial.
Line 70: Line 79:

The branch-distro.py script would get much simpler and likely faster. And wouldn't clog up the puller for 6 hours after it runs.

reclaim_branch_space will be simpler too.

Lots of places that sets up branches for integration testing should become easier to understand.

It's hard to come up with a way of doing this work that can be landed in small-ish branches. I think a sane implementation plan is to have a pipeline that works component by component, probably in this order: modify codehosting, modify puller, fix fallout.

Requirements from talking with the strategist:

1) talk to IS, starting with spm
2) testing plan
3) no regression on error display
4) make sure MP code doesn't fall over on broken branches

Remove the hosted area in codehosting

Remove the distinction between the hosted and mirrored area

On Launchpad: https://blueprints.edge.launchpad.net/launchpad-code/+spec/remove-hosted-area

As a developer using Launchpad's code hosting
I want changes I make to branches over bzr+ssh to be available over HTTP as soon as I make them
so that interacting with Launchpad involves less waiting

As a LOSA
I want codehosting to not use twice as much disk space as it needs to
so that I don't have to buy and install more disks so often

Rationale

Doing this mainly wastes less resources, and so benefits our sysadmins.

Although this is mostly an architectural change, it should make Launchpad simpler to use by removing an obscure concept users need to understand.

Stakeholders

  • Canonical Admins, I guess. I haven't talked to them for a while.

Constraints

No functional change.

Use less disk space than current implementation.

Success

How will we know when we are done?

When we can delete the /srv/bazaar.launchpad.net/push-branches directory on crowberry.

How will we measure how well we have done?

This is a pretty binary thing :-)

Thoughts?

How do we prevent abuse of Launchpad as a file hosting service?

  • I guess we don't care too much. Some kind of monitoring for abuse, there shouldn't be a security risk here as the branch data is only served over http (or some ssh backed protocol), not https and the interesting launchpad cookies are marked secure.

What do we do if someone uploads a branch reference to or a branch stacked on somewhere cheeky?

  • We have to be careful about opening branches. We can probably do this by making it easy to open branches the safe way (i.e. by having IBranch.getBzrBranch DTRT).

Places that open branches for writing:

  • codehosting, sort of -- the launchpad specific stuff is all at the transport level though
  • the puller
  • createmergeproposaljob -- the bundle -> branch + merge proposal stuff

  • the translations export to branch stuff

Places that open for reading:

  • all of the above
  • codebrowse
  • translations import
  • scanner
  • other places??

Although there would be no need to have a puller for hosted branches, in the sense that there will be no revisions to pull, there are still some things that need doing: the stacked on URL may need massaging and certainly needs recording, a scanner job needs creating, a few fields like last_mirrored_id may need updating (if anything still cares after this work is done). However we can probably do this immediately, in the codehosting process itself and by extending the XML-RPC call the puller currently makes to trigger a puller run.

The above-mentioned stacked-on URL massaging means we'll be directly editing the data the user has uploaded for the first time. I don't think this is a big issue though.

In general, the changes will make hosted branches less like mirrored and imported branches, but I think the current similarities are a bit artificial.

The fact that the mirrored area contains a copy of the branch has occasionally allowed us to easily recover from the version in the hosted area getting trashed somehow.

The codehosting vfs won't actually change that much, although perhaps it makes sense to rename hosted_transport and mirror_transport to ro_transport and rw_transport or something.

Various parts of launchpad that wait until the branch has been pulled before doing things should be changed to not do that.

The branch-distro.py script would get much simpler and likely faster. And wouldn't clog up the puller for 6 hours after it runs.

reclaim_branch_space will be simpler too.

Lots of places that sets up branches for integration testing should become easier to understand.

It's hard to come up with a way of doing this work that can be landed in small-ish branches. I think a sane implementation plan is to have a pipeline that works component by component, probably in this order: modify codehosting, modify puller, fix fallout.

Requirements from talking with the strategist:

1) talk to IS, starting with spm 2) testing plan 3) no regression on error display 4) make sure MP code doesn't fall over on broken branches

Code/RemoveHostedArea (last edited 2010-04-27 14:49:23 by jml)