Code/RemoveHostedArea

Not logged in - Log In / Register

Remove the hosted area in codehosting

Remove the distinction between the hosted and mirrored area

On Launchpad: https://blueprints.edge.launchpad.net/launchpad-code/+spec/remove-hosted-area

As a developer using Launchpad's code hosting
I want changes I make to branches over bzr+ssh to be available over HTTP as soon as I make them
so that interacting with Launchpad involves less waiting

As a LOSA
I want codehosting to not use twice as much disk space as it needs to
so that I don't have to buy and install more disks so often

Rationale

Doing this mainly wastes less resources, and so benefits our sysadmins.

Although this is mostly an architectural change, it should make Launchpad simpler to use by removing an obscure concept users need to understand.

Stakeholders

Constraints

No functional change.

Use less disk space than current implementation.

Success

How will we know when we are done?

When we can delete the /srv/bazaar.launchpad.net/push-branches directory on crowberry.

How will we measure how well we have done?

This is a pretty binary thing :-)

Thoughts?

How do we prevent abuse of Launchpad as a file hosting service?

What do we do if someone uploads a branch reference to or a branch stacked on somewhere cheeky?

Places that open branches for writing:

Places that open for reading:

Although there would be no need to have a puller for hosted branches, in the sense that there will be no revisions to pull, there are still some things that need doing: the stacked on URL may need massaging and certainly needs recording, a scanner job needs creating, a few fields like last_mirrored_id may need updating (if anything still cares after this work is done). However we can probably do this immediately, in the codehosting process itself and by extending the XML-RPC call the puller currently makes to trigger a puller run.

The above-mentioned stacked-on URL massaging means we'll be directly editing the data the user has uploaded for the first time. I don't think this is a big issue though.

In general, the changes will make hosted branches less like mirrored and imported branches, but I think the current similarities are a bit artificial.

The fact that the mirrored area contains a copy of the branch has occasionally allowed us to easily recover from the version in the hosted area getting trashed somehow.

The codehosting vfs won't actually change that much, although perhaps it makes sense to rename hosted_transport and mirror_transport to ro_transport and rw_transport or something.

Various parts of launchpad that wait until the branch has been pulled before doing things should be changed to not do that.

The branch-distro.py script would get much simpler and likely faster. And wouldn't clog up the puller for 6 hours after it runs.

reclaim_branch_space will be simpler too.

Lots of places that sets up branches for integration testing should become easier to understand.

It's hard to come up with a way of doing this work that can be landed in small-ish branches. I think a sane implementation plan is to have a pipeline that works component by component, probably in this order: modify codehosting, modify puller, fix fallout.

Requirements from talking with the strategist:

1) talk to IS, starting with spm 2) testing plan 3) no regression on error display 4) make sure MP code doesn't fall over on broken branches

Code/RemoveHostedArea (last edited 2010-04-27 14:49:23 by jml)