Diff for "Collections"

Not logged in - Log In / Register

Differences between revisions 1 and 10 (spanning 9 versions)
Revision 1 as of 2010-08-01 20:39:28
Size: 3855
Editor: james-w
Comment:
Revision 10 as of 2019-10-04 11:27:35
Size: 8140
Editor: cjwatson
Comment: update for git
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Collections are a way to manipulating a group of objects as one. Collections are a way of selecting a group of object based on some criteria, and then
either just getting the objects, or possibly manipulating the set
as one.
Line 6: Line 7:
of objects according to some criteria, and secondly manipulating methods, which may
act on the current set.
of objects according to some criteria, and secondly manipulation methods, which manipulate
each of the objects in the current set.
Line 11: Line 12:
  * '''BranchCollection''' - '''lp.code.{model,interfaces}.branchcollection'''
  * '''TranslationTemplatesCollection''' - '''lp.translations.{model,interfaces}.potemplate'''
  * '''ArchiveCollection''' - '''lp.soyuz.{model,interfaces}.archivecollection'''
  * '''`BranchCollection`''' - '''lp.code.{model,interfaces}.branchcollection'''
  * '''`TranslationTemplatesCollection`''' - '''lp.translations.{model,interfaces}.potemplate'''
  * '''`ArchiveCollection`''' - '''lp.soyuz.{model,interfaces}.archivecollection'''
Line 27: Line 28:
In '''lp.services.database.collection''' there is a base class you can use for creating In [[https://git.launchpad.net/launchpad/tree/lib/lp/services/database/collection.py|lp.services.database.collection]] there is a base class you can use for creating
Line 38: Line 39:
    """A collection of `Foo`."""
Line 51: Line 52:
with appropriate tests (see '''lp.soyuz.tests.test_archivecollection''' for some inspiration). with appropriate tests (see `lp.soyuz.tests.test_archivecollection` for some inspiration).
Line 61: Line 62:
Line 63: Line 63:
    """Collection of `Foo`."""
    def select(*args):
        """See `Collection`."""
Line 65: Line 68:
with the methods you want on the interface. Ensure that one of the methods you
put on the interface is
with the methods you want on the interface.
Line 68: Line 70:
{{{
    def select(*args):
}}}
The `select` method, or something like it, has to be there, since it's how you retrieve a Storm `ResultSet` with the objects and/or columns you want from the collection. Instead of a select method, you might wish to have multiple methods to get different kinds of objects. For example, `IBranchCollection` has `getBranches()` and `getMergeProposals()`, where the latter returns all merge proposals associated with the collection of branches.
Line 72: Line 72:
As this is what will be used to get the ResultSet.

Once you have that you can add an Interface for getting a utility to get all Foos
Once you have that, add a marker interface for getting a utility to get all `Foo`s
Line 85: Line 83:
Next comes the zcml: Next comes the ZCML:
Line 111: Line 109:
The arguments to select are the same as the first argument to Store.find(). The arguments to select are the same as the first argument to `Store.find()`.
Line 118: Line 116:
For instance you could add adapters such that For instance you could add adapters such that:
Line 121: Line 119:
IFooColllection(product) IFooCollection(product)
Line 124: Line 122:
returned you a FooCollection for all the Foos associated with that product. returned you a `FooCollection` for all the Foos associated with that product.
Line 126: Line 124:
Please fill in the details of how to do this here if you do it. To do so, define a function that takes the original object and returns an `IFooCollection`, e.g.:
{{{
def product_to_foo_collection(product):
    return getUtility(IAllFoos).inProduct(product)
}}}

And then add something like this to the relevant ZCML:
{{{
  <adapter
      for="lp.registry.interfaces.product.IProduct"
      provides="lp.app.interfaces.foocollection.IFooCollection"
      factory="lp.app.adapters.branchcollection.product_to_foo_collection"/>
}}}
Line 130: Line 140:
Please fill in the details of how to join tables here if you work it out. Two `Collection` methods help you join other tables into a collection: `joinInner` which creates a run-of-the-mill inner join and `joinOuter` which adds in the new table using an outer (or "left") join. They both work like:
{{{
    joined_collection = base_collection.joinInner(Person, Person.id == Foo.owner)
    joined_collection = base_collection.joinOuter(Person, Person.id == Foo.owner)
}}}

(Of course the "outer" case means that the `Person` will be `None` if there is no `Person.id` matching `Foo.owner`. The "inner" case will just filter out `Foo` items that don't have an owner.)


== Custom Selects ==

The `select` method returns a `ResultSet` of `Foo` by default:
{{{
    num_foos = all_foos.select().count()
    print "There are %d foo(s)." % num_foos
    if num_foos > 0:
        print "The oldest foo is %s." % all_foos.select().order_by(Foo.id)[0]
}}}

However you can select any combination of columns and objects that are in the query. The default is to select `Foo` objects, but you can ask for more (or different) data when you invoke `select`. Each `select` will create a new `ResultSet` so each will be executed separately.
{{{
    foos_and_owners = all_foos.innerJoin(Person, Person.id == Foo.owner)
    for foo, owner_name in foos_and_owners.select(Foo, Person.name):
        print "Foo #%d is owned by %s." % (foo.id, owner_name)
}}}


== Optimization ==

As you know most Foos are publicly accessible, but a few are private. Finding private Foo objects that are visible to the current user is expensive:
{{{
        def visibleTo(self, user):
            """Restrict to `Foo`s that `user` can see."""
            Owner = ClassAlias(Person)
            with_owner = self.joinOuter(Owner, Owner.id == Foo.owner)
            with_user = with_owner.joinOuter(
                TeamParticipation,
                TeamParticipation.team_id == Owner.id)
            return with_user.refine(
                Or(
                    # Return Foos that are public, or are owned by
                    # "user," or are owned by teams that "user" is in.
                    Foo.is_private == False,
                    TeamParticipation.person_id == user.id))
}}}

This is a big "performance pattern" in Launchpad. There are really two queries in here: a narrow-but-deep one that only looks at public `Foo` and gets of results, and a wide-but-shallow one that needs to join in other tables and check further details for the few private `Foo`.

You can speed this up treating these two as separate collections. Since each refinement on a collection creates a new one and leaves the old one intact, it's easy to re-use the common parts between both:
{{{
    interesting_foos = all_foos.refineOneWay().refineAnotherWay()
    public_foos = interesting_foos.isPublic(True)
    private_foos = interesting_foos.isPublic(False).visibleTo(user)
    return Union(public_foos.select(), private_foos.select())
}}}

(Of course this also leaves a lot of dead wood in `visibleTo` that you can cut to make it faster: you no longer need the `Or` and the joins can become inner joins).

One example of this is in [[https://git.launchpad.net/launchpad/tree/lib/lp/code/model/branchcollection.py|lp.code.model.branchcollection]]. Look for the `visibleByUser` method.

Collections

Collections are a way of selecting a group of object based on some criteria, and then either just getting the objects, or possibly manipulating the set as one.

They have two types of methods on them, firstly restrict methods, which reduce the set of objects according to some criteria, and secondly manipulation methods, which manipulate each of the objects in the current set.

Examples

  • BranchCollection - lp.code.{model,interfaces}.branchcollection

  • TranslationTemplatesCollection - lp.translations.{model,interfaces}.potemplate

  • ArchiveCollection - lp.soyuz.{model,interfaces}.archivecollection

Example use

all_branches = getUtility(IAllBranches)
my_branches = all_branches.ownedBy(me)
branches_i_can_see = all_branches.visibleByUser(me)
merge_proposals_on_my_branches = my_branches.getMergeProposals()
my_branch_objects = my_branches.getBranches()

Creating a collection

In lp.services.database.collection there is a base class you can use for creating your own collection.

In lp.app.model.foocollection add

from lp.app.model.foo import Foo
from lp.services.database.collection import Collection


class FooCollection(Collection):
    """A collection of `Foo`."""
    starting_table = Foo

Which is the basic collection.

You can then add methods to it such as

    def ownedBy(self, owner):
        return self.refine(Foo.owner == owner)

with appropriate tests (see lp.soyuz.tests.test_archivecollection for some inspiration).

Once you have an object with the methods that will be useful to you, you need to add an interface and a utility.

In lp.app.interfaces.foocollection add the following:

from zope.interface import Interface

class IFooCollection(Interface):
    """Collection of `Foo`."""
    def select(*args):
        """See `Collection`."""

with the methods you want on the interface.

The select method, or something like it, has to be there, since it's how you retrieve a Storm ResultSet with the objects and/or columns you want from the collection. Instead of a select method, you might wish to have multiple methods to get different kinds of objects. For example, IBranchCollection has getBranches() and getMergeProposals(), where the latter returns all merge proposals associated with the collection of branches.

Once you have that, add a marker interface for getting a utility to get all Foos

class IAllFoos(IFooCollection):
    """Get all foos."""

You can add other marker interfaces here if you wish to provide other entry points, for instance if it is very common to be interested in all foos of a particular type or status.

Next comes the ZCML:

    <securedutility                                                           
        class="lp.app.model.foocollection.FooCollection"            
        provides="lp.app.interfaces.foocollection.IAllFoo">        
        <allow                                                                
            interface="lp.app.interfaces.foocollection.IAllFoo" /> 
    </securedutility>                                                         
                                                                              
    <class class="lp.app.model.foocollection.FooCollection">        
        <allow                                                                
            interface="lp.app.interfaces.foocollection.IAllFoo" /> 
    </class>              

Which will mean that you can getUtility(IAllFoo) to start working with a collection.

Using the collection

all_foos = getUtility(IAllFoo)
foos = all_foos.ownedBy(person).withStatus(status).select()

The arguments to select are the same as the first argument to Store.find().

Adding adapters

It is possible to add adapters for objects of interest to get a collection initialized as appropriate.

For instance you could add adapters such that:

IFooCollection(product)

returned you a FooCollection for all the Foos associated with that product.

To do so, define a function that takes the original object and returns an IFooCollection, e.g.:

def product_to_foo_collection(product):
    return getUtility(IAllFoos).inProduct(product)

And then add something like this to the relevant ZCML:

  <adapter
      for="lp.registry.interfaces.product.IProduct"
      provides="lp.app.interfaces.foocollection.IFooCollection"
      factory="lp.app.adapters.branchcollection.product_to_foo_collection"/>

Adding Joins

Two Collection methods help you join other tables into a collection: joinInner which creates a run-of-the-mill inner join and joinOuter which adds in the new table using an outer (or "left") join. They both work like:

    joined_collection = base_collection.joinInner(Person, Person.id == Foo.owner)
    joined_collection = base_collection.joinOuter(Person, Person.id == Foo.owner)

(Of course the "outer" case means that the Person will be None if there is no Person.id matching Foo.owner. The "inner" case will just filter out Foo items that don't have an owner.)

Custom Selects

The select method returns a ResultSet of Foo by default:

    num_foos = all_foos.select().count()
    print "There are %d foo(s)." % num_foos
    if num_foos > 0:
        print "The oldest foo is %s." % all_foos.select().order_by(Foo.id)[0]

However you can select any combination of columns and objects that are in the query. The default is to select Foo objects, but you can ask for more (or different) data when you invoke select. Each select will create a new ResultSet so each will be executed separately.

    foos_and_owners = all_foos.innerJoin(Person, Person.id == Foo.owner)
    for foo, owner_name in foos_and_owners.select(Foo, Person.name):
        print "Foo #%d is owned by %s." % (foo.id, owner_name)

Optimization

As you know most Foos are publicly accessible, but a few are private. Finding private Foo objects that are visible to the current user is expensive:

        def visibleTo(self, user):
            """Restrict to `Foo`s that `user` can see."""
            Owner = ClassAlias(Person)
            with_owner = self.joinOuter(Owner, Owner.id == Foo.owner)
            with_user = with_owner.joinOuter(
                TeamParticipation,
                TeamParticipation.team_id == Owner.id)
            return with_user.refine(
                Or(
                    # Return Foos that are public, or are owned by
                    # "user," or are owned by teams that "user" is in. 
                    Foo.is_private == False,
                    TeamParticipation.person_id == user.id))

This is a big "performance pattern" in Launchpad. There are really two queries in here: a narrow-but-deep one that only looks at public Foo and gets of results, and a wide-but-shallow one that needs to join in other tables and check further details for the few private Foo.

You can speed this up treating these two as separate collections. Since each refinement on a collection creates a new one and leaves the old one intact, it's easy to re-use the common parts between both:

    interesting_foos = all_foos.refineOneWay().refineAnotherWay()
    public_foos = interesting_foos.isPublic(True)
    private_foos = interesting_foos.isPublic(False).visibleTo(user)
    return Union(public_foos.select(), private_foos.select())

(Of course this also leaves a lot of dead wood in visibleTo that you can cut to make it faster: you no longer need the Or and the joins can become inner joins).

One example of this is in lp.code.model.branchcollection. Look for the visibleByUser method.

Collections (last edited 2019-10-04 11:27:35 by cjwatson)