Collections

Collections are a way of selecting a group of object based on some criteria, and then either just getting the objects, or possibly manipulating the set as one.

They have two types of methods on them, firstly restrict methods, which reduce the set of objects according to some criteria, and secondly manipulation methods, which manipulate each of the objects in the current set.

Examples

Example use

all_branches = getUtility(IAllBranches)
my_branches = all_branches.ownedBy(me)
branches_i_can_see = all_branches.visibleByUser(me)
merge_proposals_on_my_branches = my_branches.getMergeProposals()
my_branch_objects = my_branches.getBranches()

Creating a collection

In lp.services.database.collection there is a base class you can use for creating your own collection.

In lp.app.model.foocollection add

from lp.app.model.foo import Foo
from lp.services.database.collection import Collection


class FooCollection(Collection):
    """A collection of `Foo`."""
    starting_table = Foo

Which is the basic collection.

You can then add methods to it such as

    def ownedBy(self, owner):
        return self.refine(Foo.owner == owner)

with appropriate tests (see lp.soyuz.tests.test_archivecollection for some inspiration).

Once you have an object with the methods that will be useful to you, you need to add an interface and a utility.

In lp.app.interfaces.foocollection add the following:

from zope.interface import Interface

class IFooCollection(Interface):
    """Collection of `Foo`."""
    def select(*args):
        """See `Collection`."""

with the methods you want on the interface.

The select method, or something like it, has to be there, since it's how you retrieve a Storm ResultSet with the objects and/or columns you want from the collection. Instead of a select method, you might wish to have multiple methods to get different kinds of objects. For example, IBranchCollection has getBranches() and getMergeProposals(), where the latter returns all merge proposals associated with the collection of branches.

Once you have that, add a marker interface for getting a utility to get all Foos

class IAllFoos(IFooCollection):
    """Get all foos."""

You can add other marker interfaces here if you wish to provide other entry points, for instance if it is very common to be interested in all foos of a particular type or status.

Next comes the ZCML:

    <securedutility                                                           
        class="lp.app.model.foocollection.FooCollection"            
        provides="lp.app.interfaces.foocollection.IAllFoo">        
        <allow                                                                
            interface="lp.app.interfaces.foocollection.IAllFoo" /> 
    </securedutility>                                                         
                                                                              
    <class class="lp.app.model.foocollection.FooCollection">        
        <allow                                                                
            interface="lp.app.interfaces.foocollection.IAllFoo" /> 
    </class>              

Which will mean that you can getUtility(IAllFoo) to start working with a collection.

Using the collection

all_foos = getUtility(IAllFoo)
foos = all_foos.ownedBy(person).withStatus(status).select()

The arguments to select are the same as the first argument to Store.find().

Adding adapters

It is possible to add adapters for objects of interest to get a collection initialized as appropriate.

For instance you could add adapters such that:

IFooCollection(product)

returned you a FooCollection for all the Foos associated with that product.

To do so, define a function that takes the original object and returns an IFooCollection, e.g.:

def product_to_foo_collection(product):
    return getUtility(IAllFoos).inProduct(product)

And then add something like this to the relevant ZCML:

  <adapter
      for="lp.registry.interfaces.product.IProduct"
      provides="lp.app.interfaces.foocollection.IFooCollection"
      factory="lp.app.adapters.branchcollection.product_to_foo_collection"/>

Adding Joins

Two Collection methods help you join other tables into a collection: joinInner which creates a run-of-the-mill inner join and joinOuter which adds in the new table using an outer (or "left") join. They both work like:

    joined_collection = base_collection.joinInner(Person, Person.id == Foo.owner)
    joined_collection = base_collection.joinOuter(Person, Person.id == Foo.owner)

(Of course the "outer" case means that the Person will be None if there is no Person.id matching Foo.owner. The "inner" case will just filter out Foo items that don't have an owner.)

Custom Selects

The select method returns a ResultSet of Foo by default:

    num_foos = all_foos.select().count()
    print "There are %d foo(s)." % num_foos
    if num_foos > 0:
        print "The oldest foo is %s." % all_foos.select().order_by(Foo.id)[0]

However you can select any combination of columns and objects that are in the query. The default is to select Foo objects, but you can ask for more (or different) data when you invoke select. Each select will create a new ResultSet so each will be executed separately.

    foos_and_owners = all_foos.innerJoin(Person, Person.id == Foo.owner)
    for foo, owner_name in foos_and_owners.select(Foo, Person.name):
        print "Foo #%d is owned by %s." % (foo.id, owner_name)

Optimization

As you know most Foos are publicly accessible, but a few are private. Finding private Foo objects that are visible to the current user is expensive:

        def visibleTo(self, user):
            """Restrict to `Foo`s that `user` can see."""
            Owner = ClassAlias(Person)
            with_owner = self.joinOuter(Owner, Owner.id == Foo.owner)
            with_user = with_owner.joinOuter(
                TeamParticipation,
                TeamParticipation.team_id == Owner.id)
            return with_user.refine(
                Or(
                    # Return Foos that are public, or are owned by
                    # "user," or are owned by teams that "user" is in. 
                    Foo.is_private == False,
                    TeamParticipation.person_id == user.id))

This is a big "performance pattern" in Launchpad. There are really two queries in here: a narrow-but-deep one that only looks at public Foo and gets of results, and a wide-but-shallow one that needs to join in other tables and check further details for the few private Foo.

You can speed this up treating these two as separate collections. Since each refinement on a collection creates a new one and leaves the old one intact, it's easy to re-use the common parts between both:

    interesting_foos = all_foos.refineOneWay().refineAnotherWay()
    public_foos = interesting_foos.isPublic(True)
    private_foos = interesting_foos.isPublic(False).visibleTo(user)
    return Union(public_foos.select(), private_foos.select())

(Of course this also leaves a lot of dead wood in visibleTo that you can cut to make it faster: you no longer need the Or and the joins can become inner joins).

One example of this is in lp.code.model.branchcollection. Look for the visibleByUser method.

Collections (last edited 2019-10-04 11:27:35 by cjwatson)