Diff for "StormMigrationGuide"

Not logged in - Log In / Register

Differences between revisions 3 and 4
Revision 3 as of 2011-01-19 22:57:52
Size: 20427
Editor: lifeless
Comment: note that we can't use storm.base.Storm
Revision 4 as of 2020-03-25 14:39:38
Size: 20410
Editor: cjwatson
Comment: update for modern LP tree layout; mention StormBase
Deletions are marked like this. Additions are marked like this.
Line 15: Line 15:
from canonical.launchpad.webapp.interfaces import (
    IStoreSelector, MAIN_STORE, AUTH_STORE,
    MASTER_FLAVOR, SLAVE_FLAVOR, DEFAULT_FLAVOR)
from lp.services.database.interfaces import (
    DEFAULT_FLAVOR,
    
IStoreSelector,
   
MAIN_STORE,
    )
Line 135: Line 137:
 * Subclass from `storm.base.Storm` instead of `canonical.database.sqlbase.SQLBase`.
   '''You must hook in a storm_invalidate hook for CachedProperty if you use storm.base.Storm'''. We should add a canonical.database.storm.LPStorm base class instead.
 * Subclass from `lp.services.database.stormbase.StormBase` instead of `lp.services.database.sqlbase.SQLBase`. (Subclassing `storm.base.Storm` also works in most cases, but `StormBase` adds a `storm_invalidate` hook for cached properties.)
Line 250: Line 251:
To perform direct SQL queries, we previously used the `cursor()` function from `canonical.database.sqlbase` to get a cursor on the connection being used by SQLObject. These uses should be converted to use `Store.execute()`, which will make sure pending changes have been flushed to the database first in order to stay consistent. To perform direct SQL queries, we previously used the `cursor()` function from `lp.services.database.sqlbase` to get a cursor on the connection being used by SQLObject. These uses should be converted to use `Store.execute()`, which will make sure pending changes have been flushed to the database first in order to stay consistent.
Line 269: Line 270:
 6. Change the class to derive from `storm.base.Storm` instead of `SQLBase`.  6. Change the class to derive from `lp.services.database.stormbase.StormBase` instead of `SQLBase`.
Line 336: Line 337:
canonical.database.sqlbase.block_implicit_flushes function decorator. lp.services.database.sqlbase.block_implicit_flushes function decorator.
Line 385: Line 386:
  * Try canonical.database.enumcol.DBEnum.   * Try lp.services.database.enumcol.DBEnum.

This guide explains how certain SQLObject concepts map to equivalent Storm concepts. It expects a level of familiarity in how SQLObject works (or at least how it is used in Launchpad). It is not a full tutorial on how to use Storm either – see https://storm.canonical.com/Tutorial for that.

Differences

Now that we've landed the Storm code, Launchpad is running on top of Storm's SQLObject compatibility layer. This is not the end of the story though, since we want to move to using the native Storm API. Due to the way the compatibility layer is structured it is possible to start using many of Storm's native APIs right away, so this will be a gradual process rather than a single big change as with the first stage.

Connections

With SQLObject, each database class has a connection associated with it which is used for loading objects and performing queries. With Storm, the equivalent concept is a Store. Unlike SQLObject, stores are bound to instances rather than their classes. This means that a single class can be used to refer to objects in multiple databases (or to objects in the same database over different DB connections, as you might want to do in tests).

There are two main ways to access the main store. One is explicitely via the IStoreSelector utility:

  •    1 from lp.services.database.interfaces import (
       2     DEFAULT_FLAVOR,
       3     IStoreSelector,
       4     MAIN_STORE,
       5     )
       6 
       7 store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
    

Use the master flavor if you need to update the objects. Use the slave flavor to offload a search to a replica database and don't mind the search being made on data a few seconds out of date. Use the default flavor if you don't need to make changes, but need an up to date copy of the database (eg. most views, as the object you are viewing might just have been created) - Launchpad will choose an appropriate flavor.

The other method is from an existing object:

  •    1 from storm.store import Store
       2 
       3 store = Store.of(some_object)
    

The second form is often more convenient, and is preferred if you don't need to make updates and want them to play nicely with objects from an unknown store (eg. passed in via your method parameters).

Utility methods and Stores

If you are writing a utility method like MailingListSet.get, use the default store. Utility methods can't know whether the caller will be writing to objects it retrieves. But the default choice makes pretty good guesses about whether your operation needs the master store. It uses the master store:

  • If you are in a non-web context, like a batch job
  • If you are doing a POST (which means your overall operation may write)
  • If you are doing a GET, but have recently written (which means the slaves may not have your latest changes).

So the only times you'll run into trouble are if:

  • a GET operation writes to the database
  • a GET operation relies on data that was written to the database by another GET
  • a GET operation relies on data that was written to the database by another browser instance.

We plan to address these issues better once we're using Python 2.5 and its support for with statements / context management.

Adding Objects

/!\ Note: this section applies to classes that are not defined using the compatibility layer. Classes using the compatibility layer continue to provide the SQLObject behaviour.

With SQLObject, a default constructor is added to database classes that inserts a new row in the database that takes column values as keyword arguments. Storm does not provide a default constructor, so classes will need to add one.

Furthermore, Storm does not add the object to the database on instantiation: that must be done separately. There are two ways that an object can be added to a store. It can be added directly:

  •    1 store.add(object)
    

Or you can link it to an existing object, which will add it to that object's store:

  •    1 owner = getUtility(IPersonSet).getByEmail('test@canonical.com')
       2 product = Product(owner=owner) # product added to owner's store
    

Removing Objects

Objects can be removed from the database using the Store.remove method. To remove an object from its store, you can use:

  •    1 from storm.store import Store
       2 
       3 Store.of(some_object).remove(some_object)
    

Getting Objects by ID

The equivalent of SQLObject's Class.get() method is Store.get. It takes the class and the primary key of the object as arguments:

  •    1 store = getUtility(IZStorm).get('main')
       2 person = store.get(Person, 42)
    

Querying Objects

The equivalent of SQLObject's select, selectBy, selectOne, selectOneBy, selectFirst and selectFirstBy methods is Store.find(). It acts quite similar to the equivalent SQLObject methods, and the following are equivalent:

  •    1 result = store.find(Person, displayname='Some guy')
       2 result = store.find(Person, Person.displayname == 'Some guy')
       3 result = store.find(Person, "person.displayname = 'Some guy'")
    

Note that the ".q." bit is not required in the second example. The first two versions are preferred to direct SQL since they allow Storm to determine which tables are being used in the query automatically. As with SQLObject, no query is issued when executing find(): that is delayed until you try to access the result set.

The behaviour of selectOne and selectFirst are covered by the one and first methods on the result set. You can chain them with the find call if it is appropriate:

  •    1 # Raises NotOneError if there is more than one item in the result set
       2 person = store.find(Person, displayname='Some guy').one()
       3 
       4 # Raises UnorderedError if the result set has no order
       5 person = store.find(Person, displayname='Some guy').first()
       6 
       7 # Like first() but doesn't complain about unordered result sets
       8 person = store.find(Person, displayname='Some guy').any()
    

Result sets can be indexed, sliced and iterated over as with SQLObject. An ordering can be applied to the result set with the order_by method:

  •    1 result.order_by(Person.name, Person.id)
    

Unlike SQLObject, the ordering is applied to the result set rather than creating another one. The method does return the result set though, to make it possible to chain the calls when constructing a result set. Similar to SQLObject, a table can specify the default ordering for results with the __storm_order__ class attribute.

See the storm.store.ResultSet doc strings and the Storm tutorial for more details on what is possible.

Defining Tables

Some of the primary differences between SQLObject and Storm database class definitions are:

  • Subclass from lp.services.database.stormbase.StormBase instead of lp.services.database.sqlbase.SQLBase. (Subclassing storm.base.Storm also works in most cases, but StormBase adds a storm_invalidate hook for cached properties.)

  • Use the __storm_table__ attribute to set the table name instead of _table.

  • The primary key must be defined explicitly. This will usually look like:
    •    1 id = Int(primary=True)
      
  • The class should have a constructor if appropriate (some classes like BugSubscription may not need one). Note that the constructor should not usually add the object to a store -- leave that for a FooSet.new() method, or let it be inferred by a relation. BarryWarsaw: what if there is no FooSet or relation? See question below.

  • Default result set ordering should be set using the __storm_order__ property rather than _defaultOrder.

  • Use the column definition classes are found in storm.properties, and do not use the Col suffix. In general, they will follow Python's type naming conventions rather than SQL's (e.g. TimeDelta rather than Interval).

  • There is no equivalent of alternateID=True. The Store.find() method provides equivalent functionality to the byColumnName methods generated by this argument.

  • To specify that a column can not contain NULLs, use allow_none=False rather than notNull=True. Note that if NULLs are found in such columns, NoneError will be raised.

  • If no default is specified for a column, the database default will be used. So default=DEFAULT or similar can be removed.

  • Be sure your table has a PRIMARY KEY constraint defined, otherwise your id column will not get set automatically and you will get an IntegrityError from PostgreSQL.

Foreign Key References

The equivalent of SQLObject's ForeignKey class is Reference. A Storm Reference property creates a relationship between a local column and a remote column. Unlike ForeignKey, it does not implicitly create the FK column. So the following definitions are equivalent:

  •    1 # SQLObject
       2 owner = ForeignKey(foreignKey='Person', dbName='owner')
       3 
       4 # Storm
       5 ownerID = Int('owner')
       6 owner = Reference(ownerID, 'Person.id')
    

The columns can be passed directly to Reference(), or can be passed as strings that are looked up on first use.

The Reference class is also used to replace SQLObject's SingleJoin class:

  •    1 # SQLObject
       2 import_job = SingleJoin('CodeImportJob', joinColumn='code_importID')
       3 
       4 # Storm
       5 import_job = Reference(id, 'CodeImportJob.code_importID', on_remote=True)
    

Reference Sets

The SQLMultipleJoin and SQLRelatedJoin classes are replaced by Storm's ReferenceSet:

  •    1 # SQLObject
       2 subscriptions = SQLMultipleJoin('QuestionSubscription', joinColumn='question')
       3 subscribers = SQLRelatedJoin('Person',
       4     joinColumn='question', otherColumn='person',
       5     intermediateTable='QuestionSubscription', orderBy='name')
       6 
       7 # Storm
       8 subscriptions = ReferenceSet(id, QuestionSubscription.questionID)
       9 subscribers = ReferenceSet(id, QuestionSubscription.questionID,
      10                            QuestionSubscription.personID, Person.id,
      11                            order_by=Person.name)
    

While the SQLObject properties return plain result sets, the Storm properties return BoundReferenceSet objects. Some differences include:

  • add(obj) and remove(obj) methods are provided for adding and removing objects from the set. These are roughly equivalent to the automatic addFoo() and removeFoo() methods that SQLObject generates. For reference sets that join through a third table, Storm will take care of inserting and deleting rows as needed.

  • A find() method is provided for searching for objects within the reference set. This behaves a lot like Store.find() without the first argument.

Property Setters / Validators

SQLObject provided two ways of controlling how variables were set:

  1. magic _set_columnName() methods.

  2. the validator argument on column definitions.

Storm does not support magic methods but does have validators (albeit in a simpler form than SQLObject). A validator is a function that takes (object, attr_name, new_value) as arguments and returns the value that should be set. This allows validation to be performed on the new value (by raising an exception on bad values), and transformation of the value if appropriate (by returning something other than new_value).

A validator can be set for a column with the validator argument in the column definition.

You may notice some uses of storm_validator in code using the compatibility layer. As the compatibility layer does not implement the either of the SQLObject validation APIs, this was done to allow use of Storm validators without completely rewriting the definitions.

Prejoins

Storm's equivalent of prejoins is tuple finds. To select all products that are part of launchpad-project and their owners, we can do:

  •    1 launchpad_project = store.find(Project, name='launchpad-project')
       2 result = store.find((Product, Person),
       3                     Product.project == launchpad_project,
       4                     Product.owner == Person.id)
    

Iterating over this result will give us (product, person) tuples. The above case performs an inner join, so is not appropriate for cases where the foreign key linking the tables can be NULL. In those cases, a slightly different syntax is needed:

  •    1 from storm.expr import LeftJoin
       2 
       3 result = store.using(LeftJoin(Product, Project, Product.project == Project.id)).find(
       4     (Product, Project))
    

This result set will return (product, project) tuples, with project set to None where appropriate.

If you need to select a table multiple times, it is necessary to alias it. For example:

  •    1 from storm.info import ClassAlias
       2 
       3 Driver = ClassAlias(Person, 'driver')
       4 result = store.using(LeftJoin(Product, Driver, Product.driverID == Driver.id), Person).find(
       5     (Product, Person, Driver), Product.owner == Person.id)
    

This result set will return (product, owner, driver) tuples.

Direct SQL Queries

To perform direct SQL queries, we previously used the cursor() function from lp.services.database.sqlbase to get a cursor on the connection being used by SQLObject. These uses should be converted to use Store.execute(), which will make sure pending changes have been flushed to the database first in order to stay consistent.

This method returns a result object with get_one and get_all methods that act like a cursor's fetchone and fetchall methods. It also supports iteration.

  •    1 result = store.execute("SELECT name FROM person ORDER BY name")
       2 names = result.get_all()
    

Migration Plan

A good order to migrate code is:

  1. Convert column properties to use the Storm syntax. This should be a no-op change, and not affect external code.
  2. Convert ForeignKey() definitions to an appropriate pair of Int() and Reference() definitions.

  3. Convert sync(), syncUpdate(), destroySelf(), etc calls to Storm equivalents.

  4. Convert uses of Class.select*() to use find(). Note that you lose prejoins support here, so use tuple finds as appropriate. Change queries to use Storm expressions rather than sqlbuilder expressions.

  5. Convert SQLMultipleJoin and SQLRelatedJoin to ReferenceSet(). As this changes the API of the class a bit, it will probably require changes external to the class.

  6. Change the class to derive from lp.services.database.stormbase.StormBase instead of SQLBase.

This list is roughly ordered based on the locality of changes and based on dependencies between changes.

For new code, consider using native Storm API from the start, rather than continuing to use the compatibility layer.

Tips on Converting Tests

From "Tips in converting tests to Storm", May 30, 2008,

Below are some tips on writing Storm code for Launchpad.  I won't go
too deep into the Storm API, and instead concentrate on some of the
differences between SQLObject and Storm's SQLObject compatibility
layer.

1. Storm is stricter with respect to the types it accepts in various
situations.  Most of the cases where this has caused problems in tests
have indicated problems in Launchpad or its tests.  Below are a number
of the common problems I've encountered:

(a) The SQLObject EnumCol accepts values other than enumeration values
on the Python side.  With the upgrade to Storm, things are a bit
stricter, and the correct enumeration values need to be passed in.

I found a few cases where some code had an enumeration value and then
passed in item.value.  In a few tests, the numeric constants were
being used.

This affects both creating/updating objects and building queries.

(b) When assigning to a foreign key attribute, you need to assign an
object of the right type.  It seems that SQLObject would accept any
class for such assignments.  And provided a row from the correct table
existed with the same ID existed you'd get no complaints from the
database.  There were a few tests that had bugs like this.

(c) SQLObject lets you assign a result set to a foreign key reference
when creating or updating an object, while Storm does not.

This might sound like a missing feature, except for the fact that
SQLObject seems to treat result sets as NULLs when generating SQL.  So
any new errors caused by this are genuine errors.

2. Storm flushes changes to the database implicitly before various
operations.  In general, this is good since you don't need to remember
to flush changes before running select().  That said, we have a number
of cases where we have code that relies on changes not being flushed
to disk.  Some examples include:

* the code to warn about assigning bugs to non developers transitioned
to the new assignee and then checked to see if the new assignee had
any bugs assigned to them.  As Storm flushed the change in assignee,
it always looked like the user had assigned bugs.  Switching the order
of these two operations fixed the bugs.

* Some of the PPA tests would set a PPA to private and then set the
buildd password.  Database constraints require that private PPAs have
a password, so in some tests where a flush occurred between the two
operations an IntegrityError was raised.  Reordering the two
statements fixed the problem.

Of course, there are cases where it is useful to have implicit flushes
turned off.  There is an API to block implicit flushes, and I've made
use of it for our security policy (which could otherwise introduce
flushes to almost any attribute access) and most event subscribers.
I've done this in most cases with the
lp.services.database.sqlbase.block_implicit_flushes function decorator.

3. Storm flushes some changes later than SQLObject.  Namely inserts or
deletes to the database.  Furthermore, the order that objects are
added in a single flush is not defined.  This exhibits itself in two
ways:

* If inserting a row would cause an integrity error, that error will
occur at flush time rather than object construction time.  Tests for
such failures need to explicitly flush the object.

* Some tests would create a number of objects in a single flush group
and expect them to have IDs in the same order.  Such tests need to
either take this into account or add explicit flushes to preserve the
ordering.

* In cases where objects are created that reference each other in a
loop, a manual flush will be needed before closing the loop.
Otherwise Storm won't know what order to insert them in.

4. sqlbuilder expressions do not yield SQL from str().  Storm uses
quite a different method to convert sql expression objects to SQL
statements, and this is a result of that.  There were a few cases of
code that took a builder expression and substituted it into a string
to form a larger query.  I've generally fixed cases like this by
converting the string expression to builder objects.

Constant expressions (e.g. UTC_NOW) will still work with sqlvalues(),
but not when substituted directly.


James.

Questions

12-Aug-2008

  • Some of our ForeignKey columns had notNull=True but Storm's Reference class does not accept allow_none=False keyword argument.

  • How to actually convert a UtcDateTimeCol to a DateTime? For now, I'm using a DateTime with tzinfo=pytz.timezone('UTC') keyword arg. Also, does default=UTC_NOW still work?

    bigjools: use default_factory=datetime.utcnow instead.

  • Can I still use EnumCol, or is there a better way to hook up with our DBEnums?

    • Try lp.services.database.enumcol.DBEnum.

03-Oct-2008

  • I'm still confused about the right way to add an object to a store. If I'm using native Storm APIs (as all new code should, right?) should I add a

    Store.add() call my database object's __init__()? That seems to be the most straightforward translation of the SQLObject compatibility layer. And if the answer is "yes", then how do I get the Store to use? I could use Store.of(someobj).add(self) but someobj might not be in the right store. I could use the getUtility() trick, but it seems wrong that a database module should be importing an interface from webapp.

StormMigrationGuide (last edited 2020-03-25 14:39:38 by cjwatson)