[web-devel] Join support in persistent

Michael Snoyman michael at snoyman.com
Sun Apr 3 06:19:24 CEST 2011


On Sun, Apr 3, 2011 at 6:40 AM, Greg Weber <greg at gregweber.info> wrote:

> That is wonderful- application joins or database joins. Lets compare to
> Rails:
>
> selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc] [EntryIsPublicEqTrue]
> [EntryPublishedDesc] EntryAuthorEq
>
> Author.where(:isPublic => true).order("name").includes(:entries) &
> Entry.where(:isPublic => true).order("published DESC")
>
>
How does Rails handle when there are two join keys? For example:

Person
  name String
Entry
  author PersonId
  editor PersonId
  title String


> Note that a Rails query is lazy and the & is combining the queries.
> However, when there are no filtering criteria on the association, Rails
> prefers to perform 2 queries- one to retrieve the authors, and then one to
> retrieve the entries based on the author ids:
>
> SELECT "entries".* FROM "entries" WHERE ("entries".author_id IN
> (51,1,78,56,64,84,63,60))
>
> Originally rails always did do a SQL level join, but then decided to switch
> to preferring app-level, largely because it reduced the number of Ruby
> objects that needed to be allocated, resulting in much better performance
> for large data sets [1].
>
>
Regarding lazy: we could definitely go that route of allowing laziness via
unsafeInterleaveIO, but it's very contrary to the normal workings of
Yesod/Persistent. If we want constant space, we should use enumerators I
think.

As far as the performance concern: do you think we need to be worried about
the large number of allocated objects? My guess is that Haskell won't have
the same concerns here, but I could be wrong.


> It appears that persistent.Join is instead performing an n + 1 query- one
> query per each author. We should avoid these kinds of queries, and then
> there will not be much point to an outer join in the db.
>
>
Good point, we can easily fix that. However, can you clarify what you mean
by outer join here? Why would there be need for an outer join?

* Correction: It's not so simple to add, at least not efficiently. We would
need to have a function like this:

selectOneMany :: (PersistEntity one, PersistEntity many, PersistBackend m,
Eq (Key one))
              => [Filter one]  -> [Order one]
              -> [Filter many] -> [Order many]
              -> ([Key one] -> Filter many)
              -> (many -> Key one)
              -> m [((Key one, one), [(Key many, many)])]
selectOneMany oneF oneO manyF manyO inFilt' getKey = do
    x <- selectList oneF oneO 0 0
    let inFilt = inFilt' $ map fst x
    y <- selectList (inFilt : manyF) manyO 0 0
    return $ map (go y) x
  where
    go manys one@(key, _) = (one, filter (\x -> getKey (snd x) == key)
manys)

I'm not convinced that there will be any performance enhancement here. Most
likely, when dealing with small datasets, this will pay off due to the
savings in the number of bytes transmitted with the server. But for large
datasets, the O(m * n) complexity (number of ones times number of manys)
will hurt us. I'd prefer to optimize for larger datasets. Plus, I think the
current API is much nicer.

If you can think of a better approach than this, let me know. But remember
that there's no way to know the sort order of the keys of the one table.


> Looking at the behavior of Rails for joins, I don't like how it decides
> between types of joins. The sql produced by Rails is not in fact identical
> to the persistent one: it will do an outer join with the entry filtering in
> a WHERE clause, not as part of the JOIN conditions.
> If we are to support joins it needs to be very apparent which type of join
> is performed.
>
>
As far as I know, every database on the planet these days treat these two as
identical for performance reasons:

    SELECT * from a, b where a.foo = b.bar
    SELECT * from a INNER JOIN b ON a.foo = b .bar

I went the INNER JOIN route because I've always had a preference for it. But
*outer* join will be a very different beast. If you look in the runtests.hs
file, it specifically relies on the fact that we're doing an inner join. An
outer join would mean that *all* authors appear in the output set, while an
inner join will only include authors with entries.

I agree that we should make this clear in the docs.


>
> selectOneMany doesn't have an offset and limit. If we added it we end up
> with queries like this:
>
> selectOneMany [] [] [] [] EntryAuthorEq 0 0
>
>
I purposely avoided offset and limit for now, since I'm not exactly certain
how it should be applied. Should it offset/limit the number of ones? The
number of manys? The total number of manys for all ones, or the number of
manys per one?


> This function with 5+ required arguments is somewhat awkward/difficult to
> use and to read. Rails is composable in a readable way because it copied
> haskellDB. I would like to get away from the empty optional arguments.
>

OK, how about this:

data SelectOneManyArgs one many = SelectOneManyArgs { oneFilter :: [Filter]
} ...

defaultOneManyArgs :: (Key one -> Filter many) -> SelectOneManyArgs one many

But I'd like to have shorter names somehow.


> I am all for adding these changes for now, I just hope we can move to a
> more composable API in the future.
> I thought the API that Aur came up with was a better effort in that
> direction, although there are definitely practical issues with it.
>
>
I definitely think we should continue exploring the design space to see if
we can come up with better solutions. But I have a sneaking suspicion that
if we want to have fully customizable queries that allow arbitrary joining
and selecting individual fields, we're going to end up with some kind of SQL
syntax inside of Template Haskell. I'm all for making something like that...
but it's not Persistent.

I think Persistent's goal should *not* be to handle every possible query you
can ever imagine. It should handle the common cases efficiently, with
type-safety and a simple API. It should also allow people to easily drop
down to something more low-level- possibly sacrificing type safety- when the
need arises. And over time, as we get more user experience feedback, we can
push the boundary farther of what Persistent handles out-of-the-box.

If we get to the point where 95% of queries people perform can be handled
with an out-of-the-box function, and for the 5% people need to write some
SQL (or MongoDB backend code, or Redis...), I think we'll have hit our
target.

Michael


> [1]
> http://akitaonrails.com/2008/05/25/rolling-with-rails-2-1-the-first-full-tutorial-part-2
>
> Greg Weber
>
> On Sat, Apr 2, 2011 at 2:50 PM, Michael Snoyman <michael at snoyman.com>wrote:
>
>> Hey all,
>>
>> After a long discussion with Aur Saraf, I think we came up with a good
>> approach to join support in Persistent. Let's review the goals:
>>
>> * Allow for non-relational backends, such as Redis (simple key-value
>> stores)
>> * Allow SQL backends to take advantage of the database's JOIN abilities.
>> * Not force SQL backends to use JOIN if they'd rather avoid it.
>> * Keep a simple, straight-forward, type-safe API like we have
>> everywhere else in Persistent.
>> * Cover the most common (say, 95%) of use cases out-of-the-box.
>>
>> So our idea (well, if you don't like it, don't blame Aur...) is to
>> provide a separate module (Database.Persist.Join) which provides
>> special functions for the most common join operations. To start with,
>> I want to handle a two-table one-to-many relationship. For
>> demonstration purposes, let's consider a blog entry application, with
>> entities Author and Entry. Each Entry has precisely one Author, and
>> each Author can have many entries. In Persistent, it looks like:
>>
>> Author
>>    name String Asc
>>    isPublic Bool Eq
>> Entry
>>    author AuthorId Eq
>>    title String
>>    published UTCTime Desc
>>    isPublic Bool Eq
>>
>> In order to get a list of all entries along with their authors, you
>> can use the newly added[1] selectOneMany function:
>>
>>    selectOneMany [AuthorIsPublicEq True] [AuthorNameAsc]
>> [EntryIsPublicEqTrue] [EntryPublishedDesc] EntryAuthorEq
>>
>> This will return a value of type:
>>
>>    type AuthorPair = (AuthorId, Author)
>>    type EntryPair = (EntryId, Entry)
>>    [(AuthorPair, [EntryPair])]
>>
>> In addition to Database.Persist.Join, there is also a parallel module
>> named Database.Persist.Join.Sql, which has an alternative version of
>> selectOneMany that is powered by a SQL JOIN. It has almost identical
>> semantics: the only catch comes in when you don't fully specify
>> ordering. But then again, if you don't specify ordering in the first
>> place the order of the results is undefined, so it really *is*
>> identical semantics, just slightly different behavior.
>>
>> Anyway, it's almost 1 in the morning, so I hope I haven't rambled too
>> much. The basic idea is this: Persistent 0.5 will provide a nice,
>> high-level approach to relations. I'll be adding more functions to
>> these modules as necessary, and I'd appreciate input on what people
>> would like to see there.
>>
>> Michael
>>
>> [1]
>> https://github.com/snoyberg/persistent/commit/d2b52a6a7b7a6af6234315492f24f821a0ea7ce4#diff-2
>>
>> _______________________________________________
>> web-devel mailing list
>> web-devel at haskell.org
>> http://www.haskell.org/mailman/listinfo/web-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/web-devel/attachments/20110403/312396f9/attachment-0001.htm>


More information about the web-devel mailing list