Chapter 18. ESN - Event Subscription Notification

ESN - Event/Subscription/Notification - works like this:

Lots of things on the site fire events. An event is defined by the tuple:

       (journalid,eventtype,e_arg1,e_arg2)

Where journalid is the primary journal it took place in (the journal a new post is in, or the journal a comment is in). In some cases, it makes less sense. For example, a befriending: just pick which of the two-journal(user) relationship is more important, and make that the journalid for that event type.

e_arg1 and e_arg2 are eventtype-defined integers. But that's all you have to completely describe the event that took place. Given that it's not much space, most eventtypes will use one or both fields to point to another database record that fully describe the event. e.g, e_arg1 being the primary key of a new journal event or journal comment.

People then subscribe to events, subject to privacy/stalking rules. Just because an event fires, does not mean it is subscribable. A subscription has its own arg1/arg2, but those s_arg1/s_arg2 have nothing to do with the e_arg1/e_arg2.

How events get processed async by TheSchwartz (reliable async job system)

  1. web context logs one job “LJ::Worker::FiredEvent” with params:

  2. journalid, etypeid, arg1, arg2

    (just enough to recreate the event)

  3. async worker picks it up and ultimately has to create a new job for each matching subscription, over all clusters.

Logically, this can be split into the following steps:

foreach cluster,

  1. find all subscriptions for that jid/eventtypeid (including wildcards)

  2. filter those down to those that match

  3. enqueue jobs for each subscription to fire those

But we take some fast paths. Given the following steps:

[FiredEvent] -> [FindSubsPerCluster] -> [FilterSubs] -> [ProcessSub]

We can often skip from FireEvent directly to N x ProcessSub jobs if a) clusters are up, b) N is small. Note that TheSchwartz has a “replace_with” operation that atomically closes on jobs as completely if N other jobs take its place. We use this operation to expand the original 1 FiredEvent job into N ProcessSub jobs, where N is potentially huge, in the hundreds of thousands to more. In those non-fast paths is where we split the job into parts at a much slower rate, utilizing all four steps above, not jumping from FiredEvent to ProcessSub. Also, if any cluster is down, we always split the job into FindSubsPerCluster * # clusters.

So the different paths:

Using 5,000 for $MAX_FILTER_AT_A_TIME