Monday 7 December 2015

Event Queues Part Uno

WHY SHOULD I KEEP AS LESS RECORDS AS POSSIBLE

CMS Tuning Guide [2.3 Chapter] recommends to keep less than 1000 records in Event Queue in ideal world.

The more records exist in EventQueue table, the more time and resources it takes for SQL Server to find fresh rows raised by other instances.

Taking into account that the aforementioned SQL would be executed every 2 seconds by every Sitecore instance, keeping obsolete entries in database could lead to MSSQL overheat.

A symptom would be SQL Timeout errors in Sitecore Logs:

Message: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

Source: Sitecore.Kernel
   at Sitecore.Data.DataProviders.Sql.DataProviderCommand.ExecuteReader()
.....
   at Sitecore.Eventing.EventQueue.ProcessEvents(Action`2 handler)
   at Sitecore.Eventing.EventProvider.RaiseQueuedEvents()

WHO CLEANS UP ?

Sitecore ships with Sitecore.Tasks.CleanupEventQueue agent defined in web.config.

It can remove one day old records.

A more aggressive cleanup policy was implemented in 7.2 Update-4 [ref. #392673]. It allows to specify number of minutes to keep. Yippee!

Although one can always configure custom SQL Agent to manually cleanup Event Queue, but Sitecore stock mechanism does its job pretty well.

AM I AFFECTED?

A handy SQL Query would help to answer the question:

SELECT SUBSTRING(p.[Key],9,100) AS [Instance], CONVERT(BINARY(8), CAST(CAST(p.[Value] AS NVARCHAR(10)) AS int )) AS [LastProcessedStamp],
(SELECT COUNT(*) FROM [EventQueue] WHERE [Stamp] > CONVERT(INT, CAST(p.[Value] AS NVARCHAR(8)))) AS [TODO],
(CASE WHEN (q.[Created] is null) THEN
(
CONVERT(VARCHAR(24),(SELECT MAX([Created])-MIN([Created]) FROM EventQueue),20)
)
ELSE
CONVERT(VARCHAR(24),(SELECT top(1) [Created] AS TopCreated FROM EventQueue order by [Stamp] desc) - (q.[Created]),20)
end) AS [ProcessingDelay],
 SUBSTRING(q.[EventType],0, CHARINDEX(',',q.[EventType])) AS [LastEventType],
 q.[InstanceName] as [RaisedByInstance],
 q.[UserName] as [RaisedByUser],
 q.[Created] as [RaisedTime],
 q.[InstanceData] as [LastEventData],
 q.[Id] as [LastEqID]

FROM  Properties p 
FULL join EventQueue q
ON q.[Stamp] = CONVERT(BINARY(8), CAST(CAST(p.[Value] AS NVARCHAR(10)) AS int ))
WHERE p.[Key] LIKE 'EQStamp%'
order by TODO

Sitecore keeps last processed event stamp in Properties table as number in decimal, whereas SQL stores stamp as TimeStamp (shown as hex)

Given SQL query would try to find last processed event details, and number of events that were created later ( pending events ).

ProcessingDelay column is a good measure showing how much time would it take for a published item to appear in live site. It is calculated as timespan between now, and last processed event creation time.

WHY COULD DELAY APPEAR ?

Common sense rule - events are raised faster then processed.

Publishing, or content editing can be performed in a few threads, whereas events processing is done in a single Heartbeat thread.


It could also take some time to find & remove modified item entry from large Sitecore caches that are accessed by many threads simultaneously.

It is expected that some time could be required for Sitecore servers to process all the events during massive publish operations.


HOW TO FIGHT BACK ?

Raise less events, write less data, stop storing legacy data in EventQueue.

In next articles I will show basic techniques on how to eliminate processing delays.

Wednesday 2 December 2015

Sitecore Heartbeat


What is Heartbeat ?

Heartbeat is vital Sitecore CMS timer, that enables subscribed code execution to be executed periodically.


Whenever it rings, a list of due subscribers is built & invoked one-by-one.

If any of subscribers takes ages to be executed, all other has nothing to do but wait.


By whom is it used ?

Usages via iLSpy
Core CMS features rely on the mechanism:

  • Indexing
  • Publishing
  • Sitecore Jobs
  • Event Queue processing

However custom code is welcomed to use it as well. You can either subscribe on static 'Sitecore.Services.Heartbeat.Beat' event, or build a new instance of
'Sitecore.Services.AlarmClock' class, which internally uses Beat Event as well.

How is it implemented?


A thread named "Heartbeat" is created during CMS start. You might have noticed that some Sitecore log entries start with 'Heartbeat' - those were raised by background timer.


It would do following in forever loop:

  1. Sleep a few seconds ( specified by 'HeartbeatInterval' setting )
  2. Get all subscribers for 'Heartbeat.Beat' event
  3. Invoke them one-by-one

Possible pitfalls

If execution of a subscriber takes a while, all other subscribers must wait.

As a result CMS vital mechanisms ( Event Queues, Indexing ) would need to wait for invocation as well =\

You can observe the situation by filtering Sitecore Logs via 'Heartbeat' text filter -> gaps might appear in time panel.

Thus if you tend to initiate a new 'AlarmClock' that would execute heavy logic, please don`t do it.

Consider creating a Sitecore Job instead.

Tuesday 1 December 2015

Event Queues. Why ? How ? When ?


CMS documentation:
Event Queues [ aka EQs ] are needed to ensure data and cache coherence and support communication between Sitecore instances.

In simple words - all the data changes are there.

No matter if you do publishing, content editing, or any other modifications, Sitecore will record those changes into Event Queue, and spread them among other instances.

WHY? 

Use-case:
  • Andy visits panda site page. He sees 'Panda in zoo' article title
  • Sitecore renders the requested page:
        • fetches needed items from DB
        • puts them to cache




  • If Bob visits same panda page, Sitecore would NOT read data from database, as all the needed info in cache!

  • So far good. But here comes the content change:
    • Content author Leo modifies items for panda page, so that article title is 'Panda is Free'
    • Leo publishes changes, and data gets modified in web database
    • Leo goes to live site and requests panda page
    • He sees that 'Panda is Free' now
    How come ? Sitecore cached all the needed item data to built page. As a result Bob`s request was served without a single SQL query fired. . .
    However Leo saw modified content, not cached one.


    Event queue mechanism allows to apply changes from one instance to others.
    Live server had processed changes introduced by publishing instance, and had removed obsolete data from cache before Leo request in our example.

    HOW? WHEN?

    Each Sitecore instance would read fresh events raised by other instances and apply them locally every 2 seconds by default.

    All Sitecore content databases have their own 'EventQueue' table which stores following data:

    • InstanceData - huggeee JSON document with Event details !
    • EventType (f.e. item saved, property changed, publish ended)
    • Who is responsible for doing that ( UserName )
    • Which Sitecore instance did that ( InstanceName )
    • When was is done ( Created )
    • Stamp ( monotonously increasing value )

    The last processed stamp by previous iteration is stored in Properties table (key-value storage) with EQStamp_InstanceName ( f.e. EQSTAMP_WS-NMI2-SC72REV140526 ) as plain number (f.e. 566174).

    Sitecore CMS would read fresh events in similar SQL query:

    SELECT * FROM EventQueue WHERE InstanceName!=CurrentInstance AND STAMP>lastProcessed

    HOW LONG SHOULD EVENT QUEUE ROW BE KEPT ?

    Once all the Sitecore instances have processed given event, it becomes useless.
    It make sense to remove event queue rows shortly after they were processed to make SQL Server life simpler.

    WHAT PROBLEMS COULD BE CAUSED BY EQ?

    In case events are fired faster than being processed, a delay in showing new content can be introduced.

    In case a lot of already processed events are stored in EventQueue table, the SQL performance would drop, and processing of new events would occur slower.

    In next chapter I will show basic ways of troubleshooting kinda issues.