Thursday, 28 January 2016

Event Queues Part Duo

IMPROVING EVENT QUEUE PROCESSING SPEED

The number one problem is high number of records in Event Queue table. As we already know once the event is processed by all the Sitecore instances, it becomes useless and eligible to be removed.

If the number of records gets too high, cleanup procedure can fail with timeout exception, causing MSSQL to bend the knee.

Determine events by type statistics


Given SQL query shows 'hot' event types, and highlights priority optimizations:

SELECT MIN([Created]) AS Earliest,
MAX([Created]) AS latest,
DATEDIFF(day, MIN([Created]),MAX([Created])) as [Days between min and max],
COUNT(1) as TOTAL,
[EventType]
  FROM [EventQueue]
 GROUP BY [EventType]

The 'PropertyChangedRemoteEvent' usually is the winner for core database.
Lets see how can we reduce the received numbers.

Reducing number of events

Switching indexes property store to file system

Sitecore Content Search engine stores indexing metadata inside database property store.

Content Search needs a reliable storage to store metadata ( f.e. last indexed item ) to avoid re-indexing of already indexed data.

There is no need to share/sync the data between servers, thus no need to sync changes.

Using database properties could be costly in terms of performance in this case.
Sitecore Content Search may cause performance issues due of excessive updates of the EventQueue table gives a solution how to change property store.

The #420602 performance optimization  not to raise events in case code executed under EventDisabler) is addressed in 7.2 Update-5.

Logging to Sitecore interfaces with 'Remember Me' flag

In case 'Remember Me' flag is not selected during login to Sitecore interfaces, sliding expiration policy would be applied, and user ticket would be prolonged on every request.

Since Sitecore Client security mechanism should keep track/share information about simultaneously logged in users between servers, it has to use properties as well =\

Every ticket prolongation would update value inside properties table, thus provoke 'property changed' event.

A #443748 performance optimization has been introduced in CMS 7.2 Update 6, that noticeably reduces number of database property calls.

To sum up, always select 'Remember Me' flag.

Reducing number of publish operations

Every publish operation would produce 'publish:begin', 'publish:end' events, as well as publish 'languages' items.

Even though there are no actual content items to be published, a set of events would be produced, and system language items would be published.

If you are not going to add more languages to your solution, can comment out 'AddLanguagesToQueue' processor inside publish pipeline.

You can also consider to reducing the frequency of PublishingAgent executions.

Write less data into EventQueue

MSSQL would perform better if rows would have less data.

The #422510 optimization that allows to specify which changes are to be added into event data is available from CMS 7.2 Update 3.

In short - whenever item is saved, a list of modified fields with values is added into EventQueue.

If you update a field with 500 KB HTML text, it would be serialized and forwarded to database.

Needless to say that database server would appreciate if we could put less data.

I will create an article to describe in more details HowTo configure the configuration.

Aggressive cleanup policy

Stock 'CleanupEventQueue' agent was improved to perform cleanup in more aggressive way starting from 7.2 Update 4 (#392673).

For prior CMS versions once can use reworked stock cleanup task

How to pick optimal interval


The interval to keep should be more than longest running operation that uses EventQueue ( f.e. onPublishEnd indexing strategy )

F.e. Content indexes would be populated with freshly published data when publish:end is raised.

One should keep EventQueue rows produced by the publishing until indexing on all servers is over.

Wednesday, 27 January 2016

How to share information between Sitecore Instances

What is the challenge ?

Lets say one wants to have a set of Sitecore instances that should modify/update same data(key-value pair) (f.e. number of people who have filled form, last task execution time, and so on).
A storage must be persistent, so application recycle would not cause any data lost.
Storage must handle high load, in-memory caching is required.

Can we achieve that without customization?
Can we just use OOB Sitecore functionality?

Technical sketch

  • It make sense to create a table (key-value) inside database.
  • Caching layer must respect data changes, as well as have defined limit
  • All interactions should be done via database provider, so one can handle changing database engine, and create tests if needed.


How To inside Sitecore ?

Sitecore CMS has Database Properties mechanism encapsulates everything aforesaid:

Sitecore.Configuration.Factory.GetDatabase("core").Properties["customKey"]=value

This mechanism is used internally f.e. to maintain Sitecore Client User tickets, publishing and indexing metadata.

Database Properties logic is inside 'Sitecore.Data.DataProviders.Sql.SqlDataProvider' class ( 'GetPropertyCore', ' SetPropertyCore', and  'RemovePropertyCore' methods ), so please feel free to check exact implementation via any reverse-engineering tool.


Implementation details

Each Sitecore database has 'Properties' table that represents key-value storage:

Interactions with database are done through DataProvider (defined in web.config under dataProviders node):

'Database property changed' event is added into EventQueue once property value is changed. As a result other Sitecore Instances eliminate modified property from cache, and would reload it from database directly on next call. Given SQL could be used to check what is written into EQ:
        •  SELECT * FROM [EventQueue] WHERE [InstanceType] LIKE '%PropertyChangedRemoteEvent%'
Property cache size is controlled by hidden 'Caching.DefaultPropertyCacheSize' setting, and equals to 500KB by default:

Summarize

Sitecore provides key-value storage synchronized across all instances out of the box.
The price to forward modification from one instance to other is an extra row in EventQueue.
Frequent properties modification could produce a large amount of EventQueue entries, so please use the feature wisely. 

Monday, 7 December 2015

Event Queues Part Uno

WHY SHOULD I KEEP AS LESS RECORDS AS POSSIBLE

CMS Tuning Guide [2.3 Chapter] recommends to keep less than 1000 records in Event Queue in ideal world.

The more records exist in EventQueue table, the more time and resources it takes for SQL Server to find fresh rows raised by other instances.

Taking into account that the aforementioned SQL would be executed every 2 seconds by every Sitecore instance, keeping obsolete entries in database could lead to MSSQL overheat.

A symptom would be SQL Timeout errors in Sitecore Logs:

Message: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

Source: Sitecore.Kernel
   at Sitecore.Data.DataProviders.Sql.DataProviderCommand.ExecuteReader()
.....
   at Sitecore.Eventing.EventQueue.ProcessEvents(Action`2 handler)
   at Sitecore.Eventing.EventProvider.RaiseQueuedEvents()

WHO CLEANS UP ?

Sitecore ships with Sitecore.Tasks.CleanupEventQueue agent defined in web.config.

It can remove one day old records.

A more aggressive cleanup policy was implemented in 7.2 Update-4 [ref. #392673]. It allows to specify number of minutes to keep. Yippee!

Although one can always configure custom SQL Agent to manually cleanup Event Queue, but Sitecore stock mechanism does its job pretty well.

AM I AFFECTED?

A handy SQL Query would help to answer the question:

SELECT SUBSTRING(p.[Key],9,100) AS [Instance], CONVERT(BINARY(8), CAST(CAST(p.[Value] AS NVARCHAR(10)) AS int )) AS [LastProcessedStamp],
(SELECT COUNT(*) FROM [EventQueue] WHERE [Stamp] > CONVERT(INT, CAST(p.[Value] AS NVARCHAR(8)))) AS [TODO],
(CASE WHEN (q.[Created] is null) THEN
(
CONVERT(VARCHAR(24),(SELECT MAX([Created])-MIN([Created]) FROM EventQueue),20)
)
ELSE
CONVERT(VARCHAR(24),(SELECT top(1) [Created] AS TopCreated FROM EventQueue order by [Stamp] desc) - (q.[Created]),20)
end) AS [ProcessingDelay],
 SUBSTRING(q.[EventType],0, CHARINDEX(',',q.[EventType])) AS [LastEventType],
 q.[InstanceName] as [RaisedByInstance],
 q.[UserName] as [RaisedByUser],
 q.[Created] as [RaisedTime],
 q.[InstanceData] as [LastEventData],
 q.[Id] as [LastEqID]

FROM  Properties p 
FULL join EventQueue q
ON q.[Stamp] = CONVERT(BINARY(8), CAST(CAST(p.[Value] AS NVARCHAR(10)) AS int ))
WHERE p.[Key] LIKE 'EQStamp%'
order by TODO

Sitecore keeps last processed event stamp in Properties table as number in decimal, whereas SQL stores stamp as TimeStamp (shown as hex)

Given SQL query would try to find last processed event details, and number of events that were created later ( pending events ).

ProcessingDelay column is a good measure showing how much time would it take for a published item to appear in live site. It is calculated as timespan between now, and last processed event creation time.

WHY COULD DELAY APPEAR ?

Common sense rule - events are raised faster then processed.

Publishing, or content editing can be performed in a few threads, whereas events processing is done in a single Heartbeat thread.


It could also take some time to find & remove modified item entry from large Sitecore caches that are accessed by many threads simultaneously.

It is expected that some time could be required for Sitecore servers to process all the events during massive publish operations.


HOW TO FIGHT BACK ?

Raise less events, write less data, stop storing legacy data in EventQueue.

In next articles I will show basic techniques on how to eliminate processing delays.

Wednesday, 2 December 2015

Sitecore Heartbeat


What is Heartbeat ?

Heartbeat is vital Sitecore CMS timer, that enables subscribed code execution to be executed periodically.


Whenever it rings, a list of due subscribers is built & invoked one-by-one.

If any of subscribers takes ages to be executed, all other has nothing to do but wait.


By whom is it used ?

Usages via iLSpy
Core CMS features rely on the mechanism:

  • Indexing
  • Publishing
  • Sitecore Jobs
  • Event Queue processing

However custom code is welcomed to use it as well. You can either subscribe on static 'Sitecore.Services.Heartbeat.Beat' event, or build a new instance of
'Sitecore.Services.AlarmClock' class, which internally uses Beat Event as well.

How is it implemented?


A thread named "Heartbeat" is created during CMS start. You might have noticed that some Sitecore log entries start with 'Heartbeat' - those were raised by background timer.


It would do following in forever loop:

  1. Sleep a few seconds ( specified by 'HeartbeatInterval' setting )
  2. Get all subscribers for 'Heartbeat.Beat' event
  3. Invoke them one-by-one

Possible pitfalls

If execution of a subscriber takes a while, all other subscribers must wait.

As a result CMS vital mechanisms ( Event Queues, Indexing ) would need to wait for invocation as well =\

You can observe the situation by filtering Sitecore Logs via 'Heartbeat' text filter -> gaps might appear in time panel.

Thus if you tend to initiate a new 'AlarmClock' that would execute heavy logic, please don`t do it.

Consider creating a Sitecore Job instead.

Tuesday, 1 December 2015

Event Queues. Why ? How ? When ?


CMS documentation:
Event Queues [ aka EQs ] are needed to ensure data and cache coherence and support communication between Sitecore instances.

In simple words - all the data changes are there.

No matter if you do publishing, content editing, or any other modifications, Sitecore will record those changes into Event Queue, and spread them among other instances.

WHY? 

Use-case:
  • Andy visits panda site page. He sees 'Panda in zoo' article title
  • Sitecore renders the requested page:
        • fetches needed items from DB
        • puts them to cache




  • If Bob visits same panda page, Sitecore would NOT read data from database, as all the needed info in cache!

  • So far good. But here comes the content change:
    • Content author Leo modifies items for panda page, so that article title is 'Panda is Free'
    • Leo publishes changes, and data gets modified in web database
    • Leo goes to live site and requests panda page
    • He sees that 'Panda is Free' now
    How come ? Sitecore cached all the needed item data to built page. As a result Bob`s request was served without a single SQL query fired. . .
    However Leo saw modified content, not cached one.


    Event queue mechanism allows to apply changes from one instance to others.
    Live server had processed changes introduced by publishing instance, and had removed obsolete data from cache before Leo request in our example.

    HOW? WHEN?

    Each Sitecore instance would read fresh events raised by other instances and apply them locally every 2 seconds by default.

    All Sitecore content databases have their own 'EventQueue' table which stores following data:

    • InstanceData - huggeee JSON document with Event details !
    • EventType (f.e. item saved, property changed, publish ended)
    • Who is responsible for doing that ( UserName )
    • Which Sitecore instance did that ( InstanceName )
    • When was is done ( Created )
    • Stamp ( monotonously increasing value )

    The last processed stamp by previous iteration is stored in Properties table (key-value storage) with EQStamp_InstanceName ( f.e. EQSTAMP_WS-NMI2-SC72REV140526 ) as plain number (f.e. 566174).

    Sitecore CMS would read fresh events in similar SQL query:

    SELECT * FROM EventQueue WHERE InstanceName!=CurrentInstance AND STAMP>lastProcessed

    HOW LONG SHOULD EVENT QUEUE ROW BE KEPT ?

    Once all the Sitecore instances have processed given event, it becomes useless.
    It make sense to remove event queue rows shortly after they were processed to make SQL Server life simpler.

    WHAT PROBLEMS COULD BE CAUSED BY EQ?

    In case events are fired faster than being processed, a delay in showing new content can be introduced.

    In case a lot of already processed events are stored in EventQueue table, the SQL performance would drop, and processing of new events would occur slower.

    In next chapter I will show basic ways of troubleshooting kinda issues.