Friday 18 March 2016

Profiling Sitecore CMS

Sitecore CMS fetches information/items from underlying database, and caches it locally to speed up further reads, thus the overall system performance depends on:
  1. Database Performance
  2. Sitecore Cache Sizes
  3. CPU power ( how fast can processor find data in caches )
  4. The volume of data processed by target operation
In this chapter we`ll take a look on needed steps to find the cause of CMS slowness in the particular case.

Scenario

Lets` take a look on the scenario 'Workbox' application loading is slow.
Content editors complain that it takes a few years to open a Workbox. Since their life is too short, they cannot wait for that long =\
We are going to profile .NET Sitecore code to find which code takes too much of Content Editors life.

Action plan

This are general steops we are going to do:
  • Configure DotTrace Performance Profiler Tracing Session
  • Configure a data collector set in performance monitor 
  • Capture a full memory dump file of the Sitecore process before the test ( via Task Manager )
  • Start data collector set captuing & DotTrace Profile
  • Reproduce an issue ( Opening workbox )
  • Get Snapshot'n'Wait
  • Stop recording performance counters
  • Capture a full memory dump file of the Sitecore process after the test

Configuring DotTrace Profiler

Lets select 'Local -> IIS -> Tracing mode' in the demo.

DotTrace configuration

Important

  1. 'Start immediately' MUST be UNCHECKED.
  2. The shorter profile capturing session is, the less noise from other Sitecore activities would be captured -> the more accurate data we get
  3. A same scenario is to be captured at least 2 times in a different profiles :
    1. First call would hit 'cold' caches without data so it is expected to be quite slow =\
    2. Second call would hit caches with needed data, and is expected to be faster

Configure a data collector set

Extended counter list
The general steps to create a data collector set manually can be found in 'Create a Data Collector Set Manually' article.

We are about to select following counters:
  • Processor Information (_Total) \ Processor Time - see the overall CPU load by all processes on the target machine
  • Process(w3wp):
    • % Processor Time - how much CPU is used by Sitecore Process
    • Virtual Bytes 
    • Thread Count
    • % User Time
Those are main ones for now, however you can select more counters for better vision ( as per image)

Preparing for a test

Login to Sitecore Shell using test user. When user is logged in to Sitecore, capture a full memory snapshot of w3wp.exe process. We will later use it to :
  • Measure initial Sitecore Cache Size state
  • Find runing/scheduled Sitecore Jobs
  • Find most CPU-consuming threads
  • Objects by type distribution 

Starting profiling


The video shows exact steps needed:


Things to remember

  • It is important to start profiling immediately before reproducing an issue
  • Stop profiling immediately after the reproduction
  • A same scenario is to be profiled at least 2 times ( with cold/empty and filled caches )

Finishing test

After the testing session is over, lets collect more data:
  1. Another full memory dump file. It will be used later to compare with base one
  2. Following details via Sitecore Support Package Generator:
    1. Sitecore logs that cover the session
    2. Sitecore configuration files
    3. .NET framework files — required for working with process memory dumps
    4. Machine's hardware configuration
    5. Windows event logs

Results

The workbox opened in ~15 seconds, whereas a vast majority of time spend for resolving item security:
High number of operations is detected

The security resolution for a single item takes less than 3 ms, whereas overall amount of time is caused by huge number of security resolutions.

What can be done

Improving server hardware

  • Data collector set shows that memory usage is low, and does not hit number of physical RAM installed, thus improving server RAM would not have desired affect
  • Improve CPU
    • Data collector does not show high CPU =\
    • Lets assume more powerful CPU can reduce operation time from 3 ms to 2.5, so instead of 15 seconds, we will get 12.5 seconds. This is not the affect we are looking for.
  • Give more processor time to a thread
    • We can try to reduce the number of threads in Sitecore App, however in this scenario the number of threads is low, and will unlikely bring the desired affect

Reducing the number of security resolutions

Since the timing per one security resolution looks low, we can focus on reducing number of security resolutions. A possible option would be to rewrite SQL query to return only first 100 items ordered by date in workflow.

In next chapters we will take a closer look on analyzing memory dump files, and checking the cache usage statistics & distribution of objects by type.

Monday 7 March 2016

Profiling item fetch database performance


Any schema-change decision must be driven by data/statistics analysis.
One needs to test same functionality before, and after implementing the suggested change to see the real outcome.

A SQL Server Profiler provides statistics on how much resources were spent to process the SQL query.

An analysis of the statistics would actually show if any improvements were introduced, and if it is worth implementing the change.

Test scenario

  1. Create database backup
  2. Rebuild SQL Indexes to ensure low fragmentation rates
  3. Use stock Sitecore Testing DB performance page to collect the baseline first
  4. Implement database schema changes
  5. Ensure that SQL indexes are in a healthy state
  6. Restart SQL, and Sitecore to clean all cache layers
  7. Use stock Sitecore Testing DB performance page to collect second profile

Configuring Profiler

The SQL Server Profiler is a super-powerful tool that allows to capture dozen of various metrics.
We`ll use only base ones ( CPU, Reads, Writes, and Duration) in this session, though.
  1. Start SQL Server Profiler application
  2. Select 'New Trace' ( CTRL+N hotkey )
  3. Open 'Event Selection' tab and apply filters as per screenshot
  4. Ensure SQL Indexes on Items, and Field-related tables are in a healthy state, and start the test
Profiler configuration
After the test is finished stop tracing, and save performance trace as a table:

Once data is saved, implement needed database changes, restart both Sitecore, and MSSQL Server.

Now we can repeat same set of operations, and record a new profile.

Comparing resulting profiles

A handly SQL query can be used to get aggregated statistics:

SELECT 'Optimized' as 'Where', 
      count(1) as TotalUp
      ,sum(op.[CPU]) as TotalCPU
      ,sum(op.[Reads]) as TotalReads
 FROM [master].[dbo].[big_Optimized] as op
 WHERE op.CPU IS NOT NULL

The results in my local environment are:
  • 35% less reads
  • 25 % less CPU
Please keep in mind that test results much depend on variety of data, and custom logic implemented in solution.

An end decision must be made after a few rounds of testing of following functionality:
  • Publishing
  • Live site navigation
  • Content Editing
The best performance optimization is to cache as much data inside Sitecore Caches as possible, thereby minimizing the number of requests to database.

We`ll cover Sitecore Caches configuration in next article

Sunday 6 March 2016

Speeding up item loading in Sitecore

Execution plan analysis

An execution plan for selecting an item by ID would have:
  1. IndexSeek for field-related tables, and further RID lookup ( or DAC_Clustered Key lookup).
  2. An ugly execution plan =\
  3. A long SQL statement ( at least 1000 characters )
  4. A sort operation (could be moved to Sitecore) costs ~10 %

Taking into account:
  1. Sitecore fetches items by ID only ( neither language, nor version is applied as a filter )
  2. Sitecore issues far more SQL read than write statements
  3. Relational database resources are typically more expensive in terms of scalability
There seems to be some room for improvements.

Possible improvements

  • Changing index types to clustered for Unversioned, Versioned, and Shared fields tables
    • NonClustered Index Seek with Key lookup inner join would be replaced with clustered index seek
    • Execution plan becomes much straightforward
  • Reworking items table indexes
    • Making ParentID index clustered to order data
    • OR making ID clustered index to enable fastest item fetch
  • Moving sort operation to Sitecore side
  • Using View instead of firing long ugly SQL all the time
Query plan after modifying indexes

Concerns

  • Inserts would take more time
    • The difference in single item save operation would not be noticeable for Content Editors, since a vast majority of total item save time is spend in 'item:saved' event handlers.
  • Publishing would be slower
    • Publish reads an item from both source and target databases first before making a decision whether to insert/update/remove item version. 
    • Having much faster reads could even improve the overall publishing speed.

Conclusion

A series of tests are required to get the understanding if it is worth to introduce those improvements in particular case.

In the next chapter we`ll take a look on basic technique how to perform SQL Profiling of Sitecore GetItem opearition.

How does Sitecore read item data ?

Where is content stored ?

All the item content/data is stored inside Shared, Unversioned, and Versioned field tables depending on field sharing type.
Items table has metadata about an item (IDnameparentIDtemplate).
Sitecore unions item metadata with actual fields content to construct an item instance.

How an item is read ?

'Sitecore.Data.SqlServer.SqlServerDataProvider.LoadItemDefinitions' method is responsible fetching data.
It raises a SQL query that unions aforementioned tables by configurable ItemID condition:
Item-related tables
SELECT [ItemId], [Order], [Version], [Language], [Name], [Value], [FieldId], [MasterID], [ParentID]
FROM (
SELECT [Id] as [ItemId], 0 as [Order], 0 as [Version], '' as [Language], [Name], '' as [Value], [TemplateID] as [FieldId], [MasterID], [ParentID]   FROM [Items] {A}
UNION ALL
SELECT [ParentId] as [ItemId], 1 as [Order], 0 as [Version], '' as [Language], NULL as [Name], '', NULL, NULL, [Id]   FROM [Items] {B} 
UNION ALL
SELECT [ItemId], 2 as [Order], 0 AS [Version], '' as [Language], NULL as [Name], [Value], [FieldId], NULL, NULL   FROM [SharedFields] {C}
UNION ALL
SELECT [ItemId], 2 as [Order], 0 AS [Version],       [Language], NULL as [Name], [Value], [FieldId], NULL, NULL   FROM [UnversionedFields] {C}
UNION ALL
SELECT [ItemId], 2 as [Order],      [Version],       [Language], NULL as [Name], [Value], [FieldId], NULL, NULL FROM [VersionedFields] {C})
as temp
WHERE [ItemId] IN (ItemID condition {D} ) ORDER BY [ItemId], [Order] ASC, [Language] DESC, [Version] DESC

Description

Black part {D} sets a condition to fetch items.
Examples:

  • ItemID = GUID in case single item fetch
  • ParentID=GUID in case loading item children
  • TemplateID in GUIDs in case prefetching items by template on CMS start
Red statement {A} selects data about currently read item

Yellow part {B} reads child IDs for the item
Blue part {C} gets item fields itself
Green part orders read rows by itemID, and orders them by Order column:

Rows order

Notes

  1. A whole item is read at once ( all versions in all languages ) from database when requesting one item version. It is highly NOT recommended to have many item versions in database ( Details in 6.2.3 chapter of CMS Performance Tuning Guide )
  2. Children IDs are read during item fetch as well
  3. The Sitecore performance is much determined by performance of the SQL query
In the next chapter we will take a closer look on how to improve the performance of the query.