Friday 18 March 2016

Profiling Sitecore CMS

Sitecore CMS fetches information/items from underlying database, and caches it locally to speed up further reads, thus the overall system performance depends on:
  1. Database Performance
  2. Sitecore Cache Sizes
  3. CPU power ( how fast can processor find data in caches )
  4. The volume of data processed by target operation
In this chapter we`ll take a look on needed steps to find the cause of CMS slowness in the particular case.

Scenario

Lets` take a look on the scenario 'Workbox' application loading is slow.
Content editors complain that it takes a few years to open a Workbox. Since their life is too short, they cannot wait for that long =\
We are going to profile .NET Sitecore code to find which code takes too much of Content Editors life.

Action plan

This are general steops we are going to do:
  • Configure DotTrace Performance Profiler Tracing Session
  • Configure a data collector set in performance monitor 
  • Capture a full memory dump file of the Sitecore process before the test ( via Task Manager )
  • Start data collector set captuing & DotTrace Profile
  • Reproduce an issue ( Opening workbox )
  • Get Snapshot'n'Wait
  • Stop recording performance counters
  • Capture a full memory dump file of the Sitecore process after the test

Configuring DotTrace Profiler

Lets select 'Local -> IIS -> Tracing mode' in the demo.

DotTrace configuration

Important

  1. 'Start immediately' MUST be UNCHECKED.
  2. The shorter profile capturing session is, the less noise from other Sitecore activities would be captured -> the more accurate data we get
  3. A same scenario is to be captured at least 2 times in a different profiles :
    1. First call would hit 'cold' caches without data so it is expected to be quite slow =\
    2. Second call would hit caches with needed data, and is expected to be faster

Configure a data collector set

Extended counter list
The general steps to create a data collector set manually can be found in 'Create a Data Collector Set Manually' article.

We are about to select following counters:
  • Processor Information (_Total) \ Processor Time - see the overall CPU load by all processes on the target machine
  • Process(w3wp):
    • % Processor Time - how much CPU is used by Sitecore Process
    • Virtual Bytes 
    • Thread Count
    • % User Time
Those are main ones for now, however you can select more counters for better vision ( as per image)

Preparing for a test

Login to Sitecore Shell using test user. When user is logged in to Sitecore, capture a full memory snapshot of w3wp.exe process. We will later use it to :
  • Measure initial Sitecore Cache Size state
  • Find runing/scheduled Sitecore Jobs
  • Find most CPU-consuming threads
  • Objects by type distribution 

Starting profiling


The video shows exact steps needed:


Things to remember

  • It is important to start profiling immediately before reproducing an issue
  • Stop profiling immediately after the reproduction
  • A same scenario is to be profiled at least 2 times ( with cold/empty and filled caches )

Finishing test

After the testing session is over, lets collect more data:
  1. Another full memory dump file. It will be used later to compare with base one
  2. Following details via Sitecore Support Package Generator:
    1. Sitecore logs that cover the session
    2. Sitecore configuration files
    3. .NET framework files — required for working with process memory dumps
    4. Machine's hardware configuration
    5. Windows event logs

Results

The workbox opened in ~15 seconds, whereas a vast majority of time spend for resolving item security:
High number of operations is detected

The security resolution for a single item takes less than 3 ms, whereas overall amount of time is caused by huge number of security resolutions.

What can be done

Improving server hardware

  • Data collector set shows that memory usage is low, and does not hit number of physical RAM installed, thus improving server RAM would not have desired affect
  • Improve CPU
    • Data collector does not show high CPU =\
    • Lets assume more powerful CPU can reduce operation time from 3 ms to 2.5, so instead of 15 seconds, we will get 12.5 seconds. This is not the affect we are looking for.
  • Give more processor time to a thread
    • We can try to reduce the number of threads in Sitecore App, however in this scenario the number of threads is low, and will unlikely bring the desired affect

Reducing the number of security resolutions

Since the timing per one security resolution looks low, we can focus on reducing number of security resolutions. A possible option would be to rewrite SQL query to return only first 100 items ordered by date in workflow.

In next chapters we will take a closer look on analyzing memory dump files, and checking the cache usage statistics & distribution of objects by type.

No comments:

Post a Comment