Dear Experts,
We are currently facing some challenges/issues on our production environment and thus need some guidance on the same.
Issue:
Our Microsoft CRM 2011 web application goes down at a specific time frame between 11 AM - 3 PM and happens only for 2-3 minutes and eventually comes up on its own.
Analysis:
1. As per the analysis done so far, we see a Query getting logged as Warning every few minutes. Query is as below
Query execution time of 22.4 seconds exceeded the threshold of 10 seconds. Thread: 361; Database: ORG_MSCRM; Server:SomeName\inst05; Query: select
DISTINCT top 51 "am_accountreference0".am_building as "am_building"
XXXXXXX
(("am_accountreference0".statecode = 0 and "am_accountreference0".am_postcode like 'g72%')) order by
"am_accountreference0".am_addressline1 asc
, "am_accountreference0".am_accountreferenceId asc.
2. We have seen Current Connections reaching 2500 [as seen in PerfMon] when CRM goes slow and eventually goes down on one of the servers.
Architecture
We have 2 Front End Servers 2 Application Servers and database is in Clustered Mode. We have approximately 6000 users using the application. This is a Microsoft CCA based application and thus if CRM goes down, CCA application also goes down resulting in Desktop
fatal errors.
Actions Taken
We have so far taken following actions in order to resolve but have not got much success.
CRMAppPool recycle Config changed from regular interval of 29 hours to mid night out of hours
CRMAppPool recycle Config changes to new server P0004 similar to P0001 and P0002
One New Front End Server Addition to Load Balancer
Change the CRMAppPool Worker processed from 2 to default 1 and shift the CRMAppPool recycle by 1 hour towards 1 AM
Test the CRMDiagnostics and other tools suggested by Microsoft on test environments in readiness to apply in live
Change the CRMAppPool config to set overlap the worker processes. This will allow seamless transformation of worker processes with overlap when the existing worker process is recycled
Further Course of Actions Planned
As of now WCF Compression is not enabled on the Production boxes. We are planning to enable this AEAP
Registry settings OLEDBTimeout and ExtendedTimeout is missing on production Boxes. Plan is to create a entry for OLEDBTimeout and set the value as 120 sec. Is this recommended ?
We have read an article that says to set State as 1 in ScaleGroupOrganizationMaintenanceJobs table of Config database for OpertaionType 30. As of now State is 0 on production box. Do we need to update this to 1 ?
http://social.microsoft.com/Forums/en-US/b13aa3fe-7fa4-4a9f-be9b-4410c9fd1c7d/async-service-running-shrinkdatabase-command?forum=crmdeployment
Request if we can get further guidance on the issue and whether we should go for above mentioned further planned actions. Our main challenge as of now is to get system stable and also eradicate the warning message logged for long running queries [top 51].
Many Thanks in Advance !!