Last week one of our customers called and said that their app was working slow! I took on this as I was curious what was going on and because I felt like this is going to be an interesting one. You know how it is with performance tuning – there is always something that can surprise you!
The DB server was running with 16 CPUs, 32 gigs of memory, 32bit OS and “holds” 5 SQL Server instances(I know you think PROD and TEST environments cannot be on one physical server – think again). No hardware, OS or driver changes in the last month, so I said to myself – let’s take a look at some performance monitor counters and this is what I saw for our problematic PROD instance:
That’s what I was talking you about – I was sure that I will see something I have never even seen. Anyway, I jumped inside SSMS just to see that the instance was configured with 4GB of Max Server Memory setting, but with no AWE enabled for some reason and so it was actually able to work with just a bit more than 1.5GB of memory. Great! 1.5 gigs for the PROD environment that was holding 4 databases with total size of 300GB – no wonder we have PLE of 2 or 3 almost all the time. This is a very busy system, I have to mention.
So this was my first finding. I will skip some of my other findings for the other instances. For now, it is important for you to know that I was planning to reconfigure the memory used by all instances so to give enough of the 32 gigs to all of them(the server was using just 12 of its 32 gigs for some reason). Did that – now the server was utilising all of the memory that it had – 6 gigs were just left for the OS and other apps. However, after the memory reconfiguration all of the instances were working just fine except for our friend – the important PROD instance. What was really, really interesting to me was to see that after I gave it 10 gigs as Max Server Memory(and it was using them) our PLE was again 2 or 3. So again – 10x memory, but the same PLE. Now, I got even more curious.
After a bit more investigation, I captured animals like this one below:
This animal in particular though was even doing a Table Scan on a table that was 95GB in size, thus causing 89% of the execution plan cost. At that moment I thought I figured it out – I just need to do some proper indexing on this table and everything will be just fine. Noooooot that fast! It turned out that(I am skipping some investigation here) in all of the 4 databases that the app was working with every single table had no indexes what so ever – not even non-clustered ones! So we were actually doing table scans every…single…time!
That’s the end of my story actually. I am now waiting for the developers to contact me and work together with them on fixing this(I will first ask them some questions though). I will be tweeting about this, so you can follow the progress there! Until then – enjoy every single time you see the Table Scan Operator and remember – it can always be worse!
I am out.