In the past year, I have spent quite a lot of my time playing and speaking about Distributed Replay – a feature inside SQL Server 2012 and above that almost no one talks about, but at the same time a feature that can be a great help to you in case you want to:
- Test application compatibility – imagine that you will be migrating or upgrading to a newer version of SQL Server – you definitely want to know whether or not your app will continue to work once you make the switch to the new version of the product.
- Do a performance or stress test – you want to see how will your application work with more hardware thrown at it or with just a few more indexes.
- Forecast – wouldn’t it be great if you can see whether or not the current or the new server will be able to cope with twice as much users?
In case any of those three above are important to you or you are directly responsible for any of them(do I have to say all of them, DBAs?), you can find a great blog post on how to set up a Distributed Replay environment here and start playing with it. The idea of this blog post, though, is to give you some more information(and I will continue to update this blog post) about the technology by providing you the answers to all of the questions I was asked whenever I presented this feature at SQLSaturday Sofia, SQLSaturday Kharkov, SQLSaturday Slovenia and internally for my teammates. Let’s get started!
How to capture my application workload? – by using the Profiler’s predefined template “TSQL Replay”. This template contains the minimum amount of events that Distributed Replay “needs” in order for it to be able to do the actual replay. If you exclude any of them, you will not be able to preprocess your trace file/s. However, don’t forget to not use Profiler’s GUI, but a server-side trace!
Can I add more events to the trace file? – yes, you can. Those, however, will be removed during the preprocessing phase because Distributed Replay does not need them for the replay.
What transport protocol is Distributed Replay using in order to replay my workload? – TCP/IP.
So Distributed Replay is using Profiler traces. Wasn’t Profiler deprecated? – Yes, it is. From what I heard though, Microsoft will make DReplay work with Extended Events in the upcoming releases.
And can I use DReplay to replay my workload against SQL Azure Database? – No, because Distributed Replay only works with Windows Authentication and SQL Azure Database(or whatever the name of this service will be at the moment you are reading this) is working only with SQL Authentication(for now). This does not mean that your app and the workload that you capture cannot be generated by a SQL Login. It just means that when you fire up Distributed Replay and specify the target SQL Server instance, your Distributed Clients can only authenticate by using Windows authentication and that is not supported in the “cloud” world.
What will happen if Microsoft decides to kill this feature as it’s obviously not that famous? – From what MSFT is saying, DReplay will be The Feature for replaying mission critical workloads for the future, so I think we should not worry about this at the current moment.
Why is there no GUI? – I am also asking the same! There is a GUI that the guys from SolidQ built some time ago, but I would not use that one for any production work. However, I was told that a GUI is coming with the future releases, so we have to wait and see…
What other tools can we use for replaying our workload? – Profiler and RML Utilities
Why shouldn’t I use any other technology for replaying my app workload, e.g., RML Utilities or just the built-in replay functionality inside Profiler? – DReplay is especially powerful whenever you want to replay a huge trace file meaning 40, 50 or even more GB of trace. If you use any of the other tools for that – you will face some problems. The tools itself will become bottleneck. In addition to that, there are quite interesting parameters that you can play around with in DReplay that are not available in both SQL Profiler and RML Utilities.
What edition of SQL Server should I use in order to be able to replay my workload? – almost all of the editions of SQL Server 2012 support Distributed Replay. However, if you want to use its full capabilities(up to 16 clients), you should install the Distributed Replay controller from an Enterprise media(Developer is not working). Otherwise you will be limited to just 1 Distributed Replay Client.
What are the versions of SQL Server we can use Distributed Replay with? – As a minimum you can capture your workload from SQL Server 2005 and replay it to SQL Server 2008. SQL Server 2000 is not supported what so ever. If you capture your workload from a SQL Server 2008, you can replay it on another SQL Server 2008 or higher, but you can’t go backwards and replay it against an instance of 2005. All other scenarios are also supported – from 2008 -> R2, from 2008 to 2012, etc.
Can I replay against a database snapshot of this database? – Yes, you can. However you should make sure that the snapshot has the name of the database that you captured the trace from. Otherwise it will not be able to find a database with that name. So a possible approach is to rename the database on the target instance and then create a snapshot of it with the name that the replay expects.
Can I replay against a “virtual” database that was created by using 3rd party tool like ApexSQL Restore? – Yes, you can.
What’s the difference between the 2 possible replay modes – stress and synchronisation? – When you issue a replay by using the “synchronisation” mode, the controller makes sure that all the events are replayed in the same order(same event sequence) as they were captured. When you choose “stress” mode though the controller’s only sequence it follows is the one you specified for the StressScaleGranularity parameter. So if you configured this parameter to “SPID” the controller will follow the event sequence in each SPID that it replays, but not the event sequence of the whole trace.
Is the workload actually multiplied when it’s being distributed to the clients or is it actually divided? – The workload is never multiplied when the replay is issued with Distributed Replay. It is being divided among the clients and the controller tries to divide the captured workload almost equally between all available clients in the distributed environment. How the workload will be divided depends mostly on the StressScaleGranularity parameter in the DReplay.Exe.Replay configuration file that you can find on your DRController system. This parameter can accept two possible values – SPID or connection. If you choose SPID it will try to divide the workload among all the clients you have installed based on SPIDs and how much work each of them is “doing”. So if you have 6 sessions that you want to replay and you have 3 clients and all of those 6 sessions have around the same number of events to be replayed, the DReplay controller will go ahead and divide your workload so that the 3 clients will have to replay each the events from 2 sessions. The same applies if you enable the “connection” as parameter, but this time it will be 2 connections per client.
Is there a way to multiply the workload? – No, at least not with DReplay for now. You can do this with RML Utilities by using the “-n” parameter. With Distributed Replay you just replay the trace file you have.
How much space are the Controller and each of the Clients taking? – The Controller takes about 160 MB and each of the Clients will cost you around 2GB.
Can I have a DRController and DRClient on a single box? – Yes, you can.
Can I have more than one DRController or DRClient on the same box? – No. One controller per machine and one client per machine.
Can I issue more than one replay at the same time? – No, a controller cannot handle more than 1 replay at the same time. Even if you schedule one of the replays to happen on two of the four Distributed Replay Clients (imagine you have installed 4 clients for a moment) and you configure the other replay to work with the other 2 clients that are actually not doing anything at the moment, you will still get an error “saying” that the DReplay Controller is busy.
With Distributed Replay am I actually changing the data on the server that I replay against meaning if I have DML statements are they going to be committed? – Yes
Can I skip the preprocessing phase and just replay the .trc files directly via Distributed Replay? – No. There is no way you can skip this phase and still be able to use the replay functionality.
Can I use Extended Events to capture and replay my application workload – partially yes. That means that you can convert your trace to Extend Events session, but you cannot use the captured workload with DReplay to actually replay it. Explanation why that’s true is in Jonathan’s blog post. I highly recommend you read also the comment section of it.
Can I use Resource Governor to limit the replay session? – Yes, you can. However, you have to write your classifier function to “monitor” for a session started by the login of the DReplay Client and not by you and your account respectively.
Let me know if you have any other questions, comments or ideas that we should test and I will be happy to blog about them. Stay tuned as I will update this blog post with new findings around Distributed Replay.