Record and Playback – Does it really work ?
Many tools like LoadRunner and JMeter that help develop load tests provide a simple record and playback mechanism. They either use a proxy server or a browser plugin. All you do is traverse the web application as a normal user would. Your interactions with the application are captured and used to create playback script/code. Voila ! You have a test case. Run the required number of emulated users, each executing this script and your workload is ready. Or … is it really ?
If all your users act like a linear computer program executing at a fixed pace, your recorded script may work. But the truth is human beings rarely follow a single path, let alone follow it in a predetermined time. Your users will make one of the many choices available to them in your site, at the pace they desire.
Two factors need to be taken into account when modeling user behavior:
- The decision tree of which option to choose at any particular point (the Operation Mix).
- The time to follow through to the next operation (called the Timing).
The rest of this article will address the operation mix, data generation, and other issues involved in record and playback. As operation timing is a slightly independent topic by itself, it will be addressed in a different article.
Operation Mix
Tools differ in the way they create a workload from the recorded actions. The primary difference is in how they create an Operation Mix i.e the proportion of the various types of operations (aka requests) that the test makes.
- Fixed Sequence: This is the simplest method in which each emulated user simply submits the exact same sequence of recorded operations. It may be the simplest but obviously the most flawed as well. For a real application, seldom do users traverse the site in the exact same sequence. As such, this mix creates a very artificial workload.
- Flat Mix: In this method, the test developer identifies the types of operations (either during the record session by pausing between operations or editing the generated script). The workload then consists of randomly selecting a particular operation, assigning an equal probability to all of them. Some tools may go a step further and allow the probability to be changed (i.e operation1 executes 50% of the time, operation2 executes 20% of the time etc.) In either case, this method is extremely flawed because some of the generated sequences may make no sense at all from the application’s perspective. Websites are never navigated at random. In many sites, one needs to first login to perform certain operations. In other cases, it is necessary to follow a sequence for certain operations (e.g shopping cart -> checkout -> shipping options -> payment). As such this method completely fails to create a correct web workload.
- Flat Sequence Mix: Tools that use this method (both LoadRunner and JMeter do), will allow the user to record multiple use cases (referred to as scenarios). Each use case is then treated as a fixed sequence and the overall mix is created by specifying a different probability for each sequence. Both LoadRunner and JMeter use this method. Although this method is more realistic than the previous two, it can quickly become unwieldy as the number of scenarios increases. – the scenarios grow quadratically quickly exasperating the test developer.
The fact is that web application navigation is best represented by a state diagram and the best method to solve this navigation is by use of a stochastic model. This model is known as MatrixMix in Faban and is best created algorithmically – not by record and playback. An example of such a mix is given below. The first row states that if the user is currently on the home page, the probability of going to the products page is 80% and to the contacts page is 20%.
| From | To home.html | To products.html | To contact.html |
|---|---|---|---|
| home.html | 0% | 80% | 20% |
| products.html | 20% | 39% | 41% |
| contact.html | 60% | 19% | 21% |
Data Generation
Often, many web operations will require a variety of input data. The record-and-playback tools usually deal with this by having test developers edit the generated script to parameterize the input fields. The values for these fields are then read from files that the developer must somehow populate. For instance, if a user login name is required, the developer must create a file with all the login names that the workload must use (usually, by dumping the data out from the application’s database). Imagine what this process will be like if a site has millions of registered users. The workload must then choose one name for each emulated user. For other parameters, we may really want the workload to choose a different value for each operation executed (not just one per emulated user). These kinds of choices usually require some kind of coding – be it an XML (or other proprietary) script or coding in a programming language. (It’s interesting to note that although LoadRunner claims to use scripts, the code is actually C or Java and must in fact be compiled). It turns out that in many cases, this coding can be quite extensive, blowing away the so-called “no coding required” record-and-playback claims that the tool vendors make.
If a tool claims that no coding is required at all, be suspicious. It is very likely that it does not provide enough flexibility for data generation. Tools that use scripting may also not allow flexibility to manipulate data.
Also note that requiring all parameterized field values to be in files means the data cannot be programmatically generated.
The fact is that a well-designed workload requires a robust mechanism in order to both generate request data and process response data.
New Data Generation
So far we have only talked about input data for operations that retrieve known/existing data from the application’s data store. Most web2.0 sites allow a considerable amount of new data to be uploaded by users – whether they are new blog, wiki entries, comments or ratings, profile information, photos etc. How does a record-and-playback methodology work for this ? One cannot pull data from a database to pre-load a parameter file, so either these ‘Add’ operations will repeatedly use the same data (which can of course the application to fail if for example, the same username is entered twice) or the tool must provide for some way for the workload developer to specify how these parameters are to be generated. Note that different parameters may have different syntax and semantic requirements. If there is a load generator tool that can effectively generate new data without requiring programming, I’d like to know about it.
Workload Scaling
For a workload to be used for load testing or capacity planning purposes, it needs to be run at different load levels. This is achieved by using one or more scale factors by which both the initial data store and the load scales. Simply adding emulated users without due consideration to the data store will not create a proper workload. More on this topic with several examples of how real applications scale can be found in the paper, “Performance Workload Design“. Record and playbook tools have no mechanism to handle realistic scaling – one has to achieve this programmatically.
Non-web Workloads
This issue is obvious – record and playbacks can only work for web workloads where a proxy can be used to capture user interactions. Of course, the mechanism can work with any type of interactive application provided a “proxy” for the protocol used by the application is in place. LoadRunner does provide proxies for various protocols but it’s easy to see that this method can become pretty unwieldy quickly and results in product bloat.
It is better to find a tool that provides a good framework and code your own load generator for the specific protocol that you want to test. The process can be eased considerably if the framework understands various commonly used protocols and provides the ability to plugin other protocols as well.
Summary
To summarize, here are key points to remember while using a recording tool to generate a load test :
- Use a realistic mix of operations. No real user executes scenarios stepping through the same sequence of pages in exactly the same way.
- Ensure that the back-end data sources are exercised in the same way as in production. This means, not using a limited data-set that all emulated users share.
- Test creation/upload of new data to the application. This requires new, random data to be created during load generation.
Tags: benchmark, loadtesting, Workloads
December 6th, 2011 at 6:46 pm
I am not able to see the MatrixMix example you mentioned on the website. Looks like the table/Image is not visible. Thanks.
January 3rd, 2012 at 4:43 am
Sorry about that. We moved the blog to a different server causing some issues. I’ve fixed the table now.