Blog Posts from November, 2021

Lessons Learned in Finding Bugs

Thursday, November 18th, 2021

This story is put together from several parallel experiences over the last while that I’ve merged into one narrative. The pattern of experiences and epiphanies is the same, but as they used to say on TV, names and details have been changed to protect the innocent.

I was recently involved in testing an online shopping application. In the app, there’s a feature that sends notifications via email.

On the administration screen for setting up that feature, there are “Save” and “Cancel” buttons near the upper right corner. Those buttons are not mapped to any keys. The user must either click on them with the mouse, or tab to them and press Enter.

Below and to the left, there are some fields to configure some settings. Then, at the bottom left, there is a field in in which the user can enter a default email notification message.

Add a nice short string to that text field, and everything looks normal. Fill up that field, and the field starts to expand rightwards to accommodate the text. The element in which the text field is embedded expands rightwards too.

Add enough text (representing a perfectly plausible length for an email notification message) to the text field, and the field and its container expand rightwards far enough that they start to spill off the edge of the screen.

And here’s the kicker: all this starts to obscure the Save and Cancel buttons in the top right, such that they can’t be clicked on any more. You can delete the text, but the field and container stubbornly remain the same enlarged size. That is, they don’t shrink, and the Save and Cancel buttons remain covered up.

If you stumble around with the Tab key, you can at least make the screen go away—but if you were unlucky enough to click “Save” and return to the application, the front-end remains in the messed-up state.

There is a configuration file, but it’s obfuscated so that you can’t simply edit it and adjust the length of the field to restore it to something that doesn’t cover the Save and Cancel buttons. You can delete the file, but if you do that, you’ll lose a ton of other configuration settings that you’ll have to re-enter.

The organization had, the testers told me, a set of automated checks for this screen. We looked into it. Those checks didn’t include any variation. For the email notification field, they changed the default to a short string of different happy-path data, and and pressed the Save button. But they didn’t press the on-screen Save button. They pressed a virtual Save button.

Thus, even if the check included some challenging data, the automated checks would still have been able to find and click on the correct invisible, inaccessible, virtual Save and Cancel buttons just fine. That is, there is no way that the checks would alert a tester or anyone else to this problem.

After searching for a product, there was a screen to display tiles of products returned in the search. Some searches returned a single product, displaying a single tile. It didn’t take very long for us to find that leaving that screen and coming back to it produced a second instance of the same tile. Leaving and coming back again left three tiles on the screen. It didn’t take long to produce enough tiles for a Gaudi building in Barcelona.

Logging in and putting products into the shopping cart was fine. Putting items into the shopping cart and then logging in put the session into a weird state. The number of items on the shopping cart icon was correct, based on what we had selected, but trying to get into the shopping cart and change the order produced a message that the shopping cart could not be accessed at this time, and all this rendered a purchase impossible. (I tried it later on the production site; same problem. Dang; I wanted those books.)

We found these problems within the first few minutes of free, playful interaction with this product and trying to find problems. We did it by testing experientially. That is, we interacted with the product such that our encounter was mostly indistinguishable from that of a user that we had in mind from one moment to the next. Most observers wouldn’t have noticed how our encounter was different from a user, unless that observer were keen to notice us doing testing.

That observer might have noticed us designing and performing experiments in real time, and taking notes. Those experiments were based on imagining data and work flows that were not explicitly stated in the requirements or use case documents. The experiments were targeted towards vulnerabilities and risks that we anticipated, imagined, and discovered. We weren’t there to demonstrate that everything was working just fine. We were there to test.

And our ideas didn’t stay static. As we experimented and interacted with the product, we learned more. We developed richer mental models of the data and how it would interact with functions in the product. We developed our models of how people might use the product, too; how they might perform some now-more-foreseeable actions—including some errors that they might commit that the product might not handle appropriately. That is, we were changing ourselves as we were testing. We were testing in a transformative way.

Upon recognizing subtle little clues—like the text field growing when it might have wrapped, or rendered existing data invisible by scrolling the text—we recognized the possibility of vulnerabilities and risks that we hadn’t anticipated. That is, we were testing in an exploratory way.

We didn’t let tools do a bunch of unattended work and then evaluate the outcomes afterwards, even though there can be benefits from doing that. Instead, our testing benefitted from our direct observation and engagement. That is, we were testing in an attended way.

We weren’t constrained by a set procedure, or by a script, or by tools that mediated and modified our naturalistic encounter with the product. That is, we weren’t testing in an instrumented way, but in an experiential way.

We were testing in a motivated way, looking for problems that people might encounter while trying to use the damned thing. Automated checks don’t have motivations. That’s fine; they’re simply extensions of people who do have motivations, and who write code to act on them. Even then, automated checks had not alerted anyone to this bug, and would never do so because of the differences between the way that humans and machines encounter the product.

Oh, and we found a bunch of other bugs too. Bunches of bugs.

In the process of doing all this, my testing partners realized something else. You see, this organization is similar to most: the testers typically design a bunch of scripted tests, and then run them over and over—essentially, automated checking without a machine. Eventually, some of the scripts get handed to coders who turn them into actual automated checks.

Through this experience, the testers noticed that neither their scripted procedures nor the automated checks had found the problems. They came to realize that even if someone wanted to them to create formalized procedures, it might be a really, really good idea to hold off on designing and writing the scripts until after they had obtained some experience with the product.

Having got some experience with the product, the testers also realized that there were patterns in the problems they were finding. The testers realized that they could take these patterns back to design meetings as suggestions for the developers’ reviews, and for unit- and integration-level checks. That in turn would mean that there would be fewer easy-to-find bugs on the surface. That would mean that testers would spend less time and effort on reporting those bugs—and that would mean that testers could focus their attention on deeper, richer experiential testing for subtler, harder-to-find bugs.

They also realized that they would likely find and report some problems during early experiential testing, and that the developers would fix those problems and learn from the experience. For a good number of these problems, after the fix, there would be incredibly low risk of them ever coming back again—because after the fix, it would be seriously unlikely that those bits of code would be touched in a way to make those particular problems come back.

This would reduce the need for lengthy procedural scripting associated with those problems; a handful of checklist items, at most, would do the trick. The fastest script you can write is the script you don’t have to write.

And adding automated checks for those problems probably wouldn’t be necessary or desirable. Remember?—automated checks had failed to detect the problems in the first place. The testers who wrote code could refocus their work on lower-level, machine-friendly interfaces to test the business rules and requirements before the errors got passed up to the GUI. At the same time, those testers could use code to generate rich input data sets, and use code to pump that data through the product.

Or those testers could create tools and visualizations and log parsers that would help the team see interesting and informative patterns in the output. Or those testers could create genuinely interesting and powerful and rich forms of automated checking, as in this example. (Using the packing function against the unpacking function is a really nice application of the forward-and-backward oracle heuristic.)

One of the best ways to “free up time for exploratory testing” is to automate some checks—especially at the developer level. But another great way to free up time for exploratory testing is to automate fewer expensive, elaborate checks that require a lot of development and maintenance effort and that don’t actually find bugs. Some checks are valuable, and fast, and inexpensive. The fastest, least expensive check you can write is the valueless check you don’t have to write.

And attended, exploratory, motivated, transformative, experiential testing is a great way to figure out which is which.


There’s an RST Explored class coming up for American days and European evenings. It runs January 17-20, 2022. For that, register here.

You (or your colleagues or other members of your network) might also be interested in the Rapid Software Testing Managed class, also for Europe, UK, and Indian Time Zones, which happens December 1-3, 2021. Information on the class here and more info here; registration here.

What Tests Should I Automate?

Thursday, November 11th, 2021

Instead of asking “What tests should I automate?” consider asking some more pointed questions.

If you really mean “how should I think about using tools in testing?”, consider reading A Context-Driven Approach to Automation in Testing, and Testing and Checking Refined.

If you’re asking about the checking of output or other facts about the state of the product, keep reading.

Really good fact checking benefits from taking account of your status so that you don’t waste time:

  • Do I know enough about the product, and where there might be problems, to be sure that I’m not rushing into developing checks?

If the answer is No, it might be a better idea to do a deeper survey of the product, and scribble down some notes about risk as you go.

If the answer is Yes, then you might want to loop through a bunch of questions, starting here:

  • What specific fact about the product’s output or state do I want to check?
  • Do I know enough about the product, and where there might be problems, to be reasonsonably sure that this really is an important fact to check?
  • Is someone else (like a developer) checking this fact already?

Then maybe consider product risk:

  • What could go wrong in the product, such that I could notice it by checking this fact?
  • Is it a plausible problem? A significant risk?
  • Why do we think it’s a significant risk?
  • Is that foreboding feeling trying to tell us something?

Maybe if there’s serious risk here, a conversation to address the risk is a better idea than more testing or checks.

Assuming that it’s a worthwhile fact to check, move on to how you might go about checking the fact, and the cost of doing it:

  • What’s a really good way to check this fact?
  • Is that the fastest, easiest, least expensive way to check it?
  • Will the check be targeted at a machine-friendly interface?

Consider opportunity cost, especially if you’re targeting automated checks at the GUI:

  • What problems will I encounter in trying to check this fact this way, and doing that reliably?
  • What problems, near here or far away, might I be missing as a consequence of focusing on this fact, and checking it this way?
  • Is there some activity other than checking this fact that might be more valuable right now? In the long run?

Try thinking in terms of the longer term. On the one hand, the product and certain facts about it might remain very stable:

  • What would cause this fact to change? Is that likely? If not, is it worthwhile to create a check for this fact?

On the other hand, the product or the platform or the data or the test environment might become unstable. They might be unstable already:

Beware of quixotic reliability. That’s a wonderful term I first read about in Kirk and Miller’s Reliability and Validity in Qualitative Research. It refers a situation where we’re observing a consistent result that’s misleading, like a broken thermometer that reliably reads 37° Celsius. (Kirk and Miller have some really important things to say about confirmatory research, too.)

  • Is there a chance that this check might lull us to sleep by running green in a way that fools us?

To address the risk of quixotic reliability and to take advantage of what we might have learned, it’s good to look at every check, every once in a while:

  • What’s our schedule for reviewing this check? For considering sharpening it, or broadening it?

A sufficently large suite of checks is incomprehensible, so there’s no point point in running checks that are no longer worthwhile:

  • What’s our plan for checking this fact less often when we don’t need it so much? For retiring this check?

The next questions are especially important if you’re a tester. Checks are least expensive and most valuable when they provide fast feedback to the developers; so much so that it might be a good idea for the developers to check the code before it ever gets to you.

  • Am I the right person to take responsibility for checking this fact? Am I the only person? Should I be?
  • Is checking this fact, this way, the earliest way that we could become aware of a real problem related to it?

Given all that, think about utility—the intersection of cost, and risk, and value:

  • Do I still believe I really need to check this fact? Is it worthwhile to develop a check for it?

After you’ve asked these questions, move on to the next fact to be checked.

“But asking and answering all those questions will take a long time!”, you might reply.

Not that long. Now that you’ve been primed, this is a set of ideas that you can soon carry about in your mind, and run through the list at the speed of thought. Compare asking these questions with how long it takes to develop and run and interpret and revise and review and maintain even one worthless check.

Now it’s true, you can save some time and effort by skipping the interpretation and review and maintenance stuff. That is, essentially, by ignoring the check after you’re written it. But if you’re not going to pay attention to what a check are telling you, why bother with it at all? It’s faster still not to develop it in the first place.

With practice, the questions that I offered above can be asked routinely. Don’t assume that my list is comprehensive; ask your own questions too. If you pay attention to the answers, you can be more sure that your checks are powerful, valuable, and inexpensive.


There’s Rapid Software Explored class coming up for American days and European evenings. It runs January 17-20, 2022. Register here.

You (or your colleagues or other members of your network) might also be interested in the Rapid Software Testing Managed class which happens December 1-3, 2021, also for Europe/UK/Indian time zones. The class is for test managers and test leads. It’s also for people who aspire to those positions; for solo testers who must manage their own work responsibly; and for development or product managers who work with testers—anyone who is in a leadership position.

Testing Doesn’t Improve the Product

Tuesday, November 9th, 2021

(This post is adapted from my recent article on LinkedIn.)

Out there in the world, there is a persistent notion that “preventing problems early in the software development process will lead to higher-quality products than testing later will”. That isn’t true.

It’s untrue, but not for the reason that might first occur to most people. The issue is not that addressing problems early on is a bad idea. That’s usually a really good idea.

The issue is the statement is incoherent. Testing on its own, whether done early or late, will not lead to higher quality products at all.

Problem prevention, product improvements, and testing are different pursuits within development work. These activities are related, but testing can neither prevent problems nor improve the product. Something else, beyond testing, must happen.

Coming from me—a teacher and an advocate for skilled testing—that might seem crazy, but it’s true: testing doesn’t improve the product.

Investigative journalism is an information-gathering activity. Investigative journalists reveal problems in companies and governments and groups, problems that affect society.

Awareness of those problems may lead to public concern or outcry. The news reports on their own, though, don’t change anything. Change happens when boards, regulators, politicians, legal systems, leaders, workers, or social groups take action.

Testing, too, is an information-gathering activity. That information can be used to recognize problems in the product (“bugs”), or to identify aspects of the product that do not represent errors but that nonetheless could be improved (“enhancement requests”). Gathering information plays a role in making things better, but it doesn’t make things better intrinsically and automatically.

Consider: weighing yourself doesn’t cause you to lose weight. Blood tests don’t make you healthier. Standardized tests in schools don’t make kids smarter, and certainly don’t improve the quality of education.

What testing can do is to improve our understanding and awareness of whatever might be in front of us. Testing—the process of evaluating a product by learning about it through experiencing, exploring, and experimenting—helps our teams and our clients to become aware of problems. On becoming aware of them, our teams and clients might decide to address them.

To put it another way: testing is questioning a product in order to evaluate it. Neither the questions nor the answers make the product better. People acting on the answers can make the product better.

Similarly, in daily life, a particular reading on a bathroom scale might prompt us to eat more carefully, or to get more exercise, whereupon we might become more fit. A blood test might prompt a doctor to prescribe anti-malarial drugs, and if we take them as prescribed, we’re likely to control the malaria. Those standardized school tests might suggest changes to the curriculum, or to funding for education, or to teacher training. But until someone takes action, the test only improves awareness of the situation, not the situation itself.

In software development, improvement doesn’t happen unlesss someone addresses the problems that testing helps us to discover. Of course, if the problems aren’t discovered, improvement is much less likely to happen—and that’s why testing is so important. Testing helps us to understand the product we’ve got, so we can decide whether it’s the product we want. Where improvement is necessary, testing can reveal the need for improvement.

Some people believe that testing requires us to operate a product, thinking in terms of the product as a built piece of software. That’s a very important kind of testing, but it’s only one kind of testing activity, referring to one kind of product.

It can be helpful to consider a more expansive notion of a product as something that someone has produced. This means that testing can be applied to units or components or mockups or prototypes of an application.

And although we might typically call it review, we can a kind of testing to things people have written, or sketched, or said about a software product that does not yet exist. In these cases, the product is the artifact or the ideas that is represents. In this kind of test, experimentation consists of thought experiments; exploration applies to the product and to the space, or context, in which it is situated; experiencing the product applies to the process of analysis, and to experiences that we could imagine.

The outcome of the test-as-thought-experiment is the evaluation and learning that happens through these activities. That learning can be applied to correcting errors in the design and the development of the product—but once again, it’s the work that happens in response to testing, not the testing itself, that improves the product.

Just as testing doesn’t improve products, testing doesn’t prevent problems either. As testers, we have an abiding faith that there are already problems in anything that we’ve been asked to test. That is, the problems in the product are there before we encounter them. We must believe that problems have not been prevented. Indeed, our belief that problems have not been successfully prevented is a key motivating idea for testing work.

So what good is testing if it can’t prevent problems? Testing can help us to become aware of real problems that are really there. That’s good. That might even be great, because with that awareness, people can make changes to prevent those unprevented problems from going any further, and that’s good too.

It’s a great idea for people who design and build products to try to prevent problems early in development. To the degree that the attempt can be successful, the builders are more likely to develop a high-quality product. Nonetheless, problems can elude even a highly disciplined development process. There are at least two ways to find out if that has happened.

One way is to test the product all the way through its development, from intention to realization. Test the understanding of what the customer really wants, by engaging with a range of customers and learning about what they do. Test the initially fuzzy and ever-sharper vision of the designer, through review and discussion and what-if questions.

Test the code at its smallest units and at every stage of integration, through more review, pairing, static analysis tools, and automated output checking. Check the outputs of the build process for bad or missing components and incorrect settings. These forms of testing are usually not terribly deep. That’s a a good thing, because deep testing may take time, effort, and preparation that can be disruptive to developers. Without deep testing, though, bugs can elude the developers.

So, in parallel to the developers’ testing, assign some people to focus on and to perform deep testing. Deep testing is targeted towards rare, hidden, subtle, intermittent, emergent bugs that can get past the speedy, shallow, non-disruptive testing that developers—quite reasonably—prefer most of the time.

If your problem-prevention and problem-mitigation strategies have been successful, if you’ve been testing all along, and if you’ve built testability into the product, you’ll have a better understanding of it. You’ll also be less likely to encounter shallow problems late in the game. If you don’t have to investigate and report those problems, deep testing can be relatively quicker and easier.

If your problem-prevention and problem-mitigation strategies have been unsuccessful, deep testing is one way to find out. The problems that you discover can be addressed; the builders can make improvements to the product; and problems for the business and for the customer can be prevented before the product ships.

The other way to find out if a problem has eluded your problem prevention processes is to release the product to your otherwise unsuspecting customers, and take the chance that the problems will be both infrequent and insignificant enough that your customers won’t suffer much.

Here are some potential objections:

If software testing does not reveal the need for improvement then the improvement will not happen.

That’s not true. Although testing shines light on things that can be improved, improvement can happen without testing.

Testing can happen without improvement, too. For instance…

  • I perform a test. I find a bug in a feature. The program manager says, “I disagree that that’s a bug. We’re not doing anything in response to that report.”
  • I perform a test. I find a bug in a feature. The program manager says “I agree that that’s a bug. However, we don’t have time to fix it before we ship. We’ll fix it in the next cycle.”
  • I test. I find a bug. The program manager agrees that it’s a bug. The developer tries to fix it, but makes a mistake and the fix is ineffective.
  • I test. I find a bug. The program manager agrees, the developer fixes that bug, but along the way introduces new bugs, each of which is worse than the first.

In each case above, 1) Has the product been tested? (Yes.) 2) Has the product been improved? (No.)

Saying that testing doesn’t improve the product diminishes the perceived value of testing.

Saying that testing does improve the product isn’t true, and miscalibrates the role of the tester relative to the people who design, build, and manage the product.

Let’s be straight about this: we play a role in product improvement, and that’s fine and valuable and honourable. Being truthful and appropriately humble about the extents and limits of what testing can actually do diminishes none of its value. We don’t design or develop or improve the product, but we give insight to the people who do.

The value argument in favour of testing is easy to make. As I pointed out above, investigative journalists don’t run governments and don’t set public policy. Would you want them to? Probably not; making policy is appropriately the role of policy-makers. On the other hand, would you want to live in a society without investigative journalism? Now: would you want to live a world of products that had been released without sufficiently deep testing?

When there’s the risk of loss, harm, bad feelings, or diminished value for people, it’s a good idea to be aware of problems before it’s too late, and that’s where testing helps. Testing on its own neither prevents problems nor improves the product. But testing does make it possible to anticipate problems that need to be prevented, and testing shines light on the places where the product might need to be improved.

Experience Report: Katalon Studio

Friday, November 5th, 2021

Warning: this is another long post. But hey, it’s worth it, right?

Introduction

This is an experience report of attempting to perform a session of sympathetic survey and sanity testing on a “test automation” tool. The work was performed in September 2021, with follow-up work November 3-4, 2021. Last time, the application under test was mabl. This time, the application is Katalon Studio.

My self-assigned charter was to explore and survey Katalon Studio, focusing on claims and identifying features in the product through sympathetic use.

As before, I will include some meta-notes about the testing in indented text like this.

The general mission of survey testing is learning about the design, purposes, testability, and possibilities of the product. Survey testing tends to be spontaneous, open, playful, and relatively shallow. It provides a foundation for effective, efficient, deliberative, deep testing later on.

Sanity testing might also be called “smoke testing”, “quick testing”, or “build verification testing”. It’s brief, shallow testing to determine whether if the product is fit for deeper testing, or whether it has immediately obvious or dramatic problems.

The idea behind sympathetic testing is not to find bugs, but to exercise a product’s features in a relatively non-challenging way.

Summary

My first impression is that the tool is unstable, brittle and prone to systematic errors and omissions.

A very short encounter with the product reveals startlingly obvious problems, including hangs and data loss. I pointed Katalon Record to three different Web applications. In each case, Katalon’s recording functions failed to record my behaviours reliably.

I stumbled over several problems that are not included in this report, and I perceive many systemic risks to be investigated. As with mabl, I was encountered enough problems on first encounter with the product that it swamped my ability to stay focused and keep track of them all. I did record a brief video that appears below.

Both the product’s design and documentation steer the user—a tester, presumably—towards very confirmatory and shallow testing. The motivating idea seems to be recording and playing back actions, checking for the presence of on-screen elements, and completing simple processes. This kind of shallow testing could be okay, as far as it goes, if it were inexpensive and non-disruptive, and if the tool were stable, easy to use, and reliable—which it seems not to be.

The actual testing here took no more than an hour, and most of that time was consumed by sensemaking, and reproducing and recording bugs. Writing all this up takes considerably more time. That’s an important thing for testers to note: investigating and reporting bugs, and preparing test reports is important, but presents opportunity cost against interacting with the product to obtain deeper test coverage.

Were I motivated, I could invest a few more hours, develop a coverage outline, and perform deeper testing on the product. However, I’m not being compensated for this, and I’ve encountered a blizzard of bugs in a very short time.

In my opinion, Katalon Studio has not been competently, thoroughly, and broadly tested; or if it has, its product management has either ignored or decided not to address the problems that I am reporting here. This is particularly odd, since one would expect a testing tools company, of all things, to produce a well-tested, stable, and polished product. Are the people developing Katalon Studio using the product to help with the testing of itself? Neither a Yes nor a No answer bodes well.

It’s possible that everything smooths out after a while, but I have no reason to believe that. Based on my out-of-the-box experience, I would anticipate that any tester working with this tool would spend enormous amounts of time and effort working around its problems and limitations. That would displace time for achieving the tester’s presumable mission: finding deep problems in the product she’s testing. Beware of the myth that “automation saves time for exploratory testing”.

Setup and Platform

I performed most of this testing on September 19, 2021 using Chrome 94 on a Windows 10 system. The version of Katalon Studio was 8.1.0, build 208, downloaded from the Katalon web site (see below).

During the testing, I pointed Katalon Studio’s recorder at Mattermost, a popular open-source Slack-like chat system with a Web client; a very simple Web-based app that we use for an exercise that we use in out Rapid Software Testing classes; and at CryptPad Kanban, an open-source, secure kanban board product.

Testing Notes

On its home page, Katalon claims to offer “An all-in-one test automation solution”. It suggests that you can “Get started in no time, scale up with no limit, for any team, at any level.”

Katalon Claims: "An all-in-one test automation solution. Get started in no time, scale up with no limit, for any team, at any level."

I started with the Web site’s “Get Started” button. I was prompted to create a user account and to sign in. Upon signing in, the product provides two options: Katalon Studio, and Katalon TestOps. There’s a button to “Create your first test”. I chose that.

A download of 538MB begins. The download provides a monolithic .ZIP file. There is no installer, and no guide on where to put the product. (See Bug 1.)

I like to keep things tidy, so I create a Katalon folder beneath the Program Files folder, and extract the ZIP file there. Upon starting the program, it immediately crashes. (See Bug 2.)

Katalon crashes on startup, saying "An error has occurred.  See the log file."

The error message displayed is pretty uninformative, simply saying “An error has occurred.” It does, however, point to a location for the log file. Unfortunately, the dialog doesn’t offer a way to open the file directly, and the text in the dialog isn’t available via cut and paste. (See Bug 3.)

Search Everything to the rescue! I quickly navigate to the log file, open it, and see this:

java.lang.IllegalStateException: The platform metadata area could not be written: C:\Program Files\Katalon\Katalon_Studio_Windows_64-8.1.0\config\.metadata. By default the platform writes its content under the current working directory when the platform is launched. Use the -data parameter to specify a different content area for the platform. (My emphasis, there.)

That -data command-line parameter is undocumented. Creating a destination folder for the product’s data files, and starting the product with the -data parameter does seem to create a number of files in the destination folder, so it does seem to be a legitimate parameter. (Bug 4.) (Later: the product does not return a set of supported parameters when requested; Bug 5.)

I moved the product files to a folder under my home directory, and it started apparently normally. Help/About suggests that I’m working with Katalon Studio v. 8.1.0, build 208.

I followed the tutorial instructions for “Creating Your First Test”. As with my mabl experience report, I pointed Katalon Recorder at Mattermost (a freemium chat server that we use in Rapid Software Testing classes). I performed some basic actions with the product: I entered text (with a few errors and backspaces). I selected some emoticons from Mattermost’s emoticon picker, and entered a few more via the Windows on-screen keyboard. I uploaded an image, too.

Data entered into Mattermost for Katalon

I look at what Katalon is recording. It seems as though the recording process is not setting things up for Katalon to type data into input fields character by character, as a human would. It looks like the product creates a new “Set Text” step each time a character is added or deleted. That’s conjecture, but close examination of the image here suggests that that’s possible.

Katalon records a "Set Text" step for every keystroke.

Two things: First, despite what vendors claim, “test automation” tools don’t do things just the way humans do. They simulate user behaviours, and the simulation can make the product under test behave in ways dramatically different from real circumstances.

Second, my impression is that Katalon’s approach to recording and displaying the input would make editing a long sequence of actions something of a nightmare. Further investigation is warranted.

Upon ending the recording activity, I was presented with the instruction “First, save the captured elements that using in the script.” (Bug 6.)

Katalon says, "First, save the captured elements that using in the script."

This is a cosmetic and inconsequential problem in terms of operating the product, of course. It’s troubling, though, because it is inconsistent with an image that the company would probably prefer to project. What feeling do you get when a product from a testing tools company shows signs of missing obvious, shallow bugs right out of the box? For me, the feeling is suspicion; I worry that the company might be missing deeper problems too.

This also reminds me of a key problem with automated checking: while it accelerates the pressing of keys, it also intensifies our capacity to miss things that happen right in front of our eyes… because machinery doesn’t have eyes at all.

There’s a common trope about human testers being slow and error prone. Machinery is fast at mashing virtual keys on virtual keyboards. It’s infinitely slow at recognizing problems it hasn’t been programmed to recognize. It’s infinitely slow at describing problems unless it has been programmed to do so.

Machinery doesn’t eliminate human error; it reduces our tendency towards some kind and increases the tendency for other kinds.

Upon saving the script, the product presents an error dialog with no content. Then the product hangs with everything disabled, including OK button on the error dialog. (Bug 7.) The only onscreen control still available is the Close button.

On attempting to save the script, Katalon crashes with an empty error dialog.

After clicking the Close button and restarting the product, I find that all of my data has been lost. (Bug 8.)

Pause: my feelings and intuition are already suggesting that the recorder part of the product, at least, is unstable. I’ve not been pushing it very hard, nor for very long, but I’ve seen several bugs and one crash. I’ve lost the script that was supposedly being recorded.

In good testing, we think critically about our feelings, but we must take them seriously. In order to do that, we follow up on them.

Perhaps the product simply can’t handle something about the way Mattermost processes input. I have no reason to believe that Mattermost is exceptional. To test that idea, I try a very simple program, UI-wise: the Pattern exercise from our Rapid Software Testing class.

The Pattern program is a little puzzle implemented as a very simple Web page. The challenge for the tester is to determine and describe patterns of text strings that match a pattern encoded in the program. The user types input into a textbox element, and then presses Enter or clicks on the Submit button. The back end determines whether the input matches the pattern, and returns a result; then the front end logs the outcome.

I type three strings into the field, each followed by the Enter key. As the video here shows, the application receives the input and displays it. Then I type one more string into the field, and click on the submit button. Katalon Recorder fails to record all three of the strings that were submitted via the Enter key, losing all of the data! (Bug 9.)

Here’s a video recording of that experience:

The whole premise of a record-and-playback tool is to record user behaviour and play it back. Submitting Web form input via the Enter key is perfectly common and naturalistic user behaviour, and it doesn’t get recorded.

The workaround for this is for the tester to use the mouse to submit input. At that, though, Katalon Recorder will condition testers to interact with the product being tested in way that does not reflect real-world use.

I saved the “test case”, and then closed Katalon Studio to do some other work. When I returned and tried to reopen the file, Katalon Studio posted a dialog “Cannot open the test case.” (Bug 10.)

On attempting to open a saved "test case", Katalon can

To zoom that up…

On attempting to open a saved "test case", Katalon can

No information is provided other than the statement “Cannot open the test case”. Oh well; at least it’s an improvement over Bug 7, in which the error dialog contained nothing at all.

I was interested in troubleshooting the Enter key problem. There is no product-specific Help option under the Help menu. (Bug 11.)

No product-specific help under the Help menu.

Clicking on “Start Page” produces a page in the main client window that offers “Help Center” as a link.

Katalon Start Page.

Clicking on that link doesn’t take me to documentation for this product, though. It takes me to the Katalon Help Center page. In the Help Center, I encounter a page where the search field looks like… white text on an almost-white background. (Bug 12.)

Katalon Support Search: White text on a white background.

In the image, I’ve highlighted the search text (“keystrokes”). If I hadn’t done that, you’d hardly be able to see the text at all. Try reading the text next to the graphic of the stylized “K”.

I bring this sort of problem up in testing classes as the kind of problem that can be missed by checks of functional correctness. People often dismiss it as implausible, but… here it is. (Update, 2021/11/04: I do not observe this behaviour today.)

It takes some exploration to find actual help for the product (https://docs.katalon.com/katalon-studio/docs/overview.html). (Again, Bug 11.)

From there it takes more exploration to find the reference to the WebUI SendKeys function. When I get there, there’s an example, but not of appropriate syntax for sending the Enter key, and no listing of the supported keys and how to specify them. In general, the documentation seems pretty poor. (I have not recorded a specific bug for this.

This is part of The Secret Life of Automation. Products like Katalon are typically documented to the bare minimum, apparently on the assumption that the user of the product has the same tacit knowledge as the builders of the product. That tacit knowledge may get developed with effort and time, or the tester may simply decide to take workarounds (like preferring button clicks to keyboard actions, or vice versa) so that the checks can be run at all.

These products are touted as “easy to use”—and they often are if you use them in ways that follows the assumptions of the people who create them. If you deviate from the buiders’ preconceptions, though, or if your product isn’t just like the tool vendors’ sample apps, things start to get complicated in a hurry. The demo giveth, and the real world taketh away.

I turned the tool to record a session of simple actions with CryptPad Kanban (http://cryptpad.fr/kanban). I tried to enter a few kanban cards, and closed the recorder.
Playback stumbled on adding a new card, apparently because the ids for new card elements are dynamically generated.

At this point, Katalon’s “self=healing” functions began to kick in. Those functions are unsuccessful, and the process fails to complete. When I looked at the log output for the process, “self-healing” appears to consist of retrying an Xpath search for the added card over and over again.

To put it simply, “self-healing” doesn’t self-heal. (See Bug 13.)

The log output for the test case appears in a non-editable, non-copiable window, making it difficult to process and analyze. This is inconsistent with the facility available in the Help / About / System Configuration dialog, which allows copying and saving to a file. (See Bug 14.)

At this point, having spent an hour or so on testing, I stop.

Follow-up Work, November 3

I went to the location of katalon.exe and experimented with the command-line parameters.

As a matter of fact, no parameters to katalon.exe are documented; nor does the product respond to /?, -h, or –help. (See Bug 5.)

On startup the program creates a .metadata\.log file (no filename; just an extension) beneath the data folder. In that .log file I notice a number of messages that don’t look good; three instances of “Could not find element”; warnings for missing NLS messages; a warning to initialize the log4j system properly, and several messages related to key binding conflicts for several key combinations (“Keybinding conflicts occurred. They may interfere with normal accelerator operation.”). This bears further investigation some day.

Bug Summaries

Bug 1: There is no installation program and no instructions on where to put the product upon downloading it. Moreover, The installation guide at https://docs.katalon.com/katalon-studio/docs/getting-started.html#start-katalon-studio does not identify any particular location for the product. Inconsistent with usability, inconsistent with comparable products, inconsistent with installability, inconsistent with acceptable quality.

Bug 2: Product crashes when run from the \Program Files\Katlaon folder. This is due to Bug 1.

Bug 3: After the crash in Bug 2, the error dialog neither offers a way to to open the file directly nor provides a convenient way to copy the location. Inconsistent with usability.

Bug 4: Undocumented parameter -data to katalon.exe

Bug 5: Command-line help for katalon.exe does not list available command-line parameters.

Bug 6: Sloppy language in the “Creating Your First Script” introductory process: “First, save the captured elements that using in the script.” Inconsistent with image.

Bug 7: Hang upon saving my first script in the tutorial process, including an error dialog with no data whatsoever; only an OK button. Inconsistent with capability, inconsistent with reliability.

Bug 8: Loss of all recorded data for the session after closing the product after Bug 7. Inconsistent with reliability.

Bug 9: Katalon Studio’s recorder fails to record text input if it ends with an Enter key. The application under test accepts the Enter key fine. Inconsistent with purpose, inconsistent with usability.

Bug 10: Having saved a “test case” associated with Bug 9, closing the product, and then returning, Katalon Studio claims that it “cannot open the test case”. Inconsistent with reliability.

Bug 11: There is no product-specific “Help” entry under the main menu’s Help selection. Inconsistent with usability.

Bug 12: Search UI at katalon.com/s=keystrokes displays in white text on an almost-white background. Inconsistent with usability. (Possibly fixed at some point; on 2021/11/04, I did not observe this behaviour.)

Bug 13: “Self-healing” is ineffective, consisting of repeatedly trying the same Xpath-based approach to selecting an element that is not the same as the recorded one. Inconsistent with purpose.

Bug 14: Output data for a test case appears in a non-editable, non-copiable window, making it difficult to process and analyze. Inconsistent with usability, inconsistent with purpose. This is also inconsistent with the Help / About / System Configuration dialog, which allows both copying to the clipboard and saving to a file.