Like any dev shop worth its weight, we strongly promote testing of our products. In order to produce a quality product, we need an internal way to ensure a given level of quality is being met. Business requirements lead to tests, the tests lead to the logic, the logic leads to satisfying the client’s business requirements. After some time the test suite grows, time to test increases, and before you know it, running said suite of tests can result in 5, 10, 15 mins per run. In a rapid iteration environment this can become untenable. How then does a developer reduce the time from start to finish on testing, yet retain quality assurance? What follows is a cautionary tale for all involved.
A consultant developer (we will can him “Bob”) was brought in and assigned a task. Bob begins programming away, building tests as requested. With any modern app, some processes take longer than others. So Bob added Codeception’s wait() method call where needed to pause the execution until that process has finished. At first, he started with a 15 seconds time out, but as time went by and the test count increased, the need to reduce the time grew as well. So Bob reduced the wait() time until everything seemed fine.
One day, many months later, another developer, “John,” was tasked to the project and his tests start failing. John spent a couple of days trying to figure out why. The suite passed on Bob’s machine, passed in the Continuous Integration service, but he just couldn’t get them to pass on his own machine.
Brain trust time. Five developers and engineers were placed in a room to pick apart just about everything for this project. Tracing logic execution, pouring over stack traces, trying this idea and that idea. Sometimes a given test would pass the first loop and fail on the second. Run it again and 1 and 2 passed, but 3 failed. Run it a third time when 1,2, and 3 were expected to pass, but 4 to fail, and surprisingly, 1 would fail. File Cache? Nope, not enabled on the development environments. Fixture/SQL dump import file size? Cut the filler data to 5 items, still failing. Nothing seemed to work after 3 hours to banging away at this most intriguing issue.
Around hour 4, we started to notice that a given set of test that failed all had one thing in common. None of us thought much of it, but sometimes it takes fresh eyes to see the forest for the trees. One of the newer developers in here at Sourcetoad had an idea: “Let’s increase the wait() times from 5 seconds up to 15 and see what happens.” The time that elapsed was unbearable. Some of the logic would loop 3 or 4 times, calling another test method that would have 5+ wait calls in it…tick…tick…tick…
Sure enough the suite passed. Sure, it was nice to see green but after 15 minutes? Ugg. The output images and markup provided by the test system always showed the markup or elements the test would be looking for. So we set about turning the time down on one of the more time-intensive tasks. Around 6 seconds, that’s how long it would take to complete. We set to 5, it would fail randomly, only around 10 would it seem to pass with any consistency.
We then went and checked some of the tests run by one of the other team members, very few waits. WaitForText() or WaitForElement() all over the place, but no hard wait() calls. We again gathered after ordering lunch and called in the original developer (call him “Jim”). Without much prompting Jim confirmed that indeed usage of wait() in the given battery of tests was atrocious. But for the project and the deadline given no time was allocated to review Bob’s tests beyond that they did indeed pass during the integration process and on the testing service.
Ladies and gentleman, don’t use hard timeouts. Minute hardware difference and capabilities can skew how long it takes to complete a task. When on better hardware it is wasting time as the process finishes quickly. On less hardware the task may take longer due to less computing power being available.
I can hear some of you DevOps and senior engineers thinking, “Use a container service, it normalizes the platform.” We are right there with you; normalizing as much as possible from the dev to production is almost always a great idea. But guess how we came across this problem? Containerizing the development environment. Yep, I kid you not. You see, we use OS X here at Sourcetoad, and Docker’s native client has a terrible file-read performance issue right now. (We have a workaround, but even it only works at about 90% native performance.) Due to this minimal drop in performance, the tests ran just slowly enough to fail given the just-high-enough pass time provided to the wai() calls.
The moral of the story is three fold:
- Do not use hard timeouts
- Tests only provide quality assurance to the level of quality the test are written to
- Code review, including test code, is invaluable to preventing technical debt
- Quality assurance must be part of the project timeline
As we speak the hard wait() calls are being removed and replaced with more dynamic pause method calls. On the bright side, the test quality is improving, thus the application quality is improving. The sad part is it took four engineers and developers nearly a day to sort it out.