Load Tests Aren’t Scripts. They’re Services. And Somewhere Along the Way, Gatling Forgot That

So here’s a question no one really likes asking out loud: if everything looks fine — builds pass, CI is green, QA shrugs — do we even need to run performance tests?

Everything is stable on paper. You roll out a release, everything is green, traffic grows. And then — not immediately, a couple of hours or days later — something starts to break. No alarms. No panic. Just quietly. One customer. A spike. Some background cron. And you’re already catching timeouts where no one expected them to be possible.

Code? Not likely. Monitoring? We thought it was there. Gatling? There was a script. Somewhere. But was it in the Pipeline? Was it even running? No one knows.

And the worst part? We knew there were Gatling alternatives. They’d been tossed around in chats. Mentioned during retros. Someone even bookmarked a link or two — maybe PFLB, maybe K6, maybe something more custom. But no one took ownership. So the idea just… floated. Until it was no longer hypothetical.

CI/CD and Load Testing: Why the Marriage Never Happened

Time to talk. Most of us have CI pipelines that do all the usual things: linters, unit tests, integration tests, maybe even some visual differences — all good. But what’s your load testing? It’s usually somewhere in the basement. Work in Jenkins #27. Last modified by “someone” three months ago joshua Fabia.

Why is that happening? Because Gatling — and tools like it — weren’t built to live in modern DevOps. They linger. They hover near the edge of the process. You have to remember to kick them in order for them to fulfill their function. And let’s face it: no one ever does this — why would they? If everything is calm.

And here’s a meeting with reality:

There’s a test script in scripts/load/, untouched since the move to Kubernetes.
Jenkins has a job set up, but it’s manual, and buried under other tabs.
No one really knows what endpoints the script targets anymore — half the team assumes they’ve been deprecated, the other half just hasn’t checked.

Then a feature comes along. It could be new logic for email notifications. It affects the database more than expected. You send it out. CI goes through. You merge. Everyone feels good about it. And yet no one stops to ask, “Can we handle the traffic during peak hours?” No one says, “And let’s work with the synthetic load before Friday’s promo.”

It’s not because we’re lazy. It’s because the toolkit isn’t part of the flow. It’s more like an afterword. It’s like if you get a formal wedding invitation from the head of QA — well-intentioned but a little out of place, it happens.

Contextual testing: not only “a lot”, but also “exactly”

This is where the fun part comes in. Most load tests are more like abstractions. You’re modeling 1,000 users. Why? Who the hell knows? It’s supposed to be impressive. But the actual API works differently. There are specific business scenarios: exporting invoices, mass authorization, updating the cart when a discount is applied.

Contextual testing is when you don’t just pounce on the server, but do it intelligently, it implies — as close to reality as possible. What happens if 300 users simultaneously submit a request for the last step of a purchase? What if the backend is already busy running a cron task? There are no answers until you test this exact scenario. That’s impossible without proper test data — one of the most overlooked parts of quality assurance, even in mature teams.

Why you need context is simple but substantive:

Scenario	Risks without tests	What does context testing provide?
Mass authorization	Increase in 401 errors under load	Checking session persistence
Export of invoices	Service crash due to response time	Controlling report generation time
Discount campaigns	Non-consistent baskets	Checking caching and database in peak
Backup + customer requests	Deadlocks and timeouts	I/O balancing for real cases

Gatling struggles in this space — it has no awareness of context. It doesn’t know your flow, can’t adapt to new headers or changing business logic. The script stays the same, even when everything around it evolves.

That’s where Gatling substitutes start to matter. Not because they’re inherently “better,” but because they’re built to live inside your delivery process — not next to it. Some platforms tie load tests to pull requests. You change a subscription flow — the test runs. A Slack bot nudges you: — “Looks like that endpoint takes traffic — want to check it under load?” It’s just built-in. And that changes everything.

And yes, the argument that “it works for us” is a false one. It works until someone decides to download 500 quarterly reports at once. And then it’s the same thing: logs are slowly leaking, DevOps is nervously drinking.

“Who needs autotriggers anyway?”

Oh, that’s a good one. I used to say that myself. Until at one point on a project, it turned out that no one had run a Jenkins job with a load… 8 months. Because nobody thought it was necessary. Because “well, it was there!”

And then, all of a sudden, an Azure client. Peak activity. Unexpected drops. How did that happen? It’s simple: the autotrigger wasn’t configured. Actually, it was. But it was disabled so it wouldn’t interfere with the release. Who disabled it? Who knows? Legend has it that it was a project from the last team.

Tests as a service, not an artifact

Here’s the bottom line. A loadable should live. Not be a piece in a repository, but be a service. Like Grafana. Like Prometheus. You don’t think about it — it’s just there. And it works.

What it takes to make testing a service, not a note on a call:

Automatically run tests with every change related to business logic.
Alerts that react to RPS sag before something breaks.
Infrastructure where scripts are updated synchronously with code – via the same Pull Requests.
A repository of results and regression history for each fiche flop.

Does this sound complicated? Maybe. But honestly — it saves. Not from bugs. From embarrassment. Because when a service goes down under load, the client doesn’t care that you had tests. He cares that they didn’t say anything.

Platforms like PFLB are the same alternatives to Gatling that are talked about at conferences, but not everyone dares to implement. Or even custom solutions with GitHub Actions + Slack — that’s live testing. Everything else is museum pieces.

About the PFLB test — it’s not a JSON file in the archive, but API integration, metrics in Grafana, and alerts before the client notices something.

Conclusion

I won’t say Gatling is bad. He’s served honorably. It still can, if you really build it in. But if you run it once a release, it’s like checking the fire alarm on holidays. Is it working? Well, sort of. Until it catches fire.

Here’s a checklist for you to think about:

Last load test run — when was it?
Test scenarios — do they match the current code?
Is there integration with CI or just “by hand”?

If at least at one point, you sigh — look at alternatives. Maybe they know your pipeline better than you do yourself. Or, better yet, look at how others automate their load tests inside pipelines, instead of hoping the Jenkins job still works.

That’s it, you’ve exhaled. Now think about this: are you sure you don’t have any holes in your load?