Are cyber problems ever really “solved”? 

Where monetization is easily achieved, an attacker’s motivation remains persistently high. Often referred to as a game of cat and mouse, these cyber problems can remain unsolved for decades, with the competing innovation cycles of attacker and defender wrestling the ascendancy over time.

One such problem is bot mitigation. Humans have been using bots to increase the scalability of cyber attacks for decades. The attack vectors have morphed over time from overwhelming resources to deliberately exploiting business logic for financial gain.

Why the anti-bot industry exists

Why do companies pay for bot mitigation services? Ultimately you can boil the motivation to solve the bot problem down to four simple use cases that all align to brand protection:

  1. Trust and safety
  2. Fairness
  3. Security and risk
  4. Revenue, platform cost, and availability

These challenges shoulder the boundaries of legality. Cracking into someone’s account or washing stolen credit cards are definitely illegal activities. Building a bot to purchase limited-release fashion items in bulk is not illegal – but using that bot to purchase tickets to the next Taylor Swift concert is. This is a weird world of contradictions and uncertainty.

History and comparisons

There are strong parallels with the ongoing challenge of securing email traffic. The evolution of threats that leverage email as their delivery vector has grown in sophistication and variety over a 20+ year duration. From the mass mailing worms of the late 1990s to the modern sophisticated business email compromise (BEC) attacks, defenders have been forced to innovate many times over to deliver upon their mission of delivering a clean email feed. Over the years, the vendor mix that dominates the email security market share has ebbed and flowed. The early entrants into the space have all but disappeared leaving space for innovative new entrants to fill the void with their technically superior offerings.

Like the email security world, the adversarial landscape for bot mitigation has a strong history of innovation and a persistent motivation for a share of the pot of gold at the end of the Python-coded rainbow. Bot detection is inherently a data problem. And it’s a hard one to solve. It relies on the remote execution of code inside a completely untrusted environment. The client-side data collectors, mandatory for any anti-bot detection system of merit, generate a valuable dimension of data that would not otherwise exist. Without this data, there is no way of truly knowing what tooling was used to interact with an application’s backend.

The collection method, which is used to detect a bot, is prone to tampering which cascades into a data pollution problem. This undermines the value of the sophisticated ML models that drive the detection engines.

Therefore, the real battle in the bot mitigation game is to achieve clean, accurate data. Most anti-bot solutions are not equipped to deliver that.

Failures in the bot mitigation market

The modern bot mitigation industry kicked off in the mid-2010s with companies such as Distil Networks, Shape Security, and Akamai taking early market share. Whilst they built significant and successful businesses, the early entrants failed to truly solve the problem with an eye toward the future. As their organizational focus has shifted elsewhere, an opportunity has arisen for both the bot builders and new market entrants.

The consistent mistake made by bot mitigation vendors is to build products that are overly reliant on human services to deliver their value proposition. Despite the potential of AI, most services on the market fail to achieve results using an inline, unsupervised model. Polluted datasets play a huge part in this outcome. The challenge of finding the balance between false positives and negatives is real.

Poorly-protected JavaScript (JS) collection results in fake data being generated by bots in order to pass as human. Once a bot operator understands the data that is required to pass as human, they have a free pass until such a time as the bot mitigation operator changes their model.

“Is this mouse movement data indicative of what happened on the device, or was it generated in advance and submitted statically to avoid detection?”

Looking forward – what’s really needed

If you accept that the bot mitigation problem is unsolved, you must also accept that all bot mitigation solutions in the market today have their limitations. So, how would you know if your vendor was failing? And how much better or worse would a different vendor be?

The early entrants left a cookie trail of the client and server-side concepts that are all required within a tightly integrated system to effectively detect and mitigate bots:

  1. Defensible, real-time client-side data collection
    • Forces execution of the detection code
    • Strong obfuscation and polymorphism
    • Detects user behavior signals (mouse movements, etc.)
  2. Server-side anomaly detection
    • Statistical analysis
    • ML / AI
    • Rapid closed-loop feedback
  3. Innovative mitigation strategies
    • Invisible
    • Doesn’t block real users

JavaScript Virtual Machines become mainstream

JavaScript virtual machines (VMs) are the new norm. Over time, the industry was slowly awakening to this. Now JavaScript virtual machines are being increasingly leveraged to improve the defensibility of real-time client-side JavaScript execution.

Timeline: Google (2010), Shape (2017), Kasada (2020), Jscrambler (2022), Cloudflare (2023)

These vendors have all understood that whilst obfuscation is not a silver bullet, a JS VM is a necessary component to force bots to execute the detection code. They are complex, compute-intensive components that require specialist engineering skills. The development of a robust VM-based obfuscation platform is a journey along a tightrope with the competing requirements of security and performance jostling for priority.

Defensible JavaScript collection ensures that bot operators need to execute within a sandboxed environment, increasing both the cost and risk for the bot operators.

Polymorphic detection 

Bot mitigation systems need to be unpredictable and hard to automate against for resilience when run within a completely untrusted environment. Detection logic needs to change frequently. The mechanisms of execution need to change frequently. The penalty for detection needs to be severe and hard to predict. 

Metrics used to measure a bot mitigation’s resolve to improve include:

  • The frequency of breaking changes
  • The time and effort required for a bot dev to recover
  • The duration of a bot dev to persist 

The overall cost to operate a bot needs to be much greater as a result of the bot mitigation vendor’s efforts – so monetization can’t be easily achieved.

Mitigation strategies

Bot developers are experts at reverse engineering the defensive strategies used to detect them. Their efforts range from de-obfuscating code, code analysis, live testing, request reconstruction, and operational management. The infrastructure required to operate a bot at scale is significant, particularly when competing for limited-release items.

The early entrants’ focus on providing a range of mitigation strategies didn’t increase their efficacy. A bot dev will identify a slow response, a tarpit, or a block in the same way. Ultimately these features all suffer from the same limitation – once a bot dev has reversed them, the game is over. Mitigation strategies are also a challenging concept for many customers to tackle. A badly implemented mitigation strategy runs the risk of blocking legitimate end-users.

The most sophisticated customers of bot mitigation vendors are capable of consuming data and owning the mitigation step. For example, locking or banning the accounts associated with a bot can have a longer and more disruptive impact than blocking the request. Attacking the underlying trust between the layers of the bot ecosystem, in a targeted and persistent campaign, can achieve maximum return.

Using bidirectional data to bridge the divide between bot mitigation vendor and the protected customer reaps tremendous rewards. The combined strength of the signals at each layer can result in a far greater ability to gain ascendancy.

AI to the rescue?

Bot detection is a multi-dimensional, data-based game of anomaly detection in which artificial intelligence (AI) is viewed as the holy grail. However, detecting bots is a non-stationary problem. Bots evolve and bot devs continuously respond and retool. This means the training data used by bot mitigation AI systems needs to be updated constantly. Data from an hour ago is most likely obsolete, let alone the data from yesterday. 

Bot mitigation AI systems need to be constantly retrained. Equally the model needs to be general enough to detect new attacks. The main problem is how the bot mitigation vendor handles mistakes. The relationship between false positives and false negatives is not linear. The more that you reduce one at the expense of the other, the higher your overall error rate will be.

As previously discussed, poorly protected data collection results in fake data being generated by bots in order to pass as humans. These adversarial inputs are specifically created with the aim of being reliably misclassified in order to evade detection.

Therein lies the challenge for the bot mitigation and bot management industry and the primary reason that the industry is pivoting to JavaScript virtual machines: the shortcuts taken on the client-side collection components end up compromising the expensive data-driven systems used by AI.

Summary

Non-stationary problems are never solved. The key is to innovate and maintain an adversarial advantage, as the best anti-bot solutions evolve over time. Maintaining the commitment to solving a single problem often goes against business growth strategies. The challenge for organizations seeking bot detection services is to cut through the noise and identify the current innovators who are committed to solving their greatest challenges.

Want to learn more?

  • The New Mandate for Bot Detection – Ensuring Data Authenticity

    Can the data collected by an anti-bot system be trusted? Kasada's latest platform enhancements include securing the authenticity of web traffic data.

  • The Future of Web Scraping

    If data is the new oil, then web scraping is the new oil rig. The potential impact of web scraping is escalating as the twin forces of alternative data and AI training both rapidly increase in size and complexity.

Beat the bots without bothering your customers — see how.