Welcome to the first installment of our blog series dedicated to shedding light on the intricacies of bot mitigation. In this series, we will explore the fundamental systems and methodologies crucial for discerning between bots and humans. Whether you’re already utilizing a bot mitigation solution or in the process of evaluating one, understanding these components is paramount.
The 2024 Bot Mitigation Landscape
The bot mitigation industry is at a major inflection point. There is a transition occurring from the early entrants to a new group of emerging technologies. Bots have gained the upper hand in the battle with the established solutions. As a result, the companies that invest in these solutions have started voting with their dollars to replace their legacy systems.
Throughout this series, we will cover the wide range of components required to build and maintain a bot detection classification system. These systems are complex, as they must:
- Coexist with a wide range of surrounding platform components;
- Integrate efficiently, effectively, and seamlessly with customer applications; and
- Be built to proactively resist persistent adversarial reverse engineering efforts.
This blog details the core components of a bot mitigation solution. Understanding what is under the hood is important, even though many vendors present opaque systems to their customers.
Broadly speaking, all bot mitigation solutions do three things:
collect data → classify data → take an action
In this sense, the bot mitigation industry is primarily a data game, although much surrounding infrastructure is required to enable this game.
This translates to the three primary components of a solution:
- Data collection: A range of components that integrate into the customer’s applications in order to collect device, operating system, and app data.
- Classification: The component that consumes this data and applies a range of different data-based analysis techniques to detect bot traffic.
- Mitigation: The component that controls how humans and bots are treated.
Across a wide variety of approaches, these three things remain largely the same. Whilst it seems that everyone offers a bot solution these days, you can broadly generalise them in the following manner:
- Solutions that require human interaction: e.g. CAPTCHAs, puzzles, sliders, etc.
- Vendors: PerimeterX/ Human, Arkose Labs
- Solutions that do not require human interaction: e.g. Invisible challenges
- Vendors: Shape, Kasada
- Solutions that do not require client-side interrogation:
- Vendor: Netacea
Data Collection: The Foundation of Bot Detection
Data collection forms the lifeblood of any classification system. It involves gathering information from various sources, such as browsers or mobile apps, to discern between legitimate human users and bots. However, striking a balance between robustness against reverse engineering attacks and maintaining optimal user experience poses significant challenges.
Vendors need to invest significant engineering resources in building systems that satisfy two equally important – yet often competing – priorities:
The client-side data collection process must be both:
- Resistant to reverse engineering and the subsequent adversarial input attacks and/or data pollution attacks;
- Highly performant so as not to slow down the human user experience.
The process and purpose of data collection are similar between the different types of anti-bot provider models. Both CAPTCHA and CAPTCHA-less solutions collect data, which is then submitted to an API endpoint for classification.
As will be discussed in more detail in a subsequent blog, the need to provide validation that data was collected in real-time is an element of data collection that divides the industry. This is the fundamental purpose of the CAPTCHA—the act of solving the puzzle serves as a signal that a human is interacting with the device. Kasada introduced the first invisible real-time validation solution in 2022. The need to provide real-time validation of data collection is a critical part of shaping the toolkits used by bot developers.
Most vendors provide SDKs to facilitate the data collection process. The SDK is designed to bring order to the unruly world of the web browser and communicate with the APIs that deliver detection logic and receive telemetry data. The integration components of a solution are critical to its ongoing success. The world is complex, with a wide variety of devices, browsers, and operating environments. Vendors need to accommodate this by designing lightweight, performant solutions.
The engineering challenges of building a bot mitigation solution involve balancing several conflicting requirements. The greatest of these challenges is the objective of building a system that delivers the optimum human experience while also being resistant to adversarial reverse engineering. The ability to force a bot developer into a browser and spot attempts to serve fake data is what truly differentiates the market in 2024.
Finding the right balance in this system is critical. In some cases, the answer depends on the sophistication of your adversaries. There is no free lunch in bot detection.
Not all vendors are the same when it comes to data collection. Whilst everyone claims to have the industry’s best detection, this is a great opportunity to really differentiate in the market.
Classification: The Art of Identifying the Threat
Classification systems employ various techniques, including AI/ML and static detection logic, to differentiate between human and bot traffic. While static detection logic offers simplicity and performance, adversaries continually evolve their tactics, necessitating robust mechanisms to combat evasion techniques such as adversarial inputs and data poisoning.
Whilst artificial intelligence (AI) and machine learning (ML) dominate the market in vendor land, the reality is most classification systems use a large amount of static detection logic.
Static detection logic:
- Is performant and safe;
- Can be easily reasoned and accurately measured;
- Is also architecturally simple and cost-effective.
There is a lot to like about static detection logic – until you consider the crafty minds of your adversaries. Bot detection is a non-stationary problem. The world is full of threat actors who view your latest static detection logic only as a temporary barrier.
The key strategies used by bot developers to evade detection include:
- Adversarial inputs: Specifically crafted telemetry submissions that have been developed with the aim of being reliably misclassified in order to obtain a human token.
- Data poisoning: Intentionally feeding adversarial data to the classification system in order to skew the model and shift the boundary of what is considered good vs. bad.
- Model stealing: “Blackbox probing” allows a bot dev to duplicate detection models.
There is an art to crafting telemetry payloads to ensure the bot evades detection. Therefore, it is essential to develop hardened data collection components that make it hard for attackers to perform telemetry manipulation. Key components of this module include:
- Limiting exposure to detection logic leakage;
- Limiting telemetry submission manipulation;
- Combining various detection mechanisms to make it harder to bypass the overall system.
The classification of bots is ultimately a game of anomaly detection that lends itself to something beyond static detection.
Using a classifier to block attacks – whilst maintaining security and usability – is challenging beacause you need a mechanism to handle mistakes and deal with uncertainty.
The primary challenges include:
- Striking the right balance between false positives and false negatives;
- The explainability of mistakes.
Most AI classifiers attribute a score based on the information provided and other signals, which represents the likelihood that a request was sent from a bot.
Balancing your classifier’s error rates deeply impacts the security and usability of the overall system. This is most often achieved by adjusting the classifier’s sensitivity and specificity. This process results in a preference for caution (favouring reducing false positives) or optimism (favouring reducing false negatives).
It is important to note that the relationship between a false positive and a false negative is not linear. The more that you reduce one at the expense of the other, the higher your overall error rate will be.
In bot mitigation, the margin for error is nonexistent and both error types are costly. As is often the case, some errors are more costly than others and this requires customer impact assessment.
At Kasada, we conduct the following mechanisms when assessing our machine learning (ML) classifier:
- Manual reviews: We run frequent manual assessments that examine the margins of each classifier model. This allows us to effectively reverse engineer the detection logic.
- Adjust false positive and false negative rates: Each model is tuneable, and their application is customisable across customers.
- Catch-up mechanisms: We leverage bidirectional data-sharing flows with customers, which enables us to retrospectively enforce misclassified outcomes.
Being able to explain why something was classified is an equally important part of the process.
Some of the key mechanisms used to explain classification results include:
- Similarity to known bot signals – this is useful for detecting outliers, variants, and emerging versions of bot toolkits;
- Using a range of specialised models that target specific bot techniques;
- Interpreting the inner workings of the model.
The primary goal is to build an intentionally balanced model that handles errors in a safe and explainable way.
We will cover false positives / false negatives in greater detail in a subsequent blog.
Mitigation: Combatting Automated Fraud and Bot Attacks
The bot mitigation industry is truly divided on how to deal with mitigation.
- Human challenges – Examples include: PerimeterX/ Human, Arkose Labs
- Invisible challenges – Examples include: Shape, Kasada
Ultimately, the primary objective of both groups is to validate the integrity of the data that feeds the classifier.
A CAPTCHA puzzle requires human interaction to be completed whilst the data is collected. The alternative model uses other mechanisms, beyond the scope of this blog, to achieve the same real-time validation.
Real-time validation of data collection is necessary to prevent telemetry manipulation attacks.
Mitigation ultimately requires that a bot is blocked from achieving its objective. The challenge with mitigation is that bot devs are experts at knowing when they have been detected. Most bots are built with kill switches that take the operation offline as soon as they receive any indicator of detection. This is as true for block, tarpit, or delayed response.
As a general rule, most sophisticated bots can be mapped to some form of monetisation. Ticket bots, eCommerce bots, credential stuffing bots and scraping bots can all be mapped to some form of downstream monetisation. As a result, a successfully mitigated bot will rapidly be replaced with a new and improved version. It’s often said to be a game of cat and mouse, but in reality, it’s more like a game of three-dimensional chess.
It’s not uncommon for sophisticated bot operations to manage multiple variants of their bot simultaneously to prevent disruption. In the context of a hype drop in the eCommerce space, it’s also very common to see bots update their code multiple times in the hours leading up to the event. So, rather than focus on a bot mitigation provider’s mitigation options, you should focus on their ability to pivot and deal with rapidly evolving adversaries.
Key Considerations and Next Steps
As you navigate the landscape of bot detection and mitigation solutions, consider the following questions:
- Data Collection: How does the vendor address the dual challenges of resilience against reverse engineering and replay attacks while optimizing end-user experience?
- Classification: What mechanisms does the solution employ to combat evasion tactics like adversarial inputs and data poisoning?
- Mitigation: How does the solution adapt to rapidly evolving bot threats and maintain effectiveness in real-time mitigation?
Stay tuned for the next blog in our series, where we delve into the importance of classification accuracy.
If you have questions in the meantime, feel free to reach out to me personally on LinkedIn, get a personalised snapshot for your organization, or request a demo with our team of bot experts.