Bot Management
Bots generate nearly half of all internet traffic. While many bots serve legitimate purposes like search engine crawling and content aggregation, others originate from malicious sources. Bot management encompasses both observing and controlling all bot traffic. A key component of this is bot protection, which focuses specifically on mitigating risks from automated threats that scrape content, attempt unauthorized logins, or overload servers.
Bot management systems analyze incoming traffic to identify and classify requests based on their source and intent. This includes:
- Verifying and allowing legitimate bots that correctly identify themselves
- Monitoring bot traffic patterns and resource consumption
- Detecting and challenging suspicious traffic that behaves abnormally
- Enforcing browser-like behavior by verifying navigation patterns and cache usage
To effectively manage bot traffic and protect against harmful bots, various techniques are used, including:
- Signature-based detection: Inspecting HTTP requests for known bot signatures
- Rate limiting: Restricting how often certain actions can be performed to prevent abuse
- Challenges: Using JavaScript checks to verify human presence
- Behavioral analysis: Detecting unusual patterns in user activity that suggest automation
With Vercel, you can use:
- Managed rulesets to challenge specific bot traffic
- Rate limiting and challenge actions with WAF custom rules to prevent bot activity from reaching your application
- Observability and Firewall to monitor bot patterns, traffic sources, and the effectiveness of your bot management strategies
With Vercel, you can use the bot filter managed ruleset to challenge non-browser traffic from accessing your applications. It filters out automated threats while allowing legitimate traffic.
- It identifies clients that violate browser-like behavior and serves a javascript challenge to them.
- It prevents requests that falsely claim to be from a browser such as a
curl
request identifying as Chrome. - It automatically excludes verified bots, such as Google's crawler, from evaluation.
To learn more about how the ruleset works, review the Challenge section of Firewall actions. To understand the details of what get logged and how to monitor your traffic, review Firewall Observability.
For trusted automated traffic, you can create custom WAF rules with bypass actions that will allow this traffic to skip the bot filter ruleset.
You can apply the ruleset to your project in log or challenge mode. Learn how to configure the bot filter managed ruleset.
AI bots managed ruleset is available on all plans
Vercel's AI bots managed ruleset allows you to control traffic from AI bots that crawl your site for training data, search purposes, or user-generated fetches.
- It identifies and filters requests from known AI crawlers and bots.
- It provides options to log or deny these requests based on your preferences.
- The list of known AI bots is automatically maintained and updated by Vercel.
When new AI bots emerge, they are automatically added to Vercel's managed list and will be handled according to your existing configured action without requiring any changes on your part.
You can apply the ruleset to your project in log or deny mode. Learn how to configure the AI bots managed ruleset.
Vercel maintains and continuously updates a comprehensive directory of known legitimate bots from across the internet. This directory is regularly updated to include new legitimate services as they emerge. Attack Challenge Mode and bot filter automatically recognize and allow these bots to pass through without being challenged. You can block access to some or all of these bots by writing WAF custom rules with the User Agent match condition. To learn how to do this, review WAF Examples.
Submit a bot request if you are a SaaS provider and would like to be added to this list.
Bot name | Description | Documentation |
---|---|---|
AdIdxBot | AdIdxBot is the crawler used by Bing Ads for quality control of ads and their destination websites. It has multiple user agent variants including desktop, iPhone, and Windows Phone versions. | View |
AdsBot-Google | AdsBot-Google is Google's web crawler used for quality control of Google Ads. | View |
Adsense | The AdSense crawler visits participating sites in order to provide them with relevant ads. | View |
Adyen Webhook | Adyen’s webhooks (Notification API) send encrypted, real-time HTTP callbacks for key payment and account events—automating order fulfillment, settlement reconciliation, and risk-management workflows. | View |
AhrefsBot | Powers the database for both Ahrefs, a marketing intelligence platform, and Yep, an independent, privacy-focused search engine. | View |
AhrefsSiteAudit | Powers Ahrefs’ Site Audit tool. Ahrefs users can use Site Audit to analyze websites and find both technical SEO and on-page SEO issues. | View |
AI2Bot | AI2Bot is operated by the Allen Institute for Artificial Intelligence (Ai2) to crawl the web for content to train open-source AI models. It is used to index academic publications and web content for research purposes. | View |
aiHitBot | aiHitBot collects and maintains historical information about companies. It gathers data from company websites to build comprehensive company profiles, including changes in company executives and other historical information. | View |
Algolia | The Algolia Crawler extracts content from your site and makes it searchable. | View |
Amazon Kendra | Amazon Kendra is a managed information retrieval and intelligent search service that uses natural language processing and advanced deep learning model. | View |
Amazon Q | Amazon Q Business is a generative artificial intelligence (generative AI)-powered assistant that you can tailor to your business needs. | View |
Amazonbot | Amazonbot is Amazon's web crawler used to improve our services, such as enabling Alexa to more accurately answer questions for customers. | View |
Amazon Product Discovery | Amazon's web crawler used to collect publicly available product details from Amazon Selling Partner websites to help improve the accuracy and completeness of product information on Amazon. This helps ensure that Amazon customers see correct and complete information to help them in their shopping journey. | View |
Amazon Seller Initiated Listing | Amazon's web crawler that helps sellers succeed by giving them the option to provide a URL to a website and create high-quality product pages in Amazon's store. This bot crawls seller-provided URLs to collect product information for listing creation. | View |
APIs-Google | Crawling preferences addressed to the APIs-Google user agent affect the delivery of push notification messages by Google APIs. | View |
Apple Podcasts | Apple Podcasts crawler that only accesses URLs associated with registered content on Apple Podcasts. Does not follow robots.txt. | View |
Applebot | Applebot powers search features in Apple's ecosystem (Spotlight, Siri, Safari) and may be used to train Apple's foundation models for generative AI features. | View |
Artemis Web Crawler | Artemis is a calm web reader with which you can follow websites and blogs. | View |
Awario Bot | Awario's web crawler used to discover and collect new and updated web data for their social media monitoring and brand mention tracking platform. The crawler helps Internet marketers find who is mentioning their brand online. | View |
Awario RSS Bot | One of Awario's primary web crawlers specialized in collecting RSS feed data. | View |
Awario Smart Bot | One of Awario's primary web crawlers that discovers and collects new and updated web data. | View |
BaiduSpider | Baiduspider is Baidu’s web crawler that indexes websites for inclusion in its Chinese-market search results. | View |
Barkrowler | Barkrowler is Babbar's web crawler that fuels and updates their graph representation of the web, providing SEO tools for the marketing community. | View |
Better Stack | Better Stack is a platform for monitoring and alerting on your applications. | View |
Bingbot | Bingbot is Microsoft's web crawler used for indexing websites for Bing Search. | View |
BLEXBot | BLEXBot is SE Ranking's web crawler that helps analyze websites for SEO purposes, including backlink analysis, rank tracking, and website auditing. The bot is part of SE Ranking's all-in-one SEO platform used by marketing professionals and agencies. | View |
Brightbot | Brightbot is Bright Data's crawler layer that monitors the health of websites and enforces ethical web data collection. It prevents access to non-public information and blocks interactive endpoints that could be abused, acting as a guardian for ethical data collection. | View |
Bytespider | Bytespider is ByteDance's web crawler used to gather training data for their AI large language models. It's primarily used to scrape web content to train TikTok's AI features and other ByteDance AI products. | View |
CCBot | CCBot is operated by the Common Crawl Foundation to crawl web content for AI training and research. Common Crawl is a non-profit organization that maintains an open repository of web crawl data that is universally accessible for research and analysis. | View |
CensysInspectBot | Censys Inspect is a web crawler operated by Censys that performs internet-wide scanning to discover, monitor, and analyze publicly accessible devices and services. The crawler follows best practices, only accesses public-facing services, and respects robots.txt directives. | View |
ChatGPT-User | Handles user-initiated requests in ChatGPT, accessing external content to provide real-time information; not used for automated crawling or AI training. | View |
Checkly | Checkly is a platform for monitoring and alerting on your applications. | View |
Chrome Lighthouse | PageSpeed Insights (PSI) reports on the user experience of a page on both mobile and desktop devices, and provides suggestions on how that page may be improved. | View |
Chrome Privacy Preserving Prefetch Proxy | Chrome's Privacy Preserving Prefetch Proxy service that fetches /.well-known/traffic-advice to enable privacy-preserving prefetch hints. | View |
ClarityBot | ClarityBot is seoClarity's web crawler that performs technical SEO audits, analyzes content, and monitors website performance. The bot respects robots.txt directives and crawl delays, and can be configured by seoClarity clients to control crawl speed and frequency. | View |
Claude-SearchBot | Claude-SearchBot navigates the web to improve search result quality for users. It analyzes online content specifically to enhance the relevance and accuracy of search responses. | View |
Claude-User | Claude-User supports Claude AI users. When individuals ask questions to Claude, it may access websites using a Claude-User agent. | View |
ClaudeBot | ClaudeBot helps enhance the utility and safety of our generative AI models by collecting web content that could potentially contribute to their training. | View |
ContentKingBot | ContentKing (now Conductor Website Monitoring) is a website monitoring tool that continuously audits websites to help improve their performance and visibility. It makes HTTP GET requests to monitor websites' SEO, content changes, and technical health. | View |
Cookiebot | Cookiebot automates compliance with cookie laws and helps you manage your cookie consent preferences. | View |
CookieScript | A cookie scanning bot that examines websites for cookie usage to help maintain GDPR and other privacy regulation compliance. | View |
Cotoyogi | Cotoyogi is a web crawler operated by the Center for Research and Development on Data Lake, ROIS-DS (Research Organization of Information and Systems - Data Science) for collecting Japanese language data resources. | View |
Coveobot | Coveobot is a crawler operated by Coveo that indexes content for enterprise search, recommendations, and generative experience platforms. The bot crawls and analyzes both structured and unstructured content to enable unified search experiences across multiple data sources. | View |
CriteoBot | CriteoBot is a crawler operated by Criteo that analyzes web content to serve relevant contextual ads. The bot respects robots.txt directives and crawl delays, and only accesses publicly available content. | View |
Datadog Synthetic Monitoring Robot | Datadog's automated monitoring service that performs synthetic tests to verify website availability and performance. | View |
DataForSeoBot | DataForSeoBot is a backlink checker bot operated by DataForSEO that crawls websites to build and maintain their backlink database. The bot respects robots.txt directives and crawl delays, and is used to provide SEO data and analytics services. | View |
Detectify | Detectify is a web security scanner that performs automated security tests on web applications and attack surface monitoring. | View |
DigitalOceanUptimeBot | DigitalOcean Uptime is a monitoring service that checks the health of any URL or IP address. The probe performs checks from multiple global regions to monitor latency, uptime, and SSL certificates of websites and hosts. | View |
Discord Bot | Discord's link preview bot that crawls URLs shared in Discord chats to generate rich previews. | View |
DotBot | DotBot is a web crawler operated by Moz (formerly SEOmoz) that collects data for their Link Explorer tool and Links API. It helps build Moz's link intelligence database which powers their Domain Authority and Page Authority metrics. | View |
DuckAssistBot | DuckAssistBot is a web crawler for DuckDuckGo Search that crawls pages in real-time for AI-assisted answers, which prominently cite their sources. This data is not used in any way to train AI models. | View |
DuckDuckBot | DuckDuckBot is a web crawler for DuckDuckGo. DuckDuckBot’s job is to constantly improve search results and offer users the best and most secure search experience possible. | View |
Facebook Webhooks | Facebook's webhook service that delivers real-time event notifications for Meta platform events and changes. | View |
FacebookExternalHit | Fetches content for shared links on Meta platforms to generate rich previews. | View |
Feedfetcher | Feedfetcher is used for crawling RSS or Atom feeds for Google News and PubSubHubbub. | View |
GeedoProductSearchBot | GeedoProductSearch is a web crawler operated by Geedo SIA that indexes product information from e-commerce websites. The crawler respects robots.txt directives and can be configured for crawl speed and behavior through standard crawl-delay settings. | View |
GitHub Camo | GitHub's image proxy service | View |
GitHub Hookshot | GitHub's webhooks for events like push, pull request, etc. | View |
Google-CloudVertexBot | Crawling preferences addressed to the Google-CloudVertexBot user agent affect crawls requested by the site owners' for building Vertex AI Agents. It has no effect on Google Search or other products. | View |
Google-Extended | Google-Extended is a standalone product token that web publishers can use to manage whether their sites help improve Gemini Apps and Vertex AI generative APIs, including future generations of models that power those products. Grounding with Google Search on Vertex AI does not use web pages for grounding that have disallowed Google-Extended. Google-Extended does not impact a site's inclusion or ranking in Google Search. | View |
Google-InspectionTool | Crawling preferences addressed to the Google-InspectionTool user agent affect Search testing tools such as the Rich Result Test and URL inspection in Search Console. It has no effect on Google Search or other products. | View |
Google PageRenderer | Upon user request, Google Page Renderer fetches and renders web pages. | View |
Google Publisher Center | Google Publisher Center fetches and processes feeds that publishers explicitly supplied for use in Google News landing pages. | View |
Google Read Aloud | Upon user request, Google Read Aloud fetches and reads out web pages using text-to-speech (TTS). | View |
Google-Safety | The Google-Safety user agent handles abuse-specific crawling, such as malware discovery for publicly posted links on Google properties. As such it's unaffected by crawling preferences. | View |
Google Site Verifier | Google Site Verifier fetches Search Console verification tokens. | View |
Google StoreBot | Crawling preferences addressed to the Storebot-Google user agent affect all surfaces of Google Shopping (for example, the Shopping tab in Google Search and Google Shopping). | View |
Googlebot | Crawling preferences addressed to the Googlebot user agent affect Google Search (including Discover and all Google Search features), as well as other products such as Google Images, Google Video, Google News, and Discover. | View |
GoogleOther | Crawling preferences addressed to the GoogleOther user agent don't affect any specific product. GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development. It has no effect on Google Search or other products. | View |
GoogleStackdriverMonitoringBot | GoogleStackdriverMonitoringBot is operated by Google Cloud to perform uptime checks and monitor availability of services. The bot sends HTTP/HTTPS requests from multiple global locations to verify service health and responsiveness. | View |
GPT-Actions | Enables ChatGPT to interact with external APIs and retrieve real-time information from the web in response to user-initiated requests; allows access to up-to-date content without being used for automated crawling or AI training. | View |
GPTBot | Crawls web content to improve OpenAI's generative AI models; respects 'robots.txt' directives to exclude sites from training data. | View |
HetrixTools Uptime Monitoring Bot | HetrixTools Uptime Monitoring Bot is used by HetrixTools's monitoring services to perform various checks on websites, including uptime and performance monitoring. | View |
Hookdeck | A reliable Event Gateway for event-driven applications | View |
Hydrozen | Hydrozen is a tool for monitoring availability of your websites, Cronjobs, APIs, Domains, SSL etc. | View |
IASBot | IAS (Integral Ad Science) crawler, formerly known as AdmantX, is used for analyzing web content to ensure brand safety and suitability for advertisers. The crawler helps assess content quality, context, and safety for digital advertising campaigns. | View |
ImagesiftBot | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support Hive's suite of web intelligence products. | View |
Inngest | Inngest is a platform for building event-driven applications. | View |
InternetMeasurementBot | InternetMeasurementBot is operated by driftnet.io to discover and measure services that network owners and operators have publicly exposed. The bot performs network measurements and service discovery without attempting to log in to systems or send spam. | View |
LinkedInBot | LinkedInBot is a bot that renders links shared on LinkedIn. | View |
LogRocketBot | LogRocket Asset Cacher is a bot that captures and caches web assets (CSS, JavaScript, images) to ensure proper playback of user sessions in LogRocket's session replay feature. The bot only accesses publicly available content when LogRocket needs to record sessions. | View |
Lumar | The Lumar website intelligence platform is used by SEO, engineering, marketing and digital operations teams to monitor the performance of their site’s technical health, and ensure a high-performing, revenue-driving website. | View |
meta-externalagent | The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly. | View |
meta-externalfetcher | The Meta-ExternalFetcher crawler performs user-initiated fetches of individual links to support specific product functions. Because the fetch was initiated by a user, this crawler may bypass robots.txt rules. | View |
MicrosoftPreview | MicrosoftPreview generates page snapshots for Microsoft products. It has desktop and mobile variants, with Chrome version dynamically updated to match the latest Microsoft Edge version. | View |
MJ12bot | MJ12bot is a web crawler operated by Majestic-12 Ltd, a UK-based company that builds a search engine focused on backlink analysis and web structure mapping. The crawler is part of a distributed community-based system that helps build Majestic's link intelligence database. | View |
adsnaver | Naver's ad crawler that periodically visits registered ad landing pages to collect on-page content for effective ad matching and ranking. It ignores robots.txt for URLs registered in the ad system. | View |
naver-blueno | Naver's preview-snippet crawler that fetches summary information (titles, descriptions, images) when users insert links in Naver services such as blogs or cafés. It operates on demand and respects robots.txt. | View |
naverbot | Naver's web crawler (also known as Yeti) is used by Naver, South Korea's largest search engine, to crawl and index web content. | View |
OAI-SearchBot | Indexes websites for inclusion in ChatGPT's search results; does not crawl content for AI model training. | View |
OhDearBot | OhDearBot is a monitoring bot operated by Oh Dear that performs uptime checks, broken link detection, and mixed content scanning. The bot follows standard crawling practices and throttles requests to minimize server impact. | View |
PayPal | PayPal delivers real-time event notifications for payments, subscriptions, and account updates. | View |
Perplexity-User | Handles user-initiated requests in Perplexity, accessing external content to provide real-time information; not used for automated crawling or AI training. | View |
PerplexityBot | Indexes websites for inclusion in Perplexity's search results; does not crawl content for AI model training. | View |
PetalBot | PetalBot is a web crawler operated by Huawei's Petal Search engine. It crawls both PC and mobile websites to build an index database for Petal search engine and to provide content recommendations for Huawei Assistant and AI Search services. | View |
Pingdom Bot | Pingdom Bot is used by Pingdom's monitoring services to perform various checks on websites, including uptime and performance monitoring. | View |
Pinterest Bot | Pinterest's web crawler that indexes content for their platform. It crawls websites to collect metadata for Pins, including images, titles, descriptions, and prices. The crawler also helps maintain Pin data accuracy and detect broken links. | View |
ProximicBot | Proximic is Comscore's web crawler that performs contextual content analysis to help advertisers determine the best matching campaigns for a page's content. The bot respects robots.txt, only downloads static textual content, and crawls at a controlled rate. | View |
PulsePoint Crawler | A web crawler used by PulsePoint, a digital advertising technology company, for content indexing and ads.txt verification. | View |
QStash | QStash is a platform for building event-driven applications. | View |
Razorpay-Webhook | Razorpay’s webhooks enable merchants to receive secure, real-time HTTP callbacks for key payment events—automating reconciliation, notifications, and downstream workflows. | View |
Amazon Route 53 Health Check Service | Amazon Route 53 Health Check Service | View |
SBIntuitionsBot | SBIntuitionsBot is a crawler operated by SB Intuitions Corp. that collects web data for AI development and information analysis. The bot follows RFC 9309 Robots Exclusion Protocol standards and can be controlled via robots.txt directives. | View |
ScreamingFrogBot | Screaming Frog SEO Spider is a website crawler used by SEO professionals for site audits and technical SEO analysis. It's a desktop-based tool that crawls websites' links, images, CSS, scripts and apps to evaluate onsite SEO. The crawler respects robots.txt and can be configured for crawl speed and behavior. | View |
SeekportBot | SeekportBot is the web crawler for Seekport, a German search engine operated by SISTRIX. The bot crawls and indexes web content while respecting robots.txt directives and crawl delays. | View |
SemanticScholarBot | The Semantic Scholar bot crawls domains to find academic PDFs. These PDFs are served on semanticscholar.org so researchers can discover and understand other academic accomplishments. | View |
Semrush Site Audit | Semrush Site Audit is a powerful website crawler that analyzes the health of a website by checking for on-page and technical SEO issues, including duplicate content, broken links, HTTPS implementation, hreflang attributes, and more. | View |
Semrush | Semrush is a platform for SEO, content marketing, competitor research, PPC and social media marketing. | View |
Sentry Uptime Monitoring Bot | Sentry's Uptime Monitoring Bot performs health checks on configured URLs to monitor the availability and reliability of web services. | View |
SeznamBot | SeznamBot is the web crawler operated by Seznam.cz, the leading Czech search engine. The bot crawls and indexes web content for Seznam's search results, respecting robots.txt directives and crawl delays. | View |
SISTRIX Optimizer Uptime | SISTRIX Optimizer Uptime bot performs continuous monitoring of website availability by checking the startpage once per minute. It is part of SISTRIX's SEO and website monitoring platform. | View |
Site24x7 | Site24x7 Bot is used by Site24x7's monitoring services to perform various checks on websites, including uptime and performance monitoring. | View |
Sitebulb | Sitebulb is a desktop and cloud-based website crawler used by SEO professionals for technical SEO audits. It analyzes websites to find technical issues, opportunities for improvement, and provides detailed reports with visualizations and prioritized recommendations. | View |
SlackLinkExpandingBot | Slackbot Link Expanding is a bot operated by Slack that fetches metadata from shared links to create rich previews. The bot uses HTTP Range headers to efficiently fetch only necessary metadata like oEmbed and Open Graph tags, and caches responses globally for about 30 minutes. | View |
Slackbot | Slackbot is Slack's default, general-purpose bot that handles various API requests and integrations. It is used for tasks not covered by specialized bots like ImgProxy or LinkExpanding, such as making API requests for service integrations or handling outgoing webhooks. | View |
Slack-ImgProxy | Slack-ImgProxy is a bot operated by Slack that fetches and caches images posted in Slack channels. The bot helps improve performance, ensures HTTPS delivery, and protects user privacy by hiding detailed referrer information. | View |
SnapchatAdsBot | SnapchatAdsBot is a crawler operated by Snapchat that verifies and analyzes websites for their advertising platform. The bot helps ensure content quality and safety for Snapchat's advertising ecosystem. | View |
SnapURLPreviewBot | SnapURLPreviewBot is a crawler operated by Snap Inc. that analyzes and generates previews of URLs shared on Snapchat and other Snap platforms. The bot helps ensure content quality and safety by validating URLs and generating preview metadata. | View |
StatusCake | StatusCake is a website monitoring service that checks the uptime and performance of your website. | View |
Stripe Webhooks | Stripe's webhook service that delivers real-time event notifications for payment processing and account updates. | View |
svix | svix is a webhook service for sending events to webhooks. | View |
TangibleeBot | TangibleeBot is a crawler operated by Tangiblee that collects product data from e-commerce websites to power their product visualization and virtual try-on services. The bot simulates single-visitor activity and crawls at an agreed-upon frequency to prevent disruption to website performance. | View |
TikTokSpider | TikTokSpider is a web crawler used by TikTok/ByteDance to index and analyze web content for their platform. It helps in content discovery, link previews, and data collection for TikTok's services. | View |
TTD-Content | TTD-Content is a crawler operated by The Trade Desk that verifies content and quality of ad placements for their demand-side platform. The bot helps ensure brand safety and ad verification by analyzing webpage content where ads may be displayed. | View |
Twitterbot | Fetches content for shared links on X/Twitter to generate rich previews. | View |
Uptime Robot | Uptime Robot is a platform for monitoring and alerting on your applications. | View |
UsercentricsBot | UsercentricsBot is operated by Usercentrics GmbH to scan websites for data processing services and third-party technologies. The bot helps ensure GDPR compliance by identifying services that need to be included in the website's Consent Management Platform (CMP). | View |
v0bot | Bot for v0 services. | View |
Vercel Favicon Bot | Vercel Favicon Bot | View |
vercelflags | vercel flags | View |
Vercel Screenshot Bot | Vercel Screenshot Bot | View |
verceltracing | vercel tracing | View |
Yahoo! Slurp | Yahoo! Slurp is the web crawler (robot) used by Yahoo! Search to discover and index web pages for its search engine. | View |
Yandexbot | YandexBot is a web crawler operated by Yandex, a major Russian search engine. | View |
YisouSpider | YisouSpider is a search engine crawler operated by Yisou that indexes web content for their search engine results. The crawler follows standard crawling practices and respects robots.txt directives. | View |
Was this helpful?