Bot Management

Bots generate nearly half of all internet traffic. While many bots serve legitimate purposes like search engine crawling and content aggregation, others originate from malicious sources. Bot management encompasses both observing and controlling all bot traffic. A key component of this is bot protection, which focuses specifically on mitigating risks from automated threats that scrape content, attempt unauthorized logins, or overload servers.

Bot management systems analyze incoming traffic to identify and classify requests based on their source and intent. This includes:

  • Verifying and allowing legitimate bots that correctly identify themselves
  • Monitoring bot traffic patterns and resource consumption
  • Detecting and challenging suspicious traffic that behaves abnormally
  • Enforcing browser-like behavior by verifying navigation patterns and cache usage

To effectively manage bot traffic and protect against harmful bots, various techniques are used, including:

  • Signature-based detection: Inspecting HTTP requests for known bot signatures
  • Rate limiting: Restricting how often certain actions can be performed to prevent abuse
  • Challenges: Using JavaScript checks to verify human presence
  • Behavioral analysis: Detecting unusual patterns in user activity that suggest automation

With Vercel, you can use:

  • Managed rulesets to challenge specific bot traffic
  • Rate limiting and challenge actions with WAF custom rules to prevent bot activity from reaching your application
  • Observability and Firewall to monitor bot patterns, traffic sources, and the effectiveness of your bot management strategies

Bot filter managed ruleset is available in Beta on all plans

With Vercel, you can use the bot filter managed ruleset to challenge non-browser traffic from accessing your applications. It filters out automated threats while allowing legitimate traffic.

  • It identifies clients that violate browser-like behavior and serves a javascript challenge to them.
  • It prevents requests that falsely claim to be from a browser such as a curl request identifying as Chrome.
  • It automatically excludes verified bots, such as Google's crawler, from evaluation.

To learn more about how the ruleset works, review the Challenge section of Firewall actions. To understand the details of what get logged and how to monitor your traffic, review Firewall Observability.

For trusted automated traffic, you can create custom WAF rules with bypass actions that will allow this traffic to skip the bot filter ruleset.

You can apply the ruleset to your project in log or challenge mode. Learn how to configure the bot filter managed ruleset.

AI bots managed ruleset is available on all plans

Vercel's AI bots managed ruleset allows you to control traffic from AI bots that crawl your site for training data, search purposes, or user-generated fetches.

  • It identifies and filters requests from known AI crawlers and bots.
  • It provides options to log or deny these requests based on your preferences.
  • The list of known AI bots is automatically maintained and updated by Vercel.

When new AI bots emerge, they are automatically added to Vercel's managed list and will be handled according to your existing configured action without requiring any changes on your part.

You can apply the ruleset to your project in log or deny mode. Learn how to configure the AI bots managed ruleset.

Vercel maintains and continuously updates a comprehensive directory of known legitimate bots from across the internet. This directory is regularly updated to include new legitimate services as they emerge. Attack Challenge Mode and bot filter automatically recognize and allow these bots to pass through without being challenged. You can block access to some or all of these bots by writing WAF custom rules with the User Agent match condition. To learn how to do this, review WAF Examples.

Submit a bot request if you are a SaaS provider and would like to be added to this list.

Bot nameDescriptionDocumentation
AdIdxBotAdIdxBot is the crawler used by Bing Ads for quality control of ads and their destination websites. It has multiple user agent variants including desktop, iPhone, and Windows Phone versions.View
AdsBot-GoogleAdsBot-Google is Google's web crawler used for quality control of Google Ads.View
AdsenseThe AdSense crawler visits participating sites in order to provide them with relevant ads.View
Adyen WebhookAdyen’s webhooks (Notification API) send encrypted, real-time HTTP callbacks for key payment and account events—automating order fulfillment, settlement reconciliation, and risk-management workflows.View
AhrefsBotPowers the database for both Ahrefs, a marketing intelligence platform, and Yep, an independent, privacy-focused search engine.View
AhrefsSiteAuditPowers Ahrefs’ Site Audit tool. Ahrefs users can use Site Audit to analyze websites and find both technical SEO and on-page SEO issues.View
AI2BotAI2Bot is operated by the Allen Institute for Artificial Intelligence (Ai2) to crawl the web for content to train open-source AI models. It is used to index academic publications and web content for research purposes.View
aiHitBotaiHitBot collects and maintains historical information about companies. It gathers data from company websites to build comprehensive company profiles, including changes in company executives and other historical information.View
AlgoliaThe Algolia Crawler extracts content from your site and makes it searchable.View
Amazon KendraAmazon Kendra is a managed information retrieval and intelligent search service that uses natural language processing and advanced deep learning model.View
Amazon QAmazon Q Business is a generative artificial intelligence (generative AI)-powered assistant that you can tailor to your business needs.View
AmazonbotAmazonbot is Amazon's web crawler used to improve our services, such as enabling Alexa to more accurately answer questions for customers.View
Amazon Product DiscoveryAmazon's web crawler used to collect publicly available product details from Amazon Selling Partner websites to help improve the accuracy and completeness of product information on Amazon. This helps ensure that Amazon customers see correct and complete information to help them in their shopping journey.View
Amazon Seller Initiated ListingAmazon's web crawler that helps sellers succeed by giving them the option to provide a URL to a website and create high-quality product pages in Amazon's store. This bot crawls seller-provided URLs to collect product information for listing creation.View
APIs-GoogleCrawling preferences addressed to the APIs-Google user agent affect the delivery of push notification messages by Google APIs.View
Apple PodcastsApple Podcasts crawler that only accesses URLs associated with registered content on Apple Podcasts. Does not follow robots.txt.View
ApplebotApplebot powers search features in Apple's ecosystem (Spotlight, Siri, Safari) and may be used to train Apple's foundation models for generative AI features.View
Artemis Web CrawlerArtemis is a calm web reader with which you can follow websites and blogs.View
Awario BotAwario's web crawler used to discover and collect new and updated web data for their social media monitoring and brand mention tracking platform. The crawler helps Internet marketers find who is mentioning their brand online.View
Awario RSS BotOne of Awario's primary web crawlers specialized in collecting RSS feed data.View
Awario Smart BotOne of Awario's primary web crawlers that discovers and collects new and updated web data.View
BaiduSpiderBaiduspider is Baidu’s web crawler that indexes websites for inclusion in its Chinese-market search results.View
BarkrowlerBarkrowler is Babbar's web crawler that fuels and updates their graph representation of the web, providing SEO tools for the marketing community.View
Better StackBetter Stack is a platform for monitoring and alerting on your applications.View
BingbotBingbot is Microsoft's web crawler used for indexing websites for Bing Search.View
BLEXBotBLEXBot is SE Ranking's web crawler that helps analyze websites for SEO purposes, including backlink analysis, rank tracking, and website auditing. The bot is part of SE Ranking's all-in-one SEO platform used by marketing professionals and agencies.View
BrightbotBrightbot is Bright Data's crawler layer that monitors the health of websites and enforces ethical web data collection. It prevents access to non-public information and blocks interactive endpoints that could be abused, acting as a guardian for ethical data collection.View
BytespiderBytespider is ByteDance's web crawler used to gather training data for their AI large language models. It's primarily used to scrape web content to train TikTok's AI features and other ByteDance AI products.View
CCBotCCBot is operated by the Common Crawl Foundation to crawl web content for AI training and research. Common Crawl is a non-profit organization that maintains an open repository of web crawl data that is universally accessible for research and analysis.View
CensysInspectBotCensys Inspect is a web crawler operated by Censys that performs internet-wide scanning to discover, monitor, and analyze publicly accessible devices and services. The crawler follows best practices, only accesses public-facing services, and respects robots.txt directives.View
ChatGPT-UserHandles user-initiated requests in ChatGPT, accessing external content to provide real-time information; not used for automated crawling or AI training.View
ChecklyCheckly is a platform for monitoring and alerting on your applications.View
Chrome LighthousePageSpeed Insights (PSI) reports on the user experience of a page on both mobile and desktop devices, and provides suggestions on how that page may be improved.View
Chrome Privacy Preserving Prefetch ProxyChrome's Privacy Preserving Prefetch Proxy service that fetches /.well-known/traffic-advice to enable privacy-preserving prefetch hints.View
ClarityBotClarityBot is seoClarity's web crawler that performs technical SEO audits, analyzes content, and monitors website performance. The bot respects robots.txt directives and crawl delays, and can be configured by seoClarity clients to control crawl speed and frequency.View
Claude-SearchBotClaude-SearchBot navigates the web to improve search result quality for users. It analyzes online content specifically to enhance the relevance and accuracy of search responses.View
Claude-UserClaude-User supports Claude AI users. When individuals ask questions to Claude, it may access websites using a Claude-User agent.View
ClaudeBotClaudeBot helps enhance the utility and safety of our generative AI models by collecting web content that could potentially contribute to their training.View
ContentKingBotContentKing (now Conductor Website Monitoring) is a website monitoring tool that continuously audits websites to help improve their performance and visibility. It makes HTTP GET requests to monitor websites' SEO, content changes, and technical health.View
CookiebotCookiebot automates compliance with cookie laws and helps you manage your cookie consent preferences.View
CookieScriptA cookie scanning bot that examines websites for cookie usage to help maintain GDPR and other privacy regulation compliance.View
CotoyogiCotoyogi is a web crawler operated by the Center for Research and Development on Data Lake, ROIS-DS (Research Organization of Information and Systems - Data Science) for collecting Japanese language data resources.View
CoveobotCoveobot is a crawler operated by Coveo that indexes content for enterprise search, recommendations, and generative experience platforms. The bot crawls and analyzes both structured and unstructured content to enable unified search experiences across multiple data sources.View
CriteoBotCriteoBot is a crawler operated by Criteo that analyzes web content to serve relevant contextual ads. The bot respects robots.txt directives and crawl delays, and only accesses publicly available content.View
Datadog Synthetic Monitoring RobotDatadog's automated monitoring service that performs synthetic tests to verify website availability and performance.View
DataForSeoBotDataForSeoBot is a backlink checker bot operated by DataForSEO that crawls websites to build and maintain their backlink database. The bot respects robots.txt directives and crawl delays, and is used to provide SEO data and analytics services.View
DetectifyDetectify is a web security scanner that performs automated security tests on web applications and attack surface monitoring.View
DigitalOceanUptimeBotDigitalOcean Uptime is a monitoring service that checks the health of any URL or IP address. The probe performs checks from multiple global regions to monitor latency, uptime, and SSL certificates of websites and hosts.View
Discord BotDiscord's link preview bot that crawls URLs shared in Discord chats to generate rich previews.View
DotBotDotBot is a web crawler operated by Moz (formerly SEOmoz) that collects data for their Link Explorer tool and Links API. It helps build Moz's link intelligence database which powers their Domain Authority and Page Authority metrics.View
DuckAssistBotDuckAssistBot is a web crawler for DuckDuckGo Search that crawls pages in real-time for AI-assisted answers, which prominently cite their sources. This data is not used in any way to train AI models.View
DuckDuckBotDuckDuckBot is a web crawler for DuckDuckGo. DuckDuckBot’s job is to constantly improve search results and offer users the best and most secure search experience possible.View
Facebook WebhooksFacebook's webhook service that delivers real-time event notifications for Meta platform events and changes.View
FacebookExternalHitFetches content for shared links on Meta platforms to generate rich previews.View
FeedfetcherFeedfetcher is used for crawling RSS or Atom feeds for Google News and PubSubHubbub.View
GeedoProductSearchBotGeedoProductSearch is a web crawler operated by Geedo SIA that indexes product information from e-commerce websites. The crawler respects robots.txt directives and can be configured for crawl speed and behavior through standard crawl-delay settings.View
GitHub CamoGitHub's image proxy serviceView
GitHub HookshotGitHub's webhooks for events like push, pull request, etc.View
Google-CloudVertexBotCrawling preferences addressed to the Google-CloudVertexBot user agent affect crawls requested by the site owners' for building Vertex AI Agents. It has no effect on Google Search or other products.View
Google-ExtendedGoogle-Extended is a standalone product token that web publishers can use to manage whether their sites help improve Gemini Apps and Vertex AI generative APIs, including future generations of models that power those products. Grounding with Google Search on Vertex AI does not use web pages for grounding that have disallowed Google-Extended. Google-Extended does not impact a site's inclusion or ranking in Google Search.View
Google-InspectionToolCrawling preferences addressed to the Google-InspectionTool user agent affect Search testing tools such as the Rich Result Test and URL inspection in Search Console. It has no effect on Google Search or other products.View
Google PageRendererUpon user request, Google Page Renderer fetches and renders web pages.View
Google Publisher CenterGoogle Publisher Center fetches and processes feeds that publishers explicitly supplied for use in Google News landing pages.View
Google Read AloudUpon user request, Google Read Aloud fetches and reads out web pages using text-to-speech (TTS).View
Google-SafetyThe Google-Safety user agent handles abuse-specific crawling, such as malware discovery for publicly posted links on Google properties. As such it's unaffected by crawling preferences.View
Google Site VerifierGoogle Site Verifier fetches Search Console verification tokens.View
Google StoreBotCrawling preferences addressed to the Storebot-Google user agent affect all surfaces of Google Shopping (for example, the Shopping tab in Google Search and Google Shopping).View
GooglebotCrawling preferences addressed to the Googlebot user agent affect Google Search (including Discover and all Google Search features), as well as other products such as Google Images, Google Video, Google News, and Discover.View
GoogleOtherCrawling preferences addressed to the GoogleOther user agent don't affect any specific product. GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development. It has no effect on Google Search or other products.View
GoogleStackdriverMonitoringBotGoogleStackdriverMonitoringBot is operated by Google Cloud to perform uptime checks and monitor availability of services. The bot sends HTTP/HTTPS requests from multiple global locations to verify service health and responsiveness.View
GPT-ActionsEnables ChatGPT to interact with external APIs and retrieve real-time information from the web in response to user-initiated requests; allows access to up-to-date content without being used for automated crawling or AI training.View
GPTBotCrawls web content to improve OpenAI's generative AI models; respects 'robots.txt' directives to exclude sites from training data.View
HetrixTools Uptime Monitoring BotHetrixTools Uptime Monitoring Bot is used by HetrixTools's monitoring services to perform various checks on websites, including uptime and performance monitoring.View
HookdeckA reliable Event Gateway for event-driven applicationsView
HydrozenHydrozen is a tool for monitoring availability of your websites, Cronjobs, APIs, Domains, SSL etc.View
IASBotIAS (Integral Ad Science) crawler, formerly known as AdmantX, is used for analyzing web content to ensure brand safety and suitability for advertisers. The crawler helps assess content quality, context, and safety for digital advertising campaigns.View
ImagesiftBotImageSiftBot is a web crawler that scrapes the internet for publicly available images to support Hive's suite of web intelligence products.View
InngestInngest is a platform for building event-driven applications.View
InternetMeasurementBotInternetMeasurementBot is operated by driftnet.io to discover and measure services that network owners and operators have publicly exposed. The bot performs network measurements and service discovery without attempting to log in to systems or send spam.View
LinkedInBotLinkedInBot is a bot that renders links shared on LinkedIn.View
LogRocketBotLogRocket Asset Cacher is a bot that captures and caches web assets (CSS, JavaScript, images) to ensure proper playback of user sessions in LogRocket's session replay feature. The bot only accesses publicly available content when LogRocket needs to record sessions.View
LumarThe Lumar website intelligence platform is used by SEO, engineering, marketing and digital operations teams to monitor the performance of their site’s technical health, and ensure a high-performing, revenue-driving website.View
meta-externalagentThe Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.View
meta-externalfetcherThe Meta-ExternalFetcher crawler performs user-initiated fetches of individual links to support specific product functions. Because the fetch was initiated by a user, this crawler may bypass robots.txt rules.View
MicrosoftPreviewMicrosoftPreview generates page snapshots for Microsoft products. It has desktop and mobile variants, with Chrome version dynamically updated to match the latest Microsoft Edge version.View
MJ12botMJ12bot is a web crawler operated by Majestic-12 Ltd, a UK-based company that builds a search engine focused on backlink analysis and web structure mapping. The crawler is part of a distributed community-based system that helps build Majestic's link intelligence database.View
adsnaverNaver's ad crawler that periodically visits registered ad landing pages to collect on-page content for effective ad matching and ranking. It ignores robots.txt for URLs registered in the ad system.View
naver-bluenoNaver's preview-snippet crawler that fetches summary information (titles, descriptions, images) when users insert links in Naver services such as blogs or cafés. It operates on demand and respects robots.txt.View
naverbotNaver's web crawler (also known as Yeti) is used by Naver, South Korea's largest search engine, to crawl and index web content.View
OAI-SearchBotIndexes websites for inclusion in ChatGPT's search results; does not crawl content for AI model training.View
OhDearBotOhDearBot is a monitoring bot operated by Oh Dear that performs uptime checks, broken link detection, and mixed content scanning. The bot follows standard crawling practices and throttles requests to minimize server impact.View
PayPalPayPal delivers real-time event notifications for payments, subscriptions, and account updates.View
Perplexity-UserHandles user-initiated requests in Perplexity, accessing external content to provide real-time information; not used for automated crawling or AI training.View
PerplexityBotIndexes websites for inclusion in Perplexity's search results; does not crawl content for AI model training.View
PetalBotPetalBot is a web crawler operated by Huawei's Petal Search engine. It crawls both PC and mobile websites to build an index database for Petal search engine and to provide content recommendations for Huawei Assistant and AI Search services.View
Pingdom BotPingdom Bot is used by Pingdom's monitoring services to perform various checks on websites, including uptime and performance monitoring.View
Pinterest BotPinterest's web crawler that indexes content for their platform. It crawls websites to collect metadata for Pins, including images, titles, descriptions, and prices. The crawler also helps maintain Pin data accuracy and detect broken links.View
ProximicBotProximic is Comscore's web crawler that performs contextual content analysis to help advertisers determine the best matching campaigns for a page's content. The bot respects robots.txt, only downloads static textual content, and crawls at a controlled rate.View
PulsePoint CrawlerA web crawler used by PulsePoint, a digital advertising technology company, for content indexing and ads.txt verification.View
QStashQStash is a platform for building event-driven applications.View
Razorpay-WebhookRazorpay’s webhooks enable merchants to receive secure, real-time HTTP callbacks for key payment events—automating reconciliation, notifications, and downstream workflows.View
Amazon Route 53 Health Check ServiceAmazon Route 53 Health Check ServiceView
SBIntuitionsBotSBIntuitionsBot is a crawler operated by SB Intuitions Corp. that collects web data for AI development and information analysis. The bot follows RFC 9309 Robots Exclusion Protocol standards and can be controlled via robots.txt directives.View
ScreamingFrogBotScreaming Frog SEO Spider is a website crawler used by SEO professionals for site audits and technical SEO analysis. It's a desktop-based tool that crawls websites' links, images, CSS, scripts and apps to evaluate onsite SEO. The crawler respects robots.txt and can be configured for crawl speed and behavior.View
SeekportBotSeekportBot is the web crawler for Seekport, a German search engine operated by SISTRIX. The bot crawls and indexes web content while respecting robots.txt directives and crawl delays.View
SemanticScholarBotThe Semantic Scholar bot crawls domains to find academic PDFs. These PDFs are served on semanticscholar.org so researchers can discover and understand other academic accomplishments.View
Semrush Site AuditSemrush Site Audit is a powerful website crawler that analyzes the health of a website by checking for on-page and technical SEO issues, including duplicate content, broken links, HTTPS implementation, hreflang attributes, and more.View
SemrushSemrush is a platform for SEO, content marketing, competitor research, PPC and social media marketing.View
Sentry Uptime Monitoring BotSentry's Uptime Monitoring Bot performs health checks on configured URLs to monitor the availability and reliability of web services.View
SeznamBotSeznamBot is the web crawler operated by Seznam.cz, the leading Czech search engine. The bot crawls and indexes web content for Seznam's search results, respecting robots.txt directives and crawl delays.View
SISTRIX Optimizer UptimeSISTRIX Optimizer Uptime bot performs continuous monitoring of website availability by checking the startpage once per minute. It is part of SISTRIX's SEO and website monitoring platform.View
Site24x7Site24x7 Bot is used by Site24x7's monitoring services to perform various checks on websites, including uptime and performance monitoring.View
SitebulbSitebulb is a desktop and cloud-based website crawler used by SEO professionals for technical SEO audits. It analyzes websites to find technical issues, opportunities for improvement, and provides detailed reports with visualizations and prioritized recommendations.View
SlackLinkExpandingBotSlackbot Link Expanding is a bot operated by Slack that fetches metadata from shared links to create rich previews. The bot uses HTTP Range headers to efficiently fetch only necessary metadata like oEmbed and Open Graph tags, and caches responses globally for about 30 minutes.View
SlackbotSlackbot is Slack's default, general-purpose bot that handles various API requests and integrations. It is used for tasks not covered by specialized bots like ImgProxy or LinkExpanding, such as making API requests for service integrations or handling outgoing webhooks.View
Slack-ImgProxySlack-ImgProxy is a bot operated by Slack that fetches and caches images posted in Slack channels. The bot helps improve performance, ensures HTTPS delivery, and protects user privacy by hiding detailed referrer information.View
SnapchatAdsBotSnapchatAdsBot is a crawler operated by Snapchat that verifies and analyzes websites for their advertising platform. The bot helps ensure content quality and safety for Snapchat's advertising ecosystem.View
SnapURLPreviewBotSnapURLPreviewBot is a crawler operated by Snap Inc. that analyzes and generates previews of URLs shared on Snapchat and other Snap platforms. The bot helps ensure content quality and safety by validating URLs and generating preview metadata.View
StatusCakeStatusCake is a website monitoring service that checks the uptime and performance of your website.View
Stripe WebhooksStripe's webhook service that delivers real-time event notifications for payment processing and account updates.View
svixsvix is a webhook service for sending events to webhooks.View
TangibleeBotTangibleeBot is a crawler operated by Tangiblee that collects product data from e-commerce websites to power their product visualization and virtual try-on services. The bot simulates single-visitor activity and crawls at an agreed-upon frequency to prevent disruption to website performance.View
TikTokSpiderTikTokSpider is a web crawler used by TikTok/ByteDance to index and analyze web content for their platform. It helps in content discovery, link previews, and data collection for TikTok's services.View
TTD-ContentTTD-Content is a crawler operated by The Trade Desk that verifies content and quality of ad placements for their demand-side platform. The bot helps ensure brand safety and ad verification by analyzing webpage content where ads may be displayed.View
TwitterbotFetches content for shared links on X/Twitter to generate rich previews.View
Uptime RobotUptime Robot is a platform for monitoring and alerting on your applications.View
UsercentricsBotUsercentricsBot is operated by Usercentrics GmbH to scan websites for data processing services and third-party technologies. The bot helps ensure GDPR compliance by identifying services that need to be included in the website's Consent Management Platform (CMP).View
v0botBot for v0 services.View
Vercel Favicon BotVercel Favicon BotView
vercelflagsvercel flagsView
Vercel Screenshot BotVercel Screenshot BotView
verceltracingvercel tracingView
Yahoo! SlurpYahoo! Slurp is the web crawler (robot) used by Yahoo! Search to discover and index web pages for its search engine.View
YandexbotYandexBot is a web crawler operated by Yandex, a major Russian search engine.View
YisouSpiderYisouSpider is a search engine crawler operated by Yisou that indexes web content for their search engine results. The crawler follows standard crawling practices and respects robots.txt directives.View
Last updated on May 30, 2025