Cloudflare Says Perplexity’s Ai Bots Are ‘stealth Crawling’ Blocked Sites

Trending 4 months ago

The AI hunt startup Perplexity is allegedly skirting restrictions meant to extremity its AI web crawlers from accessing definite websites, according to a study from Cloudflare. In nan report, Cloudflare claims that erstwhile Perplexity encounters a block, nan startup will conceal its crawling personality “in an effort to circumvent nan website’s preferences.”

The study only adds to concerns astir Perplexity vacuuming up contented without permission, arsenic nan institution got caught barging past paywalls and ignoring sites’ robots.txt files past year. At nan time, Perplexity CEO Aravind Srinivas blamed nan activity connected third-party crawlers utilized by nan site.

Now, Cloudflare, 1 of nan world’s biggest net architecture providers, says it received complaints from customers who claimed that Perplexity’s bots still had entree to their websites moreover aft putting their penchant successful their websites’ robots.txt file and by creating Web Application Firewall (WAF) rules to restrict entree to nan startup’s AI bots.

To trial this, Cloudflare says it created caller domains pinch akin restrictions against Perplexity’s AI scrapers. It recovered that nan startup will first effort to entree nan sites by identifying itself arsenic nan names of its crawlers: “PerplexityBot” aliases “Perplexity-User.”

But if nan website has restrictions against AI scraping, Cloudflare claims Perplexity will alteration its personification supplier — nan spot of accusation that tells a website what benignant of browser and instrumentality you’re using, aliases if nan visitant is simply a bot — to “impersonate Google Chrome connected macOS.” Cloudflare says this “undeclared crawler” uses “rotating” IP addresses that nan company doesn’t include connected nan database of IP addresses utilized by its bots.

Additionally, Cloudflare claims that Perplexity changes its autonomous strategy networks (ASN), a number utilized to place groups of IP networks controlled by a azygous operator, to get astir blocks arsenic well. “This activity was observed crossed tens of thousands of domains and millions of requests per day,” Cloudflare writes.

In a connection to The Verge, Perplexity spokesperson Jesse Dwyer called Cloudflare’s study a “publicity stunt,” adding that “there are a batch of misunderstandings successful nan blog post.” Cloudflare has since de-listed Perplexity arsenic a verified bot and has rolled retired methods to artifact Perplexity’s “stealth crawling.” 

Cloudflare CEO Matthew Prince has been outspoken about AI’s “existential threat” to publishers. Last month, nan institution started letting websites ask AI companies to salary to crawl their content, and began blocking AI crawlers by default.

More