Perplexity accused of scraping websites that explicitly blocked AI scraping

Must Read
bicycledays
bicycledayshttp://trendster.net
Please note: Most, if not all, of the articles published at this website were completed by Chat GPT (chat.openai.com) and/or copied and possibly remixed from other websites or Feedzy or WPeMatico or RSS Aggregrator or WP RSS Aggregrator. No copyright infringement is intended. If there are any copyright issues, please contact: bicycledays@yahoo.com.

AI startup Perplexity is crawling and scraping content material from web sites which have explicitly indicated they don’t wish to be scraped, in response to web infrastructure supplier Cloudflare.

On Monday, Cloudflare printed analysis saying it noticed the AI startup ignore blocks and conceal its crawling and scraping actions. The community infrastructure big accused Perplexity of obscuring its identification when attempting to scrape net pages “in an try to avoid the web site’s preferences,” Cloudflare’s researchers wrote.

AI merchandise like these provided by Perplexity depend on gobbling up giant quantities of information from the web, and AI startups have lengthy scraped textual content, photos, and movies from the web many instances with out permission to make their merchandise work. In current instances, web sites have tried to combat again through the use of the online customary Robots.txt file, which tells search engines like google and AI firms which pages may be listed and which shouldn’t, efforts which have seen blended outcomes to this point. 

Perplexity seems to be willingly circumventing these blocks by altering its bots “person agent,” that means a sign that identifies a web site customer by their gadget and model sort; in addition to altering their autonomous system networks, or ASN, basically a quantity that identifies giant networks on the web, in response to Cloudflare. 

“This exercise was noticed throughout tens of 1000’s of domains and hundreds of thousands of requests per day. We had been in a position to fingerprint this crawler utilizing a mixture of machine studying and community alerts,” learn Cloudflare’s publish. 

Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s weblog publish as a “gross sales pitch,” including in an e-mail to Trendster that the screenshots within the publish “present that no content material was accessed.” In a follow-up e-mail, Dwyer claimed the bot named within the Cloudflare weblog “isn’t even ours.”

Cloudflare mentioned it first seen the habits after its clients complained that Perplexity was crawling and scraping their websites, even after they added guidelines on their Robots file and for particularly blocking Perplexity’s recognized bots. Cloudflare mentioned it then carried out assessments to test and confirmed that Perplexity was circumventing these blocks. 

Techcrunch occasion

San Francisco
|
October 27-29, 2025

“We noticed that Perplexity makes use of not solely their declared user-agent, but additionally a generic browser supposed to impersonate Google Chrome on macOS when their declared crawler was blocked,” in response to Cloudflare.  

The corporate additionally mentioned that it has de-listed Perplexity’s bots from its verified record and added new methods to dam them. 

Cloudflare has lately taken a public stance towards AI crawlers. Final month, Cloudflare introduced the launch of a market permitting web site house owners and publishers to cost AI scrapers who go to their websites. Cloudflare’s chief govt Matthew Prince sounded the alarm on the time, saying AI is breaking the enterprise mannequin of the web, significantly publishers. Final yr, Cloudflare additionally launched a free software to stop bots from scraping web sites to coach AI. 

This isn’t the primary time Perplexity is accused of scraping with out authorization. 

Final yr, information shops, reminiscent of Wired, alleged Perplexity was plagiarizing their content material. Weeks later, Perplexity’s CEO Aravind Srinivas was unable to right away reply when requested to offer the corporate’s definition of plagiarism throughout an interview with Trendster’s Devin Coldewey on the Disrupt 2024 convention.

Latest Articles

India doubles down on state-backed venture capital, approving $1.1B fund

India has cleared a $1.1 billion state-backed enterprise capital program that may channel authorities cash into startups by means...

More Articles Like This