Ghostboard pixel

Perplexity is a Shameless AI Company That Won't Take No for an Answer

Perplexity keeps crawling websites, even when it's told no.

cloudflare logo on left, a thief trying to steal a laptop with the web logo in it, and the perplexity logo on the top right

Yesterday, Cloudflare published a report accusing Perplexity AI of stealthily bypassing website restrictions. The cloud services firm said Perplexity continued to crawl content from sites that had explicitly banned it. The evidence came from a controlled test designed to trap unauthorized bots.

Perplexity had previously been blocked using both robots.txt files and firewall rules. Despite these clear signals, the AI chatbot still returned content from the restricted websites.

This is not the first time Perplexity has been under scrutiny. The company has faced multiple accusations of ignoring consent and reusing content without permission.

Cloudflare's investigation has revealed sophisticated evasion tactics. When Perplexity's official crawlers got blocked, the company just deployed masqueraded user agents pretending to be regular Chrome browsers on Mac computers.

These undeclared crawlers used rotating IPs and different ASN attempts to further dodge website blocks. This let them slip right past firewall protections that were specifically designed to keep their known crawler addresses out.

Cloudflare tested this sneaky behavior using brand-new domains that had never been indexed by any search engine. Despite having strict blocking rules in place, Perplexity still somehow accessed and returned detailed information from these supposedly protected sites (as shown above).

a flow chart that shows how perplexity's stealthy crawlers scraped contents from the test websites
This flow chart shows how these crawlers work. (Source: Cloudflare)

And the scale of this? Absolutely massive! We're talking millions of daily requests hitting tens of thousands of domains. This clearly wasn't some accidental oversight but systematic circumvention.

When these crawlers were successfully blocked, Perplexity's answers became less detailed, proving the blocks were working.

Cloudflare expects these subversive bots to come crawling back with even sneakier tactics. The company warns that bot evasion techniques will just keep evolving as AI firms continue trying to evade detection.

Nonetheless, Cloudflare isn't backing down; they are ready for whatever comes next, promising to adapt their detection methods and stay one step ahead of these increasingly underhanded crawling operations.

💬 What do you think? Is Perplexity knowingly engaging in this behavior?

🎗️
Here's why you should opt for It's FOSS Plus Membership:

- Even the biggest players in the Linux world don't care about desktop Linux users. We do.
- We don't put informational content behind paywall. Your support keeps it open for everyone. Think of it like 'pay it forward'.
- Don't like ads? With the Plus membership, you get an ad-free reading experience.
- When millions of AI-generated content is being published daily, you read and learn from real human Linux users.
- It costs just $2 a month, less than the cost of your favorite burger.

Become a Plus Member today and join over 300 people in supporting our work.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to It's FOSS News.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.