Facebook Bot Ip Range

Facebook Crawler – Sharing – Documentation

The Facebook Crawler crawls the HTML of an app or website that was shared on Facebook via copying and pasting the link or by a Facebook social plugin. The crawler gathers, caches, and displays information about the app or website such as its title, description, and thumbnail image.
Crawler Requirements
Your server must use gzip and deflate encodings.
Any Open Graph properties need to be listed before the first 1 MB of your website or app, or it will be cutoff.
Ensure that the content can be crawled by the crawler within a few seconds or Facebook will be unable to display the content.
Your app or website should either generate and return a response with all required properties according to the bytes specified in the Range header of the crawler request or it should ignore the Range header altogether.
Add to your allow list either the user agent strings or the IP addresses (more secure) used by the crawler.
Ensure that your app or website allows the Facebook Crawler to crawl the privacy policy associated with your app or website.
Crawler IPs and User Agents
The Facebook crawler user agent strings:
facebookexternalhit/1. 1 (+)
facebookexternalhit/1. 1
facebookcatalog/1. 0
To get a current list of IP addresses the crawler uses, run the following command.
whois -h — ‘-i origin AS32934’ | grep ^route These IP addresses change often.
Example Response…
route: 69. 63. 176. 0/21
route: 69. 184. 0/21
route: 66. 220. 144. 0/20
route: 69. 0/20
route6: 2620:0:1c00::/40
route6: 2a03:2880::/32
route6: 2a03:2880:fffe::/48
route6: 2a03:2880:ffff::/48
route6: 2620:0:1cff::/48… Troubleshooting
If your app or website content is not available at the time of crawling, you can force a crawl once it becomes available either by passing the URL through the Sharing Debugger tool or by using the Sharing API.
You can simulate a crawler request with the following code:
curl -v –compressed -H “Range: bytes=0-524288” -H “Connection: close” -A “facebookexternalhit/1. 1 (+)” “$URL”
What's the IP address range of Facebook's Open Graph crawler?

What’s the IP address range of Facebook’s Open Graph crawler?

In order to test the Open Graph API on our preview environment, we need to poke a hole in our firewall to allow Facebook to scrape our object pages. What IP ranges should we allow?
asked Jan 14 ’12 at 0:22
0
EDIT
Facebook has been showing some love and is now making the IP block public for anyone to have
Facebook Scraper
A number of Platform services such as Social Plugins and the Open
Graph require our systems to be able to reach your Web Pages. We
recognize that there are situations where you might not want these
pages on the public Internet, during testing or for other security
reasons.
To facilitate this, you should make exceptions in your security
systems to allow Facebook to scrape these pages by adding the
following IP ranges, accurate as of April 2012.
31. 13. 24. 0/21
31. 64. 0/18
66. 220. 144. 0/20
69. 63. 176. 171. 224. 0/19
74. 119. 76. 0/22
103. 4. 96. 0/22
173. 252. 0/18
204. 15. 20. 0/22
Instead of IP, you can also use the user agent for your firewall.
When does Facebook scrape my page?
Facebook needs to scrape your page to know how to display it around
the site.
Facebook scrapes your page every 24 hours to ensure the properties are
up to date. The page is also scraped when an admin for the Open Graph
page clicks the Like button and when the URL is entered into the
Facebook URL Linter. Facebook observes cache headers on your URLs – it
will look at “Expires” and “Cache-Control” in order of preference.
However, even if you specify a longer time, Facebook will scrape your
page every 24 hours.
The user agent of the scraper is: “facebookexternalhit/1. 1
(+)”
Stalinko2, 44817 silver badges24 bronze badges
answered Jan 14 ’12 at 0:45
DMCSDMCS31. 5k14 gold badges68 silver badges103 bronze badges
2
whois -h — ‘-i origin AS32934’ | grep ^route to see all ranges.
answered Oct 23 ’13 at 12:48
Stillmatic1985Stillmatic19851, 5625 gold badges19 silver badges37 bronze badges
1
66. 0/21
66. 152. 159. 0/24
69. 0/21
69. 184. 239. 240. 255. 0/24
173. 0/19
173. 70. 0/19
31. 0/24
31. 65. 66. 67. 68. 69. 71. 72. 73. 74. 75. 77. 0/19
Tadeck121k26 gold badges140 silver badges192 bronze badges
answered Apr 4 ’12 at 18:28
Facebook now publishes their IP range.
As of April 2012, it is:
answered May 2 ’12 at 21:19
bkaidbkaid50k21 gold badges109 silver badges127 bronze badges
New information is listed on the following URL & yes, they do have this info public.
Run this command to get a current list of IP addresses the crawler
uses.
whois -h — ‘-i origin AS32934’ | grep ^route
Such as
# For example only – over 100 in total
2401:db00::/32
2620:0:1c00::/40
2a03:2880::/32
So yeah, the ones mentioned by DMCS, stand correct. Just wanted to verify & found this info.
Thanks
answered Apr 17 ’15 at 18:18
tushonlinetushonline2622 silver badges15 bronze badges
Facebook does not publish their crawler source address range officially, but you can look at the list of all their IP ranges in the publicly available BGP routing table:
We’re currently using this list:
answered Jan 14 ’12 at 11:30
Not the answer you’re looking for? Browse other questions tagged facebook-graph-api or ask your own question.
5 Things You Need to Know Before Scraping Data From Facebook

5 Things You Need to Know Before Scraping Data From Facebook

1. Actually, Facebook disallows any scraper, according to its file.
When planning to scrape a website, you should always check its first. is a file used by websites to let “bots” know if or how the site should be scrapped or crawled and indexed. You could access the file by adding “/” by the end of the link to your target website.
Enter in your browser, and let’s check the robots file of Facebook. These two lines could be found at the bottom of the file:
The lines state that Facebook prohibits all automated scrapers. That is, no part of the website should be visited by an automated crawler.
Why do we need to respect
Websites use the robots file to specify a set of rules on how you or a bot should interact with them. When a website blocks all access to crawlers, the best thing to do is to leave that site alone. To follow the robots file is to avoid unethical data gathering as well as any legal ramifications.
2. Technically, the only legal way to collect data from Facebook with a crawler is to obtain a prior written permission
Facebook warns at the very beginning of their robots file: “Crawling Facebook is prohibited unless you have express written permission. ”
Check the link on the second line, you could find Facebook’s Automated Data Collection Terms, last revised on April 15th, 2010.
Like any other terms and conditions in the world, Facebook Automated Data Collection Terms are long (in abnormally small font size) and full of legal terms that few people could fully understand.
These terms look so familiar, as we would see them each time we install a new app on our mobile phone or sign up for a website.
“By obtaining permission to…you agree to abide by…”
“You agree that you will not…”
“You agree that any violation of these terms may result in…”
However, they may not be the same innocent.
As the social media giant, Facebook has money, time and a dedicated legal team. If you proceed with scraping Facebook by ignoring their Automated Data Collection Terms, that’s OK, but just be warned that they have been reminded you to at least obtain “written permission”. Sometimes they could be quite aggressive towards illegitimate scraping.
3. But surely you are still able to scrape data from Facebook as you need
If you have done crawling without respecting the, it doesn’t mean you would get into legal complications because you’ve violated the rules.
Data scraped from social media is undoubtedly the largest and most dynamic dataset about human behavior and real-world events. For more than a decade, researchers and business experts around the world have harvested information from Facebook using scrapers, producing representative samples to understand individuals, groups and society, as well as exploring brand new opportunities hidden in the data.
For users, they would agree that the use of social data is not always a bad thing. For example, it is the use of social data to personalize marketing that keeps the internet free and makes the ads and content we see more relevant.
Tools you could use for obtaining Facebook data
In response to the public outcry following the Cambridge Analytica scandal, Facebook implemented dramatic access restrictions on its APIs in April last year.
Application Programming Interfaces (APIs) are software interfaces designed for consumption by computer programs, which allow people to retrieve large-scale data with automated processes. Nowadays many companies provide a public API as a means for users, researchers and third-party app developers to access their infrastructure.
Facebook’s API lockdown and radical data access restrictions as an attempt to protect its user information are quite arguable. But still, as a result, now people are left with only one choice.
Without APIs, now we could only obtain Facebook data through the interfaces for users, that is, the web pages. This is exactly when web scrapers come into play. We have written a blog about some best social media scraping tools. Check our article Top 5 Social Media Scraping Tools for 2020.
4. After GDPR in force, however, there’s more chance to get sued if you’re trying to scrape personal data
Before scraping data from Facebook, learn about GDPR compliance in web scraping could help.
The EU General Data Protection Regulation, or GDPR as it is more commonly known, came into force on 25th May 2018. It is said to be the most important change in data privacy regulation in 20 years, setting to force sweeping changes in everything from technology to advertising, and medicine to banking.
Companies or organizations that hold and process large amounts of consumer data, such as technology firms like Facebook, are affected the most under GDPR. Before it was all up to these companies to enforce the rules to protect user data. Now under GDPR, they need to make sure they are in full compliance with the law.
The good news is…
GDPR only applies to personal data.
Here “personal data” refers to the data that could be used to directly or indirectly identify a specific individual. This kind of information is known as Personally Identifiable Information(PII), which includes a person’s name, physical address, email address, phone number, IP address, date of birth, employment info and even video/audio recording.
If you aren’t scraping personal data, then GDPR does not apply.
In short, unless you have the person’s explicit consent it is now illegal to scrape an EU resident personal data under GDPR.
5. And you could try Facebook alternative sources for your scraping project
As mentioned above, though Facebook prohibits all automated crawlers, it is still technically feasible to scrape data from the site. The problem is —
It is risky.
Apart from the legal ramifications, you could find that it may get harder to retrieve the desired data on a regular basis, as Facebook block suspicious IPs, and could even implement harder blocking mechanisms in the future, which may make scraping data from the site totally impossible.
Hence, it is recommended to look for more reliable sources for social media data to gain business intelligence and insights on your target market.
Four data sources alternative to Facebook
Twitter
With about 500 million tweets generated per day, Twitter is a sea of information that can be used as a great source for brand monitoring and customer sentiment measurement. Unlike Facebook, Twitter allows people to retrieve data on a large scale via Twitter’s APIs.
Reddit
Having as many users as Twitter, Reddit is one of the greatest sources of UGC (User Generated Content) in the world. Reddit also provides public APIs that can be used for a variety of purposes such as data collection, automatic commenting bots, or even to assist in subreddit moderation.
VKontakte (VK)
VK is a Russian social media platform geared toward Russians and other Eastern European users. By far, it boasts over 90 million unique visitors per month, and 9 billion page views every day. As a Russian company, VK adheres to Russian laws, and if you check its robots file you’ll find it is quite friendly with crawlers.
Instagram
Owned by Facebook, Instagram focuses more on visual content sharing, especially videos and pictures. The platform is used by many brands to humanize their content for better connecting customers and growing brand awareness. Alongside Facebook’s data lockdown last year, however, Instagram has also implemented radical restrictions on data access, which made the site much less reliable than before.
日本語記事:Facebookからデータを収集する前に知っておくべき5つのことWebスクレイピングについての記事は 公式サイトでも読むことができます。Artículo en español: 5 Cosas que Debes Saber Antes de Scraping de FacebookTambién puede leer artículos de web scraping en el Website Oficial
Written by: Ellen Y (The Octoparse Team)
Edit: Ashley Weldon
Top 5 Social Media Scraping Tools
Social Media Web Scraping Templates Take Away
Twitter Scraping, Text Mining, and Sentiment Analysis Using Python
Scrape Tweets from Twitter Without Coding
Scrape Instagram with Octoparse
How to Extract Data from Twitter Without Coding
Scrape video information from YouTube
Scrape public posts from Facebook

Frequently Asked Questions about facebook bot ip range

Does Facebook allow crawlers?

Facebook warns at the very beginning of their robots file: “Crawling Facebook is prohibited unless you have express written permission.”Aug 12, 2021

What is fb Crawler?

The Facebook Crawler crawls the HTML of an app or website that was shared on Facebook via copying and pasting the link or by a Facebook social plugin. The crawler gathers, caches, and displays information about the app or website such as its title, description, and thumbnail image.

How Facebook was used as a proxy by web scraping bots?

When a link is shared on Facebook or in a Messenger conversation, Facebook crawls the shared webpage to extract information for the preview. By simulating link sharing, web scraping bots could make unlimited requests to their targeted websites via Facebook’s infrastructure.

Leave a Reply

Your email address will not be published. Required fields are marked *