How to Scrape Amazon Reviews using Python in 3 steps
In this web scraping tutorial, we will build an Amazon Review Scraper using Python in 3 steps, which can extract review data from Amazon products such as – Review Title, Review Content, Product Name, Rating, Date, Author and more, into an Excel spreadsheet. You can also check out our tutorial on how to build a Python scraper to scrape Amazon product details and pricing. We will build this simple Amazon review scraper using Python and SelectorLib and run it in a console.
Here are the steps on how you can scrape Amazon reviews using Python
Markup the data fields to be scraped using Selectorlib
Copy and run the code provided
Download the data in Excel (CSV) format.
We have also provided how you can scrape product details from Amazon search result page, how to avoid getting blocked by Amazon and how to scrape Amazon on a large scale below.
If you do not want to code, we have made it simple to do all this for FREE and in a few clicks. ScrapeHero Cloud can scrape reviews of Amazon products within seconds!
Here are some of the data fields that the Amazon product review scraper will extract into a spreadsheet from Amazon:
Product Name
Review Title
Review Content/Review Text
Rating
Date of publishing review
Verified Purchase
Author Name
URL
We will save the data as an Excel Spreadsheet (CSV).
Installing the required packages for running Amazon Reviews Web Scraper
For this web scraping tutorial to scrape Amazon product reviews using Python 3 and its libraries. We will not be using Scrapy for this tutorial. This code can run easily and quickly on any computer (including a Raspberry Pi)
If you do not have Python 3 installed, you can follow this guide to install Python in Windows here – How To Install Python Packages.
We will use these libraries:
Python Requests, to make requests and download the HTML content of the pages ().
LXML, for parsing the HTML Tree Structure using Xpaths (Learn how to install that here –)
Python Dateutil, for parsing review dates ()
Selectorlib, to extract data using the YAML file we created from the webpages we download
Install them using pip3
pip3 install python-dateutil lxml requests selectorlib
The Code
You can get all the code used in this tutorial from Github – Let’s create a file called and paste the following Python code into it.
Here is what the Amazon product review scraper does:
Reads a list of Product Review Pages URLs from a file called (This file will contain the URLs for the Amazon product pages you care about)
Uses a selectorlib YAML file that identifies the data on an Amazon page and is saved in a file called (more on how to generate this file later in this tutorial)
Scrapes the Data
Saves the data as CSV Spreadsheet called
from selectorlib import Extractor
import requests
import json
from time import sleep
import csv
from dateutil import parser as dateparser
# Create an Extractor by reading from the YAML file
e = om_yaml_file(”)
def scrape(url):
headers = {
‘authority’: ”,
‘pragma’: ‘no-cache’,
‘cache-control’: ‘no-cache’,
‘dnt’: ‘1’,
‘upgrade-insecure-requests’: ‘1’,
‘user-agent’: ‘Mozilla/5. 0 (X11; CrOS x86_64 8172. 45. 0) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/51. 0. 2704. 64 Safari/537. 36’,
‘accept’: ‘text/html, application/xhtml+xml, application/xml;q=0. 9, image/webp, image/apng, */*;q=0. 8, application/signed-exchange;v=b3;q=0. 9’,
‘sec-fetch-site’: ‘none’,
‘sec-fetch-mode’: ‘navigate’,
‘sec-fetch-dest’: ‘document’,
‘accept-language’: ‘en-GB, en-US;q=0. 9, en;q=0. 8’, }
# Download the page using requests
print(“Downloading%s”%url)
r = (url, headers=headers)
# Simple check to check if page was blocked (Usually 503)
if atus_code > 500:
if “To discuss automated access to Amazon data please contact” in
print(“Page%s was blocked by Amazon. Please try using better proxies\n”%url)
else:
print(“Page%s must have been blocked by Amazon as the status code was%d”%(url, atus_code))
return None
# Pass the HTML of the page and create
return e. extract()
with open(“”, ‘r’) as urllist, open(”, ‘w’) as outfile:
writer = csv. DictWriter(outfile, fieldnames=[“title”, “content”, “date”, “variant”, “images”, “verified”, “author”, “rating”, “product”, “url”], quoting=csv. QUOTE_ALL)
writer. writeheader()
for url in adlines():
data = scrape(url)
if data:
for r in data[‘reviews’]:
r[“product”] = data[“product_title”]
r[‘url’] = url
if ‘verified’ in r:
if ‘Verified Purchase’ in r[‘verified’]:
r[‘verified’] = ‘Yes’
r[‘rating’] = r[‘rating’](‘ out of’)[0]
date_posted = r[‘date’](‘on ‘)[-1]
if r[‘images’]:
r[‘images’] = “\n”(r[‘images’])
r[‘date’] = (date_posted). strftime(‘%d%b%Y’)
writer. writerow(r)
# sleep(5)
If you don’t like or want to code, ScrapeHero Cloud is just right for you!
Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.
Get Started for Free
Creating the YAML file –
You will notice in the code above that we used a file called This file is what makes this tutorial so easy to create and follow. The magic behind this file is a Web Scraper tool called Selectorlib.
Selectorlib is a tool that makes selecting, marking up, and extracting data from web pages visual and easy. The Selectorlib Web Scraper Chrome Extension lets you mark data that you need to extract, and creates the CSS Selectors or XPaths needed to extract that data. Then previews how the data would look like. You can learn more about Selectorlib and how to use it here
If you just need the data we have shown above, you do not need to use Selectorlib. Since we have done that for you already and generated a simple “template” that you can just use. However, if you want to add a new field, you can use Selectorlib to add that field to the template.
Here is how we marked up the fields for the data we need to scrape Amazon reviews from the Product Reviews Page using Selectorlib Chrome Extension.
Once you have created the template, click on ‘Highlight’ to highlight and preview all of your selectors. Finally, click on ‘Export’ and download the YAML file and that file is the file.
Here is how our template () file looks like:
product_title:
css: ‘h1 a[data-hook=”product-link”]’
type: Text
reviews:
css: ‘ div. a-section. celwidget’
multiple: true
children:
title:
css:
content:
css: ‘ ‘
date:
css: span. a-size-base. a-color-secondary
variant:
css: ‘a. a-size-mini’
images:
type: Attribute
attribute: src
verified:
css: ‘span[data-hook=”avp-badge”]’
author:
css: span. a-profile-name
rating:
css: ‘div. a-row:nth-of-type(2) > a. a-link-normal:nth-of-type(1)’
attribute: title
next_page:
css: ‘li. a-last a’
type: Link
Previous Versions of the Scraper
If you need a script that runs on older versions of Python, you can view the previous versions of this code to scrape Amazon reviews.
Python 3 (built in 2018) – Python 2. 7 (built in 2016) –
Running the Amazon Review Scraper
You can get all the code used in this tutorial from Github – All you need to do is add the URLs you need to scrape into a text file called in the same folder and run the scraper using the command:
python3
Here is an example URL – You can get this URL by clicking on “See all reviews” near the bottom of the product page.
Here is how the Amazon scraped reviews look like:
This code can be used to scrape Amazon reviews of a relatively small number of ASINs for your personal projects. But if you want to scrape websites for thousands of pages, learn about the challenges here How to build and run scrapers on a large scale.
What can you do with Scraping Amazon Reviews?
The data that you gather from this tutorial can help you with:
You can get review details unavailable using the official Amazon Product Advertising API.
Monitoring customer opinions on products that you sell or manufacture using Data Analysis
Create Amazon Review Datasets for Educational Purposes and Research
Monitor product quality sold by third-party sellers
Amazon used to provide access to product reviews through their Product Advertising API to developers and sellers, a few years back. They discontinued that on November 8th, 2010, preventing customers from displaying Amazon reviews about their products, embedded in their websites. As of now, Amazon only returns a link to the review.
Building a Free Amazon Reviews API using Python, Flask & Selectorlib
If you are looking for getting reviews as an API, like an Amazon Product Advertising API – you may find this tutorial below interesting.
Thanks for reading and if you need help with your complex scraping projects let us know and we will be glad to help.
Do you need some professional help to scrape Amazon Data? Let us know
Turn the Internet into meaningful, structured and usable data
Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.
How to Scrape Amazon Reviews using Python in 3 steps
In this web scraping tutorial, we will build an Amazon Review Scraper using Python in 3 steps, which can extract review data from Amazon products such as – Review Title, Review Content, Product Name, Rating, Date, Author and more, into an Excel spreadsheet. You can also check out our tutorial on how to build a Python scraper to scrape Amazon product details and pricing. We will build this simple Amazon review scraper using Python and SelectorLib and run it in a console.
Here are the steps on how you can scrape Amazon reviews using Python
Markup the data fields to be scraped using Selectorlib
Copy and run the code provided
Download the data in Excel (CSV) format.
We have also provided how you can scrape product details from Amazon search result page, how to avoid getting blocked by Amazon and how to scrape Amazon on a large scale below.
If you do not want to code, we have made it simple to do all this for FREE and in a few clicks. ScrapeHero Cloud can scrape reviews of Amazon products within seconds!
Here are some of the data fields that the Amazon product review scraper will extract into a spreadsheet from Amazon:
Product Name
Review Title
Review Content/Review Text
Rating
Date of publishing review
Verified Purchase
Author Name
URL
We will save the data as an Excel Spreadsheet (CSV).
Installing the required packages for running Amazon Reviews Web Scraper
For this web scraping tutorial to scrape Amazon product reviews using Python 3 and its libraries. We will not be using Scrapy for this tutorial. This code can run easily and quickly on any computer (including a Raspberry Pi)
If you do not have Python 3 installed, you can follow this guide to install Python in Windows here – How To Install Python Packages.
We will use these libraries:
Python Requests, to make requests and download the HTML content of the pages ().
LXML, for parsing the HTML Tree Structure using Xpaths (Learn how to install that here –)
Python Dateutil, for parsing review dates ()
Selectorlib, to extract data using the YAML file we created from the webpages we download
Install them using pip3
pip3 install python-dateutil lxml requests selectorlib
The Code
You can get all the code used in this tutorial from Github – Let’s create a file called and paste the following Python code into it.
Here is what the Amazon product review scraper does:
Reads a list of Product Review Pages URLs from a file called (This file will contain the URLs for the Amazon product pages you care about)
Uses a selectorlib YAML file that identifies the data on an Amazon page and is saved in a file called (more on how to generate this file later in this tutorial)
Scrapes the Data
Saves the data as CSV Spreadsheet called
from selectorlib import Extractor
import requests
import json
from time import sleep
import csv
from dateutil import parser as dateparser
# Create an Extractor by reading from the YAML file
e = om_yaml_file(”)
def scrape(url):
headers = {
‘authority’: ”,
‘pragma’: ‘no-cache’,
‘cache-control’: ‘no-cache’,
‘dnt’: ‘1’,
‘upgrade-insecure-requests’: ‘1’,
‘user-agent’: ‘Mozilla/5. 0 (X11; CrOS x86_64 8172. 45. 0) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/51. 0. 2704. 64 Safari/537. 36’,
‘accept’: ‘text/html, application/xhtml+xml, application/xml;q=0. 9, image/webp, image/apng, */*;q=0. 8, application/signed-exchange;v=b3;q=0. 9’,
‘sec-fetch-site’: ‘none’,
‘sec-fetch-mode’: ‘navigate’,
‘sec-fetch-dest’: ‘document’,
‘accept-language’: ‘en-GB, en-US;q=0. 9, en;q=0. 8’, }
# Download the page using requests
print(“Downloading%s”%url)
r = (url, headers=headers)
# Simple check to check if page was blocked (Usually 503)
if atus_code > 500:
if “To discuss automated access to Amazon data please contact” in
print(“Page%s was blocked by Amazon. Please try using better proxies\n”%url)
else:
print(“Page%s must have been blocked by Amazon as the status code was%d”%(url, atus_code))
return None
# Pass the HTML of the page and create
return e. extract()
with open(“”, ‘r’) as urllist, open(”, ‘w’) as outfile:
writer = csv. DictWriter(outfile, fieldnames=[“title”, “content”, “date”, “variant”, “images”, “verified”, “author”, “rating”, “product”, “url”], quoting=csv. QUOTE_ALL)
writer. writeheader()
for url in adlines():
data = scrape(url)
if data:
for r in data[‘reviews’]:
r[“product”] = data[“product_title”]
r[‘url’] = url
if ‘verified’ in r:
if ‘Verified Purchase’ in r[‘verified’]:
r[‘verified’] = ‘Yes’
r[‘rating’] = r[‘rating’](‘ out of’)[0]
date_posted = r[‘date’](‘on ‘)[-1]
if r[‘images’]:
r[‘images’] = “\n”(r[‘images’])
r[‘date’] = (date_posted). strftime(‘%d%b%Y’)
writer. writerow(r)
# sleep(5)
If you don’t like or want to code, ScrapeHero Cloud is just right for you!
Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.
Get Started for Free
Creating the YAML file –
You will notice in the code above that we used a file called This file is what makes this tutorial so easy to create and follow. The magic behind this file is a Web Scraper tool called Selectorlib.
Selectorlib is a tool that makes selecting, marking up, and extracting data from web pages visual and easy. The Selectorlib Web Scraper Chrome Extension lets you mark data that you need to extract, and creates the CSS Selectors or XPaths needed to extract that data. Then previews how the data would look like. You can learn more about Selectorlib and how to use it here
If you just need the data we have shown above, you do not need to use Selectorlib. Since we have done that for you already and generated a simple “template” that you can just use. However, if you want to add a new field, you can use Selectorlib to add that field to the template.
Here is how we marked up the fields for the data we need to scrape Amazon reviews from the Product Reviews Page using Selectorlib Chrome Extension.
Once you have created the template, click on ‘Highlight’ to highlight and preview all of your selectors. Finally, click on ‘Export’ and download the YAML file and that file is the file.
Here is how our template () file looks like:
product_title:
css: ‘h1 a[data-hook=”product-link”]’
type: Text
reviews:
css: ‘ div. a-section. celwidget’
multiple: true
children:
title:
css:
content:
css: ‘ ‘
date:
css: span. a-size-base. a-color-secondary
variant:
css: ‘a. a-size-mini’
images:
type: Attribute
attribute: src
verified:
css: ‘span[data-hook=”avp-badge”]’
author:
css: span. a-profile-name
rating:
css: ‘div. a-row:nth-of-type(2) > a. a-link-normal:nth-of-type(1)’
attribute: title
next_page:
css: ‘li. a-last a’
type: Link
Previous Versions of the Scraper
If you need a script that runs on older versions of Python, you can view the previous versions of this code to scrape Amazon reviews.
Python 3 (built in 2018) – Python 2. 7 (built in 2016) –
Running the Amazon Review Scraper
You can get all the code used in this tutorial from Github – All you need to do is add the URLs you need to scrape into a text file called in the same folder and run the scraper using the command:
python3
Here is an example URL – You can get this URL by clicking on “See all reviews” near the bottom of the product page.
Here is how the Amazon scraped reviews look like:
This code can be used to scrape Amazon reviews of a relatively small number of ASINs for your personal projects. But if you want to scrape websites for thousands of pages, learn about the challenges here How to build and run scrapers on a large scale.
What can you do with Scraping Amazon Reviews?
The data that you gather from this tutorial can help you with:
You can get review details unavailable using the official Amazon Product Advertising API.
Monitoring customer opinions on products that you sell or manufacture using Data Analysis
Create Amazon Review Datasets for Educational Purposes and Research
Monitor product quality sold by third-party sellers
Amazon used to provide access to product reviews through their Product Advertising API to developers and sellers, a few years back. They discontinued that on November 8th, 2010, preventing customers from displaying Amazon reviews about their products, embedded in their websites. As of now, Amazon only returns a link to the review.
Building a Free Amazon Reviews API using Python, Flask & Selectorlib
If you are looking for getting reviews as an API, like an Amazon Product Advertising API – you may find this tutorial below interesting.
Thanks for reading and if you need help with your complex scraping projects let us know and we will be glad to help.
Do you need some professional help to scrape Amazon Data? Let us know
Turn the Internet into meaningful, structured and usable data
Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.
How To Scrape Amazon Product Data using Python | ScrapeHero
Web scraping helps in automating data extraction from websites. In this tutorial, we will build an Amazon scraper for extracting product details and pricing. We will build this simple web scraper using Python and SelectorLib and run it in a console.
Here is how you can scrape Amazon product details from Amazon product pageSetting up your computer for Amazon ScrapingPackages to install for Amazon scrapingScrape product details from the Amazon Product PageMarkup the data fields using SelectorlibThe CodeRunning the Amazon Product Page ScraperScrape Amazon products from the Search Results PageMarkup the data fields using SelectorlibThe CodeRunning the Amazon Scraper to Scrape Search ResultWhat to do if you get blocked while scraping AmazonUse proxies and rotate themSpecify the User Agents of latest browsers and rotate themReduce the number of ASINs scraped per minuteRetry, Retry, RetryHow to Solve Amazon Scraping ChallengesUse a Web Scraping Framework like PySpider or ScrapyIf you need speed, Distribute and Scale-Up using a Cloud ProviderUse a scheduler if you need to run the scraper periodicallyUse a database to store the Scraped Data from AmazonUse Request Headers, Proxies, and IP Rotation to prevent getting Captchas from AmazonWrite some simple data quality testsHow to use Amazon Product Data
Here is how you can scrape Amazon product details from Amazon product page
Markup the data fields to be scraped using Selectorlib
Copy and run the code provided
Check out our web scraping tutorials to learn how to scrape Amazon Reviews easily using Google Chrome and how to build a Amazon Review Scraper using Python.
We have also provided how you can scrape product details from Amazon search result page, how to avoid getting blocked by Amazon and how to scrape Amazon on a large scale below.
Setting up your computer for Amazon Scraping
We will use Python 3 for this Amazon scraper. The code will not run if you are using Python 2. 7. To start, you need a computer with Python 3 and PIP installed in it.
Follow this guide to setup your computer and install packages if you are on windows
How To Install Python Packages for Web Scraping in Windows 10
Packages to install for Amazon scraping
Python Requests, to make requests and download the HTML content of the Amazon product pages
SelectorLib python package to extract data using the YAML file we created from the webpages we download
Using pip3,
pip3 install requests requests selectorlib
Scrape product details from the Amazon Product Page
The Amazon product page scraper will scrape the following details from product page.
Product Name
Price
Short Description
Full Product Description
Image URLs
Rating
Number of Reviews
Variant ASINs
Sales Rank
Link to all Reviews Page
Markup the data fields using Selectorlib
We have already marked up the data, so you can just skip this step if you want to get right to the data.
Here is how our template looks like. See the file here
Let’s save this as a file called in the same directory as our code.
name:
css: ‘#productTitle’
type: Text
price:
css: ‘#price_inside_buybox’
short_description:
css: ‘#featurebullets_feature_div’
images:
css: ‘. imgTagWrapper img’
type: Attribute
attribute: data-a-dynamic-image
rating:
css:
number_of_reviews:
css: ‘a. a-link-normal h2’
variants:
css: ‘form. a-section li’
multiple: true
children:
css: “”
attribute: title
asin:
attribute: data-defaultasin
product_description:
css: ‘#productDescription’
sales_rank:
css: ‘li#SalesRank’
link_to_all_reviews:
css: ‘ a. a-link-emphasis’
type: Link
Here is a preview of the markup
Selectorlib is a combination of tools for developers that makes marking up and extracting data from web pages easy. The Selectorlib Chrome Extension lets you mark data that you need to extract, and creates the CSS Selectors or XPaths needed to extract that data, then previews how the data would look like.
You can learn more about Selectorlib and how to use it to markup data here
If you don’t like or want to code, ScrapeHero Cloud is just right for you!
Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.
Get Started for Free
The Code
Create a folder called amazon-scraper and paste your selectorlib yaml template file as
Let’s create a file called and paste the code below into it. All it does is
Read a list of Amazon Product URLs from a file called
Scrape the data
Save the data as a JSON Lines file
from selectorlib import Extractor
import requests
import json
from time import sleep
# Create an Extractor by reading from the YAML file
e = om_yaml_file(”)
def scrape(url):
headers = {
‘authority’: ”,
‘pragma’: ‘no-cache’,
‘cache-control’: ‘no-cache’,
‘dnt’: ‘1’,
‘upgrade-insecure-requests’: ‘1’,
‘user-agent’: ‘Mozilla/5. 0 (X11; CrOS x86_64 8172. 45. 0) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/51. 0. 2704. 64 Safari/537. 36’,
‘accept’: ‘text/html, application/xhtml+xml, application/xml;q=0. 9, image/webp, image/apng, */*;q=0. 8, application/signed-exchange;v=b3;q=0. 9’,
‘sec-fetch-site’: ‘none’,
‘sec-fetch-mode’: ‘navigate’,
‘sec-fetch-dest’: ‘document’,
‘accept-language’: ‘en-GB, en-US;q=0. 9, en;q=0. 8’, }
# Download the page using requests
print(“Downloading%s”%url)
r = (url, headers=headers)
# Simple check to check if page was blocked (Usually 503)
if atus_code > 500:
if “To discuss automated access to Amazon data please contact” in
print(“Page%s was blocked by Amazon. Please try using better proxies\n”%url)
else:
print(“Page%s must have been blocked by Amazon as the status code was%d”%(url, atus_code))
return None
# Pass the HTML of the page and create
return e. extract()
# product_data = []
with open(“”, ‘r’) as urllist, open(”, ‘w’) as outfile:
for url in adlines():
data = scrape(url)
if data:
(data, outfile)
(“\n”)
# sleep(5)
Running the Amazon Product Page Scraper
You can get the full code from Github – You can start your scraper by typing the command:
python3
Once the scrape is complete you should see a file called with your data. Here is an example for the URL
{
“name”: “2020 HP 15. 6\” Laptop Computer
Frequently Asked Questions about scrape amazon reviews python
16GB DDR4 RAM
512GB PCIe SSD
802. 11ac WiFi
Bluetooth 4. 2
Silver
Windows 10