Beautiful Soup Json

Parsing out specific values from JSON object in BeautifulSoup

import urllib
from urllib import request
from bs4 import BeautifulSoup
url = ”
html = request. urlopen(url)()
soup = BeautifulSoup(html)
Output:

{
“max_score”: 88. 84169,
“took”: 6,
“total”: 244,
“hits”: [
{
“_id”: “1017”,
“_score”: 88. 84169,
“entrezgene”: “1017”,
“name”: “cyclin dependent kinase 2”,
“symbol”: “CDK2”},
“_id”: “12566”,
“_score”: 73. 8155,
“entrezgene”: “12566”,
“name”: “cyclin-dependent kinase 2”,
“symbol”: “Cdk2”},
“_id”: “362817”,
“_score”: 62. 09322,
“entrezgene”: “362817”,
“symbol”: “Cdk2”}]}


Goal:
From this output, I would like to parse out the entrezgene, name, and symbol values
Question:
How do I go about accomplishing this?
Background:
I have tried and Python BeautifulSoup extract text between element to name a couple but I am not able to find what I am looking for
Parsing out specific values from JSON object in BeautifulSoup

Parsing out specific values from JSON object in BeautifulSoup

import urllib
from urllib import request
from bs4 import BeautifulSoup
url = ”
html = request. urlopen(url)()
soup = BeautifulSoup(html)
Output:

{
“max_score”: 88. 84169,
“took”: 6,
“total”: 244,
“hits”: [
{
“_id”: “1017”,
“_score”: 88. 84169,
“entrezgene”: “1017”,
“name”: “cyclin dependent kinase 2”,
“symbol”: “CDK2”},
“_id”: “12566”,
“_score”: 73. 8155,
“entrezgene”: “12566”,
“name”: “cyclin-dependent kinase 2”,
“symbol”: “Cdk2”},
“_id”: “362817”,
“_score”: 62. 09322,
“entrezgene”: “362817”,
“symbol”: “Cdk2”}]}


Goal:
From this output, I would like to parse out the entrezgene, name, and symbol values
Question:
How do I go about accomplishing this?
Background:
I have tried and Python BeautifulSoup extract text between element to name a couple but I am not able to find what I am looking for
Scrape Beautifully With Beautiful Soup In Python - Analytics India Magazine

Scrape Beautifully With Beautiful Soup In Python – Analytics India Magazine

Web Scraping is the process of collecting data from the internet by using various tools and frameworks. Sometimes, It is used for online price change monitoring, price comparison, and seeing how well the competitors are doing by extracting data from their websites.
Web Scraping is as old as the internet is, In 1989 World wide web was launched and after four years World Wide Web Wanderer: The first web robot was created at MIT by Matthew Gray, the purpose of this crawler is to measure the size of the worldwide gister>>
Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.
It was first introduced by Leonard Richardson, who is still contributing to this project and this project is additionally supported by Tidelift (a paid subscription tool for open-source maintenance)
Beautiful soup3 was officially released in May 2006, Latest version released by Beautiful Soup is 4. 9. 2, and it supports Python 3 and Python 2. 4 as well.
Advantage
Very fastExtremely lenientParses pages the same way a Browser doesPrettify the Source Code
Installation
For installing Beautiful Soup we need Python made framework for the same, and also some other supported or additional frameworks can be installed by given PIP command below:
pip install beautifulsoup4.
Other frameworks we need in the future to work with different parser and frameworks:
pip install selenium
pip install requests
pip install lxml
pip install html5lib
Quickstart
A small code to see how BeautifulSoup is faster than any other tools, we are extracting the source code from demoblaze
from bs4 import BeautifulSoupimport requests URL = “r = (URL)
soup = BeautifulSoup(ntent, ‘html5lib’)
print(ettify())
Now “. prettify()” is a built-in function provided by the Beautiful Soup module, it gives the visual representation of the parsed URL Source code. i. e. it arranges all the tags in a parse-tree manner with better readabilityprettify function
How to locate the data from the source code?
For Excluding unwanted data and scrap reliable information only, we have to inspect the webpage.
We can open the Inspect tab by doing any of the following in your Web browser:
Right Click on Webpage and Select InspectOr in Chrome, Go to the upper right side of your chrome browser screen and Click on the Menu bar -> More tools -> Developer + Shift + i
Now after opening the inspect tab, you can search the element you wish to extract from the webpage.
By just hovering through the webpage, we can select the elements; and corresponding code will be available like shown in the above image.
The title for all the articles is inside Class=”post-article”, and inside that, we have our article title in-between “span” tags.
With this method, we can look into web pages’ backend and explore all the data with just hover and watch functionality provided by Chrome browser Inspect tools.
In this example, we are going to use Selenium for browser automation & source code extraction purposes.
A full tutorial about selenium is available here.
Our purpose is to scrape all the Titles of articles from the Analytics India Magazine homepage.
#importing modules
from selenium import webdriver
from bs4 import BeautifulSoup
options = romeOptions()
d_argument(‘–ignore-certificate-errors’)
d_argument(‘–incognito’)
d_argument(‘–headless’)
driver = (chrome_options=options)
source (”)
ge_source
soup = BeautifulSoup(source_code, ‘lxml’)
article_block nd_all(‘div’, class_=’post-title’)
for titles in article_block:
title (‘span’). get_text()
print(title)
Let’s break down the above code line by line to understand how it can detect those article titles:
First, two lines were to import BeautifulSoup and Selenium.
Then we started the chrome Browser in Incognito, and headless mode means no chrome popup and surfing web URLs; instead, it will boot up the URL in the background.
Then with the help of Selenium driver, we loaded the given URL source code into “source_code” variable.
Note: We can extract given URL source code in many ways, but as we already know about selenium, So it’s easy to move forward with the same tool, and it has other functionalities too like scrolling through the hyperlinks and clicking elements.
Passing “source_code” variable into ‘BeautifulSoup’ with specifying the ”lxml” parser we are going to use for data processing, Now we are using the Beautiful soup function “Find” to find the ‘div’ tag having class ‘post-title’ as discussed above because article titles are inside this div container.
Now with a simple for loop, we are going to iterate through each article element and again with the help of “Find” we extract all the “span” tags containing title text. “get_text()” is used to trim the pre/post span tags we are getting with each iteration of finding titles.
After this, you can feed the data for data science work you can use this data to create a world, or maybe you can do text-analysis.
Conclusion
Beautiful Soup is a great tool for extracting very specific information from large unstructured raw Data, and also it is very fast and handy to use.
Its documentation is also very helpful if you want to continue your research.
You learned how to:
Install and setup the scraping environmentInspect the website to get elements nameParse the source code in Beautiful Soup to get trimmed resultsLive example of getting all the published article names from a website.
Join Our Discord Server. Be part of an engaging online community. Join Here.
Subscribe to our Newsletter
Get the latest updates and relevant offers by sharing your email.
Mohit Maithani
Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human’s daily problems with the help of technology.

Frequently Asked Questions about beautiful soup json

Can you use BeautifulSoup for json?

You can get the text which is in json format. Then use json. loads() to convert it to a Dictionary.Apr 6, 2019

How do I get json from BeautifulSoup?

“beautifulsoup extract json from script elements” Code Answerimport json.from bs4 import BeautifulSoup.html = ”'<script type=”application/json” data-initial-state=”review-filter”>More items…•Jan 8, 2021

What is BeautifulSoup library used for?

Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.Dec 4, 2020

Leave a Reply

Your email address will not be published. Required fields are marked *