Python Lxml

lxml – Processing XML and HTML with Python

lxml is the most feature-rich
and easy-to-use library
for processing XML and HTML
in the Python language.
The lxml XML toolkit is a Pythonic binding for the C libraries
libxml2 and libxslt. It is unique in that it combines the speed and
XML feature completeness of these libraries with the simplicity of a
native Python API, mostly compatible but superior to the well-known
ElementTree API. The latest release works with all CPython versions
from 2. 7 to 3. 9. See the introduction for more information about
background and goals of the lxml project. Some common questions are
answered in the FAQ.
lxml has been downloaded from the Python Package Index
millions of times and is also available directly in many package
distributions, e. g. for Linux or macOS.
Most people who use lxml do so because they like using it.
You can show us that you like it by blogging about your experience
with it and linking to the project website.
If you are using lxml for your work and feel like giving a bit of
your own benefit back to support the project, consider sending us
money through GitHub Sponsors, Tidelift or PayPal that we can use
to buy us free time for the maintenance of this great library, to
fix bugs in the software, review and integrate code contributions,
to improve its features and documentation, or to just take a deep
breath and have a cup of tea every once in a while.
Please read the Legal Notice below, at the bottom of this page.
Thank you for your support.
Support lxml through GitHub Sponsors
via a Tidelift subscription
or via PayPal:
Please contact Stefan Behnel
for other ways to support the lxml project,
as well as commercial consulting, customisations and trainings on lxml and
fast Python XML processing.
Travis-CI and AppVeyor
support the lxml project with their build and CI servers.
Jetbrains supports the lxml project by donating free licenses of their
PyCharm IDE.
Another supporter of the lxml project is
COLOGNE Webdesign.
The complete lxml documentation is available for download as PDF
documentation. The HTML documentation from this web site is part of
the normal source download.
Tutorials:
the tutorial for XML processing
John Shipman’s tutorial on Python XML processing with lxml
Fredrik Lundh’s tutorial for ElementTree
ElementTree:
ElementTree API
compatibility and differences of
ElementTree performance characteristics and comparison
specific API documentation
the generated API documentation as a reference
parsing and validating XML
XPath and XSLT support
Python XPath extension functions for XPath and XSLT
custom XML element classes for custom XML APIs (see EuroPython 2008 talk)
a SAX compliant API for interfacing with other XML tools
a C-level API for interfacing with external C/Cython modules
lxml. objectify:
lxml. objectify API documentation
a brief comparison of objectify and etree
follows the ElementTree API as much as possible, building
it on top of the native libxml2 tree. If you are new to ElementTree,
start with the tutorial for XML processing. See also the
ElementTree compatibility overview and the ElementTree performance
page comparing lxml to the original ElementTree and cElementTree
implementations.
Right after the tutorial for XML processing and the
ElementTree documentation, the next place to look is the
specific API documentation. It describes how lxml extends the
ElementTree API to expose libxml2 and libxslt specific XML
functionality, such as XPath, Relax NG, XML Schema, XSLT, and
c14n (including c14n 2. 0).
Python code can be called from XPath expressions and XSLT
stylesheets through the use of XPath extension functions. lxml
also offers a SAX compliant API, that works with the SAX support in
the standard library.
There is a separate module lxml. objectify that implements a data-binding
API on top of See the objectify and etree FAQ entry for a
comparison.
In addition to the ElementTree API, lxml also features a sophisticated
API for custom XML element classes. This is a simple way to write
arbitrary XML driven APIs on top of lxml. also has a
C-level API that can be used to efficiently extend in
external C modules, including fast custom element class support.
The best way to download lxml is to visit lxml at the Python Package
Index (PyPI). It has the source
that compiles on various platforms. The source distribution is signed
with this key.
The latest version is lxml 4. 6. 3, released 2021-03-21
(changes for 4. 3). Older versions
are listed below.
Please take a look at the
installation instructions!
This complete web site (including the generated API documentation) is
part of the source distribution, so if you want to download the
documentation for offline use, take the source archive and copy the
doc/html directory out of the source tree, or use the
PDF documentation.
The latest installable developer sources
are available from Github. It’s also possible to check out
the latest development version of lxml from Github directly, using a command
like this (assuming you use hg and have hg-git installed):
hg clone git+ssh lxml
Alternatively, if you use git, this should work as well:
git clone lxml
You can browse the source repository and its history through
the web. Please read how to build lxml from source
first. The latest CHANGES of the developer version are also
accessible. You can check there if a bug you found has been fixed
or a feature you want has been implemented in the latest trunk version.
Questions? Suggestions? Code to contribute? We have a mailing list.
You can search the archive with Gmane or Google.
lxml uses the launchpad bug tracker. If you are sure you found a
bug in lxml, please file a bug report there. If you are not sure
whether some unexpected behaviour of lxml is a bug or not, please
check the documentation and ask on the mailing list first. Do not
forget to search the archive (e. with Gmane)!
The lxml library is shipped under a BSD license. libxml2 and libxslt2
itself are shipped under the MIT license. There should therefore be no
obstacle to using lxml in your codebase.
See the websites of lxml
4. 5,
4. 4,
4. 3,
4. 2,
4. 1,
4. 0,
3. 8,
3. 7,
3. 6,
3. 5,
3. 4,
3. 3,
3. 2,
3. 1,
3. 0,
2. 3,
2. 2,
2. 1,
2. 0,
1. 3
lxml 4. 3, released 2021-03-21 (changes for 4. 3)
lxml 4. 2, released 2020-11-26 (changes for 4. 2)
lxml 4. 1, released 2020-10-18 (changes for 4. 1)
lxml 4. 0, released 2020-10-17 (changes for 4. 0)
lxml 4. 5. 2, released 2020-07-09 (changes for 4. 1, released 2020-05-19 (changes for 4. 0, released 2020-01-29 (changes for 4. 4. 3, released 2020-01-28 (changes for 4. 2, released 2019-11-25 (changes for 4. 1, released 2019-08-11 (changes for 4. 0, released 2019-07-27 (changes for 4. 0)
older releases
Total project income in 2019: EUR 717. 52 (59. 79 € / month)
Tidelift: EUR 360. 30
Paypal: EUR 157. 22
other: EUR 200. 00
Any donation that you make to the lxml project is voluntary and
is not a fee for any services, goods, or advantages. By making
a donation to the lxml project, you acknowledge that we have the
right to use the money you donate in any lawful way and for any
lawful purpose we see fit and we are not obligated to disclose
the way and purpose to any party unless required by applicable
law. Although lxml is free software, to the best of our knowledge
the lxml project does not have any tax exempt status. The lxml
project is neither a registered non-profit corporation nor a
registered charity in any country. Your donation may or may not
be tax-deductible; please consult your tax advisor in this matter.
We will not publish or disclose your name and/or e-mail address
without your consent, unless required by applicable law. Your
donation is non-refundable.
lxml - Processing XML and HTML with Python

lxml – Processing XML and HTML with Python

lxml is the most feature-rich
and easy-to-use library
for processing XML and HTML
in the Python language.
The lxml XML toolkit is a Pythonic binding for the C libraries
libxml2 and libxslt. It is unique in that it combines the speed and
XML feature completeness of these libraries with the simplicity of a
native Python API, mostly compatible but superior to the well-known
ElementTree API. The latest release works with all CPython versions
from 2. 7 to 3. 9. See the introduction for more information about
background and goals of the lxml project. Some common questions are
answered in the FAQ.
lxml has been downloaded from the Python Package Index
millions of times and is also available directly in many package
distributions, e. g. for Linux or macOS.
Most people who use lxml do so because they like using it.
You can show us that you like it by blogging about your experience
with it and linking to the project website.
If you are using lxml for your work and feel like giving a bit of
your own benefit back to support the project, consider sending us
money through GitHub Sponsors, Tidelift or PayPal that we can use
to buy us free time for the maintenance of this great library, to
fix bugs in the software, review and integrate code contributions,
to improve its features and documentation, or to just take a deep
breath and have a cup of tea every once in a while.
Please read the Legal Notice below, at the bottom of this page.
Thank you for your support.
Support lxml through GitHub Sponsors
via a Tidelift subscription
or via PayPal:
Please contact Stefan Behnel
for other ways to support the lxml project,
as well as commercial consulting, customisations and trainings on lxml and
fast Python XML processing.
Travis-CI and AppVeyor
support the lxml project with their build and CI servers.
Jetbrains supports the lxml project by donating free licenses of their
PyCharm IDE.
Another supporter of the lxml project is
COLOGNE Webdesign.
The complete lxml documentation is available for download as PDF
documentation. The HTML documentation from this web site is part of
the normal source download.
Tutorials:
the tutorial for XML processing
John Shipman’s tutorial on Python XML processing with lxml
Fredrik Lundh’s tutorial for ElementTree
ElementTree:
ElementTree API
compatibility and differences of
ElementTree performance characteristics and comparison
specific API documentation
the generated API documentation as a reference
parsing and validating XML
XPath and XSLT support
Python XPath extension functions for XPath and XSLT
custom XML element classes for custom XML APIs (see EuroPython 2008 talk)
a SAX compliant API for interfacing with other XML tools
a C-level API for interfacing with external C/Cython modules
lxml. objectify:
lxml. objectify API documentation
a brief comparison of objectify and etree
follows the ElementTree API as much as possible, building
it on top of the native libxml2 tree. If you are new to ElementTree,
start with the tutorial for XML processing. See also the
ElementTree compatibility overview and the ElementTree performance
page comparing lxml to the original ElementTree and cElementTree
implementations.
Right after the tutorial for XML processing and the
ElementTree documentation, the next place to look is the
specific API documentation. It describes how lxml extends the
ElementTree API to expose libxml2 and libxslt specific XML
functionality, such as XPath, Relax NG, XML Schema, XSLT, and
c14n (including c14n 2. 0).
Python code can be called from XPath expressions and XSLT
stylesheets through the use of XPath extension functions. lxml
also offers a SAX compliant API, that works with the SAX support in
the standard library.
There is a separate module lxml. objectify that implements a data-binding
API on top of See the objectify and etree FAQ entry for a
comparison.
In addition to the ElementTree API, lxml also features a sophisticated
API for custom XML element classes. This is a simple way to write
arbitrary XML driven APIs on top of lxml. also has a
C-level API that can be used to efficiently extend in
external C modules, including fast custom element class support.
The best way to download lxml is to visit lxml at the Python Package
Index (PyPI). It has the source
that compiles on various platforms. The source distribution is signed
with this key.
The latest version is lxml 4. 6. 3, released 2021-03-21
(changes for 4. 3). Older versions
are listed below.
Please take a look at the
installation instructions!
This complete web site (including the generated API documentation) is
part of the source distribution, so if you want to download the
documentation for offline use, take the source archive and copy the
doc/html directory out of the source tree, or use the
PDF documentation.
The latest installable developer sources
are available from Github. It’s also possible to check out
the latest development version of lxml from Github directly, using a command
like this (assuming you use hg and have hg-git installed):
hg clone git+ssh lxml
Alternatively, if you use git, this should work as well:
git clone lxml
You can browse the source repository and its history through
the web. Please read how to build lxml from source
first. The latest CHANGES of the developer version are also
accessible. You can check there if a bug you found has been fixed
or a feature you want has been implemented in the latest trunk version.
Questions? Suggestions? Code to contribute? We have a mailing list.
You can search the archive with Gmane or Google.
lxml uses the launchpad bug tracker. If you are sure you found a
bug in lxml, please file a bug report there. If you are not sure
whether some unexpected behaviour of lxml is a bug or not, please
check the documentation and ask on the mailing list first. Do not
forget to search the archive (e. with Gmane)!
The lxml library is shipped under a BSD license. libxml2 and libxslt2
itself are shipped under the MIT license. There should therefore be no
obstacle to using lxml in your codebase.
See the websites of lxml
4. 5,
4. 4,
4. 3,
4. 2,
4. 1,
4. 0,
3. 8,
3. 7,
3. 6,
3. 5,
3. 4,
3. 3,
3. 2,
3. 1,
3. 0,
2. 3,
2. 2,
2. 1,
2. 0,
1. 3
lxml 4. 3, released 2021-03-21 (changes for 4. 3)
lxml 4. 2, released 2020-11-26 (changes for 4. 2)
lxml 4. 1, released 2020-10-18 (changes for 4. 1)
lxml 4. 0, released 2020-10-17 (changes for 4. 0)
lxml 4. 5. 2, released 2020-07-09 (changes for 4. 1, released 2020-05-19 (changes for 4. 0, released 2020-01-29 (changes for 4. 4. 3, released 2020-01-28 (changes for 4. 2, released 2019-11-25 (changes for 4. 1, released 2019-08-11 (changes for 4. 0, released 2019-07-27 (changes for 4. 0)
older releases
Total project income in 2019: EUR 717. 52 (59. 79 € / month)
Tidelift: EUR 360. 30
Paypal: EUR 157. 22
other: EUR 200. 00
Any donation that you make to the lxml project is voluntary and
is not a fee for any services, goods, or advantages. By making
a donation to the lxml project, you acknowledge that we have the
right to use the money you donate in any lawful way and for any
lawful purpose we see fit and we are not obligated to disclose
the way and purpose to any party unless required by applicable
law. Although lxml is free software, to the best of our knowledge
the lxml project does not have any tax exempt status. The lxml
project is neither a registered non-profit corporation nor a
registered charity in any country. Your donation may or may not
be tax-deductible; please consult your tax advisor in this matter.
We will not publish or disclose your name and/or e-mail address
without your consent, unless required by applicable law. Your
donation is non-refundable.
lxml - PyPI

lxml – PyPI

Project description
lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It
provides safe and convenient access to these libraries using the ElementTree
API.
It extends the ElementTree API significantly to offer support for XPath,
RelaxNG, XML Schema, XSLT, C14N and much more.
To contact the project, go to the project home page or see our bug tracker at
In case you want to use the current in-development version of lxml,
you can get it from the github repository at. Note that this requires Cython to
build the sources, see the build instructions on the project home
page. To the same end, running easy_install lxml==dev will
install lxml from
if you have
an appropriate version of Cython installed.
After an official release of a new stable series, bug fixes may become
available at.
Running easy_install lxml==4. 6bugfix will install
the unreleased branch state from
as soon as a maintenance branch has been established. Note that this
requires Cython to be installed at an appropriate version for the build.
4. 6. 3 (2021-03-21)
Bugs fixed
A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung,
which allowed JavaScript to pass through. The cleaner now removes the HTML5
formaction attribute.
Download files
Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.
Files for lxml, version 4. 3
Filename, size
File type
Python version
Upload date
Hashes
(4. 5 MB)
Wheel
cp27
Mar 21, 2021
View
(5. 3 MB)
(5. 5 MB)
(3. 2 MB)
(3. 5 MB)
(7. 0 MB)
cp310
Aug 12, 2021
(6. 9 MB)
cp35
Mar 24, 2021
(6. 4 MB)
May 2, 2021
(6. 3 MB)
cp36
(5. 4 MB)
(6. 5 MB)
(6. 7 MB)
cp37
(4. 6 MB)
cp38
(7. 3 MB)
(6. 8 MB)
cp39
(7. 4 MB)
Source
None
Mar 22, 2021
View

Frequently Asked Questions about python lxml

What is lxml in Python?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping.Apr 10, 2019

What is difference between XML and lxml?

For most normal XML operations including building document trees and simple searching and parsing of element attributes and node values, even namespaces, ElementTree is a reliable handler. Lxml is a third-party module that requires installation.Apr 2, 2018

How do you use lxml in Python?

Implementing web scraping using lxml in PythonSend a link and get the response from the sent link.Then convert response object to a byte string.Pass the byte string to ‘fromstring’ method in html class in lxml module.Get to a particular element by xpath.Use the content according to your need.Oct 5, 2021

Leave a Reply

Your email address will not be published. Required fields are marked *