cURL: What It Is, And How You Can Use It For Web Scraping
cURL is a versatile command used by programmers for data collection and data transfers. But how can you leverage cURL for web scraping? This article will help you get started.
In this blog post you will learn:
What is cURL?
How to use cURL?
Why is cURL so popular?
Web scraping with cURL
What Is cURL?
cURL is a command-line tool that you can use to transfer data via network protocols. The name cURL stands for ‘Client URL’, and is also written as ‘curl’. This popular command uses URL syntax to transfer data to and from servers. Curl is powered by ‘libcurl’, a free and easy-to-use client-side URL transfer library.
Why using curl is advantageous?
The versatility of this command means you can use curl for a variety of use cases, including:
The simplest ‘use case’ for curl would be downloading and uploading entire websites using one of the supported protocols.
While curl has a long list of supported protocols it will use HTTP by default if you don’t provide a specific protocol. Here is the list of supported protocols:
Image source: Bright Data
The curl command is installed by default in Linux distributions.
How do you check if you already have curl installed?
1. Open your Linux console
2. Type ‘curl’, and press ‘enter’.
3. If you already have curl installed, you will see the following message:
4. If you don’t have curl installed already, you will see the following message: ‘command not found’. You can then turn to your distribution package and install it (more details below).
How to use cURL
Curl’s syntax is pretty simple:
For example, if you want to download a webpage: just run:
The command will then give you the source code of the page in your terminal window. Keep in mind that if you don’t specify a protocol, curl will default to HTTP. Below you can find an example of how to define specific protocols:
If you forget to add the curl will guess the protocol you want to use.
We talked briefly about the basic use of the command, but you can find a list of options on the curl documentation site. The options are the possible actions you can perform on the URL. When you choose an option, it tells curl what action to take on the URL you listed. The URL tells cURL where it needs to perform this action. Then cURL lets you list one or several URLs.
To download multiple URLs, prefix each URL with a -0 followed by a space. You can do this in a single line or write a different line for each URL. You can also download part of a URL by listing the pages. For example:
Saving the download
You can save the content of the URL to a file by using curl using two different methods:
1. -o method: Allows you to add a filename where the URL will be saved. This option has the following structure:
2. -O method: Here you don’t need to add a filename, since this option allows you to save the file under the URL name. To use this option, you just need to prefix the URL with a -O.
Resuming the download
It may happen that your download stops in the middle. In this case scenario, rewrite the command adding the -C option at the beginning:
Why is curl so popular?
Curl is really the ‘swiss-knife’ of commands, created for complex operations. However, there are alternatives, for example, ‘wget’ or ‘Kurly’, that are good for simpler tasks.
Curl is a favorite among developers because it is available for almost every platform. Sometimes it is even installed by default. This means, whatever programs/jobs you are running, curl commands should work.
Also, chances are that if your OS is less than a decade old, you will have curl installed. You can also read the docs in a browser, and check the curl documentation. If you are running a recent version of Windows, you probably already have curl installed. If you don’t, check out this post on Stack Overflow to learn more about how to do this.
Web Scraping with cURL
Pro tip: Be sure to abide by a website’s rules, and in general do not try to access password-protected content which is illegal for the most part or at the very least frowned upon.
You can use curl to automate the repetitive process when web scraping, helping you avoid tedious tasks. For that, you will need to use PHP. Here’s an example we found on GitHub:
When you use curl to scrape a webpage there are three options, you should use:
curl_init($url) -> Initializes the session
curl_exec() -> Executes
curl_close() -> Closes
Image source: Bright DataOther options you should use include:
Curlopt_url -> Sets the URL you want to scrape
Curlopt_returntransfer -> Tells curl to save the scraped page as a variable. (This enables you to get exactly what you wanted to extract from the page. )
In this post, we explained what curl is and what you can do with some basic commands. We also showed you an example of how you can use curl to scrape web pages. Start taking advantage of this versatile tool to start collecting your target data.
Tired of complex and timely web scraping techniques?
Gal El Al | Head of Support Head of Support at Bright Data with a demonstrated history of working in the computer and network security industry. Specializing in billing processes, technical support, quality assurance, account management, as well as helping customers streamline their data collection efforts while simultaneously improving cost efficiency.
cURL: What is It and How do I Use It? | pair Networks Blog
What is curl?
cURL, often just “curl, ” is a free command line tool. It uses URL syntax to transfer data to and from servers. curl is a widely used because of its ability to be flexible and complete complex tasks. For example, you can use curl for things like user authentication, HTTP post, SSL connections, proxy support, FTP uploads, and more! You can also do simple things with curl, such as download web pages and web images. Read on to find out if you should use curl, and if so, common use cases that will get you started.
Should You Use curl?
Whether or not you should use curl depends on your goals. For simpler goals, you may want to check out wget. curl is great for complex operations since it is scriptable and versatile. However, wget is easier to understand and more user-friendly, so we recommend using it for simpler tasks.
curl has many different supported protocols. However, curl will use HTTP protocol by default if no protocol is provided. For example, if you run the following example, it would download the homepage of
You can call a specific protocol by prefacing the URL with the protocol name.
curl The example above uses the HTTP protocol. If you want to use a different protocol, switch HTTP out for another. For example, if you wanted to use the FTP protocol, it would look like this:
curl curl will also try different protocols if the default protocol doesn’t work. If you give it hints, curl can guess what protocol you want to use.
For example, if you wrote the following command, curl would be able to intelligently guess that you wanted to use the FTP protocol.
Here’s a list of curl supported protocols:
Basics: How to Use curl
We touched on how to use curl protocols briefly, which may have given you some idea on how to use curl. At its most basic, curl tends to follow this format:
curl [option] [url]
You can find a list of possible options on the curl documentation site. Options will direct curl to perform certain actions on the URL listed. The URL gives curl the path to the server it should perform the action on. You can list one URL, several URLs, or parts of a URL, depending on the nature of your option.
Listing more than one URL:
curl -O -O Listing different parts of a URL:
page1, page2, page3}
Saving URL to File
You can also use curl to save the content of the URL to a file. There are two different methods for doing this: the -o and the -O method. When you use the -o option, you can add a filename that the URL will be saved as. A command using the -o option would look something like this:
curl -o Notice that the filename the URL will be saved in is placed between the -o option and the URL.
The -O method allows you to save the file under the same name as the URL. When using the -O option, no filename is required between the option and the URL. Instead, the command will look something like this:
curl -O Continuing a Download
If your download is stopped, you can restart it again with a simple curl command. It’s very simple, all you need to do is rewrite the command with the addition of the -C option.
If you were saving a URL, but the process was halted, you can restart the process by typing in the following:
curl -C -O This would pick the process back up where it had halted before.
Specify Time Frame for Download
Download files before or after a certain time by using curl. To do this, use the -z option, then list the date.
curl -z 25-Jan-18 The -z option will search for files after the designated time frame by default. To search for files before the time listed, you can add a dash in front of the date. It will look like this:
curl -z -25-Jan-18 Showing curl Output
curl will often not show any output after you have executed a command, which can be frustrating if you are trying to learn the ropes. The good news? curl has an option that allows you to view curl as it works.
You just need to add a -v to the command to view curl’s internal runnings as it executes. This can be especially helpful when you receive a response from curl that you didn’t anticipate. By viewing curl with -v, you can see what curl is actually doing behind the scenes. Simply run the command to turn it on.
Here’s an example of what a command with the -v option would look like:
curl -v If you get tired of seeing the internal workings of curl, you can also turn this feature off by using the –no-verbose option. Just switch the -v option out for –no-verbose, and curl will stop showing the internal process.
curl –no-verbose curl in Review
curl is a powerful, flexible tool. The commands touched on here were only the tip of the iceberg – curl has the ability to work with a multitude of protocols and, while we only touched on HTTP-specific options. Stay tuned for more blog posts about curl in the future. You can be notified as soon as a new blog post comes out.
If you like the idea of curl, but think it might be too complex for you, check out our article on using wget.
Curl Command in Linux with Examples
curl is a command-line utility for transferring data from or to a server designed to work without user interaction. With curl, you can download or upload data using one of the supported protocols including HTTP, HTTPS, SCP, SFTP, and FTP. curl provides a number of options allowing you to resume transfers, limit the bandwidth, proxy support, user authentication, and much this tutorial, we will show you how to use the curl tool through practical examples and detailed explanations of the most common curl stalling Curl The curl package is pre-installed on most Linux distributions check whether the Curl package is installed on your system, open up your console, type curl, and press enter. If you have curl installed, the system will print curl: try ‘curl –help’ or ‘curl –manual’ for more information. Otherwise, you will see something like curl command not curl is not installed you can easily install it using the package manager of your stall Curl on Ubuntu and Debian sudo apt updatesudo apt install curlInstall Curl on CentOS and Fedora sudo yum install curlHow to Use Curl The syntax for the curl command is as follows:In its simplest form, when invoked without any option, curl displays the specified resource to the standard example, to retrieve the homepage you would run:curl command will print the source code of the homepage in your terminal no protocol is specified, curl tries to guess the protocol you want to use, and it will default to the Output to a File To save the result of the curl command, use either the -o or -O option. Lowercase -o saves the file with a predefined filename, which in the example below is -o -O saves the file with its original filename:curl -O Multiple files To download multiple files at once, use multiple -O options, followed by the URL to the file you want to the following example we are downloading the Arch Linux and Debian iso files:curl -O \ -O a Download You can resume a download by using the -C – option. This is useful if your connection drops during the download of a large file, and instead of starting the download from scratch, you can continue the previous example, if you are downloading the Ubuntu 18. 04 iso file using the following command:curl -O suddenly your connection drops you can resume the download with:curl -C – -O headers are colon-separated key-value pairs containing information such as user agent, content type, and encoding. Headers are passed between the client and the server with the request or the the -I option to fetch only the HTTP headers of the specified resource:curl -I –2 if a Website Supports HTTP/2 To check whether a particular URL supports the new HTTP/2 protocol, fetch the HTTP Headers with -I along with the –2 option:curl -I –2 -s | grep HTTPThe -s option tells curl to run in a silent (quiet) and hide the progress meter and error the remote server supports HTTP/2, curl prints HTTP/2. 0 200:HTTP/2 200
Otherwise, the response is HTTP/1. 1 200:HTTP/1. 1 200 OK
If you have curl version 7. 47. 0 or newer, you do not need to use the –2 option because HTTP/2 is enabled by default for all HTTPS Redirects By default, curl doesn’t follow the HTTP Location you try to retrieve the non-www version of, you will notice that instead of getting the source of the page you’ll be redirected to the www version:curl -L option instructs curl to follow any redirect until it reaches the final destination:curl -L mChange the User-Agent Sometimes when downloading a file, the remote server may be set to block the Curl User-Agent or to return different contents depending on the visitor device and situations like this to emulate a different browser, use the -A example to emulates Firefox 60 you would use:curl -A “Mozilla/5. 0 (X11; Linux x86_64; rv:60. 0) Gecko/20100101 Firefox/60. 0” a Maximum Transfer Rate The –limit-rate option allows you to limit the data transfer rate. The value can be expressed in bytes, kilobytes with the k suffix, megabytes with the m suffix, and gigabytes with the g the following example curl will download the Go binary and limit the download speed to 1 mb:curl –limit-rate 1m -O option is useful to prevent curl consuming all the available ansfer Files via FTP To access a protected FTP server with curl, use the -u option and specify the username and password as shown below:curl -u FTP_USERNAME:FTP_PASSWORD logged in, the command lists all files and directories in the user’s home can download a single file from the FTP server using the following syntax:curl -u FTP_USERNAME:FTP_PASSWORD upload a file to the FTP server, use the -T followed by the name of the file you want to upload:curl -T -u FTP_USERNAME:FTP_PASSWORD Cookies Sometimes you may need to make an HTTP request with specific cookies to access a remote resource or to debug an default, when requesting a resource with curl, no cookies are sent or send cookies to the server, use the -b switch followed by a filename containing the cookies or a example, to download the Oracle Java JDK rpm file
you’ll need to pass a cookie named oraclelicense with value a:curl -L -b “oraclelicense=a” -O Proxies curl supports different types of proxies, including HTTP, HTTPS and SOCKS. To transfer data through a proxy server, use the -x (–proxy) option, followed by the proxy following command downloads the specified resource using a proxy on 192. 168. 44. 1 port 8888:curl -x 192. 1:8888 the proxy server requires authentication, use the -U (–proxy-user) option followed by the user name and password separated by a colon (user:password):curl -U username:password -x 192. 1:8888 curl is a command-line tool that allows you to transfer data from or to a remote host. It is useful for troubleshooting issues, downloading files, and examples shown in this tutorial are simple, but demonstrate the most used curl options and are meant to help you understand how the curl command more information about curl visit the Curl Documentation
you have any questions or feedback, feel free to leave a comment.
Frequently Asked Questions about what does curl do
Why do we use curl?
curl is a widely used because of its ability to be flexible and complete complex tasks. For example, you can use curl for things like user authentication, HTTP post, SSL connections, proxy support, FTP uploads, and more! You can also do simple things with curl, such as download web pages and web images.Jan 26, 2018
What does curl do in Linux?
curl is a command-line utility for transferring data from or to a server designed to work without user interaction. With curl , you can download or upload data using one of the supported protocols including HTTP, HTTPS, SCP , SFTP , and FTP .Nov 27, 2019
What is wget and curl?
Wget solely lets you download files from an HTTP / HTTPS or FTP server. You give it a link and it automatically downloads the file where the link points to. It builds the request automatically. curl. Curl in contrast to wget lets you build the request as you wish.