Java - URL Class Example, Download Web Page

URL. A remote HTML file contains important information. With Java's URI and URL classes we can download it and use its contents in a String.

With openStream, we obtain a stream of the file contents. With a buffer array, we can create a string from the data we download. A StringBuilder here is helpful.

First program. This example implements a getPage method. It takes a file from a remote address and places it into a new String. There are some complexities in getPage.

Start We first create a URI object from the address argument (a String). This is used to create a new URL object.

Next We invoke openStream on our URL instance to get a readable stream of the file contents.

Then We use a while-loop to read the InputStream into a byte array. We then append to a StringBuilder to get the total file.

Result We can see that on the "Example" domain, it fetched the correct HTML document. The document is more than 1024 bytes.

import java.io.IOException;
import java.io.InputStream;
import java.net.URISyntaxException;
import java.net.URL;
import java.net.URI;

public class Program {

    public static String getPage(String address) throws IOException, URISyntaxException {
        // Get URI and URL objects.
        URI uri = new URI(address);
        URL url = uri.toURL();

        // Get stream of the response.
        InputStream in = url.openStream();

        // Store results in StringBuilder.
        StringBuilder builder = new StringBuilder();
        byte[] data = new byte[1024];

        // Read in the response into the buffer.
        // ... Read many bytes each iteration.
        int c;
        while ((c = in.read(data, 0, 1024)) != -1) {
            builder.append(new String(data, 0, c));
        }

        // Return String.
        return builder.toString();
    }

    public static void main(String[] args) {

        try {
            String page = getPage("http://www.example.com/");
            System.out.println(page);
        } catch (Exception ex) {
            System.out.println("ERROR");
        }
    }
}<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />

Short example. I developed this program when learning to use URI and URL objects. It creates a BufferedInputStream from the InputStream.

However It is unclear whether this approach has any advantage over using the InputStream directly.

Also When you have a byte array, we can convert it into a String with the String constructor.

So With this method, we can quickly download the first bytes of a document. This is helpful if we only need a small piece of a document.

import java.io.BufferedInputStream;
import java.io.InputStream;
import java.net.URL;
import java.net.URI;

public class Program {
    public static void main(String[] args) throws Exception {

        // Create URI and URL objects.
        URI uri = new URI("http://en.wikipedia.org/wiki/Main_Page");
        URL url = uri.toURL();
        InputStream in = url.openStream();

        // Used a BufferedInputStream.
        BufferedInputStream reader = new BufferedInputStream(in);

        // Read in the first 200 bytes from the website.
        byte[] data = new byte[200];
        reader.read(data, 0, 200);

        // Convert the bytes to a String.
        String result = new String(data);
        System.out.println(result);
    }
}<!DOCTYPE html>
<html lang="en" dir="ltr" class="client-nojs">
<head>
<meta charset="UTF-8" />
<title>Wikipedia, the free encyclopedia</title>
...

To download web pages, we combine many classes. We use URI and URL objects to start, and an InputStream to get the data. A byte array is a suitable buffer.

And A StringBuilder may also be used. In the getPage method above, we fetch an entire web page as a String.

Some notes. If only the first bytes of a web page are needed, it is probably best to avoid looping to get the entire file. This may also prevent errors with unusually long web pages.