HomeSearch

Java Download Web Pages: URL and openStream

Use the URL, URI and InputStream classes to download a web page. Read in an entire remove HTML file.
Download URL, openStream. A remote HTML file contains important information. With Java's URI and URL classes we can download it and use its contents in a String.
With openStream, we obtain a stream of the file contents. With a buffer array, we can create a string from the data we download. A StringBuilder here is helpful.
First program. This example implements a getPage method. It takes a file from a remote address and places it into a new String. There are some complexities in getPage.

URI: We first create a URI object from the address argument (a String). This is used to create a new URL object.

InputStream: We invoke openStream on our URL instance to get a readable stream of the file contents.

Read: We use a while-loop to read the InputStream into a byte array. We then append to a StringBuilder to get the total file.

Result: We can see that on the "Example" domain, it fetched the correct HTML document. The document is more than 1024 bytes.

Java program that downloads web page, uses StringBuilder import java.io.IOException; import java.io.InputStream; import java.net.URISyntaxException; import java.net.URL; import java.net.URI; public class Program { public static String getPage(String address) throws IOException, URISyntaxException { // Get URI and URL objects. URI uri = new URI(address); URL url = uri.toURL(); // Get stream of the response. InputStream in = url.openStream(); // Store results in StringBuilder. StringBuilder builder = new StringBuilder(); byte[] data = new byte[1024]; // Read in the response into the buffer. // ... Read many bytes each iteration. int c; while ((c = in.read(data, 0, 1024)) != -1) { builder.append(new String(data, 0, c)); } // Return String. return builder.toString(); } public static void main(String[] args) { try { String page = getPage("http://www.example.com/"); System.out.println(page); } catch (Exception ex) { System.out.println("ERROR"); } } } Output <!doctype html> <html> <head> <title>Example Domain</title> <meta charset="utf-8" />
Short example. I developed this program when learning to use URI and URL objects. It creates a BufferedInputStream from the InputStream.

However: It is unclear whether this approach has any advantage over using the InputStream directly.

Also: When you have a byte array, we can convert it into a String with the String constructor.

So: With this method, we can quickly download the first bytes of a document. This is helpful if we only need a small piece of a document.

Java program that uses URI, URL and InputStream import java.io.BufferedInputStream; import java.io.InputStream; import java.net.URL; import java.net.URI; public class Program { public static void main(String[] args) throws Exception { // Create URI and URL objects. URI uri = new URI("http://en.wikipedia.org/wiki/Main_Page"); URL url = uri.toURL(); InputStream in = url.openStream(); // Used a BufferedInputStream. BufferedInputStream reader = new BufferedInputStream(in); // Read in the first 200 bytes from the website. byte[] data = new byte[200]; reader.read(data, 0, 200); // Convert the bytes to a String. String result = new String(data); System.out.println(result); } } Output <!DOCTYPE html> <html lang="en" dir="ltr" class="client-nojs"> <head> <meta charset="UTF-8" /> <title>Wikipedia, the free encyclopedia</title> ...
To download web pages, we combine many classes. We use URI and URL objects to start, and an InputStream to get the data. A byte array is a suitable buffer.

And: A StringBuilder may also be used. In the getPage method above, we fetch an entire web page as a String.

Some notes. If only the first bytes of a web page are needed, it is probably best to avoid looping to get the entire file. This may also prevent errors with unusually long web pages.
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
Home
Dot Net Perls