Java Download Web Pages: URL and openStream

This Java tutorial uses the URL, URI and InputStream classes to download a web page. It reads in an entire remove HTML file.
Download URL, openStream. A remote HTML file contains important information. With Java's URI and URL classes we can download it and use its contents in a String.
With openStream, we obtain a stream of the file contents. With a buffer array, we can create a string from the data we download. A StringBuilder here is helpful.
First program. This example implements a getPage method. It takes a file from a remote address and places it into a new String. There are some complexities in getPage.

URI: We first create a URI object from the address argument (a String). This is used to create a new URL object.

InputStream: We invoke openStream on our URL instance to get a readable stream of the file contents.

Read: We use a while-loop to read the InputStream into a byte array. We then append to a StringBuilder to get the total file.

Java program that downloads web page, uses StringBuilder import java.io.IOException; import java.io.InputStream; import java.net.URISyntaxException; import java.net.URL; import java.net.URI; public class Program { public static String getPage(String address) throws IOException, URISyntaxException { // Get URI and URL objects. URI uri = new URI(address); URL url = uri.toURL(); // Get stream of the response. InputStream in = url.openStream(); // Store results in StringBuilder. StringBuilder builder = new StringBuilder(); byte[] data = new byte[1024]; // Read in the response into the buffer. // ... Read many bytes each iteration. int c; while ((c = in.read(data, 0, 1024)) != -1) { builder.append(new String(data, 0, c)); } // Return String. return builder.toString(); } public static void main(String[] args) { try { String page = getPage("http://www.example.com/"); System.out.println(page); } catch (Exception ex) { System.out.println("ERROR"); } } } Output <!doctype html> <html> <head> <title>Example Domain</title> <meta charset="utf-8" />
Results. With the getPage method above, we can see that on the "Example" domain, it fetched the correct HTML document. The document is more than 1024 bytes, so it shows the loop works.
Short example. I developed this program when learning to use URI and URL objects. It creates a BufferedInputStream from the InputStream.

However: It is unclear whether this approach has any advantage over using the InputStream directly.

Also: When you have a byte array, we can convert it into a String with the String constructor.

So: With this method, we can quickly download the first bytes of a document. This is helpful if we only need a small piece of a document.

Java program that uses URI, URL and InputStream import java.io.BufferedInputStream; import java.io.InputStream; import java.net.URL; import java.net.URI; public class Program { public static void main(String[] args) throws Exception { // Create URI and URL objects. URI uri = new URI("http://en.wikipedia.org/wiki/Main_Page"); URL url = uri.toURL(); InputStream in = url.openStream(); // Used a BufferedInputStream. BufferedInputStream reader = new BufferedInputStream(in); // Read in the first 200 bytes from the website. byte[] data = new byte[200]; reader.read(data, 0, 200); // Convert the bytes to a String. String result = new String(data); System.out.println(result); } } Output <!DOCTYPE html> <html lang="en" dir="ltr" class="client-nojs"> <head> <meta charset="UTF-8" /> <title>Wikipedia, the free encyclopedia</title> ...
To download web pages, we combine many classes. We use URI and URL objects to start, and an InputStream to get the data. A byte array is a suitable buffer.

And: A StringBuilder may also be used. In the getPage method above, we fetch an entire web page as a String.

Some notes. If only the first bytes of a web page are needed, it is probably best to avoid looping to get the entire file. This may also prevent errors with unusually long web pages.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
HomeSearch
Home
Dot Net Perls