In this short tutorial, we'll see how to do Base64 encoding and decoding of a PDF file using Java 8 and Apache Commons Codec. Show But first, let's take a quick peek at the basics of Base64. 2. Basics of Base64When sending data over the wire, we need to send it in the binary format. But if we send just 0's and 1's, different transport layer protocols may interpret them differently and our data might get corrupted in flight. So, to have portability and common standards while transferring binary data, Base64 came in the picture. Since the sender and receiver both understand and have agreed upon using the standard, the probability of our data getting lost or misinterpreted is greatly reduced. Now let's see a couple of ways to apply this to a PDF. 3. Conversion Using Java 8Starting with Java 8, we have a utility java.util.Base64 that provides encoders and decoders for the Base64 encoding scheme. It supports Basic, URL safe and MIME types as specified in RFC 4648 and RFC 2045. 3.1. EncodingTo convert a PDF into Base64, we first need to get it in bytes and pass it through java.util.Base64.Encoder‘s encode method:
Here, IN_FILE is the path to our input PDF. 3.2. Streaming EncodingFor larger files or systems with limited memory, it's much more efficient to perform the encoding using a stream instead of reading all the data in memory. Let's look at how to accomplish this:
Here, IN_FILE is the path to our input PDF, and OUT_FILE is the path to a file containing the Base64-encoded document. Instead of reading the entire PDF into memory and then encoding the full document in memory, we are reading up to 1Kb of data at a time and passing that data through the encoder into the OutputStream. 3.3. DecodingAt the receiving end, we get the encoded file. So we now need to decode it to get back our original bytes and write them to a FileOutputStream to get the decoded PDF:
Here, OUT_FILE is the path to our PDF to be created. 4. Conversion Using Apache CommonsNext, we'll be using the Apache Commons Codec package to achieve the same. It's based on RFC 2045 and predates the Java 8 implementation we discussed earlier. So, when we need to support multiple JDK versions (including legacy ones) or vendors, this comes in handy as a third-party API. 4.1. MavenTo be able to use the Apache library, we need to add a dependency to our pom.xml:
The latest version of the above can be found on Maven Central. 4.2. EncodingThe steps are the same as for Java 8, except that this time, we pass on our original bytes to the encodeBase64 method of the org.apache.commons.codec.binary.Base64 class:
4.3. Streaming EncodingStreaming encoding is not supported by this library. 4.4. DecodingAgain, we simply call the decodeBase64 method and write the result to a file:
5. TestingNow we'll test our encoding and decoding using a simple JUnit test:
As we can see, we first read the input bytes in a @BeforeClass method, and in both our @Test methods, verified that:
Of course, we can also open up the decoded PDF file that we created and see that the contents are the same as the file we gave as input. 6. ConclusionIn this quick tutorial, we learned more about Java's Base64 utility. We also saw code samples for converting a PDF into and from Base64 using Java 8 and Apache Commons Codec. Interestingly, the JDK implementation is much faster than the Apache one. |