repost: Microsoft Word Processing in Java with Apache POI

# 1. Overview

Apache POI is a Java library for working with the various file formats based on the Office Open XML standards (OOXML) and Microsoft’s OLE 2 Compound Document format (OLE2).

This tutorial focuses on the support of Apache POI for Microsoft Word, the most commonly used Office file format. It walks through steps needed to format and generate an MS Word file and how to parse this file.

# 2. Maven Dependencies

The only dependency that is required for Apache POI to handle MS Word files is:

1
2
3
4
5
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.5</version>
</dependency>Copy

Please click here for the latest version of this artifact.

# 3. Preparation

Let’s now look at some of the elements used to facilitate the generation of an MS Word file.

# 3.1. Resource Files

We’ll collect the contents of three text files and write them into an MS Word file – named rest-with-spring.docx.

In addition, the logo-leaf.png file is used to insert an image into that new file. All these files do exist on the classpath and are represented by several static variables:

1
2
3
4
5
public static String logo = "logo-leaf.png";
public static String paragraph1 = "poi-word-para1.txt";
public static String paragraph2 = "poi-word-para2.txt";
public static String paragraph3 = "poi-word-para3.txt";
public static String output = "rest-with-spring.docx";Copy

For those who are curious, contents of these resource files in the repository, whose link is given in the last section of this tutorial, are extracted from this course page here on the site.

# 3.2. Helper Method

The main method consisting of logic used to generate an MS Word file, which is described in the following section, makes use of a helper method:

1
2
3
4
5
6
7
8
9
public String convertTextFileToString(String fileName) {
try (Stream<String> stream
= Files.lines(Paths.get(ClassLoader.getSystemResource(fileName).toURI()))) {

return stream.collect(Collectors.joining(" "));
} catch (IOException | URISyntaxException e) {
return null;
}
}Copy

This method extracts contents contained in a text file located on the classpath, whose name is the passed-in String argument. Then, it concatenates lines in this file and returns the joining String.

# 4. MS Word File Generation

This section gives instructions on how to format and generate a Microsoft Word file. Prior to working on any part of the file, we need to have an XWPFDocument instance:

1
XWPFDocument document = new XWPFDocument();Copy

# 4.1. Formatting Title and Subtitle

In order to create the title, we need to first instantiate the XWPFParagraph class and set the alignment on the new object:

1
2
XWPFParagraph title = document.createParagraph();
title.setAlignment(ParagraphAlignment.CENTER);Copy

The content of a paragraph needs to be wrapped in an XWPFRun object. We may configure this object to set a text value and its associated styles:

1
2
3
4
5
6
XWPFRun titleRun = title.createRun();
titleRun.setText("Build Your REST API with Spring");
titleRun.setColor("009933");
titleRun.setBold(true);
titleRun.setFontFamily("Courier");
titleRun.setFontSize(20);Copy

One should be able to infer the purposes of the set-methods from their names.

In a similar way we create an XWPFParagraph instance enclosing the subtitle:

1
2
XWPFParagraph subTitle = document.createParagraph();
subTitle.setAlignment(ParagraphAlignment.CENTER);Copy

Let’s format the subtitle as well:

1
2
3
4
5
6
7
XWPFRun subTitleRun = subTitle.createRun();
subTitleRun.setText("from HTTP fundamentals to API Mastery");
subTitleRun.setColor("00CC44");
subTitleRun.setFontFamily("Courier");
subTitleRun.setFontSize(16);
subTitleRun.setTextPosition(20);
subTitleRun.setUnderline(UnderlinePatterns.DOT_DOT_DASH);Copy

The setTextPosition method sets the distance between the subtitle and the subsequent image, while setUnderline determines the underlining pattern.

Notice that we hard-code the contents of both the title and subtitle as these statements are too short to justify the use of a helper method.

# 4.2. Inserting an Image

An image also needs to be wrapped in an XWPFParagraph instance. We want the image to be horizontally centered and placed under the subtitle, thus the following snippet must be put below the code given above:

1
2
XWPFParagraph image = document.createParagraph();
image.setAlignment(ParagraphAlignment.CENTER);Copy

Here is how to set the distance between this image and the text below it:

1
2
XWPFRun imageRun = image.createRun();
imageRun.setTextPosition(20);Copy

An image is taken from a file on the classpath and then inserted into the MS Word file with the specified dimensions:

1
2
3
4
Path imagePath = Paths.get(ClassLoader.getSystemResource(logo).toURI());
imageRun.addPicture(Files.newInputStream(imagePath),
XWPFDocument.PICTURE_TYPE_PNG, imagePath.getFileName().toString(),
Units.toEMU(50), Units.toEMU(50));Copy

# 4.3. Formatting Paragraphs

Here is how we create the first paragraph with contents taken from the poi-word-para1.txt file:

1
2
3
4
5
XWPFParagraph para1 = document.createParagraph();
para1.setAlignment(ParagraphAlignment.BOTH);
String string1 = convertTextFileToString(paragraph1);
XWPFRun para1Run = para1.createRun();
para1Run.setText(string1);Copy

It is apparent that the creation of a paragraph is similar to the creation of the title or subtitle. The only difference here is the use of the helper method instead of hard-coded strings.

In a similar way, we can create two other paragraphs using contents from files poi-word-para2.txt and poi-word-para3.txt:

1
2
3
4
5
6
7
8
9
10
11
12
XWPFParagraph para2 = document.createParagraph();
para2.setAlignment(ParagraphAlignment.RIGHT);
String string2 = convertTextFileToString(paragraph2);
XWPFRun para2Run = para2.createRun();
para2Run.setText(string2);
para2Run.setItalic(true);

XWPFParagraph para3 = document.createParagraph();
para3.setAlignment(ParagraphAlignment.LEFT);
String string3 = convertTextFileToString(paragraph3);
XWPFRun para3Run = para3.createRun();
para3Run.setText(string3);Copy

The creation of these three paragraphs is almost the same, except for some styling such as alignment or italics.

# 4.4. Generating MS Word File

Now we are ready to write out a Microsoft Word file to memory from the document variable:

1
2
3
4
FileOutputStream out = new FileOutputStream(output);
document.write(out);
out.close();
document.close();Copy

All the code snippets in this section are wrapped in a method named handleSimpleDoc.

# 5. Parsing and Testing

This section outlines the parsing of MS Word files and verification of the result.

# 5.1. Preparation

We declare a static field in the test class:

1
static WordDocument wordDocument;Copy

This field is used to reference to an instance of the class that encloses all the code fragments shown in sections 3 and 4.

Before parsing and testing, we need to initialize the static variable declared right above and generate the rest-with-spring.docx file in the current working directory by invoking the handleSimpleDoc method:

1
2
3
4
5
@BeforeClass
public static void generateMSWordFile() throws Exception {
WordTest.wordDocument = new WordDocument();
wordDocument.handleSimpleDoc();
}Copy

Let’s move on to the final step: parsing the MS Word file and the verification of the outcome.

# 5.2. Parsing MS Word File and Verification

First, we extract contents from the given MS Word file in the project directory and the store the contents in a List of XWPFParagraph:

1
2
3
4
Path msWordPath = Paths.get(WordDocument.output);
XWPFDocument document = new XWPFDocument(Files.newInputStream(msWordPath));
List<XWPFParagraph> paragraphs = document.getParagraphs();
document.close();Copy

Next, let’s make sure that the content and style of the title is the same as what we have set before:

1
2
3
4
5
6
7
8
XWPFParagraph title = paragraphs.get(0);
XWPFRun titleRun = title.getRuns().get(0);

assertEquals("Build Your REST API with Spring", title.getText());
assertEquals("009933", titleRun.getColor());
assertTrue(titleRun.isBold());
assertEquals("Courier", titleRun.getFontFamily());
assertEquals(20, titleRun.getFontSize());Copy

For the sake of simplicity, we just validate the contents of other parts of the file, leaving out the styles. The verification of their styles is similar to what we have done with the title:

1
2
3
4
5
6
7
8
9
assertEquals("from HTTP fundamentals to API Mastery",
paragraphs.get(1).getText());
assertEquals("What makes a good API?", paragraphs.get(3).getText());
assertEquals(wordDocument.convertTextFileToString
(WordDocument.paragraph1), paragraphs.get(4).getText());
assertEquals(wordDocument.convertTextFileToString
(WordDocument.paragraph2), paragraphs.get(5).getText());
assertEquals(wordDocument.convertTextFileToString
(WordDocument.paragraph3), paragraphs.get(6).getText());Copy

Now we can be confident that the creation of the rest-with-spring.docx file has been successful.

# 6. Conclusion

This tutorial introduced Apache POI support for the Microsoft Word format. It went through steps needed to generate an MS Word file and to verify its contents.

# Comment

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
package com.jbn.study;

import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.UnderlinePatterns;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

public class MicrosoftWordProcessing {
public static String logo = "logo.png";
public static String paragraph1 = "poi-word-para1.txt";
public static String paragraph2 = "poi-word-para2.txt";
public static String paragraph3 = "poi-word-para3.txt";
public static String output = "rest-with-spring.docx";

public static String convertTextFileToString(String fileName) {
try (Stream<String> stream = Files.lines(Paths.get(ClassLoader.getSystemResource(fileName).toURI()))) {

return stream.collect(Collectors.joining(" "));
} catch (IOException | URISyntaxException e) {
return null;
}
}

public static void main(String[] args) throws IOException, InvalidFormatException, URISyntaxException {
XWPFDocument document = new XWPFDocument();

XWPFParagraph title = document.createParagraph();
title.setAlignment(ParagraphAlignment.CENTER);

XWPFRun titleRun = title.createRun();
titleRun.setText("Life and Death");
titleRun.setColor("009933");
titleRun.setBold(true);
titleRun.setFontFamily("Courier");
titleRun.setFontSize(20);

XWPFParagraph subTitle = document.createParagraph();
subTitle.setAlignment(ParagraphAlignment.CENTER);
XWPFRun subTitleRun = subTitle.createRun();
subTitleRun.setText("Life and Death");
subTitleRun.setColor("00CC44");
subTitleRun.setFontFamily("Courier");
subTitleRun.setFontSize(16);
subTitleRun.setTextPosition(20);
subTitleRun.setUnderline(UnderlinePatterns.DOT_DOT_DASH);

XWPFParagraph image = document.createParagraph();
image.setAlignment(ParagraphAlignment.CENTER);

XWPFRun imageRun = image.createRun();
imageRun.setTextPosition(20);
Path imagePath = Paths.get(ClassLoader.getSystemResource(logo).toURI());
imageRun.addPicture(Files.newInputStream(imagePath),
XWPFDocument.PICTURE_TYPE_PNG, imagePath.getFileName().toString(),
Units.toEMU(50), Units.toEMU(50));

XWPFParagraph para1 = document.createParagraph();
para1.setAlignment(ParagraphAlignment.LEFT);
String string1 = convertTextFileToString(paragraph1);
XWPFRun para1Run = para1.createRun();
para1Run.setText(string1);

XWPFParagraph para2 = document.createParagraph();
para2.setAlignment(ParagraphAlignment.RIGHT);
String string2 = convertTextFileToString(paragraph2);
XWPFRun para2Run = para2.createRun();
para2Run.setText(string2);
para2Run.setItalic(true);

XWPFParagraph para3 = document.createParagraph();
para3.setAlignment(ParagraphAlignment.RIGHT);
String string3 = convertTextFileToString(paragraph3);
XWPFRun para3Run = para3.createRun();
para3Run.setText(string3);
para3Run.setFontFamily("kaiti");

FileOutputStream out = new FileOutputStream(output);
document.write(out);
out.close();
document.close();
}
}
Edited on