public class HTMLWriter extends XMLWriter
HTMLWriter
takes a DOM4J tree and formats it to a stream as
HTML. This formatter is similar to XMLWriter but it outputs the text of CDATA
and Entity sections rather than the serialised format as in XML, it has an
XHTML mode, it retains whitespace in certain elements such as <PRE>,
and it supports certain elements which have no corresponding close tag such
as for <BR> and <P>.
The OutputFormat passed in to the constructor is checked for isXHTML() and
isExpandEmptyElements(). See OutputFormat
for details.
Here are the rules for this class based on an OutputFormat, "format",
passed in to the constructor:
getOmitElementCloseSet
, then it is treated specially:
format.isXHTML()
, then
it has a space before the closing single-tag slash, since Netscape 4.x-
treats this: <HR /> as an element named "HR" with an attribute named
"/", but that's better than when it refuses to recognize this: <hr/>
which it thinks is an element named "HR/".format.isXHTML()
, all
elements must have either a close element, or be a closed single tag.format.isExpandEmptyElements()
() is true, all elements are expanded except
as above.<myelement><![CDATA[My data]]></myelement>Otherwise, they look like this:
<myelement>My data</myelement>Basically,
OutputFormat.isXHTML()
==
true
will produce valid XML, while OutputFormat.isExpandEmptyElements()
determines whether empty elements are
expanded if isXHTML is true, excepting the special HTML single tags.
Also, HTMLWriter handles tags whose contents should be preformatted, that is,
whitespace-preserved. By default, this set includes the tags <PRE>,
<SCRIPT>, <STYLE>, and <TEXTAREA>, case insensitively. It
does not include <IFRAME>. Other tags, such as <CODE>,
<KBD>, <TT>, <VAR>, are usually rendered in a different
font in most browsers, but don't preserve whitespace, so they also don't
appear in the default list. HTML Comments are always whitespace-preserved.
However, the parser you use may store comments with linefeed-only text nodes
(\n) even if your platform uses another line.separator character, and
HTMLWriter outputs Comment nodes exactly as the DOM is set up by the parser.
See examples and discussion here: setPreformattedTags(java.util.Set)
Examples
Pretty Printing
This example shows how to pretty print a string containing a valid HTML
document to a string. You can also just call the static methods of this
class: prettyPrintHTML(String)
or prettyPrintHTML(String,boolean,boolean,boolean,boolean)
or prettyPrintXHTML(String)
for XHTML (note the X)
String testPrettyPrint(String html) { StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); // These are the default values for createPrettyPrint, // so you needn't set them: // format.setNewlines(true); // format.setTrimText(true);</font> format.setXHTML(true); HTMLWriter writer = new HTMLWriter(sw, format); Document document = DocumentHelper.parseText(html); writer.write(document); writer.flush(); return sw.toString(); }This example shows how to create a "squeezed" document, but one that will work in browsers even if the browser line length is limited. No newlines are included, no extra whitespace at all, except where it it required by
setPreformattedTags
.
String testCrunch(String html) { StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); format.setNewlines(false); format.setTrimText(true); format.setIndent(""); format.setXHTML(true); format.setExpandEmptyElements(false); format.setNewLineAfterNTags(20); org.dom4j.io.HTMLWriter writer = new HTMLWriter(sw, format); org.dom4j.Document document = DocumentHelper.parseText(html); writer.write(document); writer.flush(); return sw.toString(); }
Modifier and Type | Field and Description |
---|---|
protected static OutputFormat |
DEFAULT_HTML_FORMAT |
protected static java.util.HashSet<java.lang.String> |
DEFAULT_PREFORMATTED_TAGS |
DEFAULT_FORMAT, lastOutputNodeType, LEXICAL_HANDLER_NAMES, preserve, writer
Constructor and Description |
---|
HTMLWriter() |
HTMLWriter(OutputFormat format) |
HTMLWriter(java.io.OutputStream out) |
HTMLWriter(java.io.OutputStream out,
OutputFormat format) |
HTMLWriter(java.io.Writer writer) |
HTMLWriter(java.io.Writer writer,
OutputFormat format) |
Modifier and Type | Method and Description |
---|---|
void |
endCDATA() |
java.util.Set<java.lang.String> |
getOmitElementCloseSet()
A clone of the Set of elements that can have their close-tags omitted.
|
java.util.Set<java.lang.String> |
getPreformattedTags() |
boolean |
isPreformattedTag(java.lang.String qualifiedName)
DOCUMENT ME!
|
protected void |
loadOmitElementCloseSet(java.util.Set<java.lang.String> set) |
protected boolean |
omitElementClose(java.lang.String qualifiedName) |
static java.lang.String |
prettyPrintHTML(java.lang.String html)
Convenience method to just get a String result.
|
static java.lang.String |
prettyPrintHTML(java.lang.String html,
boolean newlines,
boolean trim,
boolean isXHTML,
boolean expandEmpty)
DOCUMENT ME!
|
static java.lang.String |
prettyPrintXHTML(java.lang.String html)
Convenience method to just get a String result, but As XHTML .
|
void |
setOmitElementCloseSet(java.util.Set<java.lang.String> newSet)
To use the empty set, pass an empty Set, or null:
|
void |
setPreformattedTags(java.util.Set<java.lang.String> newSet)
Override the default set, which includes PRE, SCRIPT, STYLE, and
TEXTAREA, case insensitively.
|
void |
startCDATA() |
protected void |
writeCDATA(java.lang.String text) |
protected void |
writeClose(java.lang.String qualifiedName)
Overriden method to not close certain element names to avoid wierd
behaviour from browsers for versions up to 5.x
|
protected void |
writeDeclaration()
This will write the declaration to the given Writer.
|
protected void |
writeElement(Element element)
This override handles any elements that should not remove whitespace,
such as <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>.
|
protected void |
writeEmptyElementClose(java.lang.String qualifiedName) |
protected void |
writeEntity(Entity entity) |
protected void |
writeString(java.lang.String text) |
characters, close, comment, createWriter, defaultMaximumAllowedCharacter, endDocument, endDTD, endElement, endEntity, endPrefixMapping, escapeAttributeEntities, escapeElementEntities, flush, getLexicalHandler, getMaximumAllowedCharacter, getOutputFormat, getProperty, handleException, ignorableWhitespace, indent, installLexicalHandler, isElementSpacePreserved, isEscapeText, isExpandEmptyElements, isNamespaceDeclaration, notationDecl, parse, println, processingInstruction, resolveEntityRefs, setDocumentLocator, setEscapeText, setIndentLevel, setLexicalHandler, setMaximumAllowedCharacter, setOutputStream, setProperty, setResolveEntityRefs, setWriter, shouldEncodeChar, startDocument, startDTD, startElement, startEntity, startPrefixMapping, unparsedEntityDecl, write, write, write, write, write, write, write, write, write, write, write, write, write, writeAttribute, writeAttribute, writeAttributes, writeAttributes, writeClose, writeComment, writeDocType, writeDocType, writeElementContent, writeEntityRef, writeEscapeAttributeEntities, writeNamespace, writeNamespace, writeNamespaces, writeNode, writeNodeText, writeOpen, writePrintln, writeProcessingInstruction
error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, parse, resolveEntity, setContentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, skippedEntity, warning
protected static final java.util.HashSet<java.lang.String> DEFAULT_PREFORMATTED_TAGS
protected static final OutputFormat DEFAULT_HTML_FORMAT
public HTMLWriter(java.io.Writer writer)
public HTMLWriter(java.io.Writer writer, OutputFormat format)
public HTMLWriter() throws java.io.UnsupportedEncodingException
java.io.UnsupportedEncodingException
public HTMLWriter(OutputFormat format) throws java.io.UnsupportedEncodingException
java.io.UnsupportedEncodingException
public HTMLWriter(java.io.OutputStream out) throws java.io.UnsupportedEncodingException
java.io.UnsupportedEncodingException
public HTMLWriter(java.io.OutputStream out, OutputFormat format) throws java.io.UnsupportedEncodingException
java.io.UnsupportedEncodingException
public void startCDATA() throws org.xml.sax.SAXException
startCDATA
in interface org.xml.sax.ext.LexicalHandler
startCDATA
in class XMLWriter
org.xml.sax.SAXException
public void endCDATA() throws org.xml.sax.SAXException
protected void writeCDATA(java.lang.String text) throws java.io.IOException
writeCDATA
in class XMLWriter
java.io.IOException
protected void writeEntity(Entity entity) throws java.io.IOException
writeEntity
in class XMLWriter
java.io.IOException
protected void writeDeclaration() throws java.io.IOException
XMLWriter
This will write the declaration to the given Writer. Assumes XML version 1.0 since we don't directly know.
writeDeclaration
in class XMLWriter
java.io.IOException
- DOCUMENT ME!protected void writeString(java.lang.String text) throws java.io.IOException
writeString
in class XMLWriter
java.io.IOException
protected void writeClose(java.lang.String qualifiedName) throws java.io.IOException
writeClose
in class XMLWriter
qualifiedName
- DOCUMENT ME!java.io.IOException
- DOCUMENT ME!protected void writeEmptyElementClose(java.lang.String qualifiedName) throws java.io.IOException
writeEmptyElementClose
in class XMLWriter
java.io.IOException
protected boolean omitElementClose(java.lang.String qualifiedName)
protected void loadOmitElementCloseSet(java.util.Set<java.lang.String> set)
public java.util.Set<java.lang.String> getOmitElementCloseSet()
public void setOmitElementCloseSet(java.util.Set<java.lang.String> newSet)
setOmitElementCloseSet(new HashSet()); or setOmitElementCloseSet(null);
newSet
- DOCUMENT ME!public java.util.Set<java.lang.String> getPreformattedTags()
setPreformattedTags
public void setPreformattedTags(java.util.Set<java.lang.String> newSet)
Set current = myHTMLWriter.getPreformattedTags(); current.add("IFRAME"); myHTMLWriter.setPreformattedTags(current); //The set is now <b>PRE, SCRIPT, STYLE, TEXTAREA, IFRAME</b>Similarly, you can simply replace it with your own:
HashSet newset = new HashSet(); newset.add("PRE"); newset.add("TEXTAREA"); myHTMLWriter.setPreformattedTags(newset); //The set is now <b>{PRE, TEXTAREA}</b>You can remove all tags from the preformatted tags list, with an empty set, like this:
myHTMLWriter.setPreformattedTags(new HashSet()); //The set is now <b>{}</b>or with null, like this:
myHTMLWriter.setPreformattedTags(null); //The set is now <b>{}</b>
newSet
- DOCUMENT ME!public boolean isPreformattedTag(java.lang.String qualifiedName)
qualifiedName
- DOCUMENT ME!setPreformattedTags
protected void writeElement(Element element) throws java.io.IOException
writeElement
in class XMLWriter
element
- DOCUMENT ME!java.io.IOException
- When the stream could not be written to.setPreformattedTags
public static java.lang.String prettyPrintHTML(java.lang.String html) throws java.io.IOException, java.io.UnsupportedEncodingException, DocumentException
html
- DOCUMENT ME!java.io.IOException
- DOCUMENT ME!java.io.UnsupportedEncodingException
- DOCUMENT ME!DocumentException
- DOCUMENT ME!public static java.lang.String prettyPrintXHTML(java.lang.String html) throws java.io.IOException, java.io.UnsupportedEncodingException, DocumentException
html
- DOCUMENT ME!java.io.IOException
- DOCUMENT ME!java.io.UnsupportedEncodingException
- DOCUMENT ME!DocumentException
- DOCUMENT ME!public static java.lang.String prettyPrintHTML(java.lang.String html, boolean newlines, boolean trim, boolean isXHTML, boolean expandEmpty) throws java.io.IOException, java.io.UnsupportedEncodingException, DocumentException
html
- DOCUMENT ME!newlines
- DOCUMENT ME!trim
- DOCUMENT ME!isXHTML
- DOCUMENT ME!expandEmpty
- DOCUMENT ME!java.io.IOException
- DOCUMENT ME!java.io.UnsupportedEncodingException
- DOCUMENT ME!DocumentException
- DOCUMENT ME!