Package org.cyberneko.html
Class HTMLTagBalancer
java.lang.Object
org.cyberneko.html.HTMLTagBalancer
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent
,org.apache.xerces.xni.parser.XMLDocumentFilter
,org.apache.xerces.xni.parser.XMLDocumentSource
,org.apache.xerces.xni.XMLDocumentHandler
,HTMLComponent
public class HTMLTagBalancer
extends Object
implements org.apache.xerces.xni.parser.XMLDocumentFilter, HTMLComponent
Balances tags in an HTML document. This component receives document events
and tries to correct many common mistakes that human (and computer) HTML
document authors make. This tag balancer can:
- add missing parent elements;
- automatically close elements with optional end tags; and
- handle mis-matched inline element tags.
This component recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://cyberneko.org/html/features/balance-tags/document-fragment
- http://cyberneko.org/html/features/balance-tags/ignore-outside-content
This component recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/error-reporter
- http://cyberneko.org/html/properties/balance-tags/current-stack
- Version:
- $Id: HTMLTagBalancer.java,v 1.20 2005/02/14 04:06:22 andyc Exp $
- Author:
- Andy Clark, Marc Guillemot
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
Element info for each start element.static class
Unsynchronized stack of element information. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final String
Include infoset augmentations.protected static final String
Document fragment balancing only.protected static final String
Document fragment balancing only (deprecated).protected static final String
Error reporter.protected boolean
Allows self closing iframe tags.protected boolean
Allows self closing tags.protected boolean
Include infoset augmentations.protected boolean
Document fragment balancing only.protected org.apache.xerces.xni.XMLDocumentHandler
The document handler.protected org.apache.xerces.xni.parser.XMLDocumentSource
The document source.protected final HTMLTagBalancer.InfoStack
The element stack.protected HTMLErrorReporter
Error reporter.protected boolean
Ignore outside content.protected final HTMLTagBalancer.InfoStack
The inline stack.protected short
Modify HTML attribute names.protected short
Modify HTML element names.protected boolean
Namespaces.protected boolean
True if a form is in the stack (allow to discard opening of nested forms)static final String
EXPERIMENTAL: may change in next release
Name of the property holding the stack of elements in which context a document fragment should be parsed.protected boolean
Report errors.protected boolean
True if seen anything.protected boolean
True if seen <body< element.protected boolean
True if root element has been seen.protected boolean
True if seen <head< element.protected boolean
True if root element has been seen.protected boolean
True if seen the end of the document element.protected static final String
Ignore outside content.protected static final String
Modify HTML attribute names: { "upper", "lower", "default" }.protected static final String
Modify HTML element names: { "upper", "lower", "default" }.protected static final short
Lowercase HTML names.protected static final short
Match HTML element names.protected static final short
Don't modify HTML names.protected static final short
Uppercase HTML names.protected static final String
Namespaces.protected static final String
Report errors.protected static final HTMLEventInfo
Synthesized event info item.protected HTMLTagBalancingListener
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected final void
callEndElement
(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) Call document handler end element.protected final void
callStartElement
(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Call document handler start element.void
characters
(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Characters.void
comment
(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Comment.void
doctypeDecl
(String rootElementName, String publicId, String systemId, org.apache.xerces.xni.Augmentations augs) Doctype declaration.protected final org.apache.xerces.xni.XMLAttributes
Returns a set of empty attributes.void
emptyElement
(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Empty element.void
endCDATA
(org.apache.xerces.xni.Augmentations augs) End CDATA section.void
endDocument
(org.apache.xerces.xni.Augmentations augs) End document.void
endElement
(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) End element.void
endGeneralEntity
(String name, org.apache.xerces.xni.Augmentations augs) End entity.void
endPrefixMapping
(String prefix, org.apache.xerces.xni.Augmentations augs) End prefix mapping.org.apache.xerces.xni.XMLDocumentHandler
Returns the document handler.org.apache.xerces.xni.parser.XMLDocumentSource
Returns the document source.protected HTMLElements.Element
getElement
(org.apache.xerces.xni.QName elementName) Returns an HTML element.protected final int
getElementDepth
(HTMLElements.Element element) Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.getFeatureDefault
(String featureId) Returns the default state for a feature.protected static final short
getNamesValue
(String value) Converts HTML names string value to constant value.protected int
getParentDepth
(HTMLElements.Element[] parents, short bounds) Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.getPropertyDefault
(String propertyId) Returns the default state for a property.String[]
Returns recognized features.String[]
Returns recognized properties.void
ignorableWhitespace
(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Ignorable whitespace.protected static final String
modifyName
(String name, short mode) Modifies the given name based on the specified mode.void
processingInstruction
(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) Processing instruction.void
reset
(org.apache.xerces.xni.parser.XMLComponentManager manager) Resets the component.void
setDocumentHandler
(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.void
setDocumentSource
(org.apache.xerces.xni.parser.XMLDocumentSource source) Sets the document source.void
setFeature
(String featureId, boolean state) Sets a feature.void
setProperty
(String propertyId, Object value) Sets a property.void
startCDATA
(org.apache.xerces.xni.Augmentations augs) Start CDATA section.void
startDocument
(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.Augmentations augs) Start document.void
startDocument
(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs) Start document.void
startElement
(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Start element.void
startGeneralEntity
(String name, org.apache.xerces.xni.XMLResourceIdentifier id, String encoding, org.apache.xerces.xni.Augmentations augs) Start entity.void
startPrefixMapping
(String prefix, String uri, org.apache.xerces.xni.Augmentations augs) Start prefix mapping.protected final org.apache.xerces.xni.Augmentations
Returns an augmentations object with a synthesized item added.void
Text declaration.void
xmlDecl
(String version, String encoding, String standalone, org.apache.xerces.xni.Augmentations augs) XML declaration.
-
Field Details
-
NAMESPACES
Namespaces.- See Also:
-
AUGMENTATIONS
Include infoset augmentations.- See Also:
-
REPORT_ERRORS
Report errors.- See Also:
-
DOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).- See Also:
-
DOCUMENT_FRAGMENT
Document fragment balancing only.- See Also:
-
IGNORE_OUTSIDE_CONTENT
Ignore outside content.- See Also:
-
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
-
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
-
ERROR_REPORTER
Error reporter.- See Also:
-
FRAGMENT_CONTEXT_STACK
EXPERIMENTAL: may change in next release
Name of the property holding the stack of elements in which context a document fragment should be parsed.- See Also:
-
NAMES_NO_CHANGE
protected static final short NAMES_NO_CHANGEDon't modify HTML names.- See Also:
-
NAMES_MATCH
protected static final short NAMES_MATCHMatch HTML element names.- See Also:
-
NAMES_UPPERCASE
protected static final short NAMES_UPPERCASEUppercase HTML names.- See Also:
-
NAMES_LOWERCASE
protected static final short NAMES_LOWERCASELowercase HTML names.- See Also:
-
SYNTHESIZED_ITEM
Synthesized event info item. -
fNamespaces
protected boolean fNamespacesNamespaces. -
fAugmentations
protected boolean fAugmentationsInclude infoset augmentations. -
fReportErrors
protected boolean fReportErrorsReport errors. -
fDocumentFragment
protected boolean fDocumentFragmentDocument fragment balancing only. -
fIgnoreOutsideContent
protected boolean fIgnoreOutsideContentIgnore outside content. -
fAllowSelfclosingIframe
protected boolean fAllowSelfclosingIframeAllows self closing iframe tags. -
fAllowSelfclosingTags
protected boolean fAllowSelfclosingTagsAllows self closing tags. -
fNamesElems
protected short fNamesElemsModify HTML element names. -
fNamesAttrs
protected short fNamesAttrsModify HTML attribute names. -
fErrorReporter
Error reporter. -
fDocumentSource
protected org.apache.xerces.xni.parser.XMLDocumentSource fDocumentSourceThe document source. -
fDocumentHandler
protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandlerThe document handler. -
fElementStack
The element stack. -
fInlineStack
The inline stack. -
fSeenAnything
protected boolean fSeenAnythingTrue if seen anything. Important for xml declaration. -
fSeenDoctype
protected boolean fSeenDoctypeTrue if root element has been seen. -
fSeenRootElement
protected boolean fSeenRootElementTrue if root element has been seen. -
fSeenRootElementEnd
protected boolean fSeenRootElementEndTrue if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed. -
fSeenHeadElement
protected boolean fSeenHeadElementTrue if seen <head< element. -
fSeenBodyElement
protected boolean fSeenBodyElementTrue if seen <body< element. -
fOpenedForm
protected boolean fOpenedFormTrue if a form is in the stack (allow to discard opening of nested forms) -
tagBalancingListener
-
-
Constructor Details
-
HTMLTagBalancer
public HTMLTagBalancer()
-
-
Method Details
-
getFeatureDefault
Returns the default state for a feature.- Specified by:
getFeatureDefault
in interfaceHTMLComponent
- Specified by:
getFeatureDefault
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getPropertyDefault
Returns the default state for a property.- Specified by:
getPropertyDefault
in interfaceHTMLComponent
- Specified by:
getPropertyDefault
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getRecognizedFeatures
Returns recognized features.- Specified by:
getRecognizedFeatures
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getRecognizedProperties
Returns recognized properties.- Specified by:
getRecognizedProperties
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
reset
public void reset(org.apache.xerces.xni.parser.XMLComponentManager manager) throws org.apache.xerces.xni.parser.XMLConfigurationException Resets the component.- Specified by:
reset
in interfaceorg.apache.xerces.xni.parser.XMLComponent
- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setFeature
public void setFeature(String featureId, boolean state) throws org.apache.xerces.xni.parser.XMLConfigurationException Sets a feature.- Specified by:
setFeature
in interfaceorg.apache.xerces.xni.parser.XMLComponent
- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setProperty
public void setProperty(String propertyId, Object value) throws org.apache.xerces.xni.parser.XMLConfigurationException Sets a property.- Specified by:
setProperty
in interfaceorg.apache.xerces.xni.parser.XMLComponent
- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setDocumentHandler
public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.- Specified by:
setDocumentHandler
in interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
getDocumentHandler
public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()Returns the document handler.- Specified by:
getDocumentHandler
in interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start document.- Specified by:
startDocument
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
xmlDecl
public void xmlDecl(String version, String encoding, String standalone, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException XML declaration.- Specified by:
xmlDecl
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
doctypeDecl
public void doctypeDecl(String rootElementName, String publicId, String systemId, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Doctype declaration.- Specified by:
doctypeDecl
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
endDocument
public void endDocument(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End document.- Specified by:
endDocument
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
comment
public void comment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Comment.- Specified by:
comment
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
processingInstruction
public void processingInstruction(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Processing instruction.- Specified by:
processingInstruction
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
startElement
public void startElement(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start element.- Specified by:
startElement
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
emptyElement
public void emptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Empty element.- Specified by:
emptyElement
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
startGeneralEntity
public void startGeneralEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier id, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start entity.- Specified by:
startGeneralEntity
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
textDecl
public void textDecl(String version, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Text declaration.- Specified by:
textDecl
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
endGeneralEntity
public void endGeneralEntity(String name, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End entity.- Specified by:
endGeneralEntity
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
startCDATA
public void startCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start CDATA section.- Specified by:
startCDATA
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
endCDATA
public void endCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End CDATA section.- Specified by:
endCDATA
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
characters
public void characters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Characters.- Specified by:
characters
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
ignorableWhitespace
public void ignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Ignorable whitespace.- Specified by:
ignorableWhitespace
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
endElement
public void endElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End element.- Specified by:
endElement
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Throws:
org.apache.xerces.xni.XNIException
-
setDocumentSource
public void setDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source) Sets the document source.- Specified by:
setDocumentSource
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
getDocumentSource
public org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()Returns the document source.- Specified by:
getDocumentSource
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start document.- Throws:
org.apache.xerces.xni.XNIException
-
startPrefixMapping
public void startPrefixMapping(String prefix, String uri, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start prefix mapping.- Throws:
org.apache.xerces.xni.XNIException
-
endPrefixMapping
public void endPrefixMapping(String prefix, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End prefix mapping.- Throws:
org.apache.xerces.xni.XNIException
-
getElement
Returns an HTML element. -
callStartElement
protected final void callStartElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Call document handler start element.- Throws:
org.apache.xerces.xni.XNIException
-
callEndElement
protected final void callEndElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Call document handler end element.- Throws:
org.apache.xerces.xni.XNIException
-
getElementDepth
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.- Parameters:
element
- The element.
-
getParentDepth
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.- Parameters:
parents
- The parent elements.
-
emptyAttributes
protected final org.apache.xerces.xni.XMLAttributes emptyAttributes()Returns a set of empty attributes. -
synthesizedAugs
protected final org.apache.xerces.xni.Augmentations synthesizedAugs()Returns an augmentations object with a synthesized item added. -
modifyName
Modifies the given name based on the specified mode. -
getNamesValue
Converts HTML names string value to constant value.- See Also:
-