Class XhtmlBaseParser
- All Implemented Interfaces:
LogEnabled
,HtmlMarkup
,Markup
,XmlMarkup
,Parser
- Direct Known Subclasses:
XdocParser
,XhtmlParser
- Since:
- 1.1
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.maven.doxia.parser.AbstractXmlParser
AbstractXmlParser.CachedFileEntityResolver
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) boolean
Used to wrap the definedTerm with its definition, even when one is omittedprivate boolean
Used to recognize the case of img inside figure.private boolean
Verbatim flag, true whenever we are inside a <pre> tag.private boolean
Used to distinguish <a href=""> from <a name="">.private boolean
Used to distinguish <a href=""> from <a name="">.private int
Used for nested lists.private boolean
True if a <script></script> or <style></style> block is read.private int
Counts section level.Map of warn messages with a String as key to describe the error type and a Set as value.Fields inherited from interface org.apache.maven.doxia.markup.HtmlMarkup
A, ABBR, ACRONYM, ADDRESS, APPLET, AREA, ARTICLE, ASIDE, AUDIO, B, BASE, BASEFONT, BDI, BDO, BIG, BLOCKQUOTE, BODY, BR, BUTTON, CANVAS, CAPTION, CDATA_TYPE, CENTER, CITE, CODE, COL, COLGROUP, COMMAND, DATA, DATALIST, DD, DEL, DETAILS, DFN, DIALOG, DIR, DIV, DL, DT, EM, EMBED, ENTITY_TYPE, FIELDSET, FIGCAPTION, FIGURE, FONT, FOOTER, FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, HEAD, HEADER, HGROUP, HR, HTML, I, IFRAME, IMG, INPUT, INS, ISINDEX, KBD, KEYGEN, LABEL, LEGEND, LI, LINK, MAIN, MAP, MARK, MENU, META, METER, NAV, NOFRAMES, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, OUTPUT, P, PARAM, PICTURE, PRE, PROGRESS, Q, RB, RP, RT, RTC, RUBY, S, SAMP, SCRIPT, SECTION, SELECT, SMALL, SOURCE, SPAN, STRIKE, STRONG, STYLE, SUB, SUMMARY, SUP, TABLE, TAG_TYPE_END, TAG_TYPE_SIMPLE, TAG_TYPE_START, TBODY, TD, TEMPLATE, TEXTAREA, TFOOT, TH, THEAD, TIME, TITLE, TR, TRACK, TT, U, UL, VAR, VIDEO, WBR
Fields inherited from interface org.apache.maven.doxia.markup.Markup
COLON, EOL, EQUAL, GREATER_THAN, LEFT_CURLY_BRACKET, LEFT_SQUARE_BRACKET, LESS_THAN, MINUS, PLUS, QUOTE, RIGHT_CURLY_BRACKET, RIGHT_SQUARE_BRACKET, SEMICOLON, SLASH, SPACE, STAR
Fields inherited from interface org.apache.maven.doxia.parser.Parser
ROLE, TXT_TYPE, UNKNOWN_TYPE, XML_TYPE
Fields inherited from interface org.apache.maven.doxia.markup.XmlMarkup
BANG, CDATA, DOCTYPE_START, ENTITY_START, XML_NAMESPACE
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected boolean
baseEndTag
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) Goes through a common list of possible html end tags.protected boolean
baseStartTag
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) Goes through a common list of possible html start tags.private void
closeOpenSections
(int newLevel, Sink sink) Close open sections.protected void
consecutiveSections
(int newLevel, Sink sink) Make sure sections are nested consecutively.protected int
Return the current section level.private void
handleAEnd
(Sink sink) private void
handleAStart
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink, SinkEventAttributeSet attribs) protected void
handleCdsect
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) Handles CDATA sections.protected void
handleComment
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) Handles comments.private boolean
handleDivStart
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, SinkEventAttributeSet attribs, Sink sink) protected void
handleEndTag
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) Goes through the possible end tags.private void
handleFigureCaptionEnd
(Sink sink) private void
handleFigureCaptionStart
(Sink sink, SinkEventAttributeSet attribs) private void
handleImgStart
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink, SinkEventAttributeSet attribs) private void
handleLIStart
(Sink sink, SinkEventAttributeSet attribs) private void
handleListItemEnd
(Sink sink) private void
handleOLStart
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink, SinkEventAttributeSet attribs) private void
handlePreStart
(SinkEventAttributeSet attribs, Sink sink) private void
handlePStart
(Sink sink, SinkEventAttributeSet attribs) private void
handleSectionStart
(Sink sink, int level, SinkEventAttributeSet attribs) protected void
handleStartTag
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) Goes through the possible start tags.private void
handleTableStart
(Sink sink, SinkEventAttributeSet attribs, org.codehaus.plexus.util.xml.pull.XmlPullParser parser) protected void
handleText
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) Handles text events.protected void
init()
Initialize the parser.protected void
initXmlParser
(org.codehaus.plexus.util.xml.pull.XmlPullParser parser) Initializes the parser with custom entities or other options.protected boolean
Checks if we are currently inside a <script> tag.protected boolean
Checks if we are currently inside a <pre> tag.private void
logMessage
(String key, String msg) If debug mode is enabled, log themsg
as is, otherwise add unique msg inwarnMessages
.private void
private void
openMissingSections
(int newLevel, Sink sink) Open missing sections.void
Parses the given source model and emits Doxia events into the given sink.protected void
setSectionLevel
(int newLevel) Set the current section level.protected String
validAnchor
(String id) Checks if the given id is a valid Doxia id and if not, returns a transformed one.protected void
verbatim()
Start verbatim mode.protected void
Stop verbatim mode.Methods inherited from class org.apache.maven.doxia.parser.AbstractXmlParser
getAttributesFromParser, getLocalEntities, getText, getType, handleEntity, handleUnknown, isCollapsibleWhitespace, isIgnorableWhitespace, isTrimmableWhitespace, isValidate, setCollapsibleWhitespace, setIgnorableWhitespace, setTrimmableWhitespace, setValidate
Methods inherited from class org.apache.maven.doxia.parser.AbstractParser
doxiaVersion, enableLogging, executeMacro, getBasedir, getLog, getMacroManager, isEmitComments, isSecondParsing, parse, parse, parse, setEmitComments, setSecondParsing
-
Field Details
-
scriptBlock
private boolean scriptBlockTrue if a <script></script> or <style></style> block is read. CDATA sections within are handled as rawText. -
isLink
private boolean isLinkUsed to distinguish <a href=""> from <a name="">. -
isAnchor
private boolean isAnchorUsed to distinguish <a href=""> from <a name="">. -
orderedListDepth
private int orderedListDepthUsed for nested lists. -
sectionLevel
private int sectionLevelCounts section level. -
inVerbatim
private boolean inVerbatimVerbatim flag, true whenever we are inside a <pre> tag. -
inFigure
private boolean inFigureUsed to recognize the case of img inside figure. -
hasDefinitionListItem
boolean hasDefinitionListItemUsed to wrap the definedTerm with its definition, even when one is omitted -
warnMessages
Map of warn messages with a String as key to describe the error type and a Set as value. Using to reduce warn messages.
-
-
Constructor Details
-
XhtmlBaseParser
public XhtmlBaseParser()
-
-
Method Details
-
parse
Parses the given source model and emits Doxia events into the given sink.- Specified by:
parse
in interfaceParser
- Overrides:
parse
in classAbstractXmlParser
- Parameters:
source
- not null reader that provides the source document. You could usenewReader
methods fromReaderFactory
.sink
- A sink that consumes the Doxia events.reference
- the reference- Throws:
ParseException
- if the model could not be parsed.
-
initXmlParser
protected void initXmlParser(org.codehaus.plexus.util.xml.pull.XmlPullParser parser) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException Initializes the parser with custom entities or other options. Adds all XHTML (HTML 4.0) entities to the parser so that they can be recognized and resolved without additional DTD.- Overrides:
initXmlParser
in classAbstractXmlParser
- Parameters:
parser
- A parser, not null.- Throws:
org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem initializing the parser
-
baseStartTag
Goes through a common list of possible html start tags. These include only tags that can go into the body of a xhtml document and so should be re-usable by different xhtml-based parsers.
The currently handled tags are:
<h2>, <h3>, <h4>, <h5>, <h6>, <p>, <pre>, <ul>, <ol>, <li>, <dl>, <dt>, <dd>, <b>, <strong>, <i>, <em>, <code>, <samp>, <tt>, <a>, <table>, <tr>, <th>, <td>, <caption>, <br/>, <hr/>, <img/>.
- Parameters:
parser
- A parser.sink
- the sink to receive the events.- Returns:
- True if the event has been handled by this method, i.e. the tag was recognized, false otherwise.
-
baseEndTag
Goes through a common list of possible html end tags. These should be re-usable by different xhtml-based parsers. The tags handled here are the same as for
baseStartTag(XmlPullParser,Sink)
, except for the empty elements (<br/>, <hr/>, <img/>
).- Parameters:
parser
- A parser.sink
- the sink to receive the events.- Returns:
- True if the event has been handled by this method, false otherwise.
-
handleStartTag
protected void handleStartTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException, MacroExecutionException Goes through the possible start tags. Just callsbaseStartTag(XmlPullParser,Sink)
, this should be overridden by implementing parsers to include additional tags.- Specified by:
handleStartTag
in classAbstractXmlParser
- Parameters:
parser
- A parser, not null.sink
- the sink to receive the events.- Throws:
org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the modelMacroExecutionException
- if there's a problem executing a macro
-
handleEndTag
protected void handleEndTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException, MacroExecutionException Goes through the possible end tags. Just callsbaseEndTag(XmlPullParser,Sink)
, this should be overridden by implementing parsers to include additional tags.- Specified by:
handleEndTag
in classAbstractXmlParser
- Parameters:
parser
- A parser, not null.sink
- the sink to receive the events.- Throws:
org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the modelMacroExecutionException
- if there's a problem executing a macro
-
handleText
protected void handleText(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException Handles text events.This is a default implementation, if the parser points to a non-empty text element, it is emitted as a text event into the specified sink.
- Overrides:
handleText
in classAbstractXmlParser
- Parameters:
parser
- A parser, not null.sink
- the sink to receive the events. Not null.- Throws:
org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the model
-
handleComment
protected void handleComment(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException Handles comments.This is a default implementation, all data are emitted as comment events into the specified sink.
- Overrides:
handleComment
in classAbstractXmlParser
- Parameters:
parser
- A parser, not null.sink
- the sink to receive the events. Not null.- Throws:
org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the model
-
handleCdsect
protected void handleCdsect(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException Handles CDATA sections.This is a default implementation, all data are emitted as text events into the specified sink.
- Overrides:
handleCdsect
in classAbstractXmlParser
- Parameters:
parser
- A parser, not null.sink
- the sink to receive the events. Not null.- Throws:
org.codehaus.plexus.util.xml.pull.XmlPullParserException
- if there's a problem parsing the model
-
consecutiveSections
Make sure sections are nested consecutively.HTML doesn't have any sections, only sectionTitles (<h2> etc), that means we have to open close any sections that are missing in between.
For instance, if the following sequence is parsed:
<h3></h3> <h6></h6>
we have to insert two section starts before we open the
<h6>
. In the following sequence<h6></h6> <h3></h3>
we have to close two sections before we open the
<h3>
.The current level is set to newLevel afterwards.
- Parameters:
newLevel
- the new section level, all upper levels have to be closed.sink
- the sink to receive the events.
-
closeOpenSections
Close open sections.- Parameters:
newLevel
- the new section level, all upper levels have to be closed.sink
- the sink to receive the events.
-
openMissingSections
Open missing sections.- Parameters:
newLevel
- the new section level, all lower levels have to be opened.sink
- the sink to receive the events.
-
getSectionLevel
protected int getSectionLevel()Return the current section level.- Returns:
- the current section level.
-
setSectionLevel
protected void setSectionLevel(int newLevel) Set the current section level.- Parameters:
newLevel
- the new section level.
-
verbatim_
protected void verbatim_()Stop verbatim mode. -
verbatim
protected void verbatim()Start verbatim mode. -
isVerbatim
protected boolean isVerbatim()Checks if we are currently inside a <pre> tag.- Returns:
- true if we are currently in verbatim mode.
-
isScriptBlock
protected boolean isScriptBlock()Checks if we are currently inside a <script> tag.- Returns:
- true if we are currently inside
<script>
tags. - Since:
- 1.1.1.
-
validAnchor
Checks if the given id is a valid Doxia id and if not, returns a transformed one.- Parameters:
id
- The id to validate.- Returns:
- A transformed id or the original id if it was already valid.
- See Also:
-
init
protected void init()Initialize the parser. This is called first byAbstractParser.parse(java.io.Reader, org.apache.maven.doxia.sink.Sink)
and can be used to set the parser into a clear state so it can be re-used.- Overrides:
init
in classAbstractParser
-
handleAEnd
-
handleAStart
private void handleAStart(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink, SinkEventAttributeSet attribs) -
handleDivStart
private boolean handleDivStart(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, SinkEventAttributeSet attribs, Sink sink) -
handleFigureCaptionEnd
-
handleFigureCaptionStart
-
handleImgStart
private void handleImgStart(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink, SinkEventAttributeSet attribs) -
handleLIStart
-
handleListItemEnd
-
handleOLStart
private void handleOLStart(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink, SinkEventAttributeSet attribs) -
handlePStart
-
handlePreStart
-
handleSectionStart
-
handleTableStart
private void handleTableStart(Sink sink, SinkEventAttributeSet attribs, org.codehaus.plexus.util.xml.pull.XmlPullParser parser) -
logMessage
If debug mode is enabled, log themsg
as is, otherwise add unique msg inwarnMessages
.- Parameters:
key
- not nullmsg
- not null- Since:
- 1.1.1
- See Also:
-
logWarnings
private void logWarnings()- Since:
- 1.1.1
-