Package | Description |
---|---|
org.htmlparser |
The basic API classes which will be used by most developers when working with
the HTML Parser.
|
org.htmlparser.filters |
The filters package contains example filters to select only desired nodes.
|
org.htmlparser.lexer |
The lexer package is the base level I/O subsystem.
|
org.htmlparser.nodes |
The nodes package has the concrete node implementations.
|
org.htmlparser.parserapplications.filterbuilder | |
org.htmlparser.parserapplications.filterbuilder.wrappers | |
org.htmlparser.sax |
The sax package implements a SAX (Simple API for XML) parser for HTML.
|
org.htmlparser.scanners |
The scanners package contains classes responsible for the tertiary
identification of tags.
|
org.htmlparser.tags |
The tags package contains specific tags.
|
org.htmlparser.util |
Code which can be reused by many classes, is located in this package.
|
org.htmlparser.visitors |
The visitors package contains classes that use the Visitor pattern.
|
Modifier and Type | Interface and Description |
---|---|
interface |
Remark
This interface represents a comment in the HTML document.
|
interface |
Tag
This interface represents a tag (<xxx yyy="zzz">) in the HTML document.
|
interface |
Text
This interface represents a piece of the content of the HTML document.
|
Modifier and Type | Method and Description |
---|---|
Node |
Node.getFirstChild()
Get the first child of this node.
|
Node |
Node.getLastChild()
Get the last child of this node.
|
Node |
Node.getNextSibling()
Get the next sibling to this node.
|
Node |
Node.getParent()
Get the parent of this node.
|
Node |
Node.getPreviousSibling()
Get the previous sibling to this node.
|
Modifier and Type | Method and Description |
---|---|
boolean |
NodeFilter.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
void |
Node.setParent(Node node)
Sets the parent of this node.
|
Modifier and Type | Field and Description |
---|---|
protected Node |
IsEqualFilter.mNode
The node to match.
|
Modifier and Type | Method and Description |
---|---|
boolean |
LinkRegexFilter.accept(Node node)
Accept nodes that are a LinkTag and have a URL
that matches the regex pattern supplied in the constructor.
|
boolean |
HasSiblingFilter.accept(Node node)
Accept tags with a sibling acceptable to the filter.
|
boolean |
OrFilter.accept(Node node)
Accept nodes that are acceptable to any of its predicate filters.
|
boolean |
XorFilter.accept(Node node)
Accept nodes that are acceptable to an odd number of its predicate filters.
|
boolean |
TagNameFilter.accept(Node node)
Accept nodes that are tags and have a matching tag name.
|
boolean |
HasAttributeFilter.accept(Node node)
Accept tags with a certain attribute.
|
boolean |
RegexFilter.accept(Node node)
Accept string nodes that match the regular expression.
|
boolean |
StringFilter.accept(Node node)
Accept string nodes that contain the string.
|
boolean |
HasChildFilter.accept(Node node)
Accept tags with children acceptable to the filter.
|
boolean |
IsEqualFilter.accept(Node node)
Accept the node.
|
boolean |
NodeClassFilter.accept(Node node)
Accept nodes that are assignable from the class provided in
the constructor.
|
boolean |
LinkStringFilter.accept(Node node)
Accept nodes that are a LinkTag and
have a URL that matches the pattern supplied in the constructor.
|
boolean |
AndFilter.accept(Node node)
Accept nodes that are acceptable to all of its predicate filters.
|
boolean |
HasParentFilter.accept(Node node)
Accept tags with parent acceptable to the filter.
|
boolean |
NotFilter.accept(Node node)
Accept nodes that are not acceptable to the predicate filter.
|
boolean |
CssSelectorNodeFilter.accept(Node node)
Accept nodes that match the selector expression.
|
Constructor and Description |
---|
IsEqualFilter(Node node)
Creates a new IsEqualFilter that accepts only the node provided.
|
Modifier and Type | Method and Description |
---|---|
protected Node |
Lexer.makeRemark(int start,
int end)
Create a remark node based on the current cursor and the one provided.
|
protected Node |
Lexer.makeString(int start,
int end)
Create a string node based on the current cursor and the one provided.
|
protected Node |
Lexer.makeTag(int start,
int end,
java.util.Vector attributes)
Create a tag node based on the current cursor and the one provided.
|
Node |
Lexer.nextNode()
Get the next node from the source.
|
Node |
Lexer.nextNode(boolean quotesmart)
Get the next node from the source.
|
Node |
Lexer.parseCDATA()
Return CDATA as a text node.
|
Node |
Lexer.parseCDATA(boolean quotesmart)
Return CDATA as a text node.
|
protected Node |
Lexer.parseJsp(int start)
Parse a java server page node.
|
protected Node |
Lexer.parsePI(int start)
Parse an XML processing instruction.
|
protected Node |
Lexer.parseRemark(int start,
boolean quotesmart)
Parse a comment.
|
protected Node |
Lexer.parseString(int start,
boolean quotesmart)
Parse a string node.
|
protected Node |
Lexer.parseTag(int start)
Parse a tag.
|
Modifier and Type | Class and Description |
---|---|
class |
AbstractNode
The concrete base class for all types of nodes (tags, text remarks).
|
class |
RemarkNode
The remark tag is identified and represented by this class.
|
class |
TagNode
TagNode represents a generic tag.
|
class |
TextNode
Normal text in the HTML document is represented by this class.
|
Modifier and Type | Field and Description |
---|---|
protected Node |
AbstractNode.parent
The parent of this node.
|
Modifier and Type | Method and Description |
---|---|
Node |
AbstractNode.getFirstChild()
Get the first child of this node.
|
Node |
AbstractNode.getLastChild()
Get the last child of this node.
|
Node |
AbstractNode.getNextSibling()
Get the next sibling to this node.
|
Node |
AbstractNode.getParent()
Get the parent of this node.
|
Node |
AbstractNode.getPreviousSibling()
Get the previous sibling to this node.
|
Modifier and Type | Method and Description |
---|---|
void |
AbstractNode.setParent(Node node)
Sets the parent of this node.
|
Modifier and Type | Field and Description |
---|---|
protected Node |
HtmlTreeModel.mRoot
The root
Node . |
Modifier and Type | Method and Description |
---|---|
boolean |
AndFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
OrFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
StringFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
NodeClassFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasChildFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasSiblingFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasParentFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
TagNameFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
RegexFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasAttributeFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
NotFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
protected void |
HasAttributeFilterWrapper.addAttributes(java.util.Set set,
Node node)
Add the attribute names from the node to the set of attribute names.
|
protected void |
HasAttributeFilterWrapper.addAttributeValues(java.util.Set set,
Node node)
Add the attribute values from the node to the set of attribute values.
|
protected void |
TagNameFilterWrapper.addName(java.util.Set set,
Node node)
Add the tag name and it's children's tag names to the set of tag names.
|
Modifier and Type | Method and Description |
---|---|
protected void |
XMLReader.doSAX(Node node)
Process nodes recursively on the DocumentHandler.
|
Modifier and Type | Method and Description |
---|---|
protected void |
CompositeTagScanner.addChild(Tag parent,
Node child)
Add a child to the given tag.
|
Modifier and Type | Class and Description |
---|---|
class |
AppletTag
AppletTag represents an <Applet> tag.
|
class |
BaseHrefTag
BaseHrefTag represents an <Base> tag.
|
class |
BodyTag
A Body Tag.
|
class |
Bullet
A bullet tag.
|
class |
BulletList
A bullet list tag.
|
class |
CompositeTag
The base class for tags that have an end tag.
|
class |
DefinitionList
A definition list tag (dl).
|
class |
DefinitionListBullet
A definition list bullet tag (either DD or DT).
|
class |
Div
A div tag.
|
class |
DoctypeTag
The HTML Document Declaration Tag can identify <!DOCTYPE> tags.
|
class |
FormTag
Represents a FORM tag.
|
class |
FrameSetTag
Identifies an frame set tag.
|
class |
FrameTag
Identifies a frame tag
|
class |
HeadingTag
A heading (h1 - h6) tag.
|
class |
HeadTag
A head tag.
|
class |
Html
A html tag.
|
class |
ImageTag
Identifies an image tag.
|
class |
InputTag
An input tag in a form.
|
class |
JspTag
The JSP/ASP tags like <%...%> can be identified by this class.
|
class |
LabelTag
A label tag.
|
class |
LinkTag
Identifies a link tag.
|
class |
MetaTag
A Meta Tag
|
class |
ObjectTag
ObjectTag represents an <Object> tag.
|
class |
OptionTag
An option tag within a form.
|
class |
ParagraphTag
A paragraph (p) tag.
|
class |
ProcessingInstructionTag
The XML processing instructions like <?xml ...
|
class |
ScriptTag
A script tag.
|
class |
SelectTag
A select tag within a form.
|
class |
Span
A span tag.
|
class |
StyleTag
A StyleTag represents a <style> tag.
|
class |
TableColumn
A table column tag.
|
class |
TableHeader
A table header tag.
|
class |
TableRow
A table row tag.
|
class |
TableTag
A table tag.
|
class |
TextareaTag
A text area tag within a form.
|
class |
TitleTag
A title tag.
|
Modifier and Type | Method and Description |
---|---|
Node |
CompositeTag.childAt(int index)
Get child at given index
|
Node |
CompositeTag.getChild(int index)
Get the child of this node at the given position.
|
Node[] |
CompositeTag.getChildrenAsNodeArray()
Get the children as an array of
Node objects. |
Modifier and Type | Method and Description |
---|---|
int |
CompositeTag.findPositionOf(Node searchNode)
Returns the node number of a child node given the node object.
|
Modifier and Type | Field and Description |
---|---|
protected Node |
NodeTreeWalker.mCurrentNode
The current Node element, which will be a child of the root Node, or null.
|
protected Node |
NodeTreeWalker.mNextNode
The next Node element after the current Node element.
|
protected Node |
NodeTreeWalker.mRootNode
The root Node element which defines the scope of the current tree to walk.
|
Modifier and Type | Method and Description |
---|---|
Node |
NodeList.elementAt(int i) |
static Node[] |
ParserUtils.findTypeInNode(Node node,
java.lang.Class type)
Search given node and pick up any objects of given type.
|
Node |
NodeTreeWalker.getCurrentNode()
Get the Node in the tree that the NodeTreeWalker is current at.
|
protected Node |
NodeTreeWalker.getNextNodeBreadthFirst()
Traverses to the next Node from the current Node using breadth-first tree traversal
|
protected Node |
NodeTreeWalker.getNextNodeDepthFirst()
Traverses to the next Node from the current Node using depth-first tree traversal
|
Node |
NodeTreeWalker.getRootNode()
Get the root Node that defines the scope of the tree to traverse.
|
Node |
NodeTreeWalker.nextNode()
Traverses to the next Node from the current Node, using either depth-first or breadth-first tree traversal as appropriate.
|
Node |
NodeIterator.nextNode()
Get the next node.
|
Node |
IteratorImpl.nextNode()
Get the next node.
|
Node |
SimpleNodeIterator.nextNode()
Get the next node.
|
Node |
NodeList.remove(int index)
Remove the node at index.
|
Node[] |
NodeList.toNodeArray() |
Modifier and Type | Method and Description |
---|---|
void |
NodeList.add(Node node) |
boolean |
NodeList.contains(Node node)
Check to see if the NodeList contains the supplied Node.
|
void |
NodeList.copyToNodeArray(Node[] array) |
static Node[] |
ParserUtils.findTypeInNode(Node node,
java.lang.Class type)
Search given node and pick up any objects of given type.
|
int |
NodeList.indexOf(Node node)
Finds the index of the supplied Node.
|
protected void |
NodeTreeWalker.initRootNode(Node rootNode)
Sets the root Node to be the given Node.
|
void |
NodeList.prepend(Node node)
Insert the given node at the head of the list.
|
boolean |
NodeList.remove(Node node)
Remove the supplied Node from the list.
|
void |
NodeTreeWalker.setRootNode(Node rootNode)
Sets the specified Node as the root Node.
|
Constructor and Description |
---|
NodeList(Node node)
Create a one element node list.
|
NodeTreeWalker(Node rootNode)
Creates a new instance of NodeTreeWalker using depth-first tree traversal, without limits on how deep it may traverse.
|
NodeTreeWalker(Node rootNode,
boolean depthFirst)
Creates a new instance of NodeTreeWalker using the specified type of tree traversal, without limits on how deep it may traverse.
|
NodeTreeWalker(Node rootNode,
boolean depthFirst,
int maxDepth)
Creates a new instance of NodeTreeWalker using the specified type of tree traversal and maximum depth from the root Node to traverse.
|
Modifier and Type | Method and Description |
---|---|
Node[] |
ObjectFindingVisitor.getTags() |
Node[] |
TagFindingVisitor.getTags(int index) |
HTML Parser is an open source library released under LGPL.