|
| |
According to Michael Kay's "XSLT Programmer's Reference" page
736, a predicate is "An expression used to filter which nodes are
selected by a particular step in a path expression, or to select a subset of
the nodes in a node-set. A Boolean expression selects the nodes for which the
predicate is true; a numeric expression selects the node at the position
given by the value of the expression, for example '[1]' selects the first
node.". Note that a predicate containing a boolean expression can
return zero, one or more nodes, while a predicate containing a numeric
expression can return only zero or one node.
|
| |
I'll list a few examples that I can refer back to later on in this
document. All examples will use this XML document: | | | |
<?xml version="1.0"?>
<doc>
<foo location="Drumcondra">
<bar name="Cat and Cage"/>
<bar name="Fagan's"/>
<bar name="Gravedigger's"/>
<bar name="Ivy House"/>
<foo>
<foo location="Town">
<bar name="Peter's Pub"/>
<bar name="Grogan's"/>
<bar name="Hogans's"/>
<bar name="Brogan's"/>
</foo>
</doc> | | | | |
Here are some examples of a predicate with boolean expressions: | | | |
<xsl:for-each select="//bar[contains(@name,'ogan')]">
<xsl:for-each select="//bar[parent::*/@location = 'Drumcondra']">
<xsl:for-each select="//bar[@name = 'Cat and Cage']"> | | | | |
The first two select more than one node, while the last selects only one.
The last expression could select more nodes if the input document was
different. Now, here are a few examples of predicates with numeric
expressions: | | | |
<xsl:value-of select="//bar[1]">
<xsl:value-of select="/doc/foo[2]/bar[1]">
<xsl:value-of select="/doc/foo[2]/bar"> | | | | |
The last expression will return more than one node, but the step that
contains the predicate returns only one (the second <foo>
element).
The above are the basic types of predicates. These can be grouped to create
a predicate pipeline, where the first predicate reduces the node-set that the
second predicate filters, and so on. Here are some examples: | | | |
A: <for-each select="//bar[contains(@name,'ogan')][2]">
C: <for-each select="//bar[2][contains(@name,'ogan')]">
B: <for-each select="//bar[position() > 3][2]"> | | | | |
It is easier to figure out which nodes these expressions should return if
one goes through the steps and predicates one by one. In expression
A: we first get all <bar> elements from the
whole document. Then the first predicate selects from that node-set only
those elements that have a @name attribute that contains
"ogan", and we're left with these elements: | | | |
<bar name="Grogan's">
<bar name="Hogans's">
<bar name="Brogan's"> | | | | |
And finally, the last predicate then selects the second of those
elements:
Expression B: contains the same predicates as A: ,
but the resulting node set if completely different. We start off with the same
set of <bar> elements, but we apply the
"[2]" predicate first, and end up with this
element:
Fagan's is the bar where the Irish Taoiseach (prime minister) drinks his
pints, but its name does not contain the string "ogan ",
so the resulting node-set is empty.
The third expressions also starts off with all <bar>
elements, applies the predicate "[position() > 3] ",
and reduces the node set to these: | | | |
<bar name="Ivy House">
<bar name="Peter's Pub">
<bar name="Grogan's">
<bar name="Hogans's">
<bar name="Brogan's"> | | | | |
The last predicate "[2] " is applied to this node-set
and set is further reduced to:
|
| |
From the examples in the last chapter we can try to categorize predicate
chains/pipelines to simplify our implementation. We can speed up processing
significantly if we can avoid using a data-structure (iterator) to represent
the intermediate step between predicates. The goal of setting up these
categories is to pinpoint those cases where an intermediate iterator has
to be used and when it can be avoided.
|
| |
Predicates are handled quite differently in step expressions and step
patterns. Step expressions are not implemented with the various contexts in
mind and use a specialised iterator to wrap the code for each predicate.
Step patterns are more complicated and CPU (or should I say JVM?)
exhaustive. Step patterns containing predicates are analysed to determine
context type and compiled accordingly.
| | | | Predicates and Step expressions | | | | |
| |
The basic behaviour for a predicate is to compile a filter. This
filter is an auxiliary class that implements the
org.apache.xalan.xsltc.dom.CurrentNodeListFilter interface. The
Step or StepPattern that uses the predicate will
create a org.apache.xalan.xsltc.dom.CurrentNodeListFilter . This
iterator contains the nodes that pass through the predicate. The compiled
filter is used by the iterator to determine which nodes that should be
included. The org.apache.xalan.xsltc.dom.CurrentNodeListFilter
interface contains only a single method: | | | |
public interface CurrentNodeListFilter {
public abstract boolean test(int node, int position, int last, int current,
AbstractTranslet translet, NodeIterator iter);
} | | | | |
The code that is compiled into the test() method is the
code for the predicate's expression. The Predicate class
compiles the filter class and a test() method skeleton, while
some sub-class of the Expression class compiles the actual
code that goes into this method.
The iterator is initialised with a filter that implements this interface:
| | | |
public CurrentNodeListIterator(NodeIterator source,
CurrentNodeListFilter filter,
int currentNode,
AbstractTranslet translet) {
this(source, !source.isReverse(), filter, currentNode, translet);
} | | | | |
The iterator will use its source iterator to provide it with the initial
node-set. Each node that is returned from this set is passed through the
filter before returned by the next() method. Note that the
source iterator can also be a current node-list iterator (if two or more
predicates are chained together).
|
| | | | Optimisations in Step expressions | | | | |
| |
|
|
|
|