The first thing that I'm going to do in building out the markup
module's class-structure is to figure out where all of the various members of those
classes originate, and at what point they are concrete. One of
my priorities, as mentioned before, is to try and keep as much similarity between
the classes and their members in the markup
module and the equivalent
DOM objects in typical JavaScript implementations on the client side.
Conforming to DOM Conventions
I can't really meet that goal, conforming to the interfaces of DOM elements (tags,
text-nodes, comments and CDATA sections) until I know what members they expose in
a browser context. What I did, then, to determine that was write a chunk of JavaScript
living in a bare-bones HTML page (download below) that iterates over the list of
properties and methods listed on the w3schools.com
site, checked an instance of each node-type (except CDATA sections, more on
that in a bit) for each property- and method-member that might be available,
and reported what came back in that check-process. If a given element did not report
that it had the member, then the equivalent class-member in the markup
module could be skipped. If the check returned an
expected type, like a function
for a method, that member should be
kept. Anything else that came back will require some
additional discovery.
I'd originally included CDATA sections in my collection of objects to examine,
but the browser that I ran the page against (Chromium) wouldn't actually
allow the creation of a CDATA section, even though it has a
document.createCDATASection
method. Creation of CDATA sections is
not supported for HTML documents
according to the error-message I got back.
The closest to an actual CDATA that I could get was a comment that contained all
of the CDATA's original content, plus the [CDATA[
start and ]]
end text. As a result, I don't really know what a CDATA's members look
like without doing more digging around. For the time being, I'm willing to leave
that be, though — the Comment
, Tag
and Text
classes will likely suffice for my needs for the time being.
The breakdown I got back from that analysis-script was:
markup Module Equivalent Class |
|||
---|---|---|---|
Member Name | Comment | Tag | Text |
Property Members | |||
accessKey | n/a | string | n/a |
attributes | n/a | object | n/a |
childElementCount | n/a | number | n/a |
childNodes | object | object | object |
children | n/a | object | n/a |
classList | n/a | object | n/a |
className | n/a | string | n/a |
clientHeight | n/a | number | n/a |
clientLeft | n/a | number | n/a |
clientTop | n/a | number | n/a |
clientWidth | n/a | number | n/a |
contentEditable | n/a | string | n/a |
dir | n/a | string | n/a |
firstChild | null | object | null |
firstElementChild | n/a | null | n/a |
id | n/a | string | n/a |
innerHTML | n/a | string | n/a |
isContentEditable | n/a | boolean | n/a |
lang | n/a | string | n/a |
lastChild | null | object | null |
lastElementChild | n/a | null | n/a |
namespaceURI | n/a | string | n/a |
nextElementSibling | null | null | null |
nextSibling | null | null | null |
nodeName | string | string | string |
nodeType | number | number | number |
nodeValue | string | null | string |
offsetHeight | n/a | number | n/a |
offsetLeft | n/a | number | n/a |
offsetParent | n/a | null | n/a |
offsetTop | n/a | number | n/a |
offsetWidth | n/a | number | n/a |
ownerDocument | object | object | object |
parentElement | null | null | object |
parentNode | null | null | object |
previousElementSibling | null | null | null |
previousSibling | null | null | null |
scrollHeight | n/a | number | n/a |
scrollLeft | n/a | number | n/a |
scrollTop | n/a | number | n/a |
scrollWidth | n/a | number | n/a |
style | n/a | object | n/a |
tabIndex | n/a | number | n/a |
tagName | n/a | string | n/a |
textContent | string | string | string |
title | n/a | string | n/a |
Method Members | |||
addEventListener | function | function | function |
appendChild | function | function | function |
blur | n/a | function | n/a |
click | n/a | function | n/a |
cloneNode | function | function | function |
compareDocumentPosition | function | function | function |
contains | function | function | function |
focus | n/a | function | n/a |
getAttribute | n/a | function | n/a |
getAttributeNode | n/a | function | n/a |
getElementsByClassName | n/a | function | n/a |
getElementsByTagName | n/a | function | n/a |
getFeature | n/a | n/a | n/a |
hasAttribute | n/a | function | n/a |
hasAttributes | n/a | function | n/a |
hasChildNodes | function | function | function |
insertBefore | function | function | function |
isDefaultNamespace | function | function | function |
isEqualNode | function | function | function |
isSameNode | function | function | function |
isSupported | n/a | n/a | n/a |
nodelist.item | n/a | n/a | n/a |
normalize | function | function | function |
querySelector | n/a | function | n/a |
querySelectorAll | n/a | function | n/a |
removeAttribute | n/a | function | n/a |
removeAttributeNode | n/a | function | n/a |
removeChild | function | function | function |
removeEventListener | function | function | function |
replaceChild | function | function | function |
scrollIntoView | n/a | function | n/a |
setAttribute | n/a | function | n/a |
setAttributeNode | n/a | function | n/a |
toString | function | function | function |
keeperitem from the table above that exists in all the class-types should be required by the
IsNode
interface, at least as a default consideration. The same consideration should also
be given to any items that return the same values across all the class-types,
even if they haven't been flagged as a keeper.The logic behind that statement boils down to the fact that while I checked each node-type in the original JavaScript script, I did not populate a large-enough node- and element-sample in that script to feel confident that I captured every valid low-level member. If possible, those same items should also have a concrete implementation in the
BaseNode
abstract class. There
will probably be a few items that, even though they fall into that category, just
don't make sense in those locations, but I'll note those as I go along.
Defining the IsNode
interface
Starting, then, with the items in the table that are keepers, or that returned identical values across all the different node-types, the following are either directly valid or need to be looked at in more detail for requirement in IsNode
:
markup Module Equivalent Class |
|||
---|---|---|---|
Member Name | Comment | Tag | Text |
Property Members | |||
childNodes | object | object | object |
nextElementSibling | null | null | null |
nextSibling | null | null | null |
nodeName | string | string | string |
nodeType | number | number | number |
nodeValue | string | null | string |
parentElement | null | null | object |
parentNode | null | null | object |
previousElementSibling | null | null | null |
previousSibling | null | null | null |
textContent | string | string | string |
Method Members | |||
addEventListener | function | function | function |
appendChild | function | function | function |
cloneNode | function | function | function |
compareDocumentPosition | function | function | function |
contains | function | function | function |
hasChildNodes | function | function | function |
insertBefore | function | function | function |
isDefaultNamespace | function | function | function |
isEqualNode | function | function | function |
isSameNode | function | function | function |
normalize | function | function | function |
removeChild | function | function | function |
removeEventListener | function | function | function |
replaceChild | function | function | function |
toString | function | function | function |
While I was stripping down the list, I noticed that parentElement
and parentNode
didn't get flagged in such a way to be considered for
inclusion in IsNode
, but it's a basic fact of markup-languages that
all nodes should have those properties — if they aren't populated,
that simply means that the node doesn't have a parent currently, but they might
well later after some manipulation. The nodeValue
property
Looking over that list of remining members, there are a few that don't make
any sense to include in IsNode
already:
- Any members that involve child nodes — Those are aspects of a
Tag
, certainly, but sinceComment
andText
will also derive fromIsNode
and they don't have child nodes (and can't?), those should go away. That removes:- The
childNodes
property; - The
appendChild
method; - The
hasChildNodes
method; - The
insertBefore
method; - The
removeChild
method; and - The
replaceChild
method;
- The
- Any members relating to manipulation of event-listeners — On
the server side, where all of the
markup
module's functionality is actually running, there is no browser context available, so no event-handling processes, and so none of these members are useful. That removes:- The
addEventListener
method; and - The
removeEventListener
method;
- The
Implementing and Testing the Abstract Properties
Since IsNode
is only
an interface, there are no concrete
implementations of properties to define, only abstract property requirements that
will be picked up by derived classes. That makes the definition of those property
requirements very simple, and the testing of them pretty straightforward.
The real trick is determining where the concrete implementations of them
is going to occur. Going through the list of properties:
nextElementSibling
- Returns the next element at the same node tree level — w3schools
- Abstract property in
IsNode
- Implement in
BaseNode
nextSibling
- Returns the next node at the same node tree level — w3schools
- Abstract property in
IsNode
- Implement in
BaseNode
nodeName
- Returns the name of a node — w3schools
- Returns the tag-name for
Tag
s, and magic-string constants for other node-types (#comment
for aComment
,#document
for a document,#text
for aText
object, and#cdata
for aCDATA
if the pattern is maintained). - Abstract property in
IsNode
- Implement in
CDATA
,Comment
,Tag
andText
classes nodeType
- Returns the node type of a node — w3schools
- Returns
8
forComment
s,4
forCDATA
s,1
forTag
s and3
forText
s - Abstract property in
IsNode
- Implement in
CDATA
,Comment
,Tag
andText
classes nodeValue
- Sets or returns the value of a node w3schools
- It appears that this method returns the first text-node child of an element, rather than the entire set of text-node values, at least in Chromium. At any rate, it's dependent on the presence of child nodes, so...
- Skip
- Implement in
Tag
parentElement
- Returns the parent element node of an element — w3schools
- Abstract property in
IsNode
- Implement in
BaseNode
parentNode
- Returns the parent node of an element — w3schools
- Abstract property in
IsNode
- Implement in
BaseNode
previousElementSibling
- Returns the previous element at the same node tree level — w3schools
- Abstract property in
IsNode
- Implement in
BaseNode
previousSibling
- Returns the previous node at the same node tree level — w3schools
- Abstract property in
IsNode
- Implement in
BaseNode
textContent
- Sets or returns the textual content of a node and its descendants — w3schools
- The return value is, essentially, a concatenation of all child
Text
nodes in aTag
, or thedata
value (the content) of aComment
orText
instance. IfCDATA
is assumed to behave like aComment
, then it would also return theinner
content of the instance. - Abstract property in
IsNode
- Implement in
HasTextData
andTag
isNode
is just a few lines of code:
#-----------------------------------#
# Abstract Properties #
#-----------------------------------#
nextElementSibling = abc.abstractproperty()
nextSibling = abc.abstractproperty()
nodeName = abc.abstractproperty()
nodeType = abc.abstractproperty()
parentElement = abc.abstractproperty()
parentNode = abc.abstractproperty()
previousElementSibling = abc.abstractproperty()
previousSibling = abc.abstractproperty()
textContent = abc.abstractproperty()
The test-methods for each property will follow this pattern:
def testPROPERTYNAME(self):
"""Unit-tests the PROPERTYNAME property of an IsNode instance."""
try:
testInstance = markup.IsNode()
except TypeError, error:
actual = 'PROPERTYNAME' in str( error )
self.assertTrue( actual, 'The TypeError raised by trying to '
'instantiate IsNode should include the "PROPERTYNAME" '
'abstract method-name' )
except Exception, error:
self.fail( 'testPROPERTYNAME expected a TypeError, '
'but %s was raised instead:\n - %s' % (
error.__class__.__name__, error
)
)
In a nutshell, what this does is ensures that the abstract properties appear in
the TypeError
that is raised by trying to instantiate
IsNode
, ensuring that the property being tested is abstract.
Implementing and Testing the Abstract Methods
The same basic rule, that member-definitions need only exist in the
IsNode
interface, applies to the method members as well.
The main decisions that need to be made are also similar: where does a
given method-requirement and -definition belong, and yields a similar
list as the properties noted above:
cloneNode
- Clones an element — w3schools
- Since this is capable of making shallow or deep copies, and the mechanism for making those copies will vary, it'll have to be implemented in the concrete classes.
- Abstract method in
IsNode
- Implement in
CDATA
,Comment
,Tag
andText
compareDocumentPosition
- Compares the document position of two elements — w3schools
- The description of the method on the w3schools site, frankly, has me
wondering if there's even any point to implementing this on the server
side. I've never seen this method used in the wild, though that doen't
mean that it isn't used. I can't think of a use-case for it that isn't
better served (at least on the server side) by local Python code,
particularly since all the
real
method returns is a bit-mask number-value that indicates relative position between the owner element and the element provided. - Skip
contains
- Returns true if a node is a descendant of a node, otherwise false — w3schools
- The
contains
method applies only to objects that have children, really. That hasn't stopped it from being callable on DOM node where it doesn't really make sense, though. For example, executing this JavaScript:
in several browsers yieldsook = document.createTextNode( 'ook' ); eek = document.createTextNode( 'eek' ); ook.contains( eek );
false
That result kind of makes sense — neither of the created text-nodes is a parent of the other, nor can either be appended to the other (callingook.appendChild( eek )
throws an error). - I'm going to skip this method for now, but there's some discussion around that decision that I'll dig into shortly.
isDefaultNamespace
- Returns true if a specified namespaceURI is the default, otherwise false — w3schools
- Text-nodes don't have a namespace — it's not a defined member of that node-type at all. Nor do comments, and I presume that the same would hold true for CDATA sections.
- Skip
- Implement in
Tag
isEqualNode
- Checks if two elements are equal — w3schools
- The complete criteria for testing equality on the client side is listed
at the w3schools link above, but since those criteria are dependent on
properties that won't exist across all
IsNode
instances, the usefulness of that list is, perhaps, questionable. Still, being able to perform a comparison is useful. Then the real question ishow is that going to be done?
I'll work out more details on that later, but for now: - Abstract method in
IsNode
- Implement in
BaseNode
isSameNode
- Checks if two elements are the same node — w3schools
- Abstract method in
IsNode
- Implement in
BaseNode
normalize
- Joins adjacent text nodes and removes empty text nodes in an element — w3schools
- This feels like it's something that shuoldn't exist ouside of a
Tag
, and that seems to be borne out by the fact that it's not possible to usefully callnormalize
on a text- or comment-node in the browser. - Skip
- Implement in
Tag
toString
- Converts an element to a string — w3schools
- Abstract method in
IsNode
- Implement in
CDATA
,Comment
,Tag
andText
The contains
discussion
Also like the abstract-property definitions, abstract methods don't
need much in IsNode
:
@abc.abstractmethod
def METHODNAME( arg1, arg2=None, *args, **kwargs ):
raise NotImplementedError( '%s.METHODNAME is not implemented as '
'required by IsNode' % self.__class__.__name__ )
And the unit-tests, since they're really just checking the same sort of relationship
between methods and the IsNode
interface-class as the property-tests
did, is almost identical:
def testMETHODNAME(self):
"""Unit-tests the METHODNAME method of an IsNode instance."""
try:
testInstance = markup.IsNode()
except TypeError, error:
actual = 'METHODNAME' in str( error )
self.assertTrue( actual, 'The TypeError raised by trying to '
'instantiate IsNode should include the "METHODNAME" '
'abstract method-name' )
except Exception, error:
self.fail( 'testMETHODNAME expected a TypeError, '
'but %s was raised instead:\n - %s' % (
error.__class__.__name__, error
)
)
With those tests in place for IsNode
in the test_markup.py
unit-test module, the test-results come back clean:
######################################## Unit-test Results: idic.markup #--------------------------------------# Tests were SUCCESSFUL Number of tests run ... 18 Tests ran in .......... 0.001 seconds ########################################
IsNode
, then, is done — written and tested.
Dealing with Enumerations in Python
Python doesn't really have a formal
enumeration-type like several other
languages do, but there are a number of ways to work around that. My personal
favorite uses namedtuple
from the collections
module, based on some observations I've made
about how an enumeration
behaves:
- An enumeration is a constant;
- An enumeration is immutable — its values cannot be changed at run-time;
- An enumeration's members are individually accessible by name; and
- An enumeration is a container, with members that can be
used for comparison purposes. That is, given an enumeration of
nodeTypes
, with presumably-distinctCDATA
,Comment
,Tag
andText
values:nodeTypes.Tag in nodeTypes # == True nodeTypes.Text in nodeTypes # == True nodeTypes.CDATASection in nodeTypes # == True nodeTypes.Comment in nodeTypes # == True
Using a namedtuple
, it's actually pretty easy to generate a
constant value that exhibits all of those behaviors. The basic code required
looks something like this, using the nodeType
values from the
w3schools site and generating an enumeration-equivalent named nodeTypes
that could be added to the markup
module:
from collections import namedtuple
nodeTypes = namedtuple(
'enumNodeTypes',
[ 'Tag', 'Text', 'CDATASection', 'Comment' ],
)(
Tag=1,
Text=3,
CDATASection=4,
Comment=8,
)
__all__.append( 'nodeTypes' )
nodeTypes
is a constant becausenamedtuple
returns a class, and the code then creates an instance of that class;- It's immutable because it's not possible to add values to, remove values from, or alter existing values of the named items except by altering the definition of those members in the code;
- Its members are individually accessible by name because that's a basic capability of a
namedtuple
-generated class; and - It's a container that allows the use of
someValue in nodeTypes
.
nodeTypes
and all of the nodeTypes.NAME in nodeTypes
examples in the
containercriteria above yields:
All entries in nodeTypes + enumNodeTypes( Tag=1, Text=3, CDATASection=4, Comment=8 ) nodeTypes.Tag in nodeTypes ............ True nodeTypes.Text in nodeTypes ........... True nodeTypes.CDATASection in nodeTypes ... True nodeTypes.Comment in nodeTypes ........ True nodeTypes.CDATASection in nodeTypes ... True nodeTypes.Comment in nodeTypes ........ True 12 in nodeTypes ....................... False "ook" in nodeTypes .................... FalseSo, while this approach may not be a
realenumeration, it provides all of the functionality of one that I think I'll need.
It occurs to me that I don't really have a unit-testing strategy or policy for
module-level constants, but frankly I'm not sure that one is really needed,
at least not yet. I say not yet
now because at some level,
there simply has to be some trust in the underlying language. Even with
nodeTypes
being a non-simple value, it's still a value that
is tightly tied to core language structures and functionality, and it shouldn't be
possible to break that without altering the code itself.
There's been a fair chunk of analysis in this post, but some code too, and the
next logical piece to work out would probably push the length of this post past
where I'd like, so I'm going to stop here for now. The next few items that I'm
going to tackle include the BaseNode
and HasTextContent
abstract classes, I think, then I'll have enough of the foundational abstraction
written to be able to take a swing at the CDATA
, Comment
and Text
concrete classes. I promised to include the analysis JavaScript-page, though, so here's that
No comments:
Post a Comment