Tuesday, April 25, 2017

Generating and Parsing Markup in Python [2]

With one interface defined, and most of the module design and DOM-compliance properties and methods figured out, today's post will continue with some concrete implementation and more interface definition. I'm going to get as far through all of the non-concrete implementations as I can, since the foundations they provide, while critically important in the long run, aren't as demonstrable as the concrete markup implementations.

Defining the BaseNode Abstract Class

BaseNode is the first abstract class I'm going to tackle in the markup module, and the first definition of any concrete functionality there. Since it's intended to provide some concrete implementation while just passing some of the abstraction from IsNode on to the concrete classes that will derive from it, there isn't a lot of concrete implementation, though.

Implementing and Testing the Concrete Properties

Since I've noted in some detail in my coding standards exactly how I prefer to implement instance properties, I'll be focusing more on how the properties get their jobs done than what the code underneath the public interface really looks like. BaseNode will provide concrete implementations for six properties that are required by the IsNode interface:

  • nextElementSibling;
  • nextSibling;
  • parentElement;
  • parentNode;
  • previousElementSibling; and
  • previousSibling
Four of those properties, the next* and previous* items, rely on their corresponding parent* property — if the instance in question doesn't have a parent of the appropriate type in the parent* property, then there can't be a next* or previous* value. Those parent* properties, then, need to be worked out first.

What Constitutes a parent Anyway? 

The short and obvious answer is probably best expressed as an IsNode instance that has the various *child* properties and *Child methods. That's something that I haven't really addressed in any detail yet. A fairly complete list of the properties and methods that involve child nodes, taken from the big list presented in the last post, would include:

  markup Module Equivalent Class
Member Name Comment Tag Text
Property Members
childElementCount n/a number n/a
childNodes object object object
children n/a object n/a
firstChild null object null
firstElementChild n/a null n/a
lastChild null object null
lastElementChild n/a null n/a
Method Members
appendChild function function function
contains function function function
getElementsByClassName n/a function n/a
getElementsByTagName n/a function n/a
hasChildNodes function function function
insertBefore function function function
replaceChild function function function

At present, in the current class-relationships diagram, there really isn't any single interface, abstract class or class that feels to me like the right place to set those up: That said, there's only one concrete class so far that actually needs any of those members: Tag — although down the line, any concrete document-classes that derive from BaseDocument (which in turn derives from Tag) will need those as well. From an architectural standpoint, that feels to me like a need for an interface (call it IsElement for now) at a minimum that Tag will implement, and that ties in to the various *child*-related members listed above.

I'll plan on working out IsElement right after I finish with BaseNode, then.

Another item for consideration: In the JavaScript DOM functionality that I'm trying to keep consistent with, there are two distinct parent-types: Elements (tags) and nodes (everything else). I've already established that non-tags really can't have children in at least one major browser-engine branch (webkit, used by Safari, Chrome and Chromium). In Firefox (mozilla), it's not much different — the specifics of the error-messages are different, but the fundamental DOM-object relationship is the same: text-nodes can't have children. That, then, begs the question: Why is there a parentElement and a parentNode method? Particularly since running code like this:

tag = document.createElement( 'tag' );
text = document.createTextNode( 'this is a text-node' );
tag.appendChild( text );
console.log( 'text.parentElement ..................... ' + 
    text.parentElement );
console.log( 'text.parentElement == tag .............. ' + 
    ( text.parentElement == tag ) );
console.log( 'text.parentElement.isSameNode( tag ) ... ' + 
    text.parentElement.isSameNode( tag ) );
console.log( 'text.parentNode ........................ ' + 
    text.parentNode );
console.log( 'text.parentNode == tag ................. ' + 
    ( text.parentNode == tag ) );
console.log( 'text.parentNode.isSameNode( tag ) ...... ' + 
    text.parentNode.isSameNode( tag ) );
yields results showing that parentNode and parentElement return the same tag-element:
text.parentElement ..................... [object HTMLUnknownElement]
text.parentElement == tag .............. true
text.parentElement.isSameNode( tag ) ... true
text.parentNode ........................ [object HTMLUnknownElement]
text.parentNode == tag ................. true
text.parentNode.isSameNode( tag ) ...... true
It just seems... weird, I guess. I hope that it's some sort of concession made for backwards compatibility, but I can't be sure that's the case. On top of that, I can't think of a use-case at all where parentNode wouldn't return the same thing as parentElement. Non-element nodes can't have children, and thus can't be parents, bu the naming convention in the JavaScript DOM-methods seems to be pretty consistent in *Element* methods returning elements (tags) only, while *Node* methods can return any node-type, including elements.

However, in the interests of preserving the DOM-object consistency that I want to preserve, I guess I'll have to keep both those properties. That doesn't mean, though, that I need to have separate property-getters for each, though!

The parent, parentElement and parentNode Properties

It may be simpler to just show the code in this case, then note the differences from my usual patterns:

#-----------------------------------#
# Instance property-getter methods  #
#-----------------------------------#

@describe.AttachDocumentation()
@describe.returns( 'IsElement instance or None' )
def _GetParent( self ):
    """
Returns the IsElement object that the instance is a child of, or None if there is 
no parent-child relationship available for the instance."""
    return self._parent

#-----------------------------------#
# Instance property-setter methods  #
#-----------------------------------#

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'the object to set as the parent of the instance',
    IsElement
)
def _SetParent( self, value ):
    """
Sets the instance's parent to the supplied IsElement object."""
    if not isinstance( IsElement, value ):
        raise TypeError( '%s.parent expects an instance of a class '
            'derived from IsElement, but was passed "%s" (%s), '
            'which is not one' % ( 
                self.__class__.__name__, value.__repr__(), 
                type( value ).__name__
            )
        )
    self._parent = value

#-----------------------------------#
# Instance property-deleter methods #
#-----------------------------------#

def _DelParent( self ):
    """
Deletes the instance's parent relationship by setting it to None"""
    self._parent = None

#-----------------------------------#
# Instance Properties (abstract OR  #
# concrete!)                        #
#-----------------------------------#

parent = describe.makeProperty(
    _GetParent, None, None, 
    'the IsElement object that the instance is a child of', 
    IsElement, None
)
parentElement = describe.makeProperty(
    _GetParent, None, None, 
    'the IsElement object that the instance is a child of', 
    IsElement, None
)
parentNode = describe.makeProperty(
    _GetParent, None, None, 
    'the IsElement object that the instance is a child of', 
    IsElement, None
)
What all of this provides is a set of three different properties (parent, parentElement and parentNode) that are all pointed at the same property-getter method (_GetParent). If, in the future, there's a demonstrable need to separate those out for some reason, it should be a relatively simple matter of creating a new property-getter, -setter and -deleter method-set, then re-assigning the methods in whichever property-declaration need to point to the new method(s). The one concern that I think would come up in that sort of scenario is what would have to happen to the parent property. Right now, the three are completely interchangeable, but if an actual difference between parentElement and parentNode ever surfaces, the parent property may well need to be deprecated or even removed, rather than linger there being confusing.

The nextElementSibling, nextSibling, previousElementSibling and previousSibling Properties

There is a common theme that runs across these four properties, all based around the idea that if the instance has a parent, then that parent has children and childNodes properties that are a sequence of IsElement- and IsNode-derived objects, respectively. Given that, all of these methods need to look at all of the instance's parent's childNodes (which will always include all IsNode-derived types), find the position of the instance whose sibling is being sought in that sequence, then return the previous or next node or element before or after that position, respectively. The first step, finding the position of the instance in its parent's childNodes is common across all four methods.

The _GetnextElementSibling and _GetnextSibling getter-methods show the determination of the index of the instance in its parent's collection of children (parentIndex in parent.childNodes) and the slicing of those childNodes to retrieve everything after the instance in that collection. _GetnextElementSibling also shows filtering of that slice, so that only IsElement items will be considered as candidates for the return value.

@describe.AttachDocumentation()
@describe.returns( 'IsElement object or None' )
def _GetnextElementSibling( self ):
    """
Gets the next IsElement element in the instance's parent's children after the 
instance's position in that sequence of objects"""
    if self.parent:
        # Get the index of the instance in its parent's collection 
        # of children. If this fails, there's an issue with adding 
        # children somewhere else...
        parentIndex = self.parent.childNodes.index( self )
        # Get a slice of the parent's children that captures all the 
        # children *after* the index
        nodesAfter = self.parent.childNodes[ parentIndex + 1: ]
        # Since this is an "element" property, filter those down to just 
        # IsElement members
        nodesAfter = [ 
            node for node in nodesAfter 
            if isinstance( IsElement, node )
        ]
        if len( nodesAfter ) > 0:
            # If there's at least two items, return the first one in 
            # the list
            return nodesAfter[ 0 ]
        else:
            # Otherwise, there aren't any *elements* after the instance, 
            # so return None
            return None
    else:
        # The instance has no parent, and thus there are no siblings.
        return None

@describe.AttachDocumentation()
@describe.returns( 'IsNode object or None' )
def _GetnextSibling( self ):
    """
Gets the next IsNode element in the instance's parent's children after the 
instance's position in that sequence of objects"""
    if self.parent:
        # Get the index of the instance in its parent's collection 
        # of children. If this fails, there's an issue with adding 
        # children somewhere else...
        parentIndex = self.parent.childNodes.index( self )
        # Get a slice of the parent's children that captures all the 
        # children *after* the index
        nodesAfter = self.parent.childNodes[ parentIndex + 1: ]
        if len( nodesAfter ) > 0:
            # If there's at least two items, return the first one in 
            # the list
            return nodesAfter[ 0 ]
        else:
            # Otherwise, there aren't any *elements* after the instance, 
            # so return None
            return None
    else:
        # The instance has no parent, and thus there are no siblings.
        return None
Really, the only major difference between _GetnextElementSibling and _GetnextSibling is whether the intermediate list (nodesAfter) is filtered.

The same basic pattern is used in _GetpreviousElementSibling and _GetpreviousSibling, including the filtering or non-filtering of the intermediate results (nodesBefore). The major difference between either _Getprevious* method and its _Getnext* counterpart is the initial slice of the instance's parent.childNodes:

            # Get a slice of the parent's children that captures all the 
            # children *before* the index
            nodesBefore = self.parent.childNodes[ 0:parentIndex - 1 ]
The filtering aspect in _GetpreviousElementSibling is identical to the code above for _GetnextElementSibling, and doesn't exist at all in _GetpreviousSibling.

Implementing and Testing the Concrete Methods

I'd originally expected to implement only two of the abstract methods of IsNode in BaseNode: IsEqualNode and isSameNode. With the addition of the IsElement interface to the markup class-zoo, though, any concrete implementation of isEqualNode will, I think, have to be moved out to the concrete classes — since those are the most-shallow points in the inheritance structure where all of the various properties that the method needs will actually exist.

That leaves isSameNode as the only concrete method-implementation of BaseNode:

@describe.AttachDocumentation()
@describe.argument( 'node', 
    'the node-object to compare to the instance to '
    'see if they are the same',
    IsNode
)
@describe.raises( TypeError,
    'if passed a node value that is not an IsNode instance'
)
@describe.returns( 
    'True if the supplied node is the same node-object as '
    'the instance, False otherwise'
)
def isSameNode( self, node ):
    """
Determines if a supplied node is the same node-object as 
the instance"""
    if not isinstance( node, IsNode ):
        raise TypeError( '%s.isSameNode expects an instance '
            'of IsNode for comparison, but was passed '
            '"%s" (%s)' % ( 
                self.__class__.__name__, 
                node, type( node ).__name__
            )
        )
    # If the node is the same object, it will 
    # have the same id, so:
    return id( self ) == id( node )

Normally, I'd also be looking to implement unit-tests of BaseNode, now that all of its concrete implementation is complete. In this case, because all of the *Sibling properties require participation in a node-tree structure that won't be available until I have both IsElement and a concrete class that derives from it implemented (Tag in this case), I only went as far as getting the test-method requirements stubbed out, along the lines of:

def testpreviousSibling(self):
    """Unit-tests the previousSibling property of a BaseNode instance."""
    self.fail( 'testpreviousSibling is not implemented' )
and
def testisSameNode(self):
    """Unit-tests the isSameNode method of a BaseNode instance."""
    self.fail( 'testisSameNode is not implemented' )
That means that I'll have several test-failures for a while:
########################################
Unit-test results
########################################
Tests were successful ... False
Number of tests run ..... 37
 + Tests ran in ......... 0.01 seconds
Number of errors ........ 0
Number of failures ...... 15
########################################
I could implement a dummy class in the test-module that derives from BaseNode, and test against that class, and if there weren't a concrete class expected that would serve that purpose, that's exactly what I'd do. Since I will have one, eventually, that just feels... wasteful, I guess, so I'd rather get Tag operational and then come back to these tests. Until then, I'll just have to live with these test-failures.

Defining the IsElement Interface

Between the previous post and the breakdown of members needed in IsElement above, there's really not much discussion needed, I think, nor a whole lot of code to show and explain.

The Abstract Properties of IsElement

On basic principle, I did do another run through the members of an element listed at the w3schools site, just to ensure that I didn't miss any. What I netted out with for properties in IsElement was:

childElementCount = abc.abstractproperty()
childNodes = abc.abstractproperty()
children = abc.abstractproperty()
firstChild = abc.abstractproperty()
firstElementChild = abc.abstractproperty()
lastChild = abc.abstractproperty()
lastElementChild = abc.abstractproperty()

There were other properties that I had to think about too, though — accessKey (which may well be globally available at the implementation-level of a Tag), and attributes (which absolutely is a Tag property). When push came to shove, though I opted to implement those as concrete members of Tag rather than drop them into the IsElement interface. The rationale for that decision was mostly based on the realization that the only other elements that I'm expecting to be concerned with are documents, and my current plan is to derive a BaseDocument abstract class from Tag anyway. In that scenario, all of the other properties and methods would be implemented by Tag, and available to documents through their derivation from BaseDocument anyway. Those members include all tag-level properties that are also attributes in markup, as well as any properties that aren't specifically related in some way to having, working with, or altering a parent-child relationship between a Tag and any other IsNode instance.

The Abstract Methods of IsElement

The same criteria noted above for properties was also used to cull down the list of methods that would be required by IsElement, for pretty much te same reasons. The resulting abstract methods are:

@abc.abstractmethod
def appendChild( self, child ):
    raise NotImplementedError( '%s.appendChild is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def hasChildNodes( arg1, arg2=None, *args, **kwargs ):
    raise NotImplementedError( '%s.hasChildNodes is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def insertBefore( self, newChild, existingChild ):
    raise NotImplementedError( '%s.insertBefore is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def insertChildAt( self, newChild, index ):
    raise NotImplementedError( '%s.insertChildAt is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def removeChild( self, child ):
    raise NotImplementedError( '%s.removeChild is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def removeChildAt( self, index ):
    raise NotImplementedError( '%s.removeChildAt is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def removeSelf( self ):
    raise NotImplementedError( '%s.removeSelf is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def replaceChild( self, newChild, existingChild ):
    raise NotImplementedError( '%s.RequiredMethod is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

Testing the Abstract Members of IsElement

The unit-testing of the abstract members of IsElement follows the pattern established by the testing of IsNode members shown in my previous post, with the hopefully-obvious change of class-name being tested:

def testPROPERTYNAME(self):
    """Unit-tests the PROPERTYNAME property of an IsElement instance."""
    try:
        testInstance = markup.IsElement()
    except TypeError, error:
        actual = 'PROPERTYNAME' in str( error )
        self.assertTrue( actual, 'The TypeError raised by trying to '
            'instantiate IsElement should include the "PROPERTYNAME" '
            'abstract method-name' )
    except Exception, error:
        self.fail( 'testPROPERTYNAME expected a TypeError, '
            'but %s was raised instead:\n  - %s' % ( 
                error.__class__.__name__, error
            )
        )
def testMETHODNAME(self):
    """Unit-tests the METHODNAME method of an IsElement instance."""
    try:
        testInstance = markup.IsElement()
    except TypeError, error:
        actual = 'METHODNAME' in str( error )
        self.assertTrue( actual, 'The TypeError raised by trying to '
            'instantiate IsElement should include the "METHODNAME" '
            'abstract method-name' )
    except Exception, error:
        self.fail( 'testMETHODNAME expected a TypeError, '
            'but %s was raised instead:\n  - %s' % ( 
                error.__class__.__name__, error
            )
        )
Since that pattern is simple and established, I won't go into the details of their implementation here, but I'll get them in place and make sure that they run as expected. Bearing in mind that there are still fifteen failures from the still-pending tests of BaseNode, those same failures should still appear, but the number of tests run and passed should increase:
########################################
Unit-test results
########################################
Tests were successful ... False
Number of tests run ..... 56
 + Tests ran in ......... 0.01 seconds
Number of errors ........ 0
Number of failures ...... 15
########################################

With the change made to the markup module's class-zoo (the addition of IsElement in the upper right of the diagram), I didn't get quite as far long in this post as I'd hoped before hitting my post-length cut-off, but I feel like I made solid progress.

There's one more abstract class that I'm going to define in my next post before I can start some actual concrete implementations: HasTextData. With that done, I'll be able to knock out three concrete classes pretty quickly, I think: CDATA, Comment and Text.

The completion of those will also require me to give some thought to exactly how I plan for rendered markup to be issued back out to a browser, so there will be at least some discussion around that as well.

No comments:

Post a Comment