Thursday, April 27, 2017

Generating and Parsing Markup in Python [3]

With two interfaces and one abstract class done, there's one more abstract class that needs attention before I can get to some concrete implementation (finally): the HasTextData abstract class.

HasTextData is a common super-class for the CDATA, Comment and Text concrete classes, and completion of those three classes is my goal for today's post.

What's Common Between These Classes?

Any time there's an abstract class that multiple concrete classes derive from, there's some common basis of functionality that the concrete classes share. That is, after all, one of the reasons that an abstract class gets defined. In the case of HasTextData, that commonality is that all of the derived classes have a text-data property (data in JavaScript, though the textContent property will also return that inner text in JavaScript).

Another common factor, though it may be less obvious, is that all three of these node-types have some rules about what their data can contain. Those rules aren't necessarily active in a client-side JavaScript implementation (likely because some sort of action is taken to prevent destructive or counterproductive content-manipulation — like setting the inner text of a comment to -->). Since the markup being created has to be issued out to a client browser in some fashion after it's been created and/or altered on the server side, though, there is a need to enforce at least some basic content-protection or escaping for all three of those concrete classes, though the specifics of what they are or do will probably vary pretty significantly.

All three of those concrete classes also have to be able to be rendered in some fashion and returned to the client browser as text-data. It could be str- or unicode-typed text, and should probably support both on basic principle, but somewhere along the line whatever any individual instances exist as part of a response, they should contribute something to the source-code of the response. I'll dig in to the rendering aspects of all nodes later on in this post — first, some actual implementation!

The HasTextData Abstract Class

Because, once again, the JavaScript entities that I'm trying to stay consistent with allow more than one property or method entry-point into the underlying data — in this case, both data and textContent properties being capable of getting, setting, or deleting the text-data of a comment- or text-node — I don't have any better option than to have multiple properties defined that allow the same capability.

#-----------------------------------#
# Instance Properties (abstract OR  #
# concrete!)                        #
#-----------------------------------#

data = describe.makeProperty(
    _GetTextData, _SetTextData, _DelTextData, 
    'the raw text-data of the instance', 
    str, unicode
)
textContent = describe.makeProperty(
    _GetTextData, _SetTextData, _DelTextData, 
    'the raw text-data of the instance', 
    str, unicode
)
The property-methods are pretty straightforward, though there is something new in the _SetTextData setter-method:
@describe.AttachDocumentation()
    @describe.argument( 'value',
        'the raw text data to set in the instance',
        str, unicode
    )
    @describe.raises( TypeError, 
        'if a value is passed that is not a str or unicode type'
    )
    # self._SanitizeInput can raise ValueError
    @describe.raises( ValueError, 
        'if a value is passed that cannot be sanitized by the '
        'instance to make rendering safely viable'
    )
    def _SetTextData( self, value ):
        """
Gets the raw text-data of the instance"""
        if type( value ) not in ( str, unicode ):
            raise TypeError( '%s.TextData expects a str or '
                'unicode value, but was passed "%s" (%s)' % ( 
                    self.__class__.__name__, 
                    value, type( value ).__name__
                )
            )
        # Make sure that the supplied raw text-data is safe to 
        # store before storing it.
        value = self._SanitizeInput( value )
        self._textData = value
This method makes an attempt to sanitize the supplied input, with the intention being that the sanitized input will alter anything that could result in rendering issues on the client side. The specifics of the sanitization will vary at least a little bit in the concrete class implementations, so _SanitizeInput is abstracted in HasTextData:
@abc.abstractmethod
@describe.raises( ValueError, 
    'if a value is passed that cannot be sanitized by the '
    'instance to make rendering safely viable'
)
def _SanitizeInput( self, value ):
    raise NotImplementedError( '%s._SanitizeInput is not implemented as '
        'required by HasTextData' % self.__class__.__name__ )

Rendering Considerations

All of the concrete classes, not just CDATA, Comment and Text, will eventually need to be able to be rendered and returned as part of a web request-response cycle. As things stand right now, having done some cursory examination of both mod_python and wsgi response-functionality, I'm inclined to handle that by using the __str__ and/or __unicode__ magic methods that are available to all Python objects.

The main rationale for this is that both mod_python and basic wsgi response-functionality, ultimately, just need to return the text of a response. The specific mechanisms how that response is returned may vary, maybe even vary a lot, but that returning-some-text is the key.

That more or less requires that __str__ and __unicode__ be defined as abstract methods somewhere in the class-hierarchy. Since they should be available to all nodes (and because it'd keep that functional requirement in a single place), I'm going to add them all the way back up in IsNode:

@abc.abstractmethod
def __str__( self ):
    raise NotImplementedError( '%s.__str__ is not implemented as '
        'required by IsNode' % self.__class__.__name__ )

@abc.abstractmethod
def __unicode__( self ):
    raise NotImplementedError( '%s.__unicode__ is not implemented as '
        'required by IsNode' % self.__class__.__name__ )

Unit-testing HasTextData

Of the required test-methods for HasTextData, only one can really be implemented at this point: test_SanitizeInput, following the usual structure for testing that an abstract member is abstract in the class. The rest, all of the property-methods, would all rely on a concrete implementation of _SanitizeInput. There are a couple of different approaches that could be taken at this point to resolve this conundrum:

  • Skip the tests in testHasTextData and make sure that the derived-class tests test them adequately; or
  • Create a HasTextData-derived test-class that has a concrete implementation of _SanitizeInput, then test against that test-class.
Of the two, I prefer the second approach. It requires less testing later on in the test-cases for the concrete classes, doesn't rely on someone remembering that the properties need to be individually tested there later on.

Setting up a test-class is simple in this case:

class HasTextDataDerived( HasTextData ):
    def __init__( self ):
        HasTextData.__init__( self )
    def _SanitizeInput( self, value ):
        return '[Sanitized] %s' % value
Since part of the test-process is to assure that all of the setter-methods are calling the _SanitizeInput of the test-class, it actually needs to alter the value submitted, hence the [Sanitized] addition to the submitted value.

Since the data and textContent properties should both use the same getter-, setter- and deleter-methods, only one of the test-methods between the two required for those properties actually needs to check the functionality. The other one can be tested by asserting that the underlying methods are identical:

def testdata(self):
    """Unit-tests the data property of a HasTextData instance."""
    testObject = HasTextDataDerived()
    # test default state
    self.assertEquals( testObject.data, '', 
        'The default data value for a newly-created instance should '
        'be an empty string' )
    # Test setting then getting all "good" values
    for testValue in UnitTestValuePolicy.Text:
        testObject.data = testValue
        expected = '[Sanitized] %s' % testValue
        actual = testObject.data
        self.assertEquals( actual, expected, 
            'instance.data should equal %s if it was set to %s, '
            'but %s was returned instead' % ( 
                expected, testValue, actual
            )
        )
    # Test setting all "bad" values and keeping the previous state
    testObject.data = 'original value'
    expected = testObject.data
    for testValue in ( 
        UnitTestValuePolicy.Numeric + 
        UnitTestValuePolicy.Boolean.Strict + [ object() ] ):
        try:
            testObject.data = testValue
            self.fail( 'Setting instance.data to a non-string value '
                '("%s" [%s]) should raise a TypeError' % ( 
                    testValue, type( testValue ).__name__
                )
            )
        except TypeError:
            self.assertEquals( testObject.data, expected,
                'Failure to set instance.data should have left it '
                    'set to "%s", but it was re-set to "%s"' %
                    ( expected, testObject.data )
            )

def testtextContent(self):
    """Unit-tests the textContent property of a HasTextData instance."""
    self.assertEquals( 
        HasTextData.textContent.fget, HasTextData.data.fget, 
        'HasTextData.textContent and HasTextData.data '
        'are expected to use the same property-getter method'
    )
    self.assertEquals( 
        HasTextData.textContent.fset, HasTextData.data.fset, 
        'HasTextData.textContent and HasTextData.data '
        'are expected to use the same property-setter method'
    )
    self.assertEquals( 
        HasTextData.textContent.fdel, HasTextData.data.fdel, 
        'HasTextData.textContent and HasTextData.data '
        'are expected to use the same property-deleter method'
    )
Note that we're finally putting the UnitTestValuePolicy constant, defined about a month ago, to use.

With those property-tests in place and passing, the question arises of whether there's any useful testing that can be done of the underlying methods of the properties. This was one of the items that came up when I posted the Unit-Testing Walk-through a couple of weeks back, that I didn't want to get too far into the weeds about at the time, but it's probably a good time to address it in some detail now that there's a more detailed example to look at for context.

In general, and in keeping with the thoroughly tested goal in my coding standards, the unit-testing policy requires that test-methods be defined for all public and protected members. The implication, I hope, is that all the required test-methods should also be implemented — otherwise why have the requirement for their definition. That may well break down in the case of properties and their underlying methods, though. Consider the testdata test-method above. It:

  • Calls the _DelTextData property-deleter method (at least indirectly, during initialization of the HasTextDataDerived test-class, which calls HasTextData.__init__, which calls self._DelTextData);
  • Calls _SetTextData during every property-value assignment; and
  • Calls _GetTextData during every property-value retrieval.
Since the value-assignment calls also use both good values (that should not raise errors) and bad values (that should), every path through every underlying property-method has been executed and shown to behave as expected. Since that is the goal of unit-testing, it follows that the test-methods for the property-methods aren't really needed if the properties that use them test completely and successfully.

I'd rather not try to work out a way to automatically skip, or otherwise remove property-methods from required test-methods, though. Even if it were possible to determine the relationship (my initial tests against that idea lead me to think it's not), doing so feels... fragile, maybe? Although I can't think of a case where I'd expect to need separate tests for the property-methods, I can't rule out that such cases could exist (at least not yet).

Taking all of that together, I think this is sufficient justification for skipping the test-methods of the underlying property-methods, so long as the reason for skipping them is noted:

@unittest.skip( 'Adequately tested by the testdata method' )
def test_DelTextData(self):
    """Unit-tests the _DelTextData method of a HasTextData instance."""
    self.fail( 'test_DelTextData is not implemented' )

@unittest.skip( 'Adequately tested by the testdata method' )
def test_GetTextData(self):
    """Unit-tests the _GetTextData method of a HasTextData instance."""
    self.fail( 'test_GetTextData is not implemented' )

@unittest.skip( 'Adequately tested by the testdata method' )
def test_SetTextData(self):
    """Unit-tests the _SetTextData method of a HasTextData instance."""
    self.fail( 'test_SetTextData is not implemented' )
That leaves the unit-test results:
########################################
Unit-test results
########################################
Tests were successful ... False
Number of tests run ..... 67
 + Tests ran in ......... 0.01 seconds
Number of errors ........ 0
Number of failures ...... 15
########################################

Implementing CDATA, Comment and Text Classes

Since I don't have a complete class-diagram (with all of the members of the items I've defined so far), I started by creating stub-classes for CDATA, Comment and Text, then created a test-case for one of them (I picked testCDATA) to get a list of all of the members that will need to be defined for all three concrete classes. The test-case returned:

TypeError: Can't instantiate abstract class CDATA with 
abstract methods 
    _SanitizeInput, __str__, __unicode__, cloneNode, 
    isEqualNode, nodeName, nodeType, textContent, 
    toString
Since all three of these concrete classes derive from BaseNode and HasTextData, this list should hold true for all of them, at least as a starting-point.

Or it would, except that I noticed that textContent was appearing in the list. And I just got finished implementing textContent in HasTextData! As it turns out, the reason this happened was pretty simple, I'd just forgotten some of the rules about Python's MRO. To explain, let me start by showing the original definition of CDATA I had:

@describe.InitClass()
class CDATA( BaseNode, HasTextData, object ):
    """
Represents a CDATA section in a markup tree"""
# ...
BaseNode and HasTextData both define the textContent property of an instance — one (BaseNode) as an abstract property requirement that it inherits from IsNode, the other (HasTextData) as a concrete property that, in theory, should be fulfilling that interface contract. The problem is that when Python reads super-classes, they are handled last-to-first, so that BaseNode.textContent ends up overriding HasTextData.textContent. This can be shown by switching the order of those super-classes...
@describe.InitClass()
class CDATA( HasTextData, BaseNode, object ):
    """
Represents a CDATA section in a markup tree"""
# ...
...and re-running the unit-test results, yielding:
TypeError: Can't instantiate abstract class CDATA with 
abstract methods 
    _SanitizeInput, __str__, __unicode__, cloneNode, 
    isEqualNode, nodeName, nodeType, toString
With that change, textContent no longer appears in the list of abstract members that need to be implemented in the concrete class.

If I haven't mentioned it before, I'll say it now: One of the reasons that I like Python is that it allows multiple inheritance. That makes a lot of class-structure design cleaner, I think, since it's possible to keep all functionality relating to a single aspect of multiple classes' functionality in a single place in the code. That usually eliminates, but at a minimum reduces the likelihood of needing and implementing duplicate code across multiple classes. There are some trade-offs that arise, though, and this sort of inheritance-order dependency is an example of one of them — the code becomes more sensitive to the specific order of inheritance. There are a at least two different ways this could be dealt with.

The first is the simple reversal that I already showed. The caveat with that approach is that the class-definitions are a bit more fragile — particularly if yet another class gets added into the mix for any of the concrete classes. That's a minimal risk at this point, though, I thnk — while there are two places that textContent is defined, and there may be other properties that will have similar duplicated definitions, I don't expect that there are any more that have the kind of combination that textContent has: and abstract requirement and a concrete implementation originating from different places in the inheritance tree.

The other would be to change the ineritance-tree. Right now the problem is that BaseNode (with its IsNode parent) lives in a completely separate branch of the tree than HasTextData does. If HasTextData were moved so that it's derived from BaseNode, then the textContent of BaseNode would be overridden by HasTextData:

If there is a caveat with this approach, it'd be that the resulting inheritance-structure is, perhaps, starting to get too deep. That, ultimately, is a matter of opinion, but I feel that three parent inheritance levels is about as deep as I'm comfortable with, at least in this particular case. I like this approach, apart from my reservations about the depth of the class-hierarchy. It keeps the inheritance path cleaner, and just... feels better, really. The only other change that it will require will be adding a bunch of dummy methods (all the ones that didn't exist as requirements before) in the HasTextDataDerived derived class in test_markup, but those don't need to be anything more complex than:
class HasTextDataDerived( HasTextData ):
    def __init__( self ):
        HasTextData.__init__( self )
    def _SanitizeInput( self, value ):
        return '[Sanitized] %s' % value
    def __str__( self ):
        pass
    def __unicode__( self ):
        pass
    def cloneNode( self ):
        pass
    def isEqualNode( self ):
        pass
    def nodeName( self ):
        pass
    def nodeType( self ):
        pass
    def toString( self ):
        pass

Some Commonalities in these Classes

Looking at the list of dummy methods above, it occured to me that most of the methods listed there, all of them from cloneNode on, could be moved to HasTextData as concrete implementations.

The cloneNode method really doesn't need to do anything more than create and return a new instance of the class, populated with the data of the instance being cloned. That can be done with something pretty simple:

@describe.AttachDocumentation()
@describe.argument( 'deep', 
    'indicates whether to make a "deep" copy (True) or '
    'not (False); irrelevant for HasTextData nodes',
    bool
)
@describe.returns( 'a new instance of the class, populated '
    'with the data of the current instance' )
def cloneNode( self, deep=False ):
    """
Clones the instance."""
    return self.__class__( self._textData )
Testing it is pretty simple:
def testcloneNode(self):
    """Unit-tests the cloneNode method of a HasTextData instance."""
    # Test instances using all "good" values
    for testValue in UnitTestValuePolicy.Text:
        instance1 = HasTextDataDerived( testValue )
        instance2 = instance1.cloneNode()
        self.assertEquals( instance1.__class__, instance2.__class__, 
            'cloneNode should return the same type of object, '
            'but instance2 was a %s, not a %s' % ( 
                instance2.__class__.__name__, 
                instance1.__class__.__name__
            )
        )
        self.assertEquals( instance1.data, instance2.data, 
            'an instance returned by cloneNode should have the same data '
            'as the original instance, but the cloned instance had "%s" '
            'instead of "%s"' % ( instance2.data, instance1.data )
        )

isEqualNode is similarly simple:

@describe.AttachDocumentation()
@describe.argument( 'deep', 
    'indicates whether to make a "deep" copy (True) or '
    'not (False); irrelevant for HasTextData nodes',
    bool
)
@describe.returns( 'True if the other node is the same type and '
    'has the same data as the instance, False otherwise.' )
def isEqualNode( self, other ):
    """
Compares the instance against another item."""
    return ( 
        self.__class__ == other.__class__
        and self.data == other.data
    )
This approach also makes the original criteria-list for isEqualNode from the w3schools site moot — If the instance and the other object are of the same type, they'll have all of the same values common to any instance of the class, so there's no need to do anything more than compare the classes of self and other and the data values of them. The test-method requires the creation of another class derived from HasTextData, but it's pretty much a carbon copy of the original derived test-class (HasTextDataDerived), and is also very simple:
def testisEqualNode( self ):
    """Unit-tests the isEqualNode method of a HasTextData instance."""
    # Test instances using all "good" values
    for testValue in UnitTestValuePolicy.Text:
        instance1 = HasTextDataDerived( testValue )
        # same class, same content
        instance2 = HasTextDataDerived( testValue )
        self.assertTrue( instance1.isEqualNode( instance2 ), 
            'Same class and same content should return isEqualNode '
            'of True' )
        # different class, same content
        instance2 = HasTextDataDerived2( testValue )
        self.assertFalse( instance1.isEqualNode( instance2 ), 
            'Different class and same content should return isEqualNode '
            'of False' )
        # same class, different content
        instance2 = HasTextDataDerived( 'other content' )
        self.assertFalse( instance1.isEqualNode( instance2 ), 
            'Same class and different content should return isEqualNode '
            'of False' )
        # different class and different content
        instance2 = HasTextDataDerived2( 'other content' )
        self.assertFalse( instance1.isEqualNode( instance2 ), 
            'Different class and different content should return '
            'isEqualNode of False' )

The nodeName and nodeType propeties can be defined to return a class-level attribute that is defined as None in HasTextData, and that will set to a different value in the concrete classes:

#-----------------------------------#
# Class attributes (and instance-   #
# attribute default values)         #
#-----------------------------------#

_nodeName = None
_nodeType = None

#-----------------------------------#
# Instance property-getter methods  #
#-----------------------------------#

@describe.AttachDocumentation()
def _GetnodeName( self ):
    """
Gets the (class-constant) name of the node"""
    try:
        result = self.__class__._nodeName
        if result == None:
            raise AttributeError()
    except AttributeError:
        raise AttributeError( '%s does not have a class-level '
            'definition of _nodeName, or it is inheriting the None '
            'value defined by HasTextData' % ( 
                self.__class__.__name__
                )
            )
    return result

@describe.AttachDocumentation()
def _GetnodeType( self ):
    """
Gets the (class-constant) type of the node"""
    try:
        result = self.__class__._nodeType
        if result == None:
            raise AttributeError()
    except AttributeError:
        raise AttributeError( '%s does not have a class-level '
            'definition of _nodeType, or it is inheriting the None '
            'value defined by HasTextData' % ( 
                self.__class__.__name__
                )
            )
    return result

# ...

#-----------------------------------#
# Instance Properties (abstract OR  #
# concrete!)                        #
#-----------------------------------#

nodeName = describe.makeProperty(
    _GetnodeName, None, None, 
    'the (class-constant) name of the node', 
    str, unicode
)
nodeType = describe.makeProperty(
    _GetnodeType, None, None, 
    'the (class-constant) type of the node', 
    str, unicode
)
The test-methods for those:
def testnodeName(self):
    """Unit-tests the nodeName property of a HasTextData instance."""
    instance = HasTextDataDerived()
    actual = instance.nodeName
    expected = HasTextDataDerived._nodeName
    self.assertEquals( actual, expected,
        'An instance of HasTextData with a defined _nodeName '
        'should return that value in its nodeName proeprty, but '
        '"%s" (%s) was returned instead' % ( 
            actual, type( actual ).__name__
        )
    )
    instance = HasTextDataDerived2()
    try:
        actual = instance.nodeName
        expected = HasTextDataDerived._nodeName
        self.fail( 'An instance of HasTextData that does not have '
            'a _nodeName class-propery defined should raise an '
            'AttributeError if nodeName is retrieved' )
    except AttributeError:
        pass

def testnodeType(self):
    """Unit-tests the nodeType property of a HasTextData instance."""
    instance = HasTextDataDerived()
    actual = instance.nodeType
    expected = HasTextDataDerived._nodeType
    self.assertEquals( actual, expected,
        'An instance of HasTextData with a defined _nodeType '
        'should return that value in its nodeType proeprty, but '
        '"%s" (%s) was returned instead' % ( 
            actual, type( actual ).__name__
        )
    )
    instance = HasTextDataDerived2()
    try:
        actual = instance.nodeType
        expected = HasTextDataDerived._nodeType
        self.fail( 'An instance of HasTextData that does not have '
            'a _nodeType class-propery defined should raise an '
            'AttributeError if nodeType is retrieved' )
    except AttributeError:
        pass
with the modified derived test-classes as:
class HasTextDataDerived( HasTextData ):
    _nodeName = '#hasTextDataDerived'
    _nodeType = 1024
    def __init__( self, textData=None ):
        HasTextData.__init__( self, textData )
    def _SanitizeInput( self, value ):
        if value[ 0:12 ] != '[Sanitized] ':
            return '[Sanitized] %s' % value
        else:
            return value
    def __str__( self ):
        pass
    def __unicode__( self ):
        pass
    def toString( self ):
        pass

class HasTextDataDerived2( HasTextData ):
    def __init__( self, textData=None ):
        HasTextData.__init__( self, textData )
    def _SanitizeInput( self, value ):
        if value[ 0:12 ] != '[Sanitized] ':
            return '[Sanitized] %s' % value
        else:
            return value
    def __str__( self ):
        pass
    def __unicode__( self ):
        pass
    def toString( self ):
        pass

Finally, the toString methods. In JavaScript, toString returns a decription of the node rather than its content:

comment = document.createComment( 'comment-node' );
text = document.createTextNode( 'text-node' );
comment.toString();
text.toString();
yields
"[object Comment]"
"[object Text]"
That strikes me as being directly equivalent to the built-in __repr()__ method, which returns something looking like this:
<[module].[class-name] object at [hex-number]>
I'll use that, then. Since all that will do is return the __repr__() results for the instance, I see no reason not to just skip the unit-test for it. The actual implementation of HasTextData.toString is dead simple:
@describe.AttachDocumentation()
@describe.returns( 'A string description of the instance' )
def toString( self ):
    """
Returns a description of the instance"""
    return self.__repr__()

I hadn't expected to do all the shuffling of functionality into HasTextData that I've done and show, so this is getting long, but I really want to get the concrete classes that derive from it finished before I call it a day. Fortunately, all that movement of functionality doesn't leave much to implement in them: All that they really need is implementation of _SanitizeInput, __str__ and __unicode__.

Final Implementation of CDATA

The main purposes that the remaining required methods of CDATA serve are to ensure that the data, when sent to a client browser, won't be broken (_SanitizeInput) and to provide rendered output of the instance, allowing for normal string-values and unicode values both (__str__ and __unicode__). None of these are particularly difficult operations:

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'the text-value to sanitize',
    str, unicode
)
@describe.raises( ValueError, 
    'if a value is passed that cannot be sanitized by the '
    'instance to make rendering safely viable'
)
@describe.returns( 'A sanitized str or unicode value' )
def _SanitizeInput( self, value ):
    """
Sanitizes the provided input-value to make sure it's safe to store and issue 
to a client browser"""
    if ']]>' in value:
        raise TypeError( '%s cannot contain "]]>" as a literal value '
            'in its text-content.' % ( self.__class__.__name__ ) )
    return value

@describe.AttachDocumentation()
@describe.returns( 'The instance rendered as a str' )
def __str__( self ):
    """
Renders the instance as a string value"""
    return '<![CDATA[ %s ]]>' % ( self.data )

@describe.AttachDocumentation()
@describe.returns( 'The instance rendered as a unicode' )
def __unicode__( self ):
    """
Renders the instance as a unicode value"""
    return u'<![CDATA[ %s ]]>' % ( self.data )

Of the three, _SanitizeInput probably requires the most explanation. If it were to allow data values that contained ]]> then it would be possible for a CDATA to render as something like <![CDATA[ This is my CDATA content.]]> ]]> — and that would cause rending issues in the client browser that the rendered CDATA was handed off to.

While there is provision through the __unicode__ method for unicode content-output, I may still need to work out some sort of mechanism or process that will allow a __str__ call to call an instance's __unicode__ instead, if there is reason for doing so. I suspect that will involve checking for various unicode errors (UnicodeDecodeError and UnicodeEncodeError, perhaps?), but I'm not sure yet how that's going to work, or even if it'll be necessary.

Final Implementation of Comment

The same three methods in Comment look very much like their counterparts in CDATA, and for much the same reasons:

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'the text-value to sanitize',
    str, unicode
)
@describe.raises( ValueError, 
    'if a value is passed that cannot be sanitized by the '
    'instance to make rendering safely viable'
)
@describe.returns( 'A sanitized str or unicode value' )
def _SanitizeInput( self, value ):
    """
Sanitizes the provided input-value to make sure it's safe to store and issue 
to a client browser"""
    if '-->' in value:
        raise TypeError( '%s cannot contain "-->" as a literal value '
            'in its text-content.' % ( self.__class__.__name__ ) )
    return value

@describe.AttachDocumentation()
@describe.returns( 'The instance rendered as a str' )
def __str__( self ):
    """
Renders the instance as a string value"""
    return '<!-- %s -->' % ( self.data )

@describe.AttachDocumentation()
@describe.returns( 'The instance rendered as a unicode' )
def __unicode__( self ):
    """
Renders the instance as a unicode value"""
    return u'<!-- %s -->' % ( self.data )

Final Implementation of Text

The only consideration for sanitizing the data of a Text is to make sure that it isn't going to accidentally include any tag-items in its rendered output. Ensuring that is a sinple mater of escaping any < characters during the process of setting its data. Technically, that should be all that's required, since browsers are usually pretty good about just rendering > characters if they aren't part of a detectable tag-structure, but in the interests of making sure that tag-delimiter characters are all escaped, I'm going to escape both < and >.

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'the text-value to sanitize',
    str, unicode
)
@describe.raises( ValueError, 
    'if a value is passed that cannot be sanitized by the '
    'instance to make rendering safely viable'
)
@describe.returns( 'A sanitized str or unicode value' )
def _SanitizeInput( self, value ):
    """
Sanitizes the provided input-value to make sure it's safe to store and issue 
to a client browser"""
    sanitized = value.replace( '<', '&lt;' )
    sanitized = sanitized.replace( '>', '&gt;' )
    return sanitized

@describe.AttachDocumentation()
@describe.returns( 'The instance rendered as a str' )
def __str__( self ):
    """
Renders the instance as a string value"""
    return str( self.data )

@describe.AttachDocumentation()
@describe.returns( 'The instance rendered as a unicode' )
def __unicode__( self ):
    """
Renders the instance as a unicode value"""
    return unicode( self.data )

That gets me just under 50% of the way done with the markup module's classes:

It's a bit early for a full snapshot of the current idic package, but since I didn't show all of the code for the work done today, it seems fair to set up downloads of the current markup.py and test_markup.py files before I sign off for the day:

40.6kB

Tuesday, April 25, 2017

Generating and Parsing Markup in Python [2]

With one interface defined, and most of the module design and DOM-compliance properties and methods figured out, today's post will continue with some concrete implementation and more interface definition. I'm going to get as far through all of the non-concrete implementations as I can, since the foundations they provide, while critically important in the long run, aren't as demonstrable as the concrete markup implementations.

Defining the BaseNode Abstract Class

BaseNode is the first abstract class I'm going to tackle in the markup module, and the first definition of any concrete functionality there. Since it's intended to provide some concrete implementation while just passing some of the abstraction from IsNode on to the concrete classes that will derive from it, there isn't a lot of concrete implementation, though.

Implementing and Testing the Concrete Properties

Since I've noted in some detail in my coding standards exactly how I prefer to implement instance properties, I'll be focusing more on how the properties get their jobs done than what the code underneath the public interface really looks like. BaseNode will provide concrete implementations for six properties that are required by the IsNode interface:

  • nextElementSibling;
  • nextSibling;
  • parentElement;
  • parentNode;
  • previousElementSibling; and
  • previousSibling
Four of those properties, the next* and previous* items, rely on their corresponding parent* property — if the instance in question doesn't have a parent of the appropriate type in the parent* property, then there can't be a next* or previous* value. Those parent* properties, then, need to be worked out first.

What Constitutes a parent Anyway? 

The short and obvious answer is probably best expressed as an IsNode instance that has the various *child* properties and *Child methods. That's something that I haven't really addressed in any detail yet. A fairly complete list of the properties and methods that involve child nodes, taken from the big list presented in the last post, would include:

  markup Module Equivalent Class
Member Name Comment Tag Text
Property Members
childElementCount n/a number n/a
childNodes object object object
children n/a object n/a
firstChild null object null
firstElementChild n/a null n/a
lastChild null object null
lastElementChild n/a null n/a
Method Members
appendChild function function function
contains function function function
getElementsByClassName n/a function n/a
getElementsByTagName n/a function n/a
hasChildNodes function function function
insertBefore function function function
replaceChild function function function

At present, in the current class-relationships diagram, there really isn't any single interface, abstract class or class that feels to me like the right place to set those up: That said, there's only one concrete class so far that actually needs any of those members: Tag — although down the line, any concrete document-classes that derive from BaseDocument (which in turn derives from Tag) will need those as well. From an architectural standpoint, that feels to me like a need for an interface (call it IsElement for now) at a minimum that Tag will implement, and that ties in to the various *child*-related members listed above.

I'll plan on working out IsElement right after I finish with BaseNode, then.

Another item for consideration: In the JavaScript DOM functionality that I'm trying to keep consistent with, there are two distinct parent-types: Elements (tags) and nodes (everything else). I've already established that non-tags really can't have children in at least one major browser-engine branch (webkit, used by Safari, Chrome and Chromium). In Firefox (mozilla), it's not much different — the specifics of the error-messages are different, but the fundamental DOM-object relationship is the same: text-nodes can't have children. That, then, begs the question: Why is there a parentElement and a parentNode method? Particularly since running code like this:

tag = document.createElement( 'tag' );
text = document.createTextNode( 'this is a text-node' );
tag.appendChild( text );
console.log( 'text.parentElement ..................... ' + 
    text.parentElement );
console.log( 'text.parentElement == tag .............. ' + 
    ( text.parentElement == tag ) );
console.log( 'text.parentElement.isSameNode( tag ) ... ' + 
    text.parentElement.isSameNode( tag ) );
console.log( 'text.parentNode ........................ ' + 
    text.parentNode );
console.log( 'text.parentNode == tag ................. ' + 
    ( text.parentNode == tag ) );
console.log( 'text.parentNode.isSameNode( tag ) ...... ' + 
    text.parentNode.isSameNode( tag ) );
yields results showing that parentNode and parentElement return the same tag-element:
text.parentElement ..................... [object HTMLUnknownElement]
text.parentElement == tag .............. true
text.parentElement.isSameNode( tag ) ... true
text.parentNode ........................ [object HTMLUnknownElement]
text.parentNode == tag ................. true
text.parentNode.isSameNode( tag ) ...... true
It just seems... weird, I guess. I hope that it's some sort of concession made for backwards compatibility, but I can't be sure that's the case. On top of that, I can't think of a use-case at all where parentNode wouldn't return the same thing as parentElement. Non-element nodes can't have children, and thus can't be parents, bu the naming convention in the JavaScript DOM-methods seems to be pretty consistent in *Element* methods returning elements (tags) only, while *Node* methods can return any node-type, including elements.

However, in the interests of preserving the DOM-object consistency that I want to preserve, I guess I'll have to keep both those properties. That doesn't mean, though, that I need to have separate property-getters for each, though!

The parent, parentElement and parentNode Properties

It may be simpler to just show the code in this case, then note the differences from my usual patterns:

#-----------------------------------#
# Instance property-getter methods  #
#-----------------------------------#

@describe.AttachDocumentation()
@describe.returns( 'IsElement instance or None' )
def _GetParent( self ):
    """
Returns the IsElement object that the instance is a child of, or None if there is 
no parent-child relationship available for the instance."""
    return self._parent

#-----------------------------------#
# Instance property-setter methods  #
#-----------------------------------#

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'the object to set as the parent of the instance',
    IsElement
)
def _SetParent( self, value ):
    """
Sets the instance's parent to the supplied IsElement object."""
    if not isinstance( IsElement, value ):
        raise TypeError( '%s.parent expects an instance of a class '
            'derived from IsElement, but was passed "%s" (%s), '
            'which is not one' % ( 
                self.__class__.__name__, value.__repr__(), 
                type( value ).__name__
            )
        )
    self._parent = value

#-----------------------------------#
# Instance property-deleter methods #
#-----------------------------------#

def _DelParent( self ):
    """
Deletes the instance's parent relationship by setting it to None"""
    self._parent = None

#-----------------------------------#
# Instance Properties (abstract OR  #
# concrete!)                        #
#-----------------------------------#

parent = describe.makeProperty(
    _GetParent, None, None, 
    'the IsElement object that the instance is a child of', 
    IsElement, None
)
parentElement = describe.makeProperty(
    _GetParent, None, None, 
    'the IsElement object that the instance is a child of', 
    IsElement, None
)
parentNode = describe.makeProperty(
    _GetParent, None, None, 
    'the IsElement object that the instance is a child of', 
    IsElement, None
)
What all of this provides is a set of three different properties (parent, parentElement and parentNode) that are all pointed at the same property-getter method (_GetParent). If, in the future, there's a demonstrable need to separate those out for some reason, it should be a relatively simple matter of creating a new property-getter, -setter and -deleter method-set, then re-assigning the methods in whichever property-declaration need to point to the new method(s). The one concern that I think would come up in that sort of scenario is what would have to happen to the parent property. Right now, the three are completely interchangeable, but if an actual difference between parentElement and parentNode ever surfaces, the parent property may well need to be deprecated or even removed, rather than linger there being confusing.

The nextElementSibling, nextSibling, previousElementSibling and previousSibling Properties

There is a common theme that runs across these four properties, all based around the idea that if the instance has a parent, then that parent has children and childNodes properties that are a sequence of IsElement- and IsNode-derived objects, respectively. Given that, all of these methods need to look at all of the instance's parent's childNodes (which will always include all IsNode-derived types), find the position of the instance whose sibling is being sought in that sequence, then return the previous or next node or element before or after that position, respectively. The first step, finding the position of the instance in its parent's childNodes is common across all four methods.

The _GetnextElementSibling and _GetnextSibling getter-methods show the determination of the index of the instance in its parent's collection of children (parentIndex in parent.childNodes) and the slicing of those childNodes to retrieve everything after the instance in that collection. _GetnextElementSibling also shows filtering of that slice, so that only IsElement items will be considered as candidates for the return value.

@describe.AttachDocumentation()
@describe.returns( 'IsElement object or None' )
def _GetnextElementSibling( self ):
    """
Gets the next IsElement element in the instance's parent's children after the 
instance's position in that sequence of objects"""
    if self.parent:
        # Get the index of the instance in its parent's collection 
        # of children. If this fails, there's an issue with adding 
        # children somewhere else...
        parentIndex = self.parent.childNodes.index( self )
        # Get a slice of the parent's children that captures all the 
        # children *after* the index
        nodesAfter = self.parent.childNodes[ parentIndex + 1: ]
        # Since this is an "element" property, filter those down to just 
        # IsElement members
        nodesAfter = [ 
            node for node in nodesAfter 
            if isinstance( IsElement, node )
        ]
        if len( nodesAfter ) > 0:
            # If there's at least two items, return the first one in 
            # the list
            return nodesAfter[ 0 ]
        else:
            # Otherwise, there aren't any *elements* after the instance, 
            # so return None
            return None
    else:
        # The instance has no parent, and thus there are no siblings.
        return None

@describe.AttachDocumentation()
@describe.returns( 'IsNode object or None' )
def _GetnextSibling( self ):
    """
Gets the next IsNode element in the instance's parent's children after the 
instance's position in that sequence of objects"""
    if self.parent:
        # Get the index of the instance in its parent's collection 
        # of children. If this fails, there's an issue with adding 
        # children somewhere else...
        parentIndex = self.parent.childNodes.index( self )
        # Get a slice of the parent's children that captures all the 
        # children *after* the index
        nodesAfter = self.parent.childNodes[ parentIndex + 1: ]
        if len( nodesAfter ) > 0:
            # If there's at least two items, return the first one in 
            # the list
            return nodesAfter[ 0 ]
        else:
            # Otherwise, there aren't any *elements* after the instance, 
            # so return None
            return None
    else:
        # The instance has no parent, and thus there are no siblings.
        return None
Really, the only major difference between _GetnextElementSibling and _GetnextSibling is whether the intermediate list (nodesAfter) is filtered.

The same basic pattern is used in _GetpreviousElementSibling and _GetpreviousSibling, including the filtering or non-filtering of the intermediate results (nodesBefore). The major difference between either _Getprevious* method and its _Getnext* counterpart is the initial slice of the instance's parent.childNodes:

            # Get a slice of the parent's children that captures all the 
            # children *before* the index
            nodesBefore = self.parent.childNodes[ 0:parentIndex - 1 ]
The filtering aspect in _GetpreviousElementSibling is identical to the code above for _GetnextElementSibling, and doesn't exist at all in _GetpreviousSibling.

Implementing and Testing the Concrete Methods

I'd originally expected to implement only two of the abstract methods of IsNode in BaseNode: IsEqualNode and isSameNode. With the addition of the IsElement interface to the markup class-zoo, though, any concrete implementation of isEqualNode will, I think, have to be moved out to the concrete classes — since those are the most-shallow points in the inheritance structure where all of the various properties that the method needs will actually exist.

That leaves isSameNode as the only concrete method-implementation of BaseNode:

@describe.AttachDocumentation()
@describe.argument( 'node', 
    'the node-object to compare to the instance to '
    'see if they are the same',
    IsNode
)
@describe.raises( TypeError,
    'if passed a node value that is not an IsNode instance'
)
@describe.returns( 
    'True if the supplied node is the same node-object as '
    'the instance, False otherwise'
)
def isSameNode( self, node ):
    """
Determines if a supplied node is the same node-object as 
the instance"""
    if not isinstance( node, IsNode ):
        raise TypeError( '%s.isSameNode expects an instance '
            'of IsNode for comparison, but was passed '
            '"%s" (%s)' % ( 
                self.__class__.__name__, 
                node, type( node ).__name__
            )
        )
    # If the node is the same object, it will 
    # have the same id, so:
    return id( self ) == id( node )

Normally, I'd also be looking to implement unit-tests of BaseNode, now that all of its concrete implementation is complete. In this case, because all of the *Sibling properties require participation in a node-tree structure that won't be available until I have both IsElement and a concrete class that derives from it implemented (Tag in this case), I only went as far as getting the test-method requirements stubbed out, along the lines of:

def testpreviousSibling(self):
    """Unit-tests the previousSibling property of a BaseNode instance."""
    self.fail( 'testpreviousSibling is not implemented' )
and
def testisSameNode(self):
    """Unit-tests the isSameNode method of a BaseNode instance."""
    self.fail( 'testisSameNode is not implemented' )
That means that I'll have several test-failures for a while:
########################################
Unit-test results
########################################
Tests were successful ... False
Number of tests run ..... 37
 + Tests ran in ......... 0.01 seconds
Number of errors ........ 0
Number of failures ...... 15
########################################
I could implement a dummy class in the test-module that derives from BaseNode, and test against that class, and if there weren't a concrete class expected that would serve that purpose, that's exactly what I'd do. Since I will have one, eventually, that just feels... wasteful, I guess, so I'd rather get Tag operational and then come back to these tests. Until then, I'll just have to live with these test-failures.

Defining the IsElement Interface

Between the previous post and the breakdown of members needed in IsElement above, there's really not much discussion needed, I think, nor a whole lot of code to show and explain.

The Abstract Properties of IsElement

On basic principle, I did do another run through the members of an element listed at the w3schools site, just to ensure that I didn't miss any. What I netted out with for properties in IsElement was:

childElementCount = abc.abstractproperty()
childNodes = abc.abstractproperty()
children = abc.abstractproperty()
firstChild = abc.abstractproperty()
firstElementChild = abc.abstractproperty()
lastChild = abc.abstractproperty()
lastElementChild = abc.abstractproperty()

There were other properties that I had to think about too, though — accessKey (which may well be globally available at the implementation-level of a Tag), and attributes (which absolutely is a Tag property). When push came to shove, though I opted to implement those as concrete members of Tag rather than drop them into the IsElement interface. The rationale for that decision was mostly based on the realization that the only other elements that I'm expecting to be concerned with are documents, and my current plan is to derive a BaseDocument abstract class from Tag anyway. In that scenario, all of the other properties and methods would be implemented by Tag, and available to documents through their derivation from BaseDocument anyway. Those members include all tag-level properties that are also attributes in markup, as well as any properties that aren't specifically related in some way to having, working with, or altering a parent-child relationship between a Tag and any other IsNode instance.

The Abstract Methods of IsElement

The same criteria noted above for properties was also used to cull down the list of methods that would be required by IsElement, for pretty much te same reasons. The resulting abstract methods are:

@abc.abstractmethod
def appendChild( self, child ):
    raise NotImplementedError( '%s.appendChild is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def hasChildNodes( arg1, arg2=None, *args, **kwargs ):
    raise NotImplementedError( '%s.hasChildNodes is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def insertBefore( self, newChild, existingChild ):
    raise NotImplementedError( '%s.insertBefore is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def insertChildAt( self, newChild, index ):
    raise NotImplementedError( '%s.insertChildAt is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def removeChild( self, child ):
    raise NotImplementedError( '%s.removeChild is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def removeChildAt( self, index ):
    raise NotImplementedError( '%s.removeChildAt is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def removeSelf( self ):
    raise NotImplementedError( '%s.removeSelf is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

@abc.abstractmethod
def replaceChild( self, newChild, existingChild ):
    raise NotImplementedError( '%s.RequiredMethod is not implemented as '
        'required by IsElement' % self.__class__.__name__ )

Testing the Abstract Members of IsElement

The unit-testing of the abstract members of IsElement follows the pattern established by the testing of IsNode members shown in my previous post, with the hopefully-obvious change of class-name being tested:

def testPROPERTYNAME(self):
    """Unit-tests the PROPERTYNAME property of an IsElement instance."""
    try:
        testInstance = markup.IsElement()
    except TypeError, error:
        actual = 'PROPERTYNAME' in str( error )
        self.assertTrue( actual, 'The TypeError raised by trying to '
            'instantiate IsElement should include the "PROPERTYNAME" '
            'abstract method-name' )
    except Exception, error:
        self.fail( 'testPROPERTYNAME expected a TypeError, '
            'but %s was raised instead:\n  - %s' % ( 
                error.__class__.__name__, error
            )
        )
def testMETHODNAME(self):
    """Unit-tests the METHODNAME method of an IsElement instance."""
    try:
        testInstance = markup.IsElement()
    except TypeError, error:
        actual = 'METHODNAME' in str( error )
        self.assertTrue( actual, 'The TypeError raised by trying to '
            'instantiate IsElement should include the "METHODNAME" '
            'abstract method-name' )
    except Exception, error:
        self.fail( 'testMETHODNAME expected a TypeError, '
            'but %s was raised instead:\n  - %s' % ( 
                error.__class__.__name__, error
            )
        )
Since that pattern is simple and established, I won't go into the details of their implementation here, but I'll get them in place and make sure that they run as expected. Bearing in mind that there are still fifteen failures from the still-pending tests of BaseNode, those same failures should still appear, but the number of tests run and passed should increase:
########################################
Unit-test results
########################################
Tests were successful ... False
Number of tests run ..... 56
 + Tests ran in ......... 0.01 seconds
Number of errors ........ 0
Number of failures ...... 15
########################################

With the change made to the markup module's class-zoo (the addition of IsElement in the upper right of the diagram), I didn't get quite as far long in this post as I'd hoped before hitting my post-length cut-off, but I feel like I made solid progress.

There's one more abstract class that I'm going to define in my next post before I can start some actual concrete implementations: HasTextData. With that done, I'll be able to knock out three concrete classes pretty quickly, I think: CDATA, Comment and Text.

The completion of those will also require me to give some thought to exactly how I plan for rendered markup to be issued back out to a browser, so there will be at least some discussion around that as well.

Thursday, April 20, 2017

Generating and Parsing Markup in Python [1]

The first thing that I'm going to do in building out the markup module's class-structure is to figure out where all of the various members of those classes originate, and at what point they are concrete. One of my priorities, as mentioned before, is to try and keep as much similarity between the classes and their members in the markup module and the equivalent DOM objects in typical JavaScript implementations on the client side.

Conforming to DOM Conventions

I can't really meet that goal, conforming to the interfaces of DOM elements (tags, text-nodes, comments and CDATA sections) until I know what members they expose in a browser context. What I did, then, to determine that was write a chunk of JavaScript living in a bare-bones HTML page (download below) that iterates over the list of properties and methods listed on the w3schools.com site, checked an instance of each node-type (except CDATA sections, more on that in a bit) for each property- and method-member that might be available, and reported what came back in that check-process. If a given element did not report that it had the member, then the equivalent class-member in the markup module could be skipped. If the check returned an expected type, like a function for a method, that member should be kept. Anything else that came back will require some additional discovery.

I'd originally included CDATA sections in my collection of objects to examine, but the browser that I ran the page against (Chromium) wouldn't actually allow the creation of a CDATA section, even though it has a document.createCDATASection method. Creation of CDATA sections is not supported for HTML documents according to the error-message I got back. The closest to an actual CDATA that I could get was a comment that contained all of the CDATA's original content, plus the [CDATA[ start and ]] end text. As a result, I don't really know what a CDATA's members look like without doing more digging around. For the time being, I'm willing to leave that be, though — the Comment, Tag and Text classes will likely suffice for my needs for the time being.

The breakdown I got back from that analysis-script was:

  markup Module Equivalent Class
Member Name Comment Tag Text
Property Members
accessKey n/a string n/a
attributes n/a object n/a
childElementCount n/a number n/a
childNodes object object object
children n/a object n/a
classList n/a object n/a
className n/a string n/a
clientHeight n/a number n/a
clientLeft n/a number n/a
clientTop n/a number n/a
clientWidth n/a number n/a
contentEditable n/a string n/a
dir n/a string n/a
firstChild null object null
firstElementChild n/a null n/a
id n/a string n/a
innerHTML n/a string n/a
isContentEditable n/a boolean n/a
lang n/a string n/a
lastChild null object null
lastElementChild n/a null n/a
namespaceURI n/a string n/a
nextElementSibling null null null
nextSibling null null null
nodeName string string string
nodeType number number number
nodeValue string null string
offsetHeight n/a number n/a
offsetLeft n/a number n/a
offsetParent n/a null n/a
offsetTop n/a number n/a
offsetWidth n/a number n/a
ownerDocument object object object
parentElement null null object
parentNode null null object
previousElementSibling null null null
previousSibling null null null
scrollHeight n/a number n/a
scrollLeft n/a number n/a
scrollTop n/a number n/a
scrollWidth n/a number n/a
style n/a object n/a
tabIndex n/a number n/a
tagName n/a string n/a
textContent string string string
title n/a string n/a
Method Members
addEventListener function function function
appendChild function function function
blur n/a function n/a
click n/a function n/a
cloneNode function function function
compareDocumentPosition function function function
contains function function function
focus n/a function n/a
getAttribute n/a function n/a
getAttributeNode n/a function n/a
getElementsByClassName n/a function n/a
getElementsByTagName n/a function n/a
getFeature n/a n/a n/a
hasAttribute n/a function n/a
hasAttributes n/a function n/a
hasChildNodes function function function
insertBefore function function function
isDefaultNamespace function function function
isEqualNode function function function
isSameNode function function function
isSupported n/a n/a n/a
nodelist.item n/a n/a n/a
normalize function function function
querySelector n/a function n/a
querySelectorAll n/a function n/a
removeAttribute n/a function n/a
removeAttributeNode n/a function n/a
removeChild function function function
removeEventListener function function function
replaceChild function function function
scrollIntoView n/a function n/a
setAttribute n/a function n/a
setAttributeNode n/a function n/a
toString function function function
This gives me enough information to at least start making decisions about where various member-properties and -methods need to be defined, and how. Given the class-relationships already defined: Any keeper item from the table above that exists in all the class-types should be required by the IsNode interface, at least as a default consideration. The same consideration should also be given to any items that return the same values across all the class-types, even if they haven't been flagged as a keeper. The logic behind that statement boils down to the fact that while I checked each node-type in the original JavaScript script, I did not populate a large-enough node- and element-sample in that script to feel confident that I captured every valid low-level member. If possible, those same items should also have a concrete implementation in the BaseNode abstract class. There will probably be a few items that, even though they fall into that category, just don't make sense in those locations, but I'll note those as I go along.

Defining the IsNode interface

Starting, then, with the items in the table that are keepers, or that returned identical values across all the different node-types, the following are either directly valid or need to be looked at in more detail for requirement in IsNode:

  markup Module Equivalent Class
Member Name Comment Tag Text
Property Members
childNodes object object object
nextElementSibling null null null
nextSibling null null null
nodeName string string string
nodeType number number number
nodeValue string null string
parentElement null null object
parentNode null null object
previousElementSibling null null null
previousSibling null null null
textContent string string string
Method Members
addEventListener function function function
appendChild function function function
cloneNode function function function
compareDocumentPosition function function function
contains function function function
hasChildNodes function function function
insertBefore function function function
isDefaultNamespace function function function
isEqualNode function function function
isSameNode function function function
normalize function function function
removeChild function function function
removeEventListener function function function
replaceChild function function function
toString function function function

While I was stripping down the list, I noticed that parentElement and parentNode didn't get flagged in such a way to be considered for inclusion in IsNode, but it's a basic fact of markup-languages that all nodes should have those properties — if they aren't populated, that simply means that the node doesn't have a parent currently, but they might well later after some manipulation. The nodeValue property

Looking over that list of remining members, there are a few that don't make any sense to include in IsNode already:

  • Any members that involve child nodes — Those are aspects of a Tag, certainly, but since Comment and Text will also derive from IsNode and they don't have child nodes (and can't?), those should go away. That removes:
    • The childNodes property;
    • The appendChild method;
    • The hasChildNodes method;
    • The insertBefore method;
    • The removeChild method; and
    • The replaceChild method;
  • Any members relating to manipulation of event-listeners — On the server side, where all of the markup module's functionality is actually running, there is no browser context available, so no event-handling processes, and so none of these members are useful. That removes:
    • The addEventListener method; and
    • The removeEventListener method;
The rest will need to be exmined in more detail, one by one, so let me just jump into that now...

Implementing and Testing the Abstract Properties

Since IsNode is only an interface, there are no concrete implementations of properties to define, only abstract property requirements that will be picked up by derived classes. That makes the definition of those property requirements very simple, and the testing of them pretty straightforward. The real trick is determining where the concrete implementations of them is going to occur. Going through the list of properties:

nextElementSibling
Returns the next element at the same node tree level — w3schools
Abstract property in IsNode
Implement in BaseNode
nextSibling
Returns the next node at the same node tree level — w3schools
Abstract property in IsNode
Implement in BaseNode
nodeName
Returns the name of a node — w3schools
Returns the tag-name for Tags, and magic-string constants for other node-types (#comment for a Comment, #document for a document, #text for a Text object, and #cdata for a CDATA if the pattern is maintained).
Abstract property in IsNode
Implement in CDATA, Comment, Tag and Text classes
nodeType
Returns the node type of a node — w3schools
Returns 8 for Comments, 4 for CDATAs, 1 for Tags and 3 for Texts
Abstract property in IsNode
Implement in CDATA, Comment, Tag and Text classes
nodeValue
Sets or returns the value of a node w3schools
It appears that this method returns the first text-node child of an element, rather than the entire set of text-node values, at least in Chromium. At any rate, it's dependent on the presence of child nodes, so...
Skip
Implement in Tag
parentElement
Returns the parent element node of an element — w3schools
Abstract property in IsNode
Implement in BaseNode
parentNode
Returns the parent node of an element — w3schools
Abstract property in IsNode
Implement in BaseNode
previousElementSibling
Returns the previous element at the same node tree level — w3schools
Abstract property in IsNode
Implement in BaseNode
previousSibling
Returns the previous node at the same node tree level — w3schools
Abstract property in IsNode
Implement in BaseNode
textContent
Sets or returns the textual content of a node and its descendants — w3schools
The return value is, essentially, a concatenation of all child Text nodes in a Tag, or the data value (the content) of a Comment or Text instance. If CDATA is assumed to behave like a Comment, then it would also return the inner content of the instance.
Abstract property in IsNode
Implement in HasTextData and Tag
The abstraction of these properties in isNode is just a few lines of code:
#-----------------------------------#
# Abstract Properties               #
#-----------------------------------#

nextElementSibling = abc.abstractproperty()
nextSibling = abc.abstractproperty()
nodeName = abc.abstractproperty()
nodeType = abc.abstractproperty()
parentElement = abc.abstractproperty()
parentNode = abc.abstractproperty()
previousElementSibling = abc.abstractproperty()
previousSibling = abc.abstractproperty()
textContent = abc.abstractproperty()
The test-methods for each property will follow this pattern:
def testPROPERTYNAME(self):
    """Unit-tests the PROPERTYNAME property of an IsNode instance."""
    try:
        testInstance = markup.IsNode()
    except TypeError, error:
        actual = 'PROPERTYNAME' in str( error )
        self.assertTrue( actual, 'The TypeError raised by trying to '
            'instantiate IsNode should include the "PROPERTYNAME" '
            'abstract method-name' )
    except Exception, error:
        self.fail( 'testPROPERTYNAME expected a TypeError, '
            'but %s was raised instead:\n  - %s' % ( 
                error.__class__.__name__, error
            )
        )
In a nutshell, what this does is ensures that the abstract properties appear in the TypeError that is raised by trying to instantiate IsNode, ensuring that the property being tested is abstract.

Implementing and Testing the Abstract Methods

The same basic rule, that member-definitions need only exist in the IsNode interface, applies to the method members as well. The main decisions that need to be made are also similar: where does a given method-requirement and -definition belong, and yields a similar list as the properties noted above:

cloneNode
Clones an element — w3schools
Since this is capable of making shallow or deep copies, and the mechanism for making those copies will vary, it'll have to be implemented in the concrete classes.
Abstract method in IsNode
Implement in CDATA, Comment, Tag and Text
compareDocumentPosition
Compares the document position of two elements — w3schools
The description of the method on the w3schools site, frankly, has me wondering if there's even any point to implementing this on the server side. I've never seen this method used in the wild, though that doen't mean that it isn't used. I can't think of a use-case for it that isn't better served (at least on the server side) by local Python code, particularly since all the real method returns is a bit-mask number-value that indicates relative position between the owner element and the element provided.
Skip
contains
Returns true if a node is a descendant of a node, otherwise false — w3schools
The contains method applies only to objects that have children, really. That hasn't stopped it from being callable on DOM node where it doesn't really make sense, though. For example, executing this JavaScript:
ook = document.createTextNode( 'ook' );
eek = document.createTextNode( 'eek' );
ook.contains( eek );
in several browsers yields
false
That result kind of makes sense — neither of the created text-nodes is a parent of the other, nor can either be appended to the other (calling ook.appendChild( eek ) throws an error).
I'm going to skip this method for now, but there's some discussion around that decision that I'll dig into shortly.
isDefaultNamespace
Returns true if a specified namespaceURI is the default, otherwise false — w3schools
Text-nodes don't have a namespace — it's not a defined member of that node-type at all. Nor do comments, and I presume that the same would hold true for CDATA sections.
Skip
Implement in Tag
isEqualNode
Checks if two elements are equal — w3schools
The complete criteria for testing equality on the client side is listed at the w3schools link above, but since those criteria are dependent on properties that won't exist across all IsNode instances, the usefulness of that list is, perhaps, questionable. Still, being able to perform a comparison is useful. Then the real question is how is that going to be done? I'll work out more details on that later, but for now:
Abstract method in IsNode
Implement in BaseNode
isSameNode
Checks if two elements are the same node — w3schools
Abstract method in IsNode
Implement in BaseNode
normalize
Joins adjacent text nodes and removes empty text nodes in an element — w3schools
This feels like it's something that shuoldn't exist ouside of a Tag, and that seems to be borne out by the fact that it's not possible to usefully call normalize on a text- or comment-node in the browser.
Skip
Implement in Tag
toString
Converts an element to a string — w3schools
Abstract method in IsNode
Implement in CDATA, Comment, Tag and Text

The contains discussion

Also like the abstract-property definitions, abstract methods don't need much in IsNode:

    @abc.abstractmethod
    def METHODNAME( arg1, arg2=None, *args, **kwargs ):
        raise NotImplementedError( '%s.METHODNAME is not implemented as '
            'required by IsNode' % self.__class__.__name__ )
And the unit-tests, since they're really just checking the same sort of relationship between methods and the IsNode interface-class as the property-tests did, is almost identical:
def testMETHODNAME(self):
    """Unit-tests the METHODNAME method of an IsNode instance."""
    try:
        testInstance = markup.IsNode()
    except TypeError, error:
        actual = 'METHODNAME' in str( error )
        self.assertTrue( actual, 'The TypeError raised by trying to '
            'instantiate IsNode should include the "METHODNAME" '
            'abstract method-name' )
    except Exception, error:
        self.fail( 'testMETHODNAME expected a TypeError, '
            'but %s was raised instead:\n  - %s' % ( 
                error.__class__.__name__, error
            )
        )

With those tests in place for IsNode in the test_markup.py unit-test module, the test-results come back clean:

########################################
Unit-test Results: idic.markup
#--------------------------------------#
Tests were SUCCESSFUL
Number of tests run ... 18
Tests ran in .......... 0.001 seconds
########################################
IsNode, then, is done — written and tested.

Dealing with Enumerations in Python 

Python doesn't really have a formal enumeration-type like several other languages do, but there are a number of ways to work around that. My personal favorite uses namedtuple from the collections module, based on some observations I've made about how an enumeration behaves:

  • An enumeration is a constant;
  • An enumeration is immutable — its values cannot be changed at run-time;
  • An enumeration's members are individually accessible by name; and
  • An enumeration is a container, with members that can be used for comparison purposes. That is, given an enumeration of nodeTypes, with presumably-distinct CDATA, Comment, Tag and Text values:
    nodeTypes.Tag in nodeTypes          # == True
    nodeTypes.Text in nodeTypes         # == True
    nodeTypes.CDATASection in nodeTypes # == True
    nodeTypes.Comment in nodeTypes      # == True
    
There are probably a few more aspects to the behavior of an enumeration, but those three are the main ones, at least that I can think of at this point.

Using a namedtuple, it's actually pretty easy to generate a constant value that exhibits all of those behaviors. The basic code required looks something like this, using the nodeType values from the w3schools site and generating an enumeration-equivalent named nodeTypes that could be added to the markup module:

from collections import namedtuple

nodeTypes = namedtuple(
    'enumNodeTypes', 
    [ 'Tag', 'Text', 'CDATASection', 'Comment' ],
    )(
        Tag=1,
        Text=3,
        CDATASection=4,
        Comment=8,
    )

__all__.append( 'nodeTypes' )
  • nodeTypes is a constant because namedtuple returns a class, and the code then creates an instance of that class;
  • It's immutable because it's not possible to add values to, remove values from, or alter existing values of the named items except by altering the definition of those members in the code;
  • Its members are individually accessible by name because that's a basic capability of a namedtuple-generated class; and
  • It's a container that allows the use of someValue in nodeTypes.
That last item is best explained with a quick demonstration: Printing nodeTypes and all of the nodeTypes.NAME in nodeTypes examples in the container criteria above yields:
All entries in nodeTypes
  + enumNodeTypes( Tag=1, Text=3, CDATASection=4, Comment=8 )
nodeTypes.Tag in nodeTypes ............ True
nodeTypes.Text in nodeTypes ........... True
nodeTypes.CDATASection in nodeTypes ... True
nodeTypes.Comment in nodeTypes ........ True
nodeTypes.CDATASection in nodeTypes ... True
nodeTypes.Comment in nodeTypes ........ True
12 in nodeTypes ....................... False
"ook" in nodeTypes .................... False
So, while this approach may not be a real enumeration, it provides all of the functionality of one that I think I'll need.

It occurs to me that I don't really have a unit-testing strategy or policy for module-level constants, but frankly I'm not sure that one is really needed, at least not yet. I say not yet now because at some level, there simply has to be some trust in the underlying language. Even with nodeTypes being a non-simple value, it's still a value that is tightly tied to core language structures and functionality, and it shouldn't be possible to break that without altering the code itself.

There's been a fair chunk of analysis in this post, but some code too, and the next logical piece to work out would probably push the length of this post past where I'd like, so I'm going to stop here for now. The next few items that I'm going to tackle include the BaseNode and HasTextContent abstract classes, I think, then I'll have enough of the foundational abstraction written to be able to take a swing at the CDATA, Comment and Text concrete classes. I promised to include the analysis JavaScript-page, though, so here's that