Thursday, May 4, 2017

Generating and Parsing Markup in Python [5]

The Tag class turned out to be something of a beast, partly because of the sheer scope of it, and partly for reasons that had nothing to do with the code involved. The non-code reasons I'll discuss in my next post after Tag is complete, because I think some interesting points surfaced that bear some discussion, but today, I'm going to stick to telling the story of how Tag's implementation unfolded.

Long Post and More to Come

I had really hoped that I'd be able to get all of the implementation of Tag covered in a single post, but by the time I got to the end of the properties (this post), this was already the longest post I've written to date, so I'll pick up next time with the methods implementations.

Tag is the Workhorse of the Module

It should hopefully come as no great surprise that Tag is a pretty large class — the markup-construct that it represents is the foundation for the structure of web-pages and other document-types in other languages. Given the relationships it has with other classes in the markup module:

there were 33 properties and 28 methods that I originally expected to have to implement, some of which were required by IsElement, BaseNode or IsNode. I took some time to gather all of these together into one coherent list in an effort to make sure that I could just progress down that list, implementing as I went, without missing anything. It's a pretty substantial list, despite the occasional items I decided to remove (usually because they served no real purpose in a server-side context). There were also a few members that I decided I wanted to add to the class, and a few relatively minor concerns about name-conflicts that required some thought about altering the member-names. Here's where the final member-list landed, with the additions and alterations noted:
  Tag Members  
Member Name Impl. Req. By Notes
Property Members
accessKey Tag   Is attribute (accesskey)
attributes Tag    
childElementCount Tag IsElement  
childNamespaces Tag    
childNodes Tag IsElement  
children Tag IsElement  
classList Tag   Relates to attribute (class)
className Tag   Relates to attribute (class)
dir Tag   Is attribute (dir),
Name-conflict
firstChild Tag IsElement  
firstElementChild Tag IsElement  
id Tag   Is attribute (id),
Name-conflict
innerHTML Tag    
lang Tag   Is attribute (lang)
lastChild Tag IsElement  
lastElementChild Tag IsElement  
namespace Tag   Relates to Namespace class
namespaceURI Tag   Relates to Namespace class
nextElementSibling BaseNode    
nextSibling BaseNode    
nodeName Tag IsNode  
nodeType Tag IsNode  
ownerDocument BaseNode IsNode  
parent BaseNode    
parentElement BaseNode    
parentNode BaseNode    
previousElementSibling BaseNode    
previousSibling BaseNode    
style Tag   Is attribute (style)
styleList Tag   Relates to attribute (style)
tabIndex Tag   Is attribute (tabindex)
tagName Tag    
title Tag   Is attribute (title)
Method Members
appendChild Tag IsElement  
cloneNode Tag IsElement  
contains Tag IsElement  
getAttribute Tag    
getElementById Tag    
getElementsByAttributeValue Tag    
getElementsByClassName Tag    
getElementsByNamespace Tag    
getElementsByPath Tag    
getElementsByTagName Tag    
hasAttribute Tag    
hasAttributes Tag    
hasChildNodes Tag IsElement  
insertBefore Tag IsElement  
insertChildAt Tag IsElement  
isDefaultNamespace Tag    
isEqualNode Tag IsNode  
isSameNode BaseNode IsNode  
normalize Tag    
prependChild Tag    
removeAttribute Tag    
removeChild Tag IsElement  
removeChildAt Tag IsElement  
removeSelf Tag IsElement  
replaceChild Tag IsElement  
replaceChildAt Tag IsElement  
setAttribute Tag    
toString Tag IsNode  
As before, these members are derived from the w3schools' HTML DOM Element Object page, with some additions from their list of HTML Global Attributes.

Property Implementations

There were a total of five basic patterns that cropped up while I was working through the implementation of Tag's properties, each with their own particular aspects that I found interesting. I've grouped them accordingly in the discussion below.

Storing Attribute Values

When I realized that several of the properties of Tag also had to be expressed as attributes in the rendered markup, I had to give some serious thought to how I wanted to implement the storage of attributes in general, as well as how I was going to link those properties to the attributes they were related to. There are seven properties that are, in a typical HTML/JavaScript environment, both DOM-object properties and attributes that can be set in the text of the markup:

  • accesskey
  • dir
  • id
  • lang
  • style
  • tabindex
  • title
Those do not include the other eight that were added in HTML 5 (see the list noted earlier for details on those).

To further complicate matters, two of them, dir and id are also the names of built-in functions in Python. Setting the naming-conflict aside for the moment, these properties were a potential concern because as attributes, changes to their values as properties should also be reflected in the markup generated and rendered for Tag-instances that use them. That is, given a Tag instance myTag:

# myTag is a Tag instance
myTag.accessKey = 'X'
myTag.style += 'padding:6px;'
myTag.setAttribute( 'name', 'tagname' )
# or myTag.attributes[ 'name' ] = 'tagname'
should eventually render markup that looks something like this:
<myTag accesskey="X" name="tagname" style="padding:6px;">
Given that I expected to implement at least two different ways to set attribute-values, using the setAttribute method and setting the values directly in an attributes dict, my first thought was to simply use an internal dict as the underlying data-storage mechansim for a Tag's attributes. The next potential concern is that as a dict, the attributes property would be both mutable and unconstrained, which felt like a point of some concern. Specifically, because a dict would be mutable, it'd possible to alter an attribute's value to something that isn't legitimate (a non-text value). It'd also be possible to set an attribute with a non-text name (key), because while the keys of a dict can't be any type of value, they can be any of a lot of types that didn't make sense as an attribute-name. For example:
tagInstance.attributes[ True ] = 'value'
shouldn't be valid as an attribute-name, but wouldn't raise an error when it was attempted, which I was concerned about on a longer-term basis. The mutability concern felt like it would become moot if the underlying dict that stores the attributes were able to perform type- and/or value-checking when setting keys and values.

I's not seen any way to accomplish that sort of key- or value-constraint on a standard dict, which more or less required that I create a custom dict-equivalent or -subclass to handle that. I called it AttributesDict.

AttributesDict is a pretty sparse class — it's got an __init__, mostly to assure that the parent dict.__init__ is called, a couple of instance methods for checking the validity of attribute names and values, and an override of the __setitem__ method of the base dict that performs the type- and value-checking:

#-----------------------------------#
# Class attributes (and instance-   #
# attribute default values)         #
#-----------------------------------#

validNameRE = re.compile( '[_A-Za-z][-_A-Za-z0-9]*' )

# ...

#-----------------------------------#
# Instance Methods                  #
#-----------------------------------#

@describe.AttachDocumentation()
@describe.argument( 'name', 
    'the name to check as being valid as an attribute name'
)
@describe.returns( 'True if valid, False otherwise' )
@describe.raises( TypeError, 
    'if the supplied value is not a str or unicode value'
)
def IsValidName( self, name ):
    """
Determines whether the supplied name is valid as an attribute name"""
    if type( name ) not in ( str, unicode ):
        raise TypeError( '%s._IsValidAttributeValue expects a str or '
            'unicode value, but was passed "%s" (%s)' % ( 
                self.__class__.__name__, value, type( value ).__name__ ) )
    if '\n' in name or '\r' in name:
        return False
    if self.validNameRE.sub( '', name ) != '':
        return False
    # TODO: Other checks for validity of the name?
    return True

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'the value to check as being valid in an attribute', 
    str, unicode
)
@describe.returns( 'True if valid, False otherwise' )
@describe.raises( TypeError, 
    'if the supplied value is not a str or unicode value'
)
def IsValidValue( self, value ):
    """
Determines whether the supplied value is valid as an attribute value"""
    if type( value ) not in ( str, unicode ):
        raise TypeError( '%s._IsValidAttributeValue expects a str or '
            'unicode value, but was passed "%s" (%s)' % ( 
                self.__class__.__name__, value, type( value ).__name__ ) )
    if '\n' in value or '\r' in value:
        return False
    # TODO: Other checks for validity of the name?
    return True

@describe.AttachDocumentation()
@describe.argument( 'key', 
    'the key-name to set the value to',
    str, unicode
)
@describe.argument( 'value', 
    'the value to set in the key-name',
    str, unicode
)
@describe.raises( TypeError, 
    'if passed a key-name value that is not a str or unicode type'
)
@describe.raises( TypeError, 
    'if passed a member-value that is not a str or unicode type'
)
@describe.raises( MarkupError, 
    'if passed an invalid key-name'
)
@describe.raises( MarkupError, 
    'if passed an invalid member-value'
)
def __setitem__( self, key, value ):
    """
Override of standard dict.__setitem__ that checks the types and values of key 
and value arguments both before allowing the itemn to be set"""
    if not isinstance( key, ( str, unicode ) ):
        raise TypeError( '%s cannot accept key-names that are not str or '
            'unicode values, or a type derived from one of them. "%s" (%s) '
            'is not allowed' % ( 
                self.__class__.__name__, key, type( key ).__name__
            )
        )
    if not isinstance( value, ( str, unicode ) ):
        raise TypeError( '%s cannot accept member values that are not str '
            'or unicode values, or a type derived from one of them. '
            '"%s" (%s) is not allowed' % ( 
                self.__class__.__name__, value, type( value ).__name__
            )
        )
    if not self.IsValidName( key ):
        raise AttributeError( '%s is not a valid attribute-name in a %s' 
            % ( key, self.__class__.__name__ )
        )
    if not self.IsValidValue( value ):
        raise AttributeError( '%s is not a valid attribute-value in a %s' 
            % ( key, self.__class__.__name__ )
        )
    dict.__setitem__( self, key, value )
The _Delattributes method of Tag then uses an instance of AttributesDict instead of a normal dict:
@describe.AttachDocumentation()
def _Delattributes( self ):
    """
"Deletes" the attributes of the instance by setting it to a new, empty 
AttributesDict instance"""
    self._attributes = AttributesDict()
and the constraint-concern is taken care of. Whether that's enough to resolve the mutability concern remains to be seen.

Implementing Attribute-Properties

Five of the seven Tag-properties that were also attributes all followed a very similar implementation-pattern. Those five properties were:

  • accesskey
  • lang
  • style
  • tabindex
  • title
Here's what accesskey's methods look like, in detail:
#-----------------------------------#
# Instance property-getter methods  #
#-----------------------------------#

@describe.AttachDocumentation()
@describe.returns( 'str or unicode character, or None' )
def _GetaccessKey( self ):
    """
Returns the value of the instance's "accesskey" attribute"""
    return self._attributes.get( 'accesskey' )

# ...

#-----------------------------------#
# Instance property-setter methods  #
#-----------------------------------#

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'the value to set the instance\'s "accesskey" attribute to',
    str, unicode
)
@describe.raises( TypeError, 
    'if passed a value that is not a str or unicode value'
)
@describe.raises( ValueError, 
    'if passed a value that is more than one character in length'
)
@describe.raises( MarkupError, 
    'if passed a value that is not valid as an attribute value'
)
def _SetaccessKey( self, value ):
    """
Sets the value of the instance's "accesskey" attribute"""
    if not value:
        self._DelaccessKey()
    else:
        if not self._IsValidAttributeValue( value ):
            raise MarkupError( '%s.accessKey could not be set to "%s" '
                '- That value is not a valid attribute-value' % ( 
                    self.__class__.__name__, value
                )
            )
        if type( value ) not in ( str, unicode ):
            raise TypeError( '%s.accessKey expects a single-character '
                'str or unicode value, but was passed "%s" (%s)' % ( 
                    self.__class__.__name__, value, 
                    type( value ).__name__
                )
            )
        if len( value ) > 1:
            raise ValueError( '%s.accessKey expects a single-character '
                'str or unicode value, but was passed "%s" (%s)' % ( 
                    self.__class__.__name__, value, 
                    type( value ).__name__
                )
            )
        self._attributes[ 'accesskey' ] = value

# ...

#-----------------------------------#
# Instance property-deleter methods #
#-----------------------------------#

@describe.AttachDocumentation()
def _DelaccessKey( self ):
    """
"Deletes" the value of the instance's "accesskey" attribute by removing 
it from the instance's attributes"""
    try:
        del self._attributes[ 'accesskey' ]
    except KeyError:
        # No such attribute available to delete; ignore
        pass
There was a bit more underneath the getter-, setter- and deleter-methods for lang and tabIndex and less for title's methods. The variations across those three properties were:
lang
Could potentially be validated against any of the various standard ISO language-codes, but I decided to leave that for later, if I even go that far. At present it only raises a TypeError.
tabIndex
The tab-index of a tag is a string representation of a non-negative integer value, so the setter checks to see if the value can be converted into an int, and raises a ValueError if it can't.
title
Has no value-checking, since it should be able to accept anything that is a valid attribute-value.

Implementing Name-Conflicted Attribute-Properties

The last two properties that are also attributes are dir and id. The concern that I had with using those as names as-is was that my plan for creating a Tag instance was to build out an __init__ that looked like this:

def __init__( self, tagName, namespace, **attributes):
    # ...
that allowed attributes to be specified in the code using the attributes keyword-argument. That felt like it made the prospect of instantiating new Tags pretty simple and straightforward. However, since dir and id are defined as Python built-in functions, allowing those to be used as keywords felt... sketchy. By way of example, consider:
class Example( object ):
    def __init__( self, **kwargs ):
        print kwargs

testExample = Example( id='id-value', dir='dir-value' )
actually executes successfully (for now):
{'id': 'id-value', 'dir': 'dir-value'}
There's no guarantee that this would always be the case, though — and even if it never raises any errors in the future because of using a potentially-reserved word, it would still make the built-in dir and id functions unavailable in the body of the function. While I couldn't think of any use-case where that would've been a concern, I also couldn't say with any certainty that it wouldn't be a problem down the line either.

I gave some serious consideration to the idea of establishing a pattern where any attributes specified that began with html would have the html stripped, and the rest reduced to lower-case before being stored as attributes. That would've allowed, for example, htmlId to be used as a keyword for the id attribute, which felt pretty good. Then I thought through what that would mean for an htmlClass attribute-specification. htmlClass would set and read the className property, and would tie to a class attribute. That felt awkward to me. Very awkward. I spent a lot of time going back and forth on various ways of implementing that before deciding that I couldn't really decide how I wanted things to work. Ultimately, in order to keep development moving, I ended up settling on a more brute-force approach, but one that I felt would be easier to refactor later if I could ever escape the analysis paralysis I was encountering about the different approaches. I ended up with a Tag.__init__ looking like this:

#-----------------------------------#
# Instance Initializer              #
#-----------------------------------#
@describe.AttachDocumentation()
def __init__( self, tagName, namespace=None, **attributes ):
    """
Instance initializer"""
    # Call parent initializers, if applicable.
    BaseNode.__init__( self )
    IsElement.__init__( self )
    # Set default instance property-values with _Del... methods as needed.
    # - Attributes first, since many of the rest use that
    self._Delattributes()
    # - Then the rest
    self._DelaccessKey()
    self._DelchildNodes()
    self._Delclass()
    self._DelhtmlDir()
    self._DelhtmlId()
    self._DelinnerHTML()
    self._Dellang()
    self._Delnamespace()
    self._Delstyle()
    # Set instance property values from arguments if applicable.
    self._SettagName( tagName )
    # Various attribute-setters that collide with "reserved" words 
    # in Python
    if 'className' in attributes:
        self._SetclassName( attributes[ 'className' ] )
        del attributes[ 'className' ]
    if 'htmlDir' in attributes:
        self._SethtmlDir( attributes[ 'htmlDir' ] )
        del attributes[ 'htmlDir' ]
    if 'htmlId' in attributes:
        self._SethtmlId( attributes[ 'htmlId' ] )
        del attributes[ 'htmlId' ]
    if 'htmlFor' in attributes:
        self.setAttribute( 'for', attributes[ 'htmlFor' ] )
        del attributes[ 'htmlFor' ]
    # The remaining (normal) attributes:
    if attributes:
        self._Setattributes( attributes )
    # The namespace
    if namespace:
        self._Setnamespace( namespace )
    # Other set-up
and htmlDir and htmlId properties (className has special considerations that I'll go into in a bit, and htmlFor is really only a convenience item for creating <label> tags, so I didn't feel the need to set up an htmlFor property).

The implementation of the getter-/setter-/deleter-mthods for htmlId and htmlDir are very similar, though it seemed prudent to put some value-checks in htmlDir, since the attribute was not supposed to allow completely free-form values. htmlDir's related methods ended up looking like this:

#-----------------------------------#
    # Instance property-getter methods  #
    #-----------------------------------#

    # ...

    @describe.AttachDocumentation()
    def _GethtmlDir( self ):
        """
Returns the value of the instance's "dir" attribute"""
        return self._attributes.get( 'dir' )

    # ...

    #-----------------------------------#
    # Instance property-setter methods  #
    #-----------------------------------#

    # ...

    @describe.AttachDocumentation()
    @describe.argument( 'value', 
        'the value to set the instance\'s "dir" attribute to',
        str, unicode
    )
    @describe.raises( TypeError, 
        'if passed a value that is not a str or unicode value'
    )
    @describe.raises( MarkupError, 
        'if passed a value that is not valid as an attribute value'
    )
    def _SethtmlDir( self, value ):
        """
Sets the value of the instance's "dir" attribute"""
        if value == None or value == '':
            self._DelhtmlDir()
        else:
            validValues = ( 'auto', 'ltr', 'rtl' )
            if not self.attributes.IsValidValue( value ):
                raise MarkupError( '%s.dir could not be set to "%s" '
                    '- That value is not a valid attribute-value' % ( 
                        self.__class__.__name__, value
                    )
                )
            if type( value ) not in ( str, unicode ):
                raise TypeError( '%s.htmlDir expects a str or unicode '
                    'value, one of %s, but was passed "%s" (%s)' % ( 
                        self.__class__.__name__, str( validValues ), 
                        value, type( value ).__name__
                    )
                )
            if value.lower() not in validValues:
                raise ValueError( '%s.htmlDir expects a str or unicode '
                    'value, one of %s, but was passed "%s" (%s)' % ( 
                        self.__class__.__name__, str( validValues ), 
                        value, type( value ).__name__
                    )
                )
            self._attributes[ 'dir' ] = value

    # ...

    #-----------------------------------#
    # Instance property-deleter methods #
    #-----------------------------------#

    # ...

    @describe.AttachDocumentation()
    def _DelhtmlDir( self ):
        """
Deletes the value of the instance's "dir" attribute by removing 
it from the instance's attributes"""
        try:
            del self._attributes[ 'dir' ]
        except KeyError:
            # No such attribute available to delete; ignore
            pass

    # ...

    #-----------------------------------#
    # Instance Properties               #
    #-----------------------------------#

    # ...

    htmlDir = describe.makeProperty(
        _GethtmlDir, _SethtmlDir, _DelhtmlDir, 
        'the value of the instance\'s dir attribute',
        str, unicode
    )

    # ...

Properties that Relate to Attributes

The className and classList properties have an interesting relationship on the browser side: Altering one affects the other, which means that it's possible to use array-based operations on classList, and those changes will carry through to className. Consider:

<div id="example" class="class1 class2 class3">Example div</div>
<script>
    example = document.getElementById( 'example' );
    console.log( 'example.className ... ' + example.className );
    console.log( 'example.classList ... ' + example.classList );
    console.dir( example.classList );
    console.log( 'Removing class2' );
    example.classList.remove( 'class2' );
    console.log( 'example.classList ... ' + example.classList );
    console.dir( example.classList );
    console.log( 'Adding class4 to className' );
    example.className += ' class4'
    console.log( 'example.className ... ' + example.className );
    console.log( 'example.classList ... ' + example.classList );
    console.dir( example.classList );
</script>
If this is executed in a browser (Chromium in my case), the console shows:
example.className ... class1 class2 class3
example.classList ... class1 class2 class3
   [DOMTokenList] ... [ 'class1', 'class2', 'class3' ]
Removing class2
example.classList ... class1 class3
   [DOMTokenList] ... [ 'class1', 'class3' ]
Adding class4 to className
example.className ... class1 class3 class4
   [DOMTokenList] ... [ 'class1', 'class3', 'class4' ]
That's actually kind of neat, I think.

I first discovered that when I was building out the first big list of properties and methods at the beginning of the markup module posts, and it got me thinking about doing the same sort of thing with the style attributes and a styleList attribute — having the ability to use list-operations against individual class-names and inline style specifications seemed like a potentially powerful tool to add. The real challenge felt like it would be in how to actually implement that sort of functionality, because of all the varied interactions needed:

  • The underlying storage still needs to live in an attribute-value, as a flat text-value if possible, so that special considerations don't have to be made for rendering the class and style attributes;
  • The getter-methods for the *List properties need to return a list-structure from the flat-text attribute-value;
  • The setter-methods for the *List properties need to accept a list, and generate the flat text-value in the applicable attribute;
  • The deleter-methods for the *List properties needs to not destroy the interaction between the *List and non-*List properties;
I determined that all of this could be managed by creating a class that either derived from the built-in list, and overrode the functions that allow the mutation of members (the full list of properties and methods is published on the Python site, or creating a completely custom class that does all the list-emulation needed. In either case, any change to the members of the object would have to be able to call the appropriate setter-method of the instance, and the rest would take care of itself. The only other consideration is that CSS classes and inline styles have different member-separators: Classes use a space, and style-declarations use a semicolon.

At first, I wasn't sure if that would be too complex for what I needed — Since I'd just encountered the classList property in the last couple of weeks, I'd obviously never used it, so it wasn't a big concern for me to not include it. At the same time, it definitely felt like it could be of a lot of use, so I preferred to implement it if I could. As it turned out, though, it wasn't as complex as I'd feared. The implementation proof-of-concept code is too long for me to cover in great detail if I want to keep this post to anything close to a reasonable length, but I'll make it downloadable at the end of the post. Here's a quick summary of what I ended up with:

  • I defined a class (AttributeValueList) that derives from list;
  • I added pointers to the getter-, setter- and deleter-methods for the flat-text attributes to the __init__ of the new class, as well as a separator value that would be used elsewhere to fetch a new list from the flat-text value, or to join the instance's list as a new flat-text value:
    def __init__( self, getter, setter, deleter, separator, iterable=[] ):
        self._getter = getter
        self._setter = setter
        self._deleter = deleter
        self._separator = separator
        list.__init__( self, iterable )
  • I defined two helper-methods (_pullFromGetter() and _pushToSetter()) that would refresh the instance's list-values from the flat-text attribute and re-set the flat-text value from the current list-values, respectively:
    def _pullFromGetter( self ):
        print '### Calling %s._SetclassName' % self.__class__.__name__
        # remove all current members
        while len( self ):
            self.remove( 0 )
        # get the new values
        values = self._getter()
        # append each of them to self
        for value in values:
            self.append( value )
    
    def _pushToSetter( self ):
        print '### Calling %s._pushToSetter' % self.__class__.__name__
        self._setter( self._separator.join( self ) )
  • I overrode all of the methods of list that could affect the members if the base list, following a pattern like:
    def __some_list_method__( self, [args] ):
        # Call the original list-method against the instance, 
        # with the arguments passed to the method
        self.__some_list_method__( self, [args] )
        # Call a protected helper-method to update the 
        # "flat-text" value in the object that the 
        # instance relates to
        self._pushToSetter()
  • Finally, in a very stripped-down copy of Tag, I created basic property getter-, setter- and deleter-methods and the corresponding properties, wiring things up so that:
    • The deleter-methods for the *List properties created a new, empty instance of AttributeValueList;
    • The setter-methods for the *List properties removed all members from the current AttributeValueList storage-object, then added in the new values;
    • The setter-methods for the flat-text attributes would call the _pullFromGetter method of their AttributeValueList equivalent;
    The bare-bones implementation for the classList/classNameclassList property-set in the POC shows all of that:
    def _GetclassName( self ):
        return self._attributes.get( 'class' )
    
    def _GetclassList( self ):
        return self._classList
    
    def _SetclassName( self, value ):
        self._attributes[ 'class' ] = value
        self._classList._pullFromGetter()
    
    def _SetclassList( self, value ):
        self._classList = AttributeValueList( 
            self._GetclassName, 
            self._SetclassName, 
            self._DelclassName, 
            ' ', value
        )
    
    def _DelclassName( self ):
        try:
            del self._attributes[ 'class' ]
        except KeyError:
            pass
    
    def _DelclassList( self ):
        self._classList = AttributeValueList( 
            self._GetclassName, 
            self._SetclassName, 
            self._DelclassName, 
            ' '
        )
    
    className = property( _GetclassName, _SetclassName, _DelclassName )
    classList = property( _GetclassList, _SetclassList, _DelclassList )
That proved out enough of the concept that I could run with it behind the scenes. The quick-and-nasty testing from the POC script performed a few typical/basic manipulations:
def printItem( item ):
    print '+- className .............. %s (%s)' % ( 
        item.className, type( item.className ).__name__ )
    print '+- classList .............. %s (%s)' % ( 
        item.classList, type( item.classList ).__name__ )

example = Tag()
print 'example Tag: %s' % example
print '+- classList._getter ...... %s' % ( example.classList._getter.__name__ )
print '+- classList._setter ...... %s' % ( example.classList._setter.__name__ )
print '+- classList._deleter ..... %s' % ( example.classList._deleter.__name__ )
print '+- classList._separator ... "%s"' % ( example.classList._separator )
print '#' + '-'*38 + '#'

print 'example'
printItem( example )

print '| == example.classList += [ \'addedClass\' ]'
example.classList += [ 'addedClass' ]
printItem( example )

print '| == example.className = \'class1 class2\''
example.className = 'class1 class2'
printItem( example )

print '| == example.classList.remove( \'class1\' )'
example.classList.remove( 'class1' )
printItem( example )

print '| == example.classList.append( \'class4\' )'
example.classList.append( 'class4' )
printItem( example )

print '| == example.classList.insert( 1, \'class1\' )'
example.classList.insert( 1, 'class1' )
printItem( example )

print '| == example.classList += [ \'addedClass\' ]'
example.classList += [ 'addedClass' ]
printItem( example )
and yielded expected results for those actions/operations:
example Tag: <__main__.Tag object at 0x7f996ac85e10>
+- classList._getter ...... _GetclassName
+- classList._setter ...... _SetclassName
+- classList._deleter ..... _DelclassName
+- classList._separator ... " "
#--------------------------------------#
example
+- className .............. None (NoneType)
+- classList .............. []
| == example.classList += [ 'addedClass' ]
+- className .............. addedClass (str)
+- classList .............. ['addedClass']
| == example.className = 'class1 class2'
+- className .............. class1 class2 (str)
+- classList .............. ['class1', 'class2']
| == example.classList.remove( 'class1' )
+- className .............. class2 (str)
+- classList .............. ['class2']
| == example.classList.append( 'class4' )
+- className .............. class2 class4 (str)
+- classList .............. ['class2', 'class4']
| == example.classList.insert( 1, 'class1' )
+- className .............. class2 class1 class4 (str)
+- classList .............. ['class2', 'class1', 'class4']
| == example.classList += [ 'addedClass' ]
+- className .............. class2 class1 class4 addedClass (str)
+- classList .............. ['class2', 'class1', 'class4', 'addedClass']

As a side-note: There were other ways to accommodate the list- and non-list versions of both attributes. One that I contemplated for a while was to simply store the actual list-of-string values, and just collapse those down during the rendering process. The problem that I ended up having with that approach was a combination of the discrepancy between the attribute- and property-names for class/className and a strong desire to be able to just dump the attribute keys and values during rendering. For style that wasn't a concern — the property- and attribute-names are identical. Trying to come up with a process that would handle rendering the class from a className that was, in turn calculated from classList started giving me a headache pretty quickly, though I believe I found a way to make it workable. The trade-off, though, was the addition of what might be called special handling for just that one attribute. I'm not a big fan of hard-coding exceptions into code if there's any viable way around it. Ultimately, that preference on my part was why I took the path I did.

Parents and Children, Nodes and Elements, and Their Related Properties

In the JavaScript world that I'm modeling the proeprties and methods of Tag after, there is a distinction between nodes and elements. An element is a type of node, as are text-nodes, comments, and (presumably) CDATA sections. What sets an element apart from a node, as far as my analysis seemed to indicate, was that elements can have children, which are also nodes, and may be elements.

It seems likely that's why JavaScript elements have both children and childNodes properties, and why there are members like lastChild and lastElementChild — to allow retrieval of either all children, or only children that are elements.

While I didn't see a whole lot of use for that distinction while working on the baseline notes and ideas for the markup module, providing as close a parallel functionality-set as I could more or less required that I implement all of those members as well. The foundation of all of them was the childNodes property, and the storage of child BaseNode objects therein.

When push comes to shove, BaseNode children of a Tag are a sequence of objects, so I started with a basic Python list to store them as a proof of concept, but it shared a lot of the concerns that I had that led to the creation of AttributesDict: The underlying list was mutable and unconstrained, so it would be possible to directly insert a member that wasn't valid as a child. Fundamenally, the ElementList class that I built to handle the constraint wasn't all that different from the AttributeValueList class I mentioned earlier. The main difference was really in that there was no reason to care when items were removed, so the method-overrides that related to that could be stripped out, leaving:

#-----------------------------------#
# Instance Methods                  #
#-----------------------------------#

@describe.AttachDocumentation()
def __add__( self, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be added"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) cannot be added.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    list.__add__( self, y )

def __iadd__( self, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be added"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) cannot be added.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    for item in y:
        list.append( self, item )
    return self

def __imul__( self, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be multiplied"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) cannot be multiplied.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    list.__imul__( self, y )
    return self

def __mul__( self, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be multiplied"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) cannot be multiplied.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    list.__mul__( self, y )

def __rmul__( self, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be multiplied"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) cannot be multiplied.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    list.__rmul__( self, y )

def __setitem__( self, i, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be set"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) is not allowed.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    list.__setitem__( self, i, y )

def __setslice__( self, i, j, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be set"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) is not allowed.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    list.__setslice__( self, i, j, y )

def append( self, y ):
    """
Override of the base method from list that performs type-checking on the item 
to be appended"""
    if not isinstance( y, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) is not allowed.' % ( 
                self.__class__.__name__, y, type( y ).__name__ )
            )
    list.append( self, y )

def extend( self, iterable ):
    """
Override of the base method from list that performs type-checking on the items 
to be extended"""
    badItems = [ i for i in iterable if not isinstance( i, BaseNode ) ]
    if badItems:
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members, but included %s whish are not allowed.' 
            % ( self.__class__.__name__, badItems )
        )
    list.extend( self, iterable )

def insert( self, index, obj ):
    """
Override of the base method from list that performs type-checking on the item 
to be inserted"""
    if not isinstance( obj, BaseNode ):
        raise TypeError( '%s is only allowed to have BaseNode-'
            'derived members: "%s" (%s) is not allowed.' % ( 
                self.__class__.__name__, obj, type( obj ).__name__ )
            )
    list.insert( self, index, obj )

The childNodes property was set up to use an instance of ElementList for its storage, and is read-only. The children property, also read-only, was built using a list comprehension that filtered childNodes down to only those members that were instances of IsElement:

@describe.AttachDocumentation()
def _Getchildren( self ):
    """
Gets the the sequence of all children of the instance that are elements"""
    return [ 
        c for c in self._childNodes 
        if isinstance( c, IsElement )
    ]
With those two properties in place, a lot of the remaining properties were easily implemented:
childElementCount
The number of members of children
firstChild
The first member of childNodes
firstElementChild
The first member of children
lastChild
The last member of childNodes
lastElementChild
The last member of children
nextElementSibling
The first member of childNodes after the index of the instance itself that is an IsElement instance
nextSibling
The first member of childNodes after the index of the instance itself
previousElementSibling
The first member of childNodes before the index of the instance itself that is an IsElement instance
previousSibling
The first member of childNodes before the index of the instance itself
The parentElement and parentNode properties seemed to me to be needlessly confusing — As far as I've been able to tell, there is no way for a non-element node to be a parent to another node. I looked around for a while to see if I was missing anything, but couldn't find anything that led me to think otherwise. Ultimately I decided to collapse the two of them down into the parent property.

The Remaining Properties

The majority of the remaining properties' implementations either follow some simple variation of my normal property-getter, -setter and -deleter structure, storing the value in an underlying protected local attribute, or are calculated in some fashion from another property or one of the underlying local attributes. The namespace and namespaceURI properties are a perhaps-typical example of that structure as it applies to a non-simple underlying-attribute type/value.

#-----------------------------------#
# Instance property-getter methods  #
#-----------------------------------#

# ...

@describe.AttachDocumentation()
def _Getnamespace( self ):
    """
Gets the Namespace associated with the instance"""
    if self._namespace:
        return self._namespace
    else:
        if parent:
            return parent.namespace
        else:
            return None

@describe.AttachDocumentation()
def _GetnamespaceURI( self ):
    """
Gets the URI of the Namespace associated with the instance"""
    if self.namespace:
        return self.namespace.namespaceURI
    return None

# ...

#-----------------------------------#
# Instance property-setter methods  #
#-----------------------------------#

# ...

@describe.AttachDocumentation()
@describe.argument( 'value', 
    'The Namespace, or the URI/unique identifier of the Namespace to '
    'associate with the instance', 
    Namespace, str, unicode
)
@describe.raises( TypeError, 
    ''
)
def _Setnamespace( self, value ):
    """
Sets the Namespace association for the instance"""
    if not value:
        self._Delnamespace()
    if type( value ) in ( str, unicode ):
        try:
            value = Namespace.GetNamespace( value )
        except MarkupError:
            value = None
    if not isinstance( value, Namespace ):
        raise TypeError( '%s.namespace expects a Namespace instance, '
            'or a str or unicode URI value of a registered Namespace, '
            'but was passed "%s" (%s)' % ( 
                self.__class__.__name__, value, 
                type( value ).__name__
            )
        )
    self._namespace = value

# ...

#-----------------------------------#
# Instance property-deleter methods #
#-----------------------------------#

@describe.AttachDocumentation()
def _Delnamespace( self ):
    """
"Deletes" the namespace-association of the instance by setting it to None"""
    self._namespace = None

There is one remaining property that I can't implement just yet: innerHTML. The getter-method side of it is pretty straightforward, since it really just boils down to rendering the children of the instance. While that's in the method-members that I haven't touched on yet (__str__ and/or __unicode), I don't expect the getter functionality of innerHTML to be much more than a call to one or the other. On the setter side of the property, though, I need the ability to parse markup to be functional before I can work that out. That means that innerHTML is waiting on the implementation of the MarkupParser class.

That wraps up the properties of Tag. I promised earlier to make the proof-of-concept code for AttributeValueList available for download, so before I stop, here it is:

No comments:

Post a Comment