With the Tag
class finally implemented (minus the couple
of deferred items waiting on MarkupParser
), it's time, I think, to
work out the conventions for various types of markup documents. The approach that
I plan on taking is to define a BaseDocument
abstract class that
derives from Tag
in order to carry the capabilities of Tag
through to all documents.
- An
HTML5Document
; - An
XHTMLDocument
; and (probably) - An
XMLDocument
;
<?php
abstract class BaseNode
{
}
class Tag extends BaseNode
{
}
abstract class BaseDocument extends Tag
{
}
class HTML5Document extends Tag
{
}
$doc = new HTML5Document();
?>
Before I can actually define those concrete document-classes, though, I need to
determine what their members are, and how theier behavior differs from Tag
...
What Are the Differences Between a Document and a Tag
?
There are two main areas where documents differ from tag-elements: their
object-members, and how they render. Since the three document-types that I'm
concerned with have significantly different members (XML won't have head
or body
properties like an HTML document does, for example, and there
are several other properties that fall into a similar classification), the list of
members that need to be implemented is actually very short:
- The
all
property - Basically just a call to
Tag.getElementsByTagName( '*' )
, returning all of theTag
children of the document; - The
contentType
property - Returns the MIME-type of the document
- The
doctype
property - I haven't been able to pin down exactly what this does on the
browser side, but it definitely relates to the
<!DOCTYPE html>
declaration in an HTML 5 document. My expectation as I'm writing this is that it will return theDOCTYPE
for the instance as it would render.
BaseDocument
just because it derives from
Tag
. A lot of the remainder were properties that have no meaning or
use outside a broswer context (e.g., the readyState
property and
createEvent
method, as well as all the on...*
event-methods).
Several of the properties are essentially just wrappers around some variation of
Tag.getElementsByTagName
for specific tags (images
and
scripts
) or some similar mechanism with different criteria (links
,
possibly), and I'd probably not implement them (at last not in BaseDocument
)
even if they weren't document-type specific. I may well add those to specific
concrete document-classes down the line, just so they're available, but they don't
belong in BaseDocument
.
On the rendering
side, each of the document-types I'm planning to build
out has slightly different output. Optional items are in brackets []:
- HTML 5
<DOCTYPE html> <html> <!-- the document head, body --> </html>
- XHTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html[ xmlns="http://www.w3.org/1999/xhtml"]> <!-- the document head, body --> </html>
- (This is XHTML
transitional
, but there are also officialstrict
andframeset
variants, per the w3.org site) - Since it's technically XML, XHTML (of any flavor) can have processing-instructions and any other pre-document-root elements that XML allows.
- XML
<?xml[ version="#.#"][ encoding="XXXX"]?> [<?xml-processing-instruction(s) attributes="allowed"?>] [<!DOCTYPE root-element[ PUBLIC "PUBLIC identifier"][ "SYSTEM identifier"]>] <root-element[ attributes]> <!-- child elements and content --> </root-element>
BaseDocument
— the
rules for them are of varying complexity, but they all feel like they're achievable
at this level.
Looking at DOCTYPE
All three document-types have (or are allowed) some variation of a <!DOCTYPE>
.
The rules for a DOCTYPE
declaration and how it gets rendered are pretty
simple:
- It starts with
<!DOCTYPE
- That's followed by the
tagName
of the root tag of the document - It may have a public identifier (a single-line text-value),
in which case that should be rendered and prefixed with
PUBLIC
- It may also have a
system
identifier (also a single-line text-value):- If there is no public identifier, the system identifier should
be rendered with
as a prefixSYSTEM
- Otherwise, it can just be rendered, with no prefix
- If there is no public identifier, the system identifier should
be rendered with
- It ends with
>
- It's the last thing rendered in output before the start of the
Tag
-derived structure and output
DOCTYPE
representation
could be implemented that I can think of.
The simplest way is, I think, to just store the applicable public and system
identifiers as class-level constants, and render them accordingly in the __str__
and __unicode__
methods of the document-instance. That would work fine
for HTML 5 and XHTML document-types, since the public and system identifiers for
those document-types shouldn't ever change, really. That also has the advantage (I
think) of keeping everything in one class-definition, so there'd be less code to
manage. Unfortunately, that starts to fall apart as soon as XML documents enter
the picture, unless a distinct document-class is built out for each and every XML
document-type. That prospect feels ugly.
A more complicated approach is defining a class to represent a DOCTYPE
.
So long as an instance of that class has a reference to the document it's associated
with, that'd allow it to grab the tagName
that it needs at render-time.
That would leave only the public- and system-identifier values to add to the __init__
,
so that they could be passed to the DOCTYPE
-representative object during
the construction of a document-instance. That doesn't feel horrible, but
since it'd require additional arguments, with cryptic values, it feels clumsy. Even
if those values were set up as module-level constants (which would help, I think),
that's still more stuff that has to be remembered every time a document has to be
created.
Still another possibility: A DOCTYPE
for any given
document-type is almost certainly as distinct as its namespace. If the public- and
system-identifier values were attached to each Namespace
instance,
even if it were done outside the object-construction process, and a document's
namespace
were required during its construction, then the
storage of those identifier-values is in a single place (a Namespace
instance), and could be accessed in the __str__
and __unicode__
methods much like they could if they were class constants in the first alternative.
That feels pretty reasonable to me, but I think I'd also want to set up some sort
of mechanism that would allow namespaces to be defined outside the actual
Python code — possibly by setting one or many configuration-files that would
define names and other relevant properties for any number of namespaces. The trade-off
there is that any namespaces defined by that sort of configurable set-up probably
couldn't be referred to as module-level constants — As things stand right now,
I'd defined Namespace
-instance constants in the markup
module for HTML 5 and XHTML documents both:
# HTML 5 namespace
HTML5Namespace = Namespace(
'html5',
'http://www.w3.org/2015/html',
renderingModels.RequireEndTag,
br=renderingModels.NoChildren,
img=renderingModels.NoChildren,
link=renderingModels.NoChildren,
)
__all__.append( 'HTML5Namespace' )
# XHTML namespace
XHTMLNamespace = Namespace(
'xhtml',
'http://www.w3.org/1999/xhtml',
renderingModels.RequireEndTag,
br=renderingModels.NoChildren,
img=renderingModels.NoChildren,
link=renderingModels.NoChildren,
)
__all__.append( 'XHTMLNamespace' )
Those constants would, I think, have to go away in order to keep access to all
available namespaces reasonably consistent.
Or, perhaps not, now that I think on it more. If each application that needs
to have one or more namespaces defined actually defines them as constants
within that application, then they are accessible as constants within that application's
codebase. That might, down the line, require some movement of more general-purpose
Namespace
definitions into a common location — maybe in markup
,
maybe elsewhere — but they'd still be accessible the same way that HTML5Namespace
and XHTMLNamespace
are now.
With all of those options considered, I'm going to take this last approach, I
think. It feels reasonable, keeps things relatively well-contained, and doesn't
require a huge amount of refactoring of Namespace
or a lot of additional
code in BaseDocument
. As it turns out, a similar approach/solution
can be applied to the contentType
of BaseDocument
—
referring to the document's namespace
, which in turn has a ContentType
property whose value is set during the construction of the instance. The complete
changes to Namespace
are:
#-----------------------------------#
# Instance property-getter methods #
#-----------------------------------#
@describe.AttachDocumentation()
def _GetContentType( self ):
"""
Gets the MIME-type associated with documents of the namespace the instance
represents"""
return self._contentType
# ...
@describe.AttachDocumentation()
def _GetPublicIdentifier( self ):
"""
Gets the public identifier of the namespace."""
return self._publicIdentifier
@describe.AttachDocumentation()
def _GetSystemIdentifier( self ):
"""
Gets the system identifier of the namespace."""
return self._systemIdentifier
#-----------------------------------#
# Instance property-setter methods #
#-----------------------------------#
@describe.AttachDocumentation()
@describe.argument( 'value',
'the MIME-Type to set for documents of the namespace the instance '
'represents',
str, unicode, None
)
@describe.raises( TypeError,
'if passed a value that is not a str, a unicode, or None'
)
@describe.raises( ValueError,
'if passed a value that is not a member of %s' % (
sorted( KnownMIMETypes )
)
)
def _SetContentType( self, value ):
"""
Sets the MIME-Type of the instance"""
if value != None and type( value ) not in ( str, unicode ):
raise TypeError( '%s.ContentType expects a str or unicode value '
'that is one of the known MIME-types on the system, or None, '
'but was passed "%s" (%s)' % (
self.__class__.__name__, value, type( value ).__name__ )
)
if value not in KnownMIMETypes:
raise ValueError( '%s.ContentType expects a str or unicode value '
'that is one of the known MIME-types on the system, or None, '
'but was passed "%s" which could not be found' % (
self.__class__.__name__, value )
)
self._contentType = value
# ...
@describe.AttachDocumentation()
@describe.argument( 'value',
'the public identifier of the namespace to set for the instance',
str, unicode
)
@describe.raises( TypeError,
'if passed a value that is not a str or unicode type or None'
)
@describe.raises( ValueError,
'if passed a value that has multiple lines in it'
)
def _SetPublicIdentifier( self, value ):
"""
Sets the public-identifier value for the instance"""
if type( value ) not in ( str, unicode ) and value != None:
raise TypeError( '%s.PublicIdentifier expects a single-line str '
'or unicode value, or None, but was passed "%s" (%s)' % (
self.__class__.__name__, value, type( value ).__name__ )
)
if value:
if '\n' in value or '\r' in value:
raise ValueError( '%s.PublicIdentifier expects a single-line '
'str or unicode value, or None, but was passed "%s" (%s) '
'which has multiple lines' % (
self.__class__.__name__, value, type( value ).__name__
)
)
self._publicIdentifier = value
@describe.AttachDocumentation()
@describe.argument( 'value',
'the system identifier of the namespace to set for the instance',
str, unicode
)
@describe.raises( TypeError,
'if passed a value that is not a str or unicode type or None'
)
@describe.raises( ValueError,
'if passed a value that has multiple lines in it'
)
def _SetSystemIdentifier( self, value ):
"""
Sets the System-identifier value for the instance"""
if type( value ) not in ( str, unicode ) and value != None:
raise TypeError( '%s.SystemIdentifier expects a single-line str '
'or unicode value, or None, but was passed "%s" (%s)' % (
self.__class__.__name__, value, type( value ).__name__ )
)
if value:
if '\n' in value or '\r' in value:
raise ValueError( '%s.SystemIdentifier expects a single-line '
'str or unicode value, or None, but was passed "%s" (%s) '
'which has multiple lines' % (
self.__class__.__name__, value, type( value ).__name__ )
)
self._systemIdentifier = value
#-----------------------------------#
# Instance property-deleter methods #
#-----------------------------------#
@describe.AttachDocumentation()
def _DelContentType( self ):
"""
"Deletes" the MIME-type associated with documents of the namespace the instance
represents by setting it to None"""
self._contentType = None
# ...
@describe.AttachDocumentation()
def _DelPublicIdentifier( self ):
"""
"Deletes" the public identifier of the namespace by setting it to None."""
self._publicIdentifier = None
@describe.AttachDocumentation()
def _DelSystemIdentifier( self ):
"""
"Deletes" the system identifier of the namespace by setting it to None."""
self._systemIdentifier = None
#-----------------------------------#
# Instance Properties #
#-----------------------------------#
ContentType = describe.makeProperty(
_GetContentType, None, None,
'the MIME-type of the content expected for a document of the '
'namespace the instance represents',
str, unicode, None
)
# ...
PublicIdentifier = describe.makeProperty(
_GetPublicIdentifier, None, None,
'the public identifier of the namespace',
str, unicode, None
)
SystemIdentifier = describe.makeProperty(
_GetSystemIdentifier, None, None,
'the system identifier of the namespace',
str, unicode, None
)
#-----------------------------------#
# Instance Initializer #
#-----------------------------------#
@describe.AttachDocumentation()
# ...
@describe.argument( 'contentType',
'the MIME-type of the content associate with the instance',
str, unicode, None
)
@describe.argument( 'publicId',
'the public-identifier of the namespace',
str, unicode, None
)
@describe.argument( 'systemId',
'the system-identifier of the namespace',
str, unicode, None
)
# ...
def __init__( self, name, namespaceURI, contentType, systemId=None,
publicId=None, defaultRenderingModel=renderingModels.Mixed,
**tagRenderingModels ):
"""
Instance initializer"""
# ...
# Set default instance property-values with _Del... methods as needed.
self._DelContentType()
# ...
self._DelPublicIdentifier()
self._DelSystemIdentifier()
# Set instance property values from arguments if applicable.
self._SetContentType( contentType )
# ...
self._SetPublicIdentifier( publicId )
self._SetSystemIdentifier( systemId )
The HTML5Namespace
- and XHTMLNamespace
-constants change slightly, to:
#-----------------------------------#
# Default Namespace constants #
# provided by the module. #
#-----------------------------------#
# HTML 5 namespace
HTML5Namespace = Namespace(
'html5',
'http://www.w3.org/2015/html',
'text/html',
None,
None,
renderingModels.RequireEndTag,
br=renderingModels.NoChildren,
img=renderingModels.NoChildren,
link=renderingModels.NoChildren,
)
__all__.append( 'HTML5Namespace' )
# XHTML namespace
XHTMLNamespace = Namespace(
'xhtml',
'http://www.w3.org/1999/xhtml',
'text/html',
'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd',
'-//W3C//DTD XHTML 1.0 Transitional//EN',
renderingModels.RequireEndTag,
br=renderingModels.NoChildren,
img=renderingModels.NoChildren,
link=renderingModels.NoChildren,
)
__all__.append( 'XHTMLNamespace' )
Finally, unit-tests get updated and run:
######################################## Unit-test results ######################################## Tests were successful ..... False Number of tests run ....... 265 + Tests ran in ........... 0.17 seconds Number of errors .......... 0 Number of failures ........ 1 Number of tests skipped ... 107 ######################################## FAILURES #--------------------------------------# testCodeCoverage (__main__.testmarkupCodeCoverage) AssertionError: Unit-testing policies require test-cases for all classes and functions in the idic.markup module, but the following have not been defined: (testBaseDocument)And that, I believe, provides everything needed to implement the
doctype
property in BaseDocument
, which could then also be used in its __str__
and __unicode__
methods to render it if/as needed.
Looking at the XML Headers
Apart from their final output, both the initial XML declaration and any XML
processing-instruction that I've run across look, structurally, like they could
be represented by a Tag
-variant: They have a name, and can have attributes.
That does not include any inline DTD
specifications (see the An Internal DTD Declaration
section
here for an example
of this), but I don't honestly expect that providing an inline DTD is something
that will be needed, so I'm not going to worry too much about that, at least for
the time being.
As a result, my first thought with regards to implementing those is to generate
a Tag
subclass, possibly as an inline/nested class in BaseDocument
itself, that overrides the __str__
and __unicode__
methods
to generate the right output. That would then allow a document-level property (call
it XMLDeclaration
) to provide the initial XML declaration, and a
collection of those tag-types (XMLProcessingInstructions
, as an ElementList
)
to represent any of the XML processing-instructions for a document-instance. That
implementation looks like this:
@describe.InitClass()
class BaseDocument( Tag, object ):
# ...
#-----------------------------------#
# Inline class definitions #
#-----------------------------------#
@describe.InitClass()
class XMLTag( Tag, object ):
# ...
#-----------------------------------#
# Instance Initializer #
#-----------------------------------#
@describe.AttachDocumentation()
@describe.argument( 'tagName',
'the tag-name to set in this created instance',
str, unicode
)
@describe.keywordargs(
'the attribute names/values to set in the created instance'
)
def __init__( self, tagName, **attributes ):
"""
Instance initializer"""
Tag.__init__( self, tagName, **attributes )
# ...
#-----------------------------------#
# Instance Methods #
#-----------------------------------#
@describe.AttachDocumentation()
def __str__( self ):
"""
Returns a string representation of the instance"""
try:
result = '<?%s' % ( self.tagName )
for name in self.attributes:
result += ' %s="%s"' % ( name, self.attributes[ name ] )
result += '?>'
return result
except ( UnicodeDecodeError, UnicodeEncodeError, UnicodeError ):
return __unicode__( self )
@describe.AttachDocumentation()
def __unicode__( self ):
"""
Returns a unicode representation of the instance"""
result = u'<?%s' % ( self.tagName )
for name in self.attributes:
result += u' %s="%s"' % ( name, self.attributes[ name ] )
result += u'?>'
return result
# ...
#-----------------------------------#
# Class attributes (and instance- #
# attribute default values) #
#-----------------------------------#
# ...
With that class available, the two BaseDocument
properties noted
above can be implemented, making them available for use in the rendering processes
of BaseDocument.__str__
, and BaseDocument.__unicode__
.
Some helper-methods defined in BaseDocument
, to set or add items to
those properties, will also need to be created, but they feel pretty simple:
SetXMLVersion( version )
:- Sets the "version" attribute of the instance's
XMLDeclaration
, creating it in the process if necessary SetXMLEncoding( encoding )
:- Sets the "encoding" attribute of the instance's
XMLDeclaration
, creating it in the process if necessary CreateXMLInstruction( name, **attributes )
:- Creates and adds an XML processing-instruction to the instance's
XMLProcessingInstructions
collection
The Final Implementation of BaseDocument
There's not a whole lot present in BaseDocument
, but there are
some significant chunks over and above the properties noted earlier. The implementation
of the __init__
methodof BaseDocument
and the three
XML-structure-related methods are pretty simple:
#-----------------------------------#
# Instance Initializer #
#-----------------------------------#
@describe.AttachDocumentation()
@describe.argument( 'tagName',
'the tag-name to set in this created instance',
str, unicode
)
@describe.argument( 'namespace',
'the namespace that the instance belongs to',
Namespace
)
@describe.keywordargs(
'the attribute names/values to set in the created instance'
)
def __init__( self, tagName, namespace, **attributes ):
"""
Instance initializer"""
# BaseDocument is intended to be an abstract class,
# and is NOT intended to be instantiated. Alter at your own risk!
if self.__class__ == BaseDocument:
raise NotImplementedError( 'BaseDocument is '
'intended to be an abstract class, NOT to be instantiated.' )
# Call parent initializers, if applicable.
Tag.__init__( self, tagName, namespace, **attributes )
# Set default instance property-values with _Del... methods as needed.
self._DelXMLDeclaration()
self._DelXMLProcessingInstructions()
# Set instance property values from arguments if applicable.
# Other set-up
#-----------------------------------#
# Instance Garbage Collection #
#-----------------------------------#
#-----------------------------------#
# Instance Methods #
#-----------------------------------#
@describe.AttachDocumentation()
@describe.argument( 'tagName',
'the tag-name to set in this created instance',
str, unicode
)
@describe.keywordargs(
'the attribute names/values to set in the created instance'
)
def CreateXMLInstruction( self, tagName, **attributes ):
"""
Creates an XMLTag instance with the supplied tag-name and attributes and appends
it to the instance's XML processing-instructions"""
if not self.XMLDeclaration:
self._SetXMLDeclaration( BaseDocument.XMLTag( 'xml' ) )
self.XMLProcessingInstructions.append(
BaseDocument.XMLTag( tagName, **attributes )
)
@describe.AttachDocumentation()
@describe.argument( 'value',
'the encoding value to set in the instance\'s xml declaration',
str, unicode
)
@describe.raises( TypeError,
'if passed an encoding value that is not a str or unicode'
)
@describe.raises( ValueError,
'if passed an encoding value that is not a single word'
)
def SetXMLEncoding( self, value ):
"""
Sets the "encoding" attribute-value in the instance's XMLDeclaration"""
if type( value ) not in ( str, unicode ):
raise TypeError( '%s.SetXMLEncoding expects a single-word str or '
'unicode value for its encoding, but was passed "%s" (%s)' % (
self.__class__.__name__, value, type( value ).__name__ )
)
if ' ' in value or '\n' in value or '\t' in value or '\r' in value:
raise ValueError( '%s.SetXMLEncoding expects a single-word str or '
'unicode value for its encoding, but was passed "%s" which is '
'invalid' % ( self.__class__.__name__, value )
)
if not self.XMLDeclaration:
self._SetXMLDeclaration( BaseDocument.XMLTag( 'xml' ) )
self.XMLDeclaration.setAttribute( 'encoding', value )
@describe.AttachDocumentation()
@describe.argument( 'value',
'the version value to set in the instance\'s xml declaration',
str, unicode, float, int, long
)
@describe.raises( ValueError,
'if passed a version value that is not a float and cannot be converted '
'to one'
)
def SetXMLVersion( self, value ):
"""
Sets the "version" attribute-value in the instance's XMLDeclaration"""
if type( value ) != float:
try:
checkValue = float( value )
if checkValue < 1.0:
raise ValueError
value = str( checkValue )
except:
raise ValueError( '%s.SetXMLVersion expects a float value '
'greater than or equal to one, or a text or numeric value '
'that can be converted to one, but was passed '
'"%s" (%s)' % (
self.__class__.__name__, value, type( value ).__name__ )
)
else:
if value < 1.0:
raise ValueError( '%s.SetXMLVersion expects a float value '
'greater than or equal to one, or a text or numeric value '
'that can be converted to one, but was passed '
'"%s" (%s)' % (
self.__class__.__name__, value, type( value ).__name__ )
)
if not self.XMLDeclaration:
self._SetXMLDeclaration( BaseDocument.XMLTag( 'xml' ) )
self.XMLDeclaration.setAttribute( 'version', str( value ) )
The __str__
and __unicode__
methods arent complex either,
though they might seem so atr first glance, but I'm pretty confident that the
comments in the code tell the entore story of how they work:
@describe.AttachDocumentation()
def __str__( self ):
"""
Returns a string representation of the instance"""
# Try rendering the instance as a string:
try:
result = ''
# TODO: Add XML declaration, if applicable
if self.XMLDeclaration:
result += '%s' % self.XMLDeclaration
# TODO: Add XML processing-instructions, if applicable
for instruction in self.XMLProcessingInstructions:
result += '%s' % instruction
# TODO: Add DOCTYPE, if applicable
result += '%s' % self.doctype
result += '<%s' % ( self.tagName )
# If the instance has a namespace, render that too
if self.namespace:
result += ' xmlns="%s"' % ( self.namespace.namespaceURI )
# If there are child namespaces that aren't the same as the local
# namespace, they need to be included:
for ns in self.childNamespaces:
if ns != self.namespace:
result += ' xmlns:%s="%s"' % (
self.namespace.Name, self.namespace.namespaceURI
)
# Since a document is also a tag, it can have attributes, so render
# any present:
for attr in self.attributes:
result += '%s="%s"' % (
attr, self.attributes[ attr ]
)
# Close the starting tag
result += '>'
# Add Tag.childNodes.__str__ to results
for child in self.childNodes:
result += '%s' % child
# Strip the current results just to keep things clean
result = result.strip()
# Add the closing tag
result += '</%s>' % ( self.tagName )
# And return it
return result
# If string-rendering fails because it needs unicode, return the
# unicode representation instead.
except ( UnicodeDecodeError, UnicodeEncodeError, UnicodeError ):
return __unicode__( self )
@describe.AttachDocumentation()
def __unicode__( self ):
"""
Returns a unicode representation of the instance"""
result = u''
# TODO: Add XML declaration, if applicable
if self.XMLDeclaration:
result += u'%s' % self.XMLDeclaration
# TODO: Add XML processing-instructions, if applicable
for instruction in self.XMLProcessingInstructions:
result += u'%s' % instruction
# TODO: Add DOCTYPE, if applicable
result += u'%s' % self.doctype
result += u'<%s' % ( self.tagName )
# If the instance has a namespace, render that too
if self.namespace:
result += u' xmlns="%s"' % ( self.namespace.namespaceURI )
# If there are child namespaces that aren't the same as the local
# namespace, they need to be included:
for ns in self.childNamespaces:
if ns != self.namespace:
result += u' xmlns:%s="%s"' % (
self.namespace.Name, self.namespace.namespaceURI
)
# Since a document is also a tag, it can have attributes, so render
# any present:
for attr in self.attributes:
result += u'%s="%s"' % (
attr, self.attributes[ attr ]
)
# Close the starting tag
result += u'>'
# Add Tag.childNodes.__unicode__ to results
for child in self.childNodes:
result += u'%s' % child
# Strip the current results just to keep things clean
result = result.strip()
# Add the closing tag
result += u'</%s>' % ( self.tagName )
# And return it
return result
How BaseDocument
Will Be Used
The next logical step, I think, is to define document-type classes for HTML 5
and XHTML document-types — one document-type for each Namespace constant available
in the markup
module. The implementation of those is where differentiation
between the two HTML dialects starts to take shape, as do the differences between
the two of them and any generic
XML-derived markup. The HTML-variant implementations
will be very simple: Neither will override much (if any) of the functionality
of BaseDocument
, both may well have some common structures added (like
head
and body
properties that provide direct access to
the Tag
-instance representing them, for example). Down the line, they'll
both likey have support for script- and stylesheet-management attached in some fashion,
but that's a topic for a later post.
There's one other difference that I can think of, offhand, between the two HTML
variants, maybe: An XHTML document's __init__
might set XML-declaration
values (version and encoding) in order to conform to the XML requirements that underlie
it. There's also the possibility that XML processing-instructions might be added,
though that's not part of a baseline XHTML document. Given that this is the
only difference I can identify between the two dialects that isn't already accounted
for through the relvant Namespace
associated, I'm going to give some
thought to how best to proceed on defining concrete HTML-document classes while I
work my way through the MarkupParser
in my next post.
And that's all for today, I think.
No comments:
Post a Comment