Tuesday, May 2, 2017

Generating and Parsing Markup in Python [4]

Before tackling the Tag class, the last major concrete class in the markup module that I'll need to be able to generate, well... markup, there are a few items that contribute to it that need attention. The reasons for needing the attention are, perhaps, not obvious, so in today's post I'll take a step back, and explain/examine aspects of my end-goal and show how those items fit into meeting that goal.

Keeping Markup and Logic Separate

Back in the post where I decided to work on markup-generation first, I mentioned:

I firmly believe the idea of separation of markup/structure from functionality/logic has merit — to the point that one of my goals for this framework is to make it as easy as possible to keep that separation, while still allowing as much designer-level control as possible over the structure and appearance of pages.
I didn't really go into any details about how I wanted that to work, just that it was a priority. I'm not going to get deeply into the details today, but I can at least shed some light on what I'm going to do, and what the implications of that are as they relate, here and now, to the markup module.

Separation of Markup/Design from Logic/Function

Consider the following tentative page-template:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="XML_NS_FOR_SOME_HTML_TYPE" 
  xmlns:idic="idic.page.component.path"
  xmlns:app="app.page.component.path">
  <head>
    <title>
      <idic:IfLoggedIn>
        <idic:Placeholder value="Page.Title">Page.Title</idic:Placeholder>
      </idic:IfLoggedIn>
      <idic:IfNotLoggedIn>
        <idic:Placeholder value="Page.Title">Page.Title</idic:Placeholder>
        Log-in Required:
      </idic:IfNotLoggedIn>
    </title>
    <idic:ScriptManager role="HeadScripts">
      <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
    </idic:ScriptManager>
    <idic:StyleManager>
      <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" />
    </idic:StyleManager>
  </head>
  <body>
    <idic:IfLoggedIn>
      <div id="main" class="container">
        <h1>
          <idic:Placeholder value="Page.Title">Page.Title</idic:Placeholder>
        </h1>
        <app:SomeComponent>
          <!-- etc., etc. -->
        </app:SomeComponent>
      </div>
    </idic:IfLoggedIn>
    <idic:IfNotLoggedIn>
      <div id="main" class="container">
        <h1>
          <idic:Placeholder value="Page.Title">Page.Title</idic:Placeholder>
          Log-in Required:
        </h1>
        <form action="" method="post" 
          app:component="LogInForm"
          app:user="username" app:passwd="userpass">
          <idic:Template>
            <div class="form-group">
              <label for="username">Name:</label>
              <input type="text" id="username" name="username"
              class="form-control" />
            </div>
            <div class="form-group">
              <label for="userpass">Password:</label>
              <input type="password" id="userpass" name="userpass"
              class="form-control" />
            </div>
            <div>
              <button type="submit" class="btn btn-default">Log In</button>
            </div>
          </idic:Template>
        </app:LogInForm>
      </div>
    </idic:IfNotLoggedIn>
    <idic:ScriptManager role="DefinitionScripts">
      <script src="AnotherExternalScript.js"></script>
    </idic:ScriptManager>
    <idic:ScriptManager role="RuntimeScripts" />
  </body>
</html>
This is a very bare-bones example of the sort of templating that I'm trying to accomplish with the framework. In order to illustrate the ability to add third-party frameworks, I've added links to the Bootstrap core CSS and JavaScript. If it were passed off to a client browser (removing the initial XML declaration if necessary) it would render reasonably well, though it would show two major page-sections, one in the <idic:IfLoggedIn> element in the markup and one in the <idic:IfNotLoggedIn>. Still, it would render:

On top of that, though this example might be a little odd to work with because it's got displays for logged-in and not-logged-in states, there's not much here that even a still-in-school intern wouldn't be able to understand just by looking at it, and the new stuff can pretty much be ignored:
  • Any page-components — that is, any tag with an XML namespace (idic or app) — can be ignored or left alone;
  • All the markup inside the page-components uses standard HTML tag-names and reasonably-normal tag structure, barring the XML structure;
    That even holds true for:
    • The external/third-party style-sheet references; and
    • The external/third-party script-references
  • Any attributes with a page-component namespace can also be ignored or left alone.
Granted, it will take at least some getting used to, particularly for anyone who isn't familiar with XML's rules, but I'd guess that most of the tools that are out there in the wild will recognize XML and provide assistance with it if/as needed while authoring in a page-template document.

Without worrying too much about the details or implementation behind the page-component tags in the example template, here's at least a rough approximation of what they'd probably do:

<idic:IfLoggedIn>:
Would render any child markup to the final page-output if the user is logged in.
<idic:Placeholder>:
Would replace its child markup with a value identified by its value attribute.
<idic:IfNotLoggedIn>:
Would render any child markup to the final page-output if the user is not logged in.
<idic:ScriptManager>:
Would gather any number of external script-references or inline script-code, and keep track of them so that components can require specific scripts without having to worry about multiple instance of those scripts being present in the final rendered page markup.
<idic:StyleManager>:
Performs much the same task as a ScriptManager, but for external style-sheet references and inline stylesheets.
<app:SomeComponent>:
Some application component tag — Something that the underlying application renders, with or without user interaction.
<idic:Template>:
Defines a block of markup that will be used by a parent page-component to define some part of what its rendered markup will look like.
Each of these component-tags would need to be able to map back to a Python class (something that is derived from a common base page-component class that I'll define after I've completed the markup module and the next two major topics after that). That mapping, I think, can eventually be handled by a combination of one or more properties in a Namespace instance, and probably some sort of component-registration process that I'll figure out in detail later.

I'm also planning, at least tentatively, to make page-component equivalents of all of the standard HTML form-tags — <form>, all of the <input> variations, <select> and <textarea>, with an eye towards allowing in-template specification of server-side (and maybe client-side) validation. That, too, will rely at least to some degree on the same sort of Namespace-based mapping and/or component-registration process.

If all of this seems like a lead-in to defining the Namespace class, well... Yes, really. But before that, there's one other item to consider...

Rendering Models

In order for an XML-based page-template to render out to a non-XML-based markup language like HTML 5, and to do so without strict XML rendering rules, there needs to be some way to determine how any given tag in a document should be rendered. Take a

<link rel="stylesheet" ... >
tag as an example. In XML, that would be constructed as a self-closing tag:
<link rel="stylesheet" ... />
but in HTML 5 it's not closed at all. A <script> tag that only references an external source, having no internal content, is perfectly legitimate in XML as
<script src="..." />
but if that markup gets issued to a browser, there's a good chance that it'll make the page puke in odd and unexpected ways. I've seen similar things happen with self-closed <div> tags, and I'd expect similar issues to arise from any HTML tag that's supposed to have content inside it.

Then there are the component-tags listed above. Any one of them might generate child markup (or not), have a wrapping tag (or not) or be represented themselves by a tag in the rendered markup (or not), in any combination of those three possibilities.

All of these represent what I think of as a rendering model — some indication of how a given tag must or should be rendered to a client browser, and whose rendering rules might vary from one markup-dialect to another, even if the tags themselves are identical.

By the time I get to a point where I can define actual document types, I should have a pretty good idea of what the rendering rules are for all the tags within the markup language that the document is for. Those individual tag-level rendering-models, then, can be defined as a set of properties for each tag for a given document-type, and those document-types can be identified by a Namespace that can actually be identified, in turn, by a real namespace, though the official namespace for HTML 5 is not distinct, so it'll require some workaround:

XML
http://www.w3.org/XML/1998/namespace
XHTML
http://www.w3.org/1999/xhtml
HTML5
http://www.w3.org/2015/html
or, maybe:
idic.markup.HTML5Document
(Because the official namespace for HTML 5, even as late as the HTML 5.2 specification is http://www.w3.org/1999/xhtml, which is the same as for XHTML...)
But I digress...

I could think of the following rendering-model variations:

NoChildren
A tag that should never have children, and so shouldn't render any, even if it does
Example: [HTML 5] <link ... >
Example: [XHTML] <link ... />
Mixed
A tag that might or might not have children, and should render with a closing-tag if children are present, or as if it were a NoChildren tag if it doesn't.
Example: [XML] Any tag that doesn't have required children in its schema or DTD definition.
RequireEndTag
A tag that should always render with a closing tag, even if it has no child content.
Example: [HTML 5, XHTML] <div></div>, <script></script> and most other tags
ChildrenOnly
A tag that renders only any child markup.
Example: There will likely be at least a few page-components that will use this model, though at present I don't have any defined that I can point to. The idic:ScriptManager and idic:StyleManager tags might fit into this model, depending on how their managed scripts and styles get stored, though.
These feel to me like they could fit well into another pseudo-enumeration, the same sort of structure/construct that was created for managing node-types:
renderingModels = namedtuple(
    'enumRenderingModels', 
    [ 'NoChildren', 'Mixed', 'RequireEndTag', 'ChildrenOnly' ]
    )(
        NoChildren=0,
        Mixed=1,
        RequireEndTag=2,
        ChildrenOnly=3,
    )

__all__.append( 'renderingModels' )

My earlier comparison of XML vs. HTML 5 rendering of a link tag was, now that I look at it again, somewhat misleading. There is a (subtle?) distinction between these rendering models and the XML-style vs. HTML-style self-closing/unclosed tags (<link ...> vs. <link ... /> as an example again): Whether a given language's handling of a NoChildren object uses the XML-style unary-tag syntax (<link />) or just leaves it hanging like HTML 5 does (<link>) is really more a function of the markup language than the tags within that language. That, too, feels like something that could be stored as a Namespace property and used when the final rendered output is generated, but I'm going to ponder on that until I get to the point where I'm actually defining how documents work.

Another perhaps-odd consideration: This structure would allow the generation of tags with child markup, while also allowing the rendering of the final markup of such tags to prohibit rendering of those children. On the surface that probably sounds odd. I thought so too. I'm leaving that implied capability in place in the framework, though, because while I can't think of any real-world case where a tag in one markup-language allows children, but the same tag in another doesn't, I can't guarantee that it can't happen. I'll probably think more on that in the future, but for the time being, it doesn't feel like a major consideration, so I'll leave it in place, as weird as it feels to me.

That, I think, is all that need be done to define rendering-models. I'll dig in to the application of them for tag-rendering purposes when I get to the Tag class, but I've got enough now to define Namespace, I think.

The Namespace class

Namespace is built with my standard final class template as a starting-point. The rationale for making it nominally-final is about as weak as I consider to still be valid: I cannot think of an actual need for it to ever be extended. That said, if a reason surfaces, I'll drop the nominally-final check-code out of its definition. Apart from that, it's a very straightforward class, I think: a few properties, a class-level registration-process of specific namespaces to facilitate creation of some commonly-used variants as constants in the markup module... Not much else to it.

The properties of Namespace are:

DefaultRenderingModel
A value from the renderingModels enumeration, defining the rendering-model to use for tags that don't have a specific one identified;
TagRenderingModels
A dictionary of tag-names to rendering-model values that indicate the rendering-model to be used for specific tags — e.g.:
{
    # ...
    'br':renderingModels.NoChildren,
    'ing':renderingModels.NoChildren,
    'link':renderingModels.NoChildren,
    # ...
}
for an HTML dialect.
namespaceURI
The unique identifier of the namespace instance, used as a name to register the instance with the Namespace class, and to retrieve a namespace by that URI if needed.

As is typical for me, I'm type- and value-checking the values going into these properties in their various _Set* methods. For the most part, those checks are pretty simple, but the check- and set-process for setting TagRenderingModels is a bit more complex than anything I think I've shown so far, so I'll show and discuss it briefly:

@describe.AttachDocumentation()
@describe.raises( TypeError, 
    'if passed a value that is not a dict, or not derived from one'
)
@describe.raises( ValueError, 
    'if passed a dict with one or more invalid keys (that are not '
    'valid tag-names)'
)
@describe.raises( ValueError, 
    'if passed a dict with one or more invalid values (not members '
    'of renderingModels)'
)
def _SetTagRenderingModels( self, value ):
    """
Sets the dictionary of tag-names:rendering-models to use for tags that don't 
use the default rendering model of the namespace."""
    if not isinstance( value, dict ):
        raise TypeError( '%s.TagRenderingModels expects a dict of '
            'str or unicode values that are valid tag-names as keys, '
            'and members of renderingModels as values, but was passed '
            '"%s" (%s)' % ( 
                self.__class__.__name__, value, type( value ).__name__
            )
        )
    # TODO: Figure out a better way to validate tag-names
    badKeys = [
        k for k in sorted( value ) 
        if type( k ) not in ( str, unicode )
        or ' ' in k
        or '\n' in k
        or '\t' in k
        or '\r' in k
    ]
    if badKeys:
        raise ValueError( '%s.TagRenderingModels expects a dict of '
            'str or unicode values that are valid tag-names as keys, '
            'and members of renderingModels as values, but was passed '
            'a dict with invalid key-values %s' % ( 
                self.__class__.__name__, badKeys
            )
        )
    badValues = dict(
            [
                ( k, value[ k ] ) for k in sorted( value ) 
                if value[ k ] not in renderingModels
            ]
        )
    if badValues:
        raise ValueError( '%s.TagRenderingModels expects a dict of '
            'str or unicode values that are valid tag-names as keys, '
            'and members of renderingModels as values, but was passed '
            'a dict with invalid values %s' % ( 
                self.__class__.__name__, badValues
            )
        )
    self._tagRenderingModels = value
All of this code is intended to accomplish a few key checks:
  • The initial incoming value is expected to be a dict, or a subclass of one — the process needs the keys and values in order to support multiple different values for multiple tags, after all.
  • Any tag-name specified must be a valid tag-name. I'm going to look for a better way to make that determination while I'm working out the Tag class, I'm sure, but for now the simple checks I have in place will suffice — though I won't really be able to fully unit-test the property until I get that resolved.
  • Any tag-name specified must have a valid rendering-model value — it must be a member of the renderingModels enumeration defined earlier.
If this seems like an awful lot of code to write when it should be possible to just set self._tagRenderingModels = value, I'd point to
Raise errors as close to their ultimate source as possible
and
If specific types are expected, test for those where they're expected
from my coding standards. This, I think, is a really good example of why I think those are important — Without those checks, it would be possible to dump pretty much any value as a rendering-model for any tag in a namespace. It would almost certainly be possible to have the rendering-process for Tag instances check for valid values, but I'm pretty sure that those processes are going to be complex enough without adding that sort of checking to them. Even if those checks were made during rendering, and were to raise an error of some sort, that wouldn't necessarily help identify where the source of the error was.

I'm not going to take the time to fully populate the Namespace constants in markup just yet — doing so would require digging through all the tag-level documentation for HTML 5 and XHTML, at a minimum, and though it needs to be done, it doesn't need to be done right now, I think. I am, however, going to stub those constants out so that they're available (using a bogus URI for HTML 5, as noted earlier):

#-----------------------------------#
# Default Namespace constants       #
# provided by the module.           #
#-----------------------------------#

# HTML 5 namespace
HTML5Namespace = Namespace(
    'http://www.w3.org/2015/html', 
    br=renderingModels.NoChildren,
    img=renderingModels.NoChildren,
    link=renderingModels.NoChildren,
    )
__all__.append( 'HTML5Namespace' )

# XHTML namespace
XHTMLNamespace = Namespace(
    'http://www.w3.org/1999/xhtml', 
    br=renderingModels.NoChildren,
    img=renderingModels.NoChildren,
    link=renderingModels.NoChildren,
    )
__all__.append( 'XHTMLNamespace' )

I'm also going to defer finishing out the unit-tests for Namespace until I have Tag implemented, if only so that I can use the tag-name validation that it'll require to also check tag-names in the _SetTagRenderingModels method. I'm accumulating testing tech-debt by doing that, but I'm at the point where I'd rather get all of the unit-testing resolved at once after Tag is done, and the markup module is more complete. Unit-test stubs for Namespace and other recent items have added nearly 40 new test-methods to the mix, and six new failures:

########################################
Unit-test results
########################################
Tests were successful ... False
Number of tests run ..... 104
 + Tests ran in ......... 0.01 seconds
Number of errors ........ 0
Number of failures ...... 21
########################################

At this point, the next logical item to tackle is the Tag class. That has the potential to be a really long post (Tag has to implement all of the DOM-element functionality listed in the first markup-module post, and there's a lot there). That's way more than I feel comfortable with tackling in today's post.

On the off chance that there's any interest in the XML page-template that I started with today, I'm making that available for download:

No comments:

Post a Comment