Thursday, January 12, 2017

So. Documentation…

Creating good, consistent documentation is often challenging for a lot of developers, judging by a few I've worked with over the years, and the documentation I've seen in the wild. It's not surprising, I expect — ideally, well-written and -commented code is self-documenting, and in a lot of places, documentation and testing are not development functions that get a high priority from a time-budgeting standpoint. Sometimes that's even the case when the people making those decisions are or were developers themselves, and would stand to benefit from the effort. Even in cases where there is enough time allocated, writing documentation is rarely as much fun for developers as writing code, so there's not a whole lot of impetus to do it right, even assuming that some agreement can be reached about what right even means when it comes to documentation.

What's Available to Work With

Python's native documentation-structure is very simple, and very free-form. That is, at best, a double-edged sword. The simplicity is nice, but makes it very easy to simply not bother with documentation. Its free-form nature, limited only by the constraints of it being a string or unicode value, means that just about anything that can be expressed in text can be put in play — but the flip side of that is that there is no way to assure any consistency or completeness of documentation across projects/developers. Different projects, or even different developers on the same project, may well generate documentation that isn't consistent with documentation from other projects or developers. Upkeep of documentation is tedious, at best, if there is any real meat to it.

class SomeClass( object ):
    """
This is a "docstring." It can contain pretty much anything. What it should 
contain is *probably* a matter of opinion, but should almost certainly 
include:
 * A description of the class (what it represents);
 * Any applicable warnings (known issues, imminent deprecation, etc.)
It *might* include a simple usage-example.
"""
    def __init__( self, arg1, arg2, arg3=None, *args, **kwargs ):
        """
This is a method's docstring. Like the one associated with the class, it can 
contain pretty much anything. Like the class-level docstring, what it SHOULD 
contain is *probably* a matter of opinion. Odds are good that many would agree 
that it should contain:
 * A description of the method ("Creates an instance of SomeClass" in this 
   case);
 * A description of each argument (arg1, arg2), with default values where 
   applicable (arg3), including the expected/allowed types and value-
   constraints if there are any.
 * A description of the argument-list (*args) and any type- or value 
   expectations/constraints it might have; 
 * A description of the keyword-arguments (**kwargs) allowed/expected and 
   *their* type- or value-expectations/constraints, if any;
 * What the method returns, if anything; and
 * What exceptions might be raised, maybe with descriptions of *why* they'd be 
   raised.
"""
        pass

What I Want To Accomplish

For my money, any process for generating code-documentation in general has the following characteristics:

  • Documentation information should reside in the codebase itself;
  • It should be as close to the code it's about as possible;
  • The process for documenting anything should be as consistent as possible;
  • It must be reasonably readable in the code;
  • It cannot be dependent on external processes (though extensibility to leverage them is OK) and should be output-format-agnostic (but extensible to other formats);
  • Most importantly: It's gotta be fairly simple, and not too time-consuming — Developers would rather write code than documentation...

Something else that I think is important is the ability to access some or all of that documentation through the built-in __doc__ property of Python constructs. It may not be super-critical to everyone, but I find that I will at least occasionally launch a Python session, import something, and print the doc-string of it:

>>> import re
>>> print re.compile.__doc__
Compile a regular expression pattern, returning a pattern object.
>>> print list.__doc__
list() -> new empty list
list(iterable) -> new list initialized from iterable's items
>>> print list.append.__doc__
L.append(object) -- append object to end
>>> 
That's usually faster than Googling something (and doesn't require network access for those rare occasions when I have no connection) when I have a brain-fart. So I'd like for whatever portion of the documentation generated to carry over to the __doc__.

With respect to several of these points, I have to give credit to Microsoft for their XML Documentation Comments implementation. A very simple example of it is:

/// <summary>
///  This class performs an important function.
/// </summary>
public class MyClass{}
This keeps the documentation in the codebase and right next to the code it's documenting. The only down-side to taking this sort of approach in my code is that it would require parsing of the actual code, and some method to associate the documentation-text with the item being documented. That seems like an awful lot of work, even assuming that it's possible. At the same time, the fairly large collection of supported tags is nice, and is worth keeping in mind as I scope out my documentation-process. On top of that, while an XML-based documentation-structure sounds sexy, that would introduce an external-process requirement for almost any useful format, which is a show-stopper in my mind.

As I was writing this post, a couple other nice-to-have items came to mind:

  • Documentation should be concerned more with facts about the code (accepts arguments of this type, returns this or that value, etc.) than prose, though there should be a reasonable allowance for more prose-like information.
  • It should be possible to test for whether some minimal level of d ocumentation is provided as part of a build-process.

My Approach: Metadata via Decoration

When push comes to shove, documentation is a form of metadata. That is, it's data about other data, in this case about the functionality that's being documented. There's quite a range of programmatic elements that need documentation, including (but most definitely not limited to):

  • Module- and package-files, containing:
    • Constants
    • Exceptions
    • Functions
      • Any number of arguments (with or without default values);
      • A single argument-list; and
      • A single keyword-arguments structure, which may or may not have specific key-name expectations
      and which may or may not also need to note
      • Return-value(s), which might vary based on the internal process(es) undertaken; and
      • Exceptions that can be raised, and the circumstances that will raise them.
    • Class variations:
      • Concrete classes (whether normal or nominally-final);
      • Abstract classes;
      • Interfaces
      which, in turn, may contain concrete or abstract
      • Properties; and
      • Instance, class- and static methods, which have the same potential documentation-needs as functions, above;

Barring the generation of a consistently-sequenced and -formatted __doc__-value, all of the documentation-metadata can simply be stored in association with the item being documented, ready to retrieve if/as needed for whatever purpose. One of those purposes, I think, should be the generation of a doc-string (the __doc__ attribute of an element) that provides the detailed documentation for reading/printing as noted earlier. However, with all of the documentation-metadata available as a collected property of the code itself, it should be a relatively simple matter to read/extract it, and generate whatever other format(s) are needed.

Under that assumption, the real challenge, then, is to come up with processes that allow documentation metadata to be quickly and easily generated, with a structure that is still meaningful in the actual code where the documentation is being specified.

Enter Python's decorator capabilities.

A Python decorator is a callable (a function or method) that accepts a callable as an argument, and returns a replacement callable. It should not be confused with the design pattern of the same name, though it may well be usable to implement that pattern. In typical usage, a decorator is applied to a callable, wrapping the original callable with other functionality, and returning the decorating callable. However, there is no reason that a decorator cannot return the same (original) callable after performing some operations. Operations like adding metadata to a common data-structure attached to the original callable.

That is the basis for the documentation-generation that I'm going to write. Each element that needs documentation capabilities will have a corresponding callable that can be used to provide the data to add to the decorated item's metadata, will perform that addition, and return the otherwise unmodified original item. In re-reading this, I'm not sure that it's easily understood, so let me provide an example of the sort of structure I'm thinking of, at some level of detail:

# An example of documentation decorators on a function
@argument( 'arg1', 'Description of arg1' )
@argument( 'arg2', 'Description of arg2' )
@argument( 'arg3', 'Description of arg3' )
@arglist( 'Description of arglist' )
@arglistitem( 0, "What's expected in the first item of the arglist" )
@arglistitem( 1, "What's expected in the second item of the arglist" )
@kwargs( 'Description of kwargs' )
@kwargsitem( 'name1', 'Description of "name1" keyword argument" )
@kwargsitem( 'name2', 'Description of "name2" keyword argument" )
@returns( 'Description of returned value(s) for one case' )
@returns( 'Description of returned value(s) for another case' )
@raises( NotImplementedError, 'If called' )
def MyFunction( arg1, arg2, arg3=None, *arglist, **kwargs ):
    """
Description of function (original docstring)"""
    # TODO: Generate actual implementation here...
    raise NotImplementedError( 'MyFunction is not yet implemented' )

In this structure, each @argument, @arglist, @arglistitem, @kwargs and @kwargsitem is a call to a decorator callable that attaches the names, descriptions, and other data about one of those arguments to a metadata-structure that will be generated and attached to the MyFunction function. The other @... calls perform similar operations for other documentation-information, for what the function returns, and what exceptions could be raised. By the time all of these decorators have executed, that metadata-structure would look something like this:

# The resulting metadata structure
{
    "arguments":{
        "arg1":{
                "name":"arg1",
                "description":"Description of arg1",
                "hasDefault":False,
                "defaultValue":None,
            },
        "arg2":{
                "name":"arg2",
                "description":"Description of arg2",
                "hasDefault":False,
                "defaultValue":None,
            },
        "arg3":{
                "name":"arg3",
                "description":"Description of arg3",
                "hasDefault":True,
                "defaultValue":None,
            },
    },
    "arglist":{
        "description":"Description of arglist",
        "items":[
            "What's expected in the first item of the arglist",
            "What's expected in the second item of the arglist",
            ],
    },
    "kwargs":{
        "description":"Description of kwargs",
        "items":{
            "name1":"Description of 'name1' keyword argument",
            "name2":"Description of 'name2' keyword argument",
            },
    },
    "returns":[
        "Description of returned value(s) for one case",
        "Description of returned value(s) for another case",
        ],
    "raises":{
        NotImplementedError: [
                "If called",
            ],
    "originalDocstring":"Description of function (original docstring)",
    "sourceElement":MyFunction,
    },
}

That structure could then be used quite easily to generate a __doc__-replacement value that maybe looks like this:


MyFunction( arg1, arg2[, arg3], *arglist, **kwargs ) Description of function (original docstring) RETURNS: - Description of returned value(s) for one case - Description of returned value(s) for another case ARGUMENTS: - arg1 ........... Description of arg1 - arg2 ........... Description of arg2 - arg3 ........... (Optional, defaults to None) Description of arg3 - arglist ........ Description of arglist + What's expected in the first item of the arglist + What's expected in the second item of the arglist - kwargs ......... Description of kwargs. Specific keywords include: + name1 ........ Description of "name1" keyword argument + name2 ........ Description of "name2" keyword argument RAISES: - NotImplementedError: + If called

It would also be feasible to generate nice-looking HTML documentation, maybe along the lines of:

[ABCMeta]
Ook
Test-class.
Class Attributes
PublicAttribute
Public attribute description
_ProtectedAttribute
Protected attribute description
Properties
PropertyName
(int|float) Gets, sets or deletes the PropertyName property of the instance.
Methods
[function]
Bleep(cls, arg1, arg2, *args, **kwargs)
Ook.Bleep (classmethod) original doc-string
Deprecated: Will be removed by version X.YY.ZZ
Arguments
cls
(class, required): The class that the method will bind to for execution.
arg1
(int|long|float, required): Ook.Bleep (classmethod) arg1 description
arg2
(any, optional, defaults to None): Ook.Bleep (classmethod) arg2 description
*args
(any): Ook.Bleep (classmethod) arglist description
[abstract function]
Fnord(self, arg1, arg2, *args, **kwargs)
Ook.Fnord (method) original doc-string
Deprecated: Use new_Fnord instead.
Returns: None (at least until the method is implemented)
Fix Me:
  • Rewrite list-loops to perform the same operations in fewer passes
  • Magic _parameters value needs to be removed
Arguments
self
(instance, required): The object-instance that the method will bind to for execution.
arg1
(bool|None, required): Ook.Fnord (method) arg1 description
arg2
(any, required): Ook.Fnord (method) arg2 description
*args
(int|long|float): Ook.Fnord (method) arglist description
The following values are specified by position:
argitem1
(float): Ook.Fnord.args[0] description
argitem2
(int|long): Ook.Fnord.args[1] description
argitem3
(bool): Ook.Fnord.args[2] description
values
(str|unicode): Ook.Fnord.args[3] (values) description
**kwargs
Ook.Fnord keyword-arguments list description
keyword1
(int|long|float, required): Ook.Fnord (method) "keyword1" description
keyword2
(None|str|unicode, defaults to None): Ook.Fnord (method) "keyword2" description
keyword3
(None|str|unicode): Ook.Fnord (method) "keyword3" description
Exceptions
NotImplementedError
if called
To-Do:
  • Change output to class with the same interface
  • Clean up output to remove empty members

Creation of documentation-output in other formats (LATEX, XML, etc.) should also be feasible with relatively little effort, and it's already JSON-ready, since it's a dict of simple types that Python's json module should be able to dump easily. If the plain-text formatting or sequencing is not to someone's liking, changing it should also be quick and painless.

Which only leaves figuring out how to implement it.

No comments:

Post a Comment