Creating good, consistent documentation is often challenging for a lot of
developers, judging by a few I've worked with over the years, and the documentation
I've seen in the wild.
It's not surprising, I expect — ideally,
well-written and -commented code is self-documenting, and in a lot of places,
documentation and testing are not development functions that get a high priority from
a time-budgeting standpoint. Sometimes that's even the case when the people making
those decisions are or were developers themselves, and would stand to benefit from the
effort. Even in cases where there is enough time allocated, writing documentation
is rarely as much fun for developers as writing code, so there's not a whole lot of
impetus to do it right,
even assuming that some agreement can be reached about
what right
even means when it comes to documentation.
What's Available to Work With
Python's native documentation-structure is very simple, and very free-form. That is, at best, a double-edged sword. The simplicity is nice, but makes it very easy to simply not bother with documentation. Its free-form nature, limited only by the constraints of it being a string or unicode value, means that just about anything that can be expressed in text can be put in play — but the flip side of that is that there is no way to assure any consistency or completeness of documentation across projects/developers. Different projects, or even different developers on the same project, may well generate documentation that isn't consistent with documentation from other projects or developers. Upkeep of documentation is tedious, at best, if there is any real meat to it.
class SomeClass( object ):
"""
This is a "docstring." It can contain pretty much anything. What it should
contain is *probably* a matter of opinion, but should almost certainly
include:
* A description of the class (what it represents);
* Any applicable warnings (known issues, imminent deprecation, etc.)
It *might* include a simple usage-example.
"""
def __init__( self, arg1, arg2, arg3=None, *args, **kwargs ):
"""
This is a method's docstring. Like the one associated with the class, it can
contain pretty much anything. Like the class-level docstring, what it SHOULD
contain is *probably* a matter of opinion. Odds are good that many would agree
that it should contain:
* A description of the method ("Creates an instance of SomeClass" in this
case);
* A description of each argument (arg1, arg2), with default values where
applicable (arg3), including the expected/allowed types and value-
constraints if there are any.
* A description of the argument-list (*args) and any type- or value
expectations/constraints it might have;
* A description of the keyword-arguments (**kwargs) allowed/expected and
*their* type- or value-expectations/constraints, if any;
* What the method returns, if anything; and
* What exceptions might be raised, maybe with descriptions of *why* they'd be
raised.
"""
pass
What I Want To Accomplish
For my money, any process for generating code-documentation in general has the following characteristics:
- Documentation information should reside in the codebase itself;
- It should be as close to the code it's about as possible;
- The process for documenting anything should be as consistent as possible;
- It must be reasonably readable in the code;
- It cannot be dependent on external processes (though extensibility to leverage them is OK) and should be output-format-agnostic (but extensible to other formats);
- Most importantly: It's gotta be fairly simple, and not too time-consuming — Developers would rather write code than documentation...
Something else that I think is important is the ability to access some or all of
that documentation through the built-in __doc__
property of Python
constructs. It may not be super-critical to everyone, but I find that I will at
least occasionally launch a Python session, import something, and print
the doc-string of it:
>>> import re >>> print re.compile.__doc__ Compile a regular expression pattern, returning a pattern object. >>> print list.__doc__ list() -> new empty list list(iterable) -> new list initialized from iterable's items >>> print list.append.__doc__ L.append(object) -- append object to end >>>That's usually faster than Googling something (and doesn't require network access for those rare occasions when I have no connection) when I have a brain-fart. So I'd like for whatever portion of the documentation generated to carry over to the
__doc__
.
With respect to several of these points, I have to give credit to Microsoft for their XML Documentation Comments implementation. A very simple example of it is:
/// <summary>
/// This class performs an important function.
/// </summary>
public class MyClass{}
This keeps the documentation in the codebase and right next to the code
it's documenting. The only down-side to taking this sort of approach in my code
is that it would require parsing of the actual code, and some method to associate
the documentation-text with the item being documented. That seems like an
awful lot of work, even assuming that it's possible. At the same time,
the fairly large
collection of supported tags is nice, and is worth keeping in mind as I scope
out my documentation-process. On top of that, while an XML-based documentation-structure
sounds sexy, that would introduce an external-process requirement for almost
any useful format, which is a show-stopper in my mind.
As I was writing this post, a couple other nice-to-have
items came to mind:
- Documentation should be concerned more with facts about the
code (accepts arguments of this type, returns this or that value, etc.)
than prose, though there should be a
reasonable
allowance for moreprose-like
information. - It should be possible to test for whether some minimal level of d ocumentation is provided as part of a build-process.
My Approach: Metadata via Decoration
When push comes to shove, documentation is a form of metadata. That is, it's data about other data, in this case about the functionality that's being documented. There's quite a range of programmatic elements that need documentation, including (but most definitely not limited to):
- Module- and package-files, containing:
- Constants
- Exceptions
- Functions
- Any number of arguments (with or without default values);
- A single argument-list; and
- A single keyword-arguments structure, which may or may not have specific key-name expectations
- Return-value(s), which might vary based on the internal process(es) undertaken; and
- Exceptions that can be raised, and the circumstances that will raise them.
- Class variations:
- Concrete classes (whether
normal
ornominally-final
); - Abstract classes;
- Interfaces
- Properties; and
- Instance, class- and static methods, which have the same potential documentation-needs as functions, above;
- Concrete classes (whether
Barring the generation of a consistently-sequenced and -formatted
__doc__
-value, all of the documentation-metadata can simply
be stored in association with the item being documented, ready to retrieve
if/as needed for whatever purpose. One of those purposes, I think, should
be the generation of a doc-string (the __doc__
attribute of
an element) that provides the detailed documentation for reading/printing
as noted earlier. However, with all of the documentation-metadata available
as a collected property of the code itself, it should be a
relatively simple matter to read/extract it, and generate whatever other
format(s) are needed.
Under that assumption, the real challenge, then, is to come up with processes that allow documentation metadata to be quickly and easily generated, with a structure that is still meaningful in the actual code where the documentation is being specified.
Enter Python's decorator capabilities.
A Python decorator is a callable (a function or method) that accepts a callable as an argument, and returns a replacement callable. It should not be confused with the design pattern of the same name, though it may well be usable to implement that pattern. In typical usage, a decorator is applied to a callable, wrapping the original callable with other functionality, and returning the decorating callable. However, there is no reason that a decorator cannot return the same (original) callable after performing some operations. Operations like adding metadata to a common data-structure attached to the original callable.
That is the basis for the documentation-generation that I'm going to write. Each element that needs documentation capabilities will have a corresponding callable that can be used to provide the data to add to the decorated item's metadata, will perform that addition, and return the otherwise unmodified original item. In re-reading this, I'm not sure that it's easily understood, so let me provide an example of the sort of structure I'm thinking of, at some level of detail:
# An example of documentation decorators on a function
@argument( 'arg1', 'Description of arg1' )
@argument( 'arg2', 'Description of arg2' )
@argument( 'arg3', 'Description of arg3' )
@arglist( 'Description of arglist' )
@arglistitem( 0, "What's expected in the first item of the arglist" )
@arglistitem( 1, "What's expected in the second item of the arglist" )
@kwargs( 'Description of kwargs' )
@kwargsitem( 'name1', 'Description of "name1" keyword argument" )
@kwargsitem( 'name2', 'Description of "name2" keyword argument" )
@returns( 'Description of returned value(s) for one case' )
@returns( 'Description of returned value(s) for another case' )
@raises( NotImplementedError, 'If called' )
def MyFunction( arg1, arg2, arg3=None, *arglist, **kwargs ):
"""
Description of function (original docstring)"""
# TODO: Generate actual implementation here...
raise NotImplementedError( 'MyFunction is not yet implemented' )
In this structure, each @argument
, @arglist
,
@arglistitem
, @kwargs
and @kwargsitem
is a call to a decorator callable that attaches the names, descriptions,
and other data about one of those arguments to a metadata-structure that
will be generated and attached to the MyFunction
function.
The other @...
calls perform similar operations for other
documentation-information, for what the function returns, and what
exceptions could be raised. By the time all of these decorators have
executed, that metadata-structure would look something like this:
# The resulting metadata structure
{
"arguments":{
"arg1":{
"name":"arg1",
"description":"Description of arg1",
"hasDefault":False,
"defaultValue":None,
},
"arg2":{
"name":"arg2",
"description":"Description of arg2",
"hasDefault":False,
"defaultValue":None,
},
"arg3":{
"name":"arg3",
"description":"Description of arg3",
"hasDefault":True,
"defaultValue":None,
},
},
"arglist":{
"description":"Description of arglist",
"items":[
"What's expected in the first item of the arglist",
"What's expected in the second item of the arglist",
],
},
"kwargs":{
"description":"Description of kwargs",
"items":{
"name1":"Description of 'name1' keyword argument",
"name2":"Description of 'name2' keyword argument",
},
},
"returns":[
"Description of returned value(s) for one case",
"Description of returned value(s) for another case",
],
"raises":{
NotImplementedError: [
"If called",
],
"originalDocstring":"Description of function (original docstring)",
"sourceElement":MyFunction,
},
}
That structure could then be used quite easily to generate a
__doc__
-replacement value that maybe looks like this:
MyFunction( arg1, arg2[, arg3], *arglist, **kwargs ) Description of function (original docstring) RETURNS: - Description of returned value(s) for one case - Description of returned value(s) for another case ARGUMENTS: - arg1 ........... Description of arg1 - arg2 ........... Description of arg2 - arg3 ........... (Optional, defaults to None) Description of arg3 - arglist ........ Description of arglist + What's expected in the first item of the arglist + What's expected in the second item of the arglist - kwargs ......... Description of kwargs. Specific keywords include: + name1 ........ Description of "name1" keyword argument + name2 ........ Description of "name2" keyword argument RAISES: - NotImplementedError: + If called
It would also be feasible to generate nice-looking HTML documentation, maybe along the lines of:
Test-class.
Creation of documentation-output in other formats (LATEX, XML,
etc.) should also be feasible with relatively little effort, and it's already
JSON-ready, since it's a dict
of simple types that Python's json
module should be able to dump easily. If the plain-text formatting or sequencing
is not to someone's liking, changing it should also be quick and painless.
Which only
leaves figuring out how to implement it.
No comments:
Post a Comment