Thursday, March 16, 2017

Unit-testing and Code-coverage in Python [1]

I suspect that unit-testing, like documentation, is not thought of terribly fondly by many developers. It strikes me as one of those chores associated with the process — something that gets done not because it's fun, or sexy, or whatever, but because there's some policy in place that requires it. At best, perhaps, there is some recognition that it's a necessary evil, or something akin to an insurance policy — done so that if something arises, the developer can point to the unit-tests and say it's passing in the automated testing. Or maybe those suites of tests can point to exactly where an issue is arising, making the process of identifying and fixing a bug faster (and letting the developer get back to solving more interesting challenges that much sooner).

In my particular case, it falls squarely into the necessary evil category. I've definitely seen the benefits of having well-tested code:

  • Over a total of thirteen project-years, across three projects and with four application-instances, the total number of code errors encountered by up to three or four hundred application-users could be counted on the fingers of two hands;
  • The run of the relevant test-suites, per build, takes maybe ten minutes total (some of which is because of external system access that takes how long it takes for several tests).
    That ten-minute period runs perhaps two or three thousand test-cases, and I'm not even sure how many assertions are involved, but it's almost certainly in the tens of thousands.
  • I've never counted how many issues have been surfaced by unit-test runs, but I can distinctly remember more than a few that would, eventually, have caused some issue for end-users of the applications, that I could fix before the problematic code was released.

I'd be lying if I said that I enjoyed writing unit-tests, but I'm absolutely convinced that they save time, effort, and frustration in the long run, particularly if the code being tested is complex, and/or doesn't get modified on a frequent basis once it's done.

What I Want To Accomplish

The laundry-list of the goals I want to achieve in my common unit-testing structure(s) is a short list, but perhaps a deep one:

  • I want to spend as little time as is necessary writing actual tests;
  • At the same time, I want to be able to assure that those tests are complete and thorough — that they cover all the variations I think are necessary for testing arguments, values, types, etc., and that every element in my code is tested;
  • I do not want to disallow tests to be skipped, either — while it doesn't happen often in my experience, it does happen that there is no meaningful or useful testing that can be done on a code-element Though it should not just pass without comment, the ability to consciously skip a test should not be forbidden either.
The combination of the first two items might sound pretty daunting. They feel like they could easily be mutually exclusive goals at first blush. The strategy I'm going to follow, though, will allow what I feel to be a good balance between assuring 100% code-coverage (...that every element in my code is tested...) and not having to spend a ridiculous length of time writing those tests. Identifying the overall strategy took me a while, and I had to take a few steps back and look at what I was trying to accomplish at a more abstract level...

How I'm Going to Accomplish This

The two questions that ended up defining my unit-testing strategy, after a fair time thinking about it, were:

How can I identify what code-elements need tested?
and
What needs to happen in order to test those elements to my satisfaction?

The first question requires making some decisions about what should be tested, then figuring out how to identify those code-elements in a programmatic fashion. Since I'm aiming for 100% code-coverage, at least as far as any classes or functions are concerned, the decision part is pretty simple:

  • All class-definitions; and
  • All function-definitions
Actually identifying the members to be tested then becaomes a mostly straightforward chunk of code:
#!/usr/bin/env python

import inspect
import os
import sys

from pprint import pprint

sys.path.insert( 1, 
    os.path.expanduser( 'path-to-idic' )
)

import idic.serialization as TestModule
from idic.serialization import *

# Get all the classes available in the module
moduleMembers = inspect.getmembers( 
    TestModule, inspect.isclass )
# Get all the functions available in the module
moduleMembers += inspect.getmembers( 
    TestModule, inspect.isfunction )

pprint( moduleMembers )

# Create a dict of them, because that's slighly 
#   easier to work with, filtering down to local 
#   members only in the process
finalItems = dict(
    [
        m for m in moduleMembers
        if m[ 1 ].__module__ == TestModule.__name__
    ]
)

pprint( finalItems )
This yields two chunks of output, one for each pprint:
[
  ('HasSerializationDict', 
    <class 'idic.serialization.HasSerializationDict'>),
  ('IsJSONSerializable', 
    <class 'idic.serialization.IsJSONSerializable'>),
  ('UnsanitizedJSONWarning',
    <class 'idic.serialization.UnsanitizedJSONWarning'>),
  ('describe', 
    <class 'idic.doc_metadata.describe'>)
]
This shows all of the members that exist in the module, even if they aren't defined within the module — That's why the describe class, from idic.doc_metadata shows up at the end of the list. It's been imported by serialization, so it exists in the module, as a member. The process of identifying the members that are local to the module being inspected requires the additional step of comparing something in the members to the TestModule that I'm concerned with and removing items from consideration that aren't local members. I'm accomplishing that with the filter criteria in the dict creation:
if m[ 1 ].__module__ == TestModule.__name__
After that filtering has been done, the resulting dict no longer contains the describe class:
{
    'HasSerializationDict': 
        <class 'idic.serialization.HasSerializationDict'>,
    'IsJSONSerializable': 
        <class 'idic.serialization.IsJSONSerializable'>,
    'UnsanitizedJSONWarning': 
        <class 'idic.serialization.UnsanitizedJSONWarning'>
}
I'm not fully confident that the comparison of m[ 1 ].__module__ and TestModule.__name__ will always be bulletproof, though I cannot think of a case where the expected match-up would fail. At the same time, since the __module__ property is a somewhat magic string-value rather than a reference to the actual module, I'm... uneasy about that match-up. I'll trust it for now, I think, but I may well circle back to this concern later if a potentially-better way to deal with the comparison occurs to me.

Still, that's a major step toward identifying required test-cases. If each class and function in a module should have a corresponding unittest.TestCase, the three items that come back in that dict could be used as a basis for assuming the names of those test-cases:

Relationship Between Module Member-names and Test-case Names
Member-name Required Test-case Name
HasSerializationDict testHasSerializationDict
IsJSONSerializable testIsJSONSerializable
UnsanitizedJSONWarning testUnsanitizedJSONWarning
So, the first code-coverage test can be performed by checking the unit-testing code's members, and asserting that there is a test-case class matching each required test-case name as calculated from the names of local members in the module being tested.

That feels like a pretty solid first step to me.

The next logical step would seem to be to perform the same sort of inspect-based analysis of each of the classes. The inspect module provides ways to identify members as methods or properties of a class. For purposes of this post, I'm going to spin off a copy of the original serialization module, adding the minimal JSONSerializableStub class described in my last post to it in order to assure that I had an interface, and abstract class, and a concrete class. The stub-class has a couple of fields/properties added:

class JSONSerializableStub( IsJSONSerializable ):

    _jsonPublicFields = ( 'FieldName1', 'FieldName2' )

    FieldName1 = property()
    FieldName2 = property()

    def __init__( self ):
        IsJSONSerializable.__init__( self )
        # TODO: Whatever other initialization needs to happen

    def GetSerializationDict( self, sanitize=False ):
        # TODO: Actually implement code that reads the instance state 
        # and drops it into the "result" dict
        result = {}
        if sanitize:
            return self.SanitizeDict( result )
        return result

    @classmethod
    def FromDict( cls, data={}, **properties ):
        # Merge the properties and the data
        data.update( properties )
        # Create the instance to be returned
        result = cls()
        # TODO: Populate the instance with state-data from "data"
        # Return the instance
        return result

JSONSerializableStub.RegisterLoadable()

With the original serialization-module classes, plus that stub-class, an optimal member-based unit-testing requirement-scan function would need to be able to determine that these test-cases and their member test-methods exist:

  • testHasSerializationDict
    • testFromDict
    • testGetSerializationDict
    • test__init__
  • testIsJSONSerializable
    • testFromJSON
    • testPythonNamespace
    • testRegisterLoadable
    • testSanitizeDict
    • testSanitizedJSON
    • test_GetPythonNamespace
    • test_GetSanitizedJSON
    • testwrapdump (testing that the decoration works)
    • testwrapdumps (testing that the decoration works)
    • testwrapload (testing that the decoration works)
    • testwraploads (testing that the decoration works)
    • __init__
  • testJSONSerializableStub
    • testFieldName1
    • testFieldName2
    • testFromDict
    • testGetSerializationDict
    • json.dump (testing that the decorated json.* functions work when applied to an instance)
    • json.dumps (testing that the decorated json.* functions work when applied to an instance)
    • json.load (testing that the decorated json.* functions work when applied to an instance)
    • json.loads (testing that the decorated json.* functions work when applied to an instance)
    • test__init__
UnsanitizedJSONWarning, since it doesn't have any local properties or methods, shouldn't have any test-methods, which makes the presence of a test-case class kind of pointless. I'm not sure yet how I want to handle that kind of circumstance, so I'll ponder that while I work out the rest.

The various abstract methods (instance- and class-level alike) can be usefully tested in their corresponding test-case classes to assure that they are raising the expected errors (NotImplementedError for the nominally-abstract classmethods, the TypeError that would be raised by failed implementation of required abstract instance methods likely falls into the category of trusting the language. If that fails, there are other, larger issues with the underlying language...

Going back to the idea above of not having to spend any more time than is necessary in writing tests, my original thought was to use inspection to detect all of the test-methods in a test-case, and fail the coverage-test if any were missing that were expected, right? If I cannot identify which members of the original class need to be tested, and which are inherited from some other class in the class-tree, I end up having to write tests for every member of every class being tested. Now, to be fair, a lot of those might well end up being simple tests to see if, for example, the FromDict of IsJSONSerializable equals the FromDict of HasSerializationDict. I've actually worked with that sort of arrangement in the past, and while it's somewhat tedious, it's not horribly time-consuming or too difficult.

It's just not as efficient as I'd like, since even in the best case scenario, it requires copying and pasting a lot of test-methods that just check inherited members. In testing the serialization code, it wouldn't be too bad — just test-methods for FromDict and GetSerializationDict methods. I can easily imagine, by the time I start getting into other parts of the framework where there's a lot more inheritance of base-class functionality across a lot of individual classes, that it could get really unwieldy. I'm specifically thinking of the various HTML form-elements, all of which will inherit from a Tag class, which will, in turn, inherit from a BaseNode class. Most of the interaction-oriented markup elements will also likely inherit from other mix-in classes to provide consistent interfaces for things like server-side input-validation hooks, etc., and all of those combined could very well blow the number of individual test-cases out a lot if I cannot come up with a way to identify local vs. inherited members consistently.

And that's even before any consideration about whether to test protected members... Or whether there would be any advantage to subclassing the various HTML-5 field-types from a different base class than the generic HTML field-types (in order to have a hook for automatically attaching client-side script-functionality to them to deal with custom UI processes, etc.).

Acquiring the members of any given class is not much more difficult than acquiring the class- and function-members of the module they reside in. The inspect.getmembers method can be used, passing the class and an optional predicate function to filter retrieved members down to just properties (data-descriptors), methods, or static methods (functions). This, however, is where things start to get tricky.

Filtering and Analyzing Members Based on MRO

Python classes, when they are defined, store their super-classes, in the order they were specified, in the class' Method Resolution Order (MRO). That sequence of super-classes can be accessed as a property of the class (without even requiring any inspect-module function-calls), and each member of the immutable sequence is a reference to the super-class itself, with all that class' members. It's not terribly difficult to build out an interative process that looks at all of the super-classes for a given class, and determines where a given class-member originates:

for item in sorted( finalItems.values(), 
    key=lambda mn: len( mn.__mro__ ) ):
    # The MRO of the item, minus itself
    itemMRO = list( item.__mro__[ 1: ] )
    # Build out a reversed sequence of super-classes to look in for members
    mroCheck = list( item.__mro__ )
    mroCheck.reverse()
    # Get all the item's properties
    properties = inspect.getmembers( item, inspect.isdatadescriptor )
    # Get all the item's methods
    methods = inspect.getmembers( item, inspect.ismethod )
    # Get all the item's functions (static methods fall in here)
    functions = inspect.getmembers( item, inspect.isfunction )
    # Create and populate data-structures that keep track of where members 
    # originate from, and what their implementation looks like. Initially 
    # populated with None values:
    memberSources = {}
    memberImplementations = {}
    for name, value in properties:
        memberSources[ name ] = None
        memberImplementations[ name ] = None
    for name, value in methods:
        memberSources[ name ] = None
        memberImplementations[ name ] = None
    for name, value in functions:
        memberSources[ name ] = None
        memberImplementations[ name ] = None
    print '='*80
    print '%s (%s)' % (item.__name__,  ', '.join( [ m.__name__ for m in itemMRO ] ) )
    print '='*80
    for source in mroCheck:
        for memberName in memberSources:
            implementation = item.__dict__.get( memberName )
            if implementation:
                if memberImplementations[ memberName ] != implementation:
                    memberImplementations[ memberName ] = implementation
                    memberSources[ memberName ] = item
    localElements = sorted( [ key for key in memberSources if memberSources[ key ] == item ] )
    for localElement in localElements:
        print localElement

The items that get captured in localElements are pretty close to the optimal lists above:

================================================================================
HasSerializationDict (object)
================================================================================
FromDict
GetSerializationDict
__init__
__weakref__
================================================================================
IsJSONSerializable (HasSerializationDict, object)
================================================================================
FromJSON
PythonNamespace
RegisterLoadable
SanitizeDict
SanitizedJSON
_GetPythonNamespace
_GetSanitizedJSON
__init__
wrapjsondump
wrapjsondumps
wrapjsonload
wrapjsonloads
================================================================================
JSONSerializableStub (IsJSONSerializable, HasSerializationDict, object)
================================================================================
FieldName1
FieldName2
FromDict
GetSerializationDict
__init__
================================================================================
UnsanitizedJSONWarning (Warning, Exception, BaseException, object)
================================================================================
__weakref__
By comparison:
  • testHasSerializationDict
    • testFromDict
    • testGetSerializationDict
    • test__init__
  • testIsJSONSerializable
    • testFromJSON
    • testPythonNamespace
    • testRegisterLoadable
    • testSanitizeDict
    • testSanitizedJSON
    • test_GetPythonNamespace
    • test_GetSanitizedJSON
    • testwrapdump
    • testwrapdumps
    • testwrapload
    • testwraploads
    • __init__
  • testJSONSerializableStub
    • testFieldName1
    • testFieldName2
    • testFromDict
    • testGetSerializationDict
    • json.dump
    • json.dumps
    • json.load
    • json.loads
    • test__init__
The varous json.* functions don't show up because they aren't actual members of the JSONSerializableStub class — requiring tests against those functions would require some other programmatic approach, or would just require discipline to assure that they were tested for each IsJSONSerializable-derived class.

The one oddball item that shows up is the __weakref__ property in the members of HasSerializationDict and UnsanitizedJSONWarning. A full discussion of __weakref__ is well outside the scope of this post, and irrelevant from the perspective of unit-testing — it's a built-in property that shows up under various circumstances, and doesn't need to be unit-tested unless it's being overridden as a local class member.

Barring the handling of classes like UnsanitizedJSONWarning that don't have any implementation to be tested, and the occasional odd built-in member (the __weakref__ property), that, I think, gives me enough of the analysis I need to assure the degree of code-coverage I want to enforce. The next step, then, would be implementing a common code-coverage test-case that can be esily imported into a unit-test module that will actually perform these checks and generate assertions for the presence of test-cases and -methods. That's likely going to be fairly long and detailed, to the point where it would make this post longer than I'd like, so I'll stop here for today and pick up again in my next post.

No comments:

Post a Comment