Thursday, March 9, 2017

Serialization: JSON (for now) [2]

With today's post, I plan to wrap up the serialization module's members. I'd originally planned to also show a simple example of how to implement the serialization functionality they provide, but after the post had grown to the length it ended up at, and since I'd planned on showing a deeper example, it feels to me like it makes more sense to roll the basic implementation guidelines into the deeper look.

The Other Serialization Class

There's just one more class in the serialization module that needs to be shown, but it's the one that will probably be used most often: IsJSONSerializable. IsJSONSerializable is the abstract class that concrete classes will inherit from in order to access the SanitizedJSON it provides, as well as to be able to be registered for automatic json-module handling for that module's dump* and load* functions. The class-diagram for IsJSONSerializable looks like this:

It's also important to remember that IsJSONSerializable derives from (kind of implements) HasSerializationDict. It doesn't actually implement any of the functionality required by HasSerializationDict, but it does pass that requirement on to the concrete classes derived from it. HasSerializationDict is a pretty small interface:

The IsJSONSerializable Abstract Class

There isn't a lot about IsJSONSerializable that I haven't already discussed, and none of it that I think is relevant before I show the actual code, so I'll just do that:

@describe.InitClass()
class IsJSONSerializable( HasSerializationDict, object ):
    """
Provides baseline functionality, interface requirements and type-identity for 
objects whose state-data can be serialized to and unserialized from JSON."""

    #-----------------------------------#
    # Abstraction through abc.ABCMeta   #
    #-----------------------------------#
    __metaclass__ = abc.ABCMeta

    #-----------------------------------#
    # Class attributes (and instance-   #
    # attribute default values)         #
    #-----------------------------------#

    _decoratedJSON = {}
    _jsonPublicFields = ()
    _registeredLoadables = {}

    #-----------------------------------#
    # Instance property-getter methods  #
    #-----------------------------------#

    @describe.AttachDocumentation()
    def _GetPythonNamespace( self ):
        """
Gets the Python namespace of the class that the instance is an instance of."""
        return '%s.%s' % ( self.__class__.__module__, 
            self.__class__.__name__ )

    @describe.AttachDocumentation()
    def _GetSanitizedJSON( self ):
        """
Gets a minimized JSON representation of the instance containing only those 
fields that are deemed safe for transmission over a network"""
        if not self.__class__._jsonPublicFields:
            raise AttributeError( '%s.SanitizedJSON cannot be retrieved, '
                'because %s has not specified any public JSON fields (fields '
                'allowed to be represented in a JSON dump). Correct this by '
                'populating the _jsonPublicFields class-attribute of %s' % ( 
                    self.__class__.__name__, self.__class__.__name__, 
                    self.__class__.__name__
                )
            )
        return json.dumps( self.GetSerializationDict( True ) )

    #-----------------------------------#
    # Instance property-setter methods  #
    #-----------------------------------#

    #-----------------------------------#
    # Instance property-deleter methods #
    #-----------------------------------#

    #-----------------------------------#
    # Instance Properties (abstract OR  #
    # concrete!)                        #
    #-----------------------------------#

    PythonNamespace = describe.makeProperty( 
        _GetPythonNamespace, None, None, 
        'the fully-qualified Python namespace of the instance\'s class',
        str, unicode
    )
    SanitizedJSON = describe.makeProperty( 
        _GetSanitizedJSON, None, None, 
        'a minimized JSON representation of the instance containing only '
        'those fields that are deemed safe for transmission over a network',
        unicode, str
    )

    #-----------------------------------#
    # Instance Initializer              #
    #-----------------------------------#
    @describe.AttachDocumentation()
    @describe.todo( 'Document __init__' )
    @describe.todo( 'Implement __init__' )
    def __init__( self ):
        """
Instance initializer"""
        # IsJSONSerializable is intended to be an abstract class,
        # and is NOT intended to be instantiated. Alter at your own risk!
        if self.__class__ == IsJSONSerializable:
            raise NotImplementedError( 'IsJSONSerializable is '
                'intended to be an abstract class, NOT to be instantiated.' )
        # Call parent initializers, if applicable.
        HasSerializationDict.__init__( self )
        # Set default instance property-values with _Del... methods as needed.
        # Set instance property values from arguments if applicable.
        # Other set-up

    #-----------------------------------#
    # Instance Garbage Collection       #
    #-----------------------------------#

    #-----------------------------------#
    # Instance Methods                  #
    #-----------------------------------#

    @describe.AttachDocumentation()
    @describe.argument( 'dictIn', 
        'the dictionary to sanitize',
        str, unicode
    )
    @describe.keywordargs( 'additional values to add to dictIn' )
    def SanitizeDict( self, dictIn={}, **dictItems ):
        """
Returns a copy of the supplied dict, with any fields that aren't members of 
the class' _jsonPublicFields attribute removed."""
        if dictIn and not isinstance( dictIn, dict ):
            raise TypeError( '%s.SanitizeDict expects a dictionary of '
                'values to sanitize, or None for its dictIn argument, but '
                'was passed "%s" (%s)' % ( self.__class__.__name__, 
                    dictIn, type( dictIn ).__name__ ) )
        dictIn.update( dictItems )
        return dict(
            [
                ( key, dictIn[ key ] ) for key in dictIn 
                if key in self.__class__._jsonPublicFields
            ]
        )

    #-----------------------------------#
    # Class Methods                     #
    #-----------------------------------#

    @classmethod
    @describe.AttachDocumentation()
    @describe.argument( 'jsonData', 
        'the JSON representation of the object to create and return',
        str, unicode
    )
    @describe.raises( TypeError, 'if passed a jsonData argument that is not a '
        'str or unicode value' )
    def FromJSON( cls, jsonData ):
        """
Creates and returns an instance of the class whose state-data is populated with 
the values from the supplied JSON serialization-data"""
        if type( jsonData ) not in ( str, unicode ):
            raise TypeError( '%s.FromJSON expectes a str or unicode JSON '
                'construct, but was passed "%s" (%s)' % ( 
                cls.__name__, jsonData, type( jsonData ).__name__ )
            )
        return cls.FromDict( json.loads( jsonData ) )

    @classmethod
    def RegisterLoadable( cls ):
        """
Registers the class as a JSON-serializable type with IsJSONSerializable."""
        namespace = '%s.%s' % ( cls.__module__, cls.__name__ )
        if IsJSONSerializable._registeredLoadables.get( namespace ):
            raise RuntimeError( 'The %s class cannot be registered as JSON-'
                'loadable using the %s namespace: %s has already been '
                'registered as a JSON-loadable there' % ( cls.__name__, 
                namespace, 
                IsJSONSerializable._registeredLoadables[ namespace ].__name__ )
            )
        IsJSONSerializable._registeredLoadables[ namespace ] = cls

    #-----------------------------------#
    # Static Class Methods              #
    #-----------------------------------#

    @staticmethod
    @describe.argument( 'origfunc', 
        '[json.dump] The original function to wrap IsJSONSerializable checking'
        'around'
    )
    @describe.raises( RuntimeError, 'if not passed json.dump' )
    @describe.returns( 'The replacement function for json.dump' )
    def wrapjsondump( origfunc ):
        """
Wraps checking for IsJSONSerializable-derived classes around the standard 
json.dump function. Note that this dumps *ALL* fields, so output is *NOT* 
sanitized for over-the-wire transit!"""
        if IsJSONSerializable._decoratedJSON.get( origfunc ):
            return IsJSONSerializable._decoratedJSON[ origfunc ]
        if origfunc != json.dump:
            raise RuntimeError( 'wrapjsondump expects json.dump as the '
                'function to decorate,but was passed %s' % ( origfunc ) )
        def _dump( obj, fp, skipkeys=False, ensure_ascii=True, 
            check_circular=True, allow_nan=True, cls=None, indent=None, 
            separators=None, encoding='utf-8', default=None, sort_keys=False, 
            **kw ):
            if isinstance( obj, IsJSONSerializable ):
                objNS = obj.PythonNamespace
                obj = obj.GetSerializationDict()
                obj[ '__namespace' ] = objNS
            return origfunc( obj, fp, skipkeys, ensure_ascii, check_circular, 
                allow_nan, cls, indent, separators, encoding, default, 
                sort_keys, **kw )
        IsJSONSerializable._decoratedJSON[ origfunc ] = _dump
        return _dump

    @staticmethod
    @describe.argument( 'origfunc', 
        '[json.dumps] The original function to wrap IsJSONSerializable checking'
        'around'
    )
    @describe.raises( RuntimeError, 'if not passed json.dumps' )
    @describe.returns( 'The replacement function for json.dumps' )
    def wrapjsondumps( origfunc ):
        """
Wraps checking for IsJSONSerializable-derived classes around the standard 
json.dumps function. Note that this dumps *ALL* fields, so output is *NOT* 
sanitized for over-the-wire transit!"""
        if IsJSONSerializable._decoratedJSON.get( origfunc ):
            return IsJSONSerializable._decoratedJSON[ origfunc ]
        if origfunc != json.dumps:
            raise RuntimeError( 'wrapjsondump expects json.dump as the '
                'function to decorate,but was passed %s' % ( origfunc ) )
        def _dumps( obj, skipkeys=False, ensure_ascii=True, 
            check_circular=True, allow_nan=True, cls=None, indent=None, 
            separators=None, encoding='utf-8', default=None, sort_keys=False, 
            **kw ):
            if isinstance( obj, IsJSONSerializable ):
                warnings.warn( '%s is an instance derived from '
                    'IsJSONSerializable, and has "sanitized" JSON available '
                    'in its SanitizedJSON property..' % 
                        ( obj.__class__.__name__ ), 
                        UnsanitizedJSONWarning, stacklevel=2
                    )
                objNS = obj.PythonNamespace
                obj = obj.GetSerializationDict()
                obj[ '__namespace' ] = objNS
            return origfunc( obj, skipkeys, ensure_ascii, check_circular, 
                allow_nan, cls, indent, separators, encoding, default, 
                sort_keys, **kw )
        IsJSONSerializable._decoratedJSON[ origfunc ] = _dumps
        return _dumps

    @staticmethod
    @describe.argument( 'origfunc', 
        '[json.load] The original function to wrap IsJSONSerializable checking'
        'around'
    )
    @describe.raises( RuntimeError, 'if not passed json.load' )
    @describe.returns( 'The replacement function for json.load' )
    def wrapjsonload( origfunc ):
        """
Replaces json.load with a function that hands processing off to a subclass of 
IsJSONSerializable for unserialization of the JSON data into an instance of the 
class when applicable"""
        if origfunc != json.load:
            raise RuntimeError( 'wrapjsonload expects json.load as the '
                'function to decorate, but was passed %s' % ( origfunc ) )
        if IsJSONSerializable._decoratedJSON.get( origfunc ):
            return IsJSONSerializable._decoratedJSON[ origfunc ]
        def _load( *args, **kw ):
            baseDict = origfunc( *args, **kw )
            try:
                objNS = baseDict.get( '__namespace' )
            except AttributeError:
                objNS = None
            if objNS:
                if type( baseDict ) != dict:
                    raise ValueError( 'Decorated override of json.loads '
                        'expected a dict value to convert to an instance of '
                        'IsJSONSerializable, but the supplied JSON evaluated '
                        'to "%s" (%s)' % ( 
                            baseDict, type( baseDict ).__name__
                        )
                    )
                objClass = IsJSONSerializable._registeredLoadables.get( objNS )
                if objClass:
                    return objClass.FromDict( baseDict )
                raise RuntimeError( 'decorated override of json.load could not '
                    'find a valid object-namespace (%s) to work with: %s' % ( 
                    objNS, args[ 0 ] ) )
            return baseDict
        IsJSONSerializable._decoratedJSON[ origfunc ] = _load
        return _load

    @staticmethod
    @describe.argument( 'origfunc', 
        '[json.loads] The original function to wrap IsJSONSerializable checking'
        'around'
    )
    @describe.raises( RuntimeError, 'if not passed json.loads' )
    @describe.returns( 'The replacement function for json.loads' )
    def wrapjsonloads( origfunc ):
        """
Replaces json.loads with a function that hands processing off to a subclass of 
IsJSONSerializable for unserialization of the JSON data into an instance of the 
class when applicable"""
        if origfunc != json.loads:
            raise RuntimeError( 'wrapjsonloads expects json.loads as the '
                'function to decorate,but was passed %s' % ( origfunc ) )
        if IsJSONSerializable._decoratedJSON.get( origfunc ):
            return IsJSONSerializable._decoratedJSON[ origfunc ]
        def _loads( *args, **kw ):
            baseDict = origfunc( *args, **kw )
            try:
                objNS = baseDict.get( '__namespace' )
            except AttributeError:
                objNS = None
            if objNS:
                if type( baseDict ) != dict:
                    raise ValueError( 'Decorated override of json.loads '
                        'expected a dict value to convert to an instance of '
                        'IsJSONSerializable, but the supplied JSON evaluated '
                        'to "%s" (%s)' % ( 
                            baseDict, type( baseDict ).__name__
                        )
                    )
                objClass = IsJSONSerializable._registeredLoadables.get( objNS )
                if objClass:
                    return objClass.FromDict( baseDict )
                raise RuntimeError( 'decorated override of json.loads could '
                    'not find a valid object-namespace (%s) to work with: %s' % 
                    ( objNS, args[ 0 ] )
                )
            return baseDict
        IsJSONSerializable._decoratedJSON[ origfunc ] = _loads
        return _loads

#---------------------------------------#
# Append to __all__                     #
#---------------------------------------#
__all__.append( 'IsJSONSerializable' )

From top to bottom, here are the pieces that I think are worth mentioning in a bit more detail...

The _jsonPublicFields class-attribute is inherited by derived classes, rather than being used in IsJSONSerializable itself. In those derived classes, it is used by the property-getter method for the SanitizedJSON property (_GetSanitizedJSON) to determine which of the class' properties/fields will be kept in the sanitized JSON output for instances of the class.

The _registeredLoadables class-attribute is where registered classes derived from IsJSONSerializable are kept track of. It shouldn't be altered by derived classes directly — all that needs to be done with it is done by the inherited RegisterLoadable class-method. I'll go a bit more into that when I show how to use IsJSONSerializable a bit later in this post.

_GetSanitizedJSON, the property-getter for the SanitizedJSON property, turned out to be a lot less complicated than I'd feared it would — But a good part of that, I suspect, is because all of the heavy lifting involved in sanitizing JSON output really has to happen at the level of the GetSerializationDict method required in derived classes, but not dealt with at all in IsJSONSerializable itself.

I had several false starts while I was trying to work out a solid mechanism for providing access to sanitized JSON output. This is, I think, the third or perhaps fourth run I've taken at it, but this feels like it should do what I want it to do, based on some of the preliminary code I've started writing for the deep dive post that will follow this one.

The SanitizeDict method provides a common mechanism that will exist across all instances derived from IsJSONSerializable to generate a sanitized dictionary, using the instance's class' _jsonPublicFields attribute to determine what can be safely sent across the wire. Although implementation of GetSerializationDict in derived classes can (and argubaly should) use SanitizeDict to filter down the generic GetSerializationDict results, there's no actual reason that implementations must do so... It's there as a convenient standard process, and will have been thoroughly tested, so there's an advantage to doing so, but it's not required.

All told, this entire process also ensures that a developer has to make a conscious decision to include a field in the sanitized JSON output — anything that isn't explicitly allowed is removed.

FromJSON uses the FromDict method (required by HasSerializationDict) of the derived class to create an instance of the class, populated with the data from the JSON, and returns it. Under normal circumstances, this will be called by the overridden json.load* functions from the last post, and they will know which class' FromJSON to call based on the __namespace provided in the JSON. That assumes that the __namespace is provided, of course, but any JSON generated by the overridden json.dump* functions will have them.

RegisterLoadable is the class-method that registers the class as available for the overridden json.load* functions. In use, it should be called from the derived class as soon as that class is defined. For example:

class MyClass( IsJSONSerializable ):
    # ... the definition of the class ...

MyClass.RegisterLoadable()

The last item of note with respect to IsJSONSerializable is the disposition of the json-function decorators from the last post. I decided to move those into IsJSONSerializable as static methods rather than keeping them as free-standing functions in the serialization module. I can't really completely explain why I chose to do that, though — most of it was a gut feeling that it'd be safer or maybe easier to manage if they didn't have to be imported one at a time from outside the module, though I couldn't think of a use-case where I'd want or need to do that that made any sense. Still, I have a lingering feeling that it could happen, so for now I'm going to leave them where I've put them.

Other Things That the serialization module does

It may be obvious to state this: When a module is imported, all the code in it that can be executed is executed. That means that a module can call functions to perform tasks that might be needed for the module to provide all of the functionality it's supposed to. It may be less obvious that when a member of a module is imported (like, say, only IsJSONSerializable from the serialization module) all the module's code is still executed.

Remember the IsJSONSerializable.wrapjson* static methods?

Those need to be fired off, assigning their replacement functions to all of the original json-module functions they are intended to replace, any time that IsJSONSerializable is in play. The serialization module does this by explicitly calling them (replacing the originals with the decorator-replacements) like this:

#-----------------------------------#
# Decoration/override of json.xxxxx #
#-----------------------------------#
json.dump = IsJSONSerializable.wrapjsondump( json.dump )
json.dumps = IsJSONSerializable.wrapjsondumps( json.dumps )
json.load = IsJSONSerializable.wrapjsonload( json.load )
json.loads = IsJSONSerializable.wrapjsonloads( json.loads )

I'll jump right in to a simple implementation of IsJSONSerializable (with all of the basic requirements and caveats) in my next post, as well as the promised deep dive into what a IsJSONSerializable implementation can actually do. In the meantime, here's the final code for the serialization module:

No comments:

Post a Comment