OK. Now that the JavaScript interlude is done, it's time to get back to the Python framework... I've gone back and forth a few times now on what to tackle next, and though I'd really like to start working on some code to handle parsing, creating, and working with markup elements, I feel that I'd be remiss if I didn't start addressing some of the outstanding items in my coding standards. Specifically:
- It should be thoroughly tested
serialization
module.
It didn't have any concrete classes, but I could create an example of one for
demonstration of unit-testing once I got done with the core serialization logic,
and there were other... interesting aspects... to play with.
My Goals for Object Serialization
Since a lot of the projects that I'm contemplating are web-applications of some
sort, and JSON
is popular in that context, I want to be able to define some common functionality
that will allow objects' state to be easily serialized into and unserialized from
JSON. Python also has its pickle
module, which I'll eventually want to incorporate, I suspect, but JSON
will do for now.
One thing that I very much dislike about Python's JSON implementations
that I've seen thus far is that while the json
module's
functions do a good job of dealing with built-in Python types, the mechanism for
making an arbitrary instance of a class JSON-serializable seems
cumbersome, requiring (apparently)
the creation of a dedicated JSONDecoder
and JSONEncoder classes. Over and above that, I can easily imagine the need
to be able to JSON-encode a given object in two distinct ways: One for
public consumption over the web (making sure to sanitize out any sensitive data),
and one for private use (allowing, if not mandating that sensitive data
be present and accounted for). That means that there would be a minimum of
two custom classes that would have to be defined for each JSON-serializable class.
Some thought would also have to go into determining how to switch between those
variants as well. Finally, that concrete class-requirement would make it difficult
to associate as an inner class,
which was where my first thoughts led me.
Ugh.
Instead, I'm going to define an abstract class that requires a JSON-serializable
object to have a some required members that provide the serialization and
unserialization functionality. My first thought was to require a ToJSON
instance-method and a FromJSON
class-method. Each of those would require
an instance-level GetSerializationDict
and class-scoped FromDict
method
at another level of abstraction (an interface that the abstract class
inherits from).
Then I got to thinking... Serialization to and from JSON has at least two distinct uses that I can think of off the top of my head:
- Serializing/unserializing
sanitized
data for transmission over a network — something that a lot of web-services do; and - Serializing/unserializing for local use — Perhaps for writing data to a local file or a record in a database, for instance.
What if an object needs to be serialized for both these cases? AAfter considering some of the options, the approach I'm going to take is to provide asecurevariant and apublicvariant?
SanitizedJSON
property on all classes derived from the abstract class.
That will provide a bare-bones, already-minimized and sanitized JSON output
for any instance of a derived class.
That means, however, that standard json.dump
and json.dumps
results have to be considered in the unsanitized
data category. In the case of
json.dump
, since it writes to a file, that's likely not going to be a
concern for any of the typical application structures and environments I've been exposed
to. Output from json.dumps
, though, might well be used to generate JSON
data for unsafe
purposes, so I'll want to raise some sort of warning when that
happens — nothing that prevents it (or stops execution), but that
will hopefully alert a developer that there is an available option if it's needed.
All that said, here's what I'm going to define and implement for the
serialization
module:
HasSerializationDict
interface: Requires implementation of aGetSerializationDict
instance-method and aFromDict
class-method.IsJSONSerializable
abstract class (extends/implementsHasSerializationDict
): Implements aSanitizedJSON
property that uses theGetSerializationDict
method required byHasSerializationDict
, and requires aFromJSON
class-method.
There will be a few other items in the final structure, but those are the main points
I think I need to cover for now. Chief among those other points: I still want to allow
the standard json
-module functions to be callable on
IsJSONSerializable
-derived objects without raising errors. I've played
around with a few ideas to make that workable, and have something that I think will
work well. I'll go into that in depth later, after writing out the interface and
abstract class, because demonstrating that it works will require a fair
chunk of code all by itself.
The First Two Serialization Classes
In previous posts, I started with the class-diagrams and interface specifics, then worked through the implementation. In this case, I'm going to start with the code, since a lot of the exploration I did to arrive at my final solution yielded fully- or near-fully implemented code.
The UnsanitizedJSONWarning
Warning
At some point, once I get the json
module functions' relationship
with IsJSONSerializable
worked out, I'll want to raise a warning if
json.dumps
is being called against an instance of a derived class.
Python's warnings
module provides the base functionality I need for that, and there are serveral
Warning
-derived
classes already available. None of them really look like they're quite
what I'd like to see when that warning situation occurs, though, so I'm going to
create a custom Warning
of my own:
#-----------------------------------#
# Defined exceptions. #
#-----------------------------------#
@describe.InitClass()
class UnsanitizedJSONWarning( Warning ):
"""
A warning to be raised when json.dumps is used to generate
potentially-unsanitized JSON output"""
pass
__all__.append( 'UnsanitizedJSONWarning' )
The Warning
class is derived from the standard Exception
class. My experience has been that although I've created custom Exception
s
with some frequency, I've never had to do anything more complex than this — usually
if I feel the need to make a custom Exception
, it's because I've determined
that I need to be able to raise
one and catch it elsewhere while allowing the
except
that catches it to differentiate between different exception-types.
In this case, it's mostly because I'd like the warning that I'm going to surface to
give some indication of what the warning's actually about. Something like this:
UnsanitizedJSONWarning: [class-name] has "sanitized" JSON available in obj.SanitizedJSON.This custom
Warning
will allow that.
The HasSerializationDict
Interface
The first item to go over is the HasSerializationDict
interface.
The intention around it is to provide some (minimal) requirements for all classes
whose instances can be serialized in pretty much any fashion, by requiring
that those instances be capable of generating a dictionary representation of their
state. Both JSON serialization (now) and pickle
-based serialization
(some time later, maybe) can use a dict
data-structure as a common
mid-point for serializing and unserializing objects, so that just seemed a logical
starting-point. Since it's an interface (at least nominally), there's not
much to the code:
@describe.InitClass()
class HasSerializationDict( object ):
"""
Provides interface requirements and type-identity for objects that are
required to implement serialization dictionaries: a dict representation of the
instance used as a process-step for serializing object state"""
#-----------------------------------#
# Abstraction through abc.ABCMeta #
#-----------------------------------#
__metaclass__ = abc.ABCMeta
#-----------------------------------#
# Static interface attributes (and #
# default values?) #
#-----------------------------------#
#-----------------------------------#
# Abstract Properties #
#-----------------------------------#
#-----------------------------------#
# Instance Initializer #
#-----------------------------------#
@describe.AttachDocumentation()
def __init__( self ):
"""
Instance initializer"""
# HasSerializationDict is intended to be an interface,
# and is NOT intended to be instantiated. Alter at your own risk!
if self.__class__ == HasSerializationDict:
raise NotImplementedError( 'HasSerializationDict is '
'intended to be an interface, NOT to be instantiated.' )
# Call parent initializers, if applicable.
# Other set-up
#-----------------------------------#
# Abstract Instance Methods #
#-----------------------------------#
@abc.abstractmethod
@describe.AttachDocumentation()
@describe.argument( 'sanitize',
'indicates whether the returned dictionary should be "sanitized," '
'the implementation of which is up to the derived class',
bool
)
def GetSerializationDict( self, sanitize=False ):
"""
Returns a dict representation of the instance."""
raise NotImplementedError( '%s.GetSerializationDict has not been '
'implemented as required by HasSerializationDict' %
( self.__class__.__name__ )
)
#-----------------------------------#
# Abstract Class Methods #
#-----------------------------------#
@classmethod
@describe.AttachDocumentation()
@describe.argument( 'data',
'the state-data to be used to create the new instance',
dict
)
@describe.keywordargs(
'keyword arguments representation of the state-data to be used to '
'create the new instance. NOTE: If provided, these override any values '
'provided in the data argument!'
)
@describe.raises( NotImplementedError,
'if called by a derived object that has not overridden the nominally-'
'abstract method' )
def FromDict( cls, data={}, **properties ):
"""
[Nominally-abstract class-method] Returns an instance of the class whose state-
data has been populated with the values provided in the data and/or properties
supplied."""
raise NotImplementedError( '%s.FromDict has not been implemented as '
'required by HasSerializationDict' % ( cls.__name__ ) )
#-----------------------------------#
# Static Class Methods #
#-----------------------------------#
#---------------------------------------#
# Append to __all__ #
#---------------------------------------#
__all__.append( 'HasSerializationDict' )
It's worth noting that the FromDict
class-method is not
decorated as an abstractmethod
. Python 2.7 doesn't support decorating
a method as both a classmethod
and an abstractmethod
.
I'm not sure if Python 3.x will or not — I haven't looked — but both
methods are built to raise a NotImplementedError
in any event, so
even if FromDict
isn't implemented in a derived class, it will raise
that error as soon as it's called.
That's something that should happen during unit-testing, and I'll show how I'm planning to deal with that once I start down the unit-testing code that I mentioned before.
Integrating the json
Module Functions
I'll be honest: I struggled with how to try and make instances of
IsJSONSerializable
tie in nicely with the json
-modules'
dumping- and loading-functions. I poked around a lot of websites, read through a
fair number of stackoverflow
articles, and took a lot of fairly long walks to try and get some right-brain creativity
to kick in. For a good, long while, the problem looked insurmountable.
Then, while reviewing the doc_metadata
posts prior to their publication,
I started wondering if I could apply a decorator to those functions. So I tried it,
and it worked!
After a fair bit of tinkering with the idea, I ended up with four functions, one
each to wrap around one of the json.*
functions that I was concerned with.
What I'm actually doing with them doesn't feel like a typical Python decorator
to me, but (amusingly enough) does feel like an application of the
Decorator design pattern.
I'll go over each in as much detail as seems relevant...
def wrapjsondump( origfunc ):
"""
Wraps checking for IsJSONSerializable-derived classes around the standard
json.dump function. Note that this dumps *ALL* fields, so output is *NOT*
sanitized for over-the-wire transit!"""
if IsJSONSerializable._decoratedJSON.get( origfunc ):
return IsJSONSerializable._decoratedJSON[ origfunc ]
if origfunc != json.dump:
raise RuntimeError( 'wrapjsondump expects json.dump as the '
'function to decorate,but was passed %s' % ( origfunc ) )
def _dump( obj, fp, skipkeys=False, ensure_ascii=True,
check_circular=True, allow_nan=True, cls=None, indent=None,
separators=None, encoding='utf-8', default=None, sort_keys=False,
**kw ):
if isinstance( obj, IsJSONSerializable ):
objNS = obj.PythonNamespace
obj = obj.GetSerializationDict()
obj[ '__namespace' ] = objNS
return origfunc( obj, fp, skipkeys, ensure_ascii, check_circular,
allow_nan, cls, indent, separators, encoding, default,
sort_keys, **kw )
IsJSONSerializable._decoratedJSON[ origfunc ] = _dump
return _dump
All of these decorator-functions accept an original function (origfunc
),
and a replacement function (_dump
in this case). The original function
persists inside the replacement function because of the way Python's closures work,
leaving it accessible within the scope of the replacement, but able to be overridden
outside that scope. Each of the replacement functions was written to use the same
signature as the functions they replace.
A step-by-step breakdown of what happens may be useful. I'll use this function as the example, but the process is very similar with the other three:
- Somewhere in some code,
json.dump = wrapjsondump( json.dump )
is called;- The decorator checks to see if
origfunc
has already been decorated by looking up the replacement function inIsJSONSerializable._decoratedJSON
. If it has been, the decorator immediately returns that found function.
I'm not sure that this is doing exactly what I want/need, but until I get a chance to test it more thoroughly than I have at this point, I'm satisfied that it seems to be working.
The decision to store the look-up inIsJSONSerializable
was made based on the realization that it would be available any place that an instance that required the use of the decorated functions would exist — they'd have to be subclasses ofIsJSONSerializable
, after all. - A check is performed to make sure that the decorator is being applied to the appropriate original function.
- The replacement function is defined (
_dump
in this case); - The replacement function is added to
IsJSONSerializable._decoratedJSON
, using theorigfunc
itself as the key. - The replacement function is returned.
json.dump
(because the initial call that started all of this wasjson.dump = wrapjsondump( json.dump )
From that point on, any call tojson.dump
will instead be handed off to the replacement_dump
function. - The decorator checks to see if
- Later, somewhere else, a call to
json.dump
is made, passing an object to be serialized:- Since
json.dump
has been replaced with_dump
by the decorator,_dump
is called instead:- The supplied object (
obj
) is checked, to see if it's an instance ofIsJSONSerializable
:- If it is, then a
dict
is built out, starting with the results ofobj.GetSerializationDict()
, adding a'__namespace'
key to it, and the original object is replaced with thedict
- The original function (
json.dump
) is called, passing the (possibly-modified) object, and - The results are returned.
- If it is, then a
- The supplied object (
- Since
In the final analysis, the only reason this approach works is because the original function that is being replaced is still (in fact, only) accessible inside the scope of the function it's being replaced with. If that sounds weird to you, you're not alone. I couldn't come up with any simpler way to explain it, and I'm not sure that I'm qualified to explain why it works without that explanation eventually devolving into mumbling about closures in functions.
The wrapjsondumps
is very similar to wrapjsondump
—
not surprising, I think, since they perform the same basic function, just to different
outputs:
def wrapjsondumps( origfunc ):
"""
Wraps checking for IsJSONSerializable-derived classes around the standard
json.dumps function. Note that this dumps *ALL* fields, so output is *NOT*
sanitized for over-the-wire transit!"""
if IsJSONSerializable._decoratedJSON.get( origfunc ):
return IsJSONSerializable._decoratedJSON[ origfunc ]
if origfunc != json.dumps:
raise RuntimeError( 'wrapjsondump expects json.dump as the '
'function to decorate,but was passed %s' % ( origfunc ) )
def _dumps( obj, skipkeys=False, ensure_ascii=True,
check_circular=True, allow_nan=True, cls=None, indent=None,
separators=None, encoding='utf-8', default=None, sort_keys=False,
**kw ):
if isinstance( obj, IsJSONSerializable ):
# TODO: Figure out how to generate an exception-like warning
# instead of printing this message
warnings.warn( '%s is an instance derived from '
'IsJSONSerializable, and has "sanitized" JSON available '
'in its SanitizedJSON property..' %
( obj.__class__.__name__ ),
UnsanitizedJSONWarning, stacklevel=2
)
objNS = obj.PythonNamespace
obj = obj.GetSerializationDict()
obj[ '__namespace' ] = objNS
return origfunc( obj, skipkeys, ensure_ascii, check_circular,
allow_nan, cls, indent, separators, encoding, default,
sort_keys, **kw )
IsJSONSerializable._decoratedJSON[ origfunc ] = _dumps
return _dumps
The significant differences are the signature of the replacement function (it has
to match the signature of the original json.dump
function), and the
warning that gets raised if obj
is an instance of IsJSONSerializable
.
That's this chunk of code:
if isinstance( obj, IsJSONSerializable ):
# TODO: Figure out how to generate an exception-like warning
# instead of printing this message
warnings.warn( '%s is an instance derived from '
'IsJSONSerializable, and has "sanitized" JSON available '
'in its SanitizedJSON property..' %
( obj.__class__.__name__ ),
UnsanitizedJSONWarning, stacklevel=2
)
The load
-related functions, though they follow the same decorator-pattern
as the dump
-centric ones, has a very different internal process. In
order for a load to be able to create an actual instance of the class it's serialized from,
there's got to be some way to make an association. That's what the '__namespace'
in the functions above is for, but that may not be enough by itself. The other piece of
the puzzle is the idea of registering each JSON-loadable class, and keeping track of those
classes so that they can be quickly identified, and their FromJSON
methods
can be called. Since I haven't detailed IsJSONSerializable
yet, there's
no context for how that works (it turned into a circular reference), but it does work,
at least with the limited testing I've done so far.
def wrapjsonload( origfunc ):
"""
Replaces json.load with a function that hands processing off to a subclass of
IsJSONSerializable for unserialization of the JSON data into an instance of the
class when applicable"""
if origfunc != json.load:
raise RuntimeError( 'wrapjsonload expects json.load as the '
'function to decorate, but was passed %s' % ( origfunc ) )
if IsJSONSerializable._decoratedJSON.get( origfunc ):
return IsJSONSerializable._decoratedJSON[ origfunc ]
def _load( *args, **kw ):
baseDict = origfunc( *args, **kw )
try:
objNS = baseDict.get( '__namespace' )
except AttributeError:
objNS = None
if objNS:
if type( baseDict ) != dict:
raise ValueError( 'Decorated override of json.loads '
'expected a dict value to convert to an instance of '
'IsJSONSerializable, but the supplied JSON evaluated '
'to "%s" (%s)' % (
baseDict, type( baseDict ).__name__
)
)
objClass = IsJSONSerializable._registeredLoadables.get( objNS )
if objClass:
return objClass.FromDict( baseDict )
raise RuntimeError( 'decorated override of json.load could not '
'find a valid object-namespace (%s) to work with: %s' % (
objNS, args[ 0 ] ) )
return baseDict
IsJSONSerializable._decoratedJSON[ origfunc ] = _load
return _load
Here's a walkthrough of what happens when the replacement function for json.load
is called:
- A call to
json.load
is made, with JSON to be unserialized.- Since
json.load
has been replaced with_load
by the decorator,_load
is called instead:- The original function (preserved, again, within the scope of
the replacement function) is called to get a
dict
. - That
dict
is checked for a'__namespace'
key:- If the namespace exists, then the dictionary of registered
loadable classes (
IsJSONSerializable._registeredLoadables
is checked for a match - If there is a match, then the found class'
FromDict
class-method is called, and the results returned
- If the namespace exists, then the dictionary of registered
loadable classes (
- If the namespace doesn't have a registered class, then the
dict
that was initially retrieved is returned instead
- The original function (preserved, again, within the scope of
the replacement function) is called to get a
- Since
This process has added a few things to the IsJSONSerializable
interface
and implementation requirements:
- A method for registering
IsJSONSerializable
classes; - Some class-level attributes in
IsJSONSerializable
for keeping track of registeredIsJSONSerializable
classes, keyed on their Python namespace; - A way to find that Python namespace;
The wrapjsonloads
function is, apart from the signature of the replacement
function itself and what original function it's expecting, identical to wrapjsonload
.
There's not much about it to comment on, but I'm going to show it in the interests of
being thorough:
def wrapjsonloads( origfunc ):
"""
Replaces json.loads with a function that hands processing off to a subclass of
IsJSONSerializable for unserialization of the JSON data into an instance of the
class when applicable"""
if origfunc != json.loads:
raise RuntimeError( 'wrapjsonloads expects json.loads as the '
'function to decorate,but was passed %s' % ( origfunc ) )
if IsJSONSerializable._decoratedJSON.get( origfunc ):
return IsJSONSerializable._decoratedJSON[ origfunc ]
def _loads( *args, **kw ):
baseDict = origfunc( *args, **kw )
try:
objNS = baseDict.get( '__namespace' )
except AttributeError:
objNS = None
if objNS:
if type( baseDict ) != dict:
raise ValueError( 'Decorated override of json.loads '
'expected a dict value to convert to an instance of '
'IsJSONSerializable, but the supplied JSON evaluated '
'to "%s" (%s)' % (
baseDict, type( baseDict ).__name__
)
)
objClass = IsJSONSerializable._registeredLoadables.get( objNS )
if objClass:
return objClass.FromDict( baseDict )
raise RuntimeError( 'decorated override of json.loads could '
'not find a valid object-namespace (%s) to work with: %s' %
( objNS, args[ 0 ] )
)
return baseDict
IsJSONSerializable._decoratedJSON[ origfunc ] = _loads
return _loads
This is a bit longer than I'd like already, and this feels like a reasonable
break-point, so I'll pick up again in my next post with the IsJSONSerializable
abstract class, a fairly detailed look at what needs to be done to build a derived class.
I'm also planning on showing an example structure that I'll be able to show in action
(if only through the command-line), but I think that'll be long enough to warrant
its own post.
No comments:
Post a Comment