Tuesday, January 10, 2017

How I Write Code

I've been a developer (mostly web-applications) for longer now than I'd care to admit. In the nearly two decades I've been doing this kind of thing, I've acquired several habits that could be considered coding standards that I adhere to, even if they may not be typical best practices in the industry at large. Some of them are, I think, though I've had experience professionally where they were acknowledged but not followed, or set aside because of time/budget priorites. All of the following, though, I consider to be important enough that I plan to follow them while I work through the code I'm presenting here.

Yes, I Know Python's not Java...

Let me get this out of the way first...

Over the last several years, as I've contemplated various programmatic constructs and concepts and how to implement them in Python, as I've Googled them I've run across any number of discussions that follow a pattern like:

How can I [do something] in Python?
You shouldn't, It's not Pythonic.
How can I [do something] in Python?
You shouldn't, it violates duck-typing.
How can I [do something] in Python?
Why would you want to? Python's not Java
I'll be honest: These tend to irritate me, sometimes a lot. While there is certainly some benefit to keeping Python code pythonic, trying to figure out what, exactly, that really means is not easy, and is often at least a little bit inconsistent from one person's viewpoint to another. I did, eventually, run across an article that makes a solid attempt to define what pythonic really means, and it's worth a read, in my opinion. I noted with some interest, though, that one of the first things stated was that
Pythonic means something like idiomatic Python, but now we'll need to describe what that actually means.
before going on to actually explain and discuss what some of the idioms of Python actually are, and how they've evolved over time (and, maybe, are continuing to evolve). There is, maybe, enough insight into the then-current idioms discussed there to get a good handle on recognizing other, perhaps newer idioms.

That same article also mentions The Zen of Python, which can be found by executing import this in a Python shell (numbered for later reference):

The Zen of Python, by Tim Peters
  1. Beautiful is better than ugly.
  2. Explicit is better than implicit.
  3. Simple is better than complex.
  4. Complex is better than complicated.
  5. Flat is better than nested.
  6. Sparse is better than dense.
  7. Readability counts.
  8. Special cases aren't special enough to break the rules.
  9. Although practicality beats purity.
  10. Errors should never pass silently.
  11. Unless explicitly silenced.
  12. In the face of ambiguity, refuse the temptation to guess.
  13. There should be one – and preferably only one – obvious way to do it.
  14. Although that way may not be obvious at first unless you're Dutch.
  15. Now is better than never.
  16. Although never is often better than right now.
  17. If the implementation is hard to explain, it's a bad idea.
  18. If the implementation is easy to explain, it may be a good idea.
  19. Namespaces are one honking great idea – let's do more of those!
These may be good general characteristics for identifying some of the Pythonic idioms. They are cited frequently enough (usually without explanation or context as applied to answering a How can I [do something] in Python question) that it appears many believe them to be adequate.

<soapbox>

But, particularly with respect to practicality beating purity, (#9), there are other considerations that I believe are at least as important. Perhaps even more important, if the alternative is code that is unusable, brittle or difficult to maintain.

</soapbox>

My Coding Imperatives

In general, I believe that all code should be written to meet the following criteria (setting aside any specific functionality or implementation guidelines). They aren't in any particular order, but I've numbered them for reference later:

  1. It should be as easy to use as possible;
  2. It should be as hard to misuse as possible;
  3. It should be as well- and consistently documented as possible;
  4. It should be as general-purpose as possible, balanced against the functionality it was intended to provide;
  5. It should thoroughly tested;
  6. It should be as unsurprising as possible;
These are very generic and very global, but each of them is a step towards improving code quality and usefulness. If code isn't useful, or doesn't have some minimum degree of quality, then I start to wonder why it even exists.

The third item (regarding documentation) is something that I'm going to address with a documentation module in one (or more) of the posts here. I'm also considering ways to integrate documentation-requirements into the thoroughly tested item (#5), but I don't know how far I'm going to get with that, or if it'll even really be necessary, since I plan on incorporating the documentation-process into code-templates as soon as I have it worked out.

Guiding Principles

The balance of those items, though, have led me to formulate a set of guiding principles that I follow with any code that I write that's going to be available to anyone other than myself. I frequently adhere to them even for code that I'm not planning on sharing or distributing, particularly if I expect that I'll need to set it aside for any length of time — after a while, I'll be as much a stranger to my own code as someone who's never seen it before. At that point, any of the old adages about commenting your code (because the sanity you save may be your own) apply as much to the application of these principles:

  1. Manage/control all public interface entities/members
  2. Raise errors as close to their ultimate source as possible
  3. If specific types are expected, test for those where they're expected
  4. If specific types are expected, assure that some common type-identity is defined or available to use to test the expectation
  5. Restrict inheritance until there is a demonstrable need for it
  6. Leverage inheritance to keep functionality in one and only one place

The specific shape of these — how they are implemented — may vary significantly depending on the language or platform(s) in use. They may even take some of their final shape from business- or project-level requirements. Here's how I see them applying to what I'm planning to do here, and specifically in Python...

Manage/control all public interface entities/members

The public interface of a class is important. Ultimately, it controls all avenues of interaction with each and every instance of the class. Python doesn't have real protected- or private-member capabilities — it's enforced by convention (and name mangling for private members), which does not prevent access to those members. Taken together, these facts open up a whole range of possibilities for bad interactions with instance members, though probably only with respect to instance attributes (properties). In application, what this usually means is that I'll define all public properties of a class using the formal property provided by Python, with getter, setter and deleter methods. The class-templates that I've shown already (the ones that allow for concrete properties, at least) have already hinted at this in their structure. An implemented (but very basic) properties-structure would look something like this for a property named PropertyName somewhere in a class:

_propertyName = None  # The internal storage of the property

def _GetPropertyName( self ):
    """
Gets the PropertyName property-value of the instance."""
    return self._propertyName

def _SetPropertyName( self, value ):
    """
Sets the PropertyName property-value of the instance."""
    # TODO: Type-check the value argument, raising an error if bad.
    # TODO: Value-check the value argument, raising an error if bad.
    self._propertyName = value

def _DelPropertyName( self ):
    """
"Deletes" the PropertyName property-value of the instance by setting it 
to None."""
    self._propertyName = None

PropertyName = property( _GetPropertyName, None, None,
    'Gets the PropertyName value of the instance.' )

In my expereience, most property-getter methods (_GetPropertyName) that don't refer to some other object, or that don't implement some sort of lazy instantiation or -loading strategy are very naïve. They almost have to be, though, since all they usually need to do is retrieve the interal-storage value that the property wraps, and outside of a few special cases, I rarely find myself doing more than this example shows. Most of my deleter-methods, barring variations in the deleted value being set are also pretty naïve...

The setter-method shown (_SetPropertyName) is also naïve as shown, but that naiveté would go away as soon as the type- and value-checks noted in the TODO comments were implemented. The implementations of those checks ties directly into my next guiding principle, so I'll cover that in more detail shortly. At least one of those checks (checking for types, usually) shows up in almost every setter-method I write, eventually.

Raise errors as close to their ultimate source as possible;
If specific types are expected, test for those where they're expected; and
If specific types are expected, assure that some common type-identity is defined or available to use to test the expectation;

These tend to be expressed in my code as type- and value-checking of function and method arguments more than anything else. That, in turn, tends to look a lot like emulating static typing of values and arguments, as is found in Java and all of the real C variants. To be brutally honest, there's a fair amount of truth to that observation (accusation?) — but I strongly believe that there's a significant contribution to several of my goals (#2, #5 and maybe #6, though that's often situational). A simple implementation, using the same setter-method shown above, and checking for a non-empty string or unicode value or None would look something like this:

def _SetPropertyName( self, value ):
    """
Sets the PropertyName property-value of the instance."""
    if value != None and type( value ) not in ( str, unicode ):
        raise TypeError( '%s.PropertyName expects a non-empty str or '
            'unicode value, or None, but was passed "%s" (%s).' % ( 
                self.__class__.__name__, value, 
                type( value ).__name__
            )
        )
    if value != None and not value:
        raise ValueError( '%s.PropertyName expects a non-empty str '
            'or unicode value, or None, but was passed "%s" (%s).' % ( 
                self.__class__.__name__, value, 
                type( value ).__name__
            )
        )
    self._propertyName = value

It's not difficult to imagine a scenario where, say, an instance property-value gets set to some value, other code happens, and somewhere along the line, perhaps a lot further in, an error gets raised because that initial value-setting was invalid. This kind of thing happens all the time, especially during the development phase of a project. Although Python's error-reporting (and traceback) functionality is pretty solid, this sort of bug can be difficult and painful to deal with. If, on the other hand, the initial value-set process actively checks for valid values (by type and/or actual value) and raises an error right then and there, it doesn't have a chance to get buried in other code after silently setting up the imminent failure. While I like Python's dynamic typing, and I like not having to specify a type for each and every variable, property and argument, there are times when it's advantageous to do so, and this sort of scenario is, hands down, one of those times.

There are some trade-offs that have to be made, though, if the more open, dynamic nature of Python code is to be preserved while still actively type- and value-checking potential points of failure. First and foremost among them, I think, is that if type-checking is going to be used, there must be some very abstract types defined for any checks that aren't looking for baseline or primitive types. That is: If a method expects a number for a given argument, checking the type of that argument is easily managed by looking to see if the value is of a numeric type (int, long, float, etc.). If an argument is expecting some other type, say an XML node (whether it's a tag, a text-node, CDATA or a comment), there must be some low-level type or interface (maybe something called IsDOMNode for example) defined that can be used to actually check the incoming value-type. That's not difficult, but it requires planning and/or discipline to make sure it gets done.

Restrict inheritance until there is a demonstrable need for it

This is, I gather, a pretty controversial practice, and not just in the realm of Python code. The typical argument against it that I see in Python-oriented discussions is usually some variant of the argument that we're all consenting adults here, so let me subclass as I see fit. OK. Fair enough. As the other consenting adult in this relationship, my concern would be that someone might want to subclass something in my code, even legitimately, that flat-out was not intended to be subclassed. That someone might well be me some time later, after I've long since forgotten why a class wasn't deemed safe or desirable to subclass. Since I'm planning on distributing the actual code, there's nothing to prevent it happening, but if it's going to happen, I'm going to require that doing so requires at least looking at the original code. I'll commit to making sure I explain why it's not intended to be extended in the code itself. After that, and hopefully after actually looking at the explanation, any other consenting adult can make their own decision, hopefully an informed one, with the understanding that they do so at their own risk.

Leverage inheritance to keep functionality in one and only one place

Python is unusual (maybe unique) in that it allows nearly unrestricted multiple inheritance — there are a few restrictions, to be sure, but they are nowhere near as stringent as the ones found in other languages, where, typically, any given class can only inherit from one other class, concrete or abstract, and any number of interfaces. Inheritance can be something of a bugaboo, breaking encapsulation (there are several explanations and examples of how this can happen that can be found easily – take your pick). I have settled on what I believe to be a pretty solid balance between using inheritance and preserving encapsulation by using small abstract classes as mixins for very specific, usually simple, pieces of common functionality. Technically, this still presents the inheritance-breaks-encapsulation risk, but if the functionality is simple, cohesive, and sensible, the ability to keep all of its code in one place is, to my thinking, a good trade-off for that risk.

Less-Imperative Items and Next Steps

I generally like to have templates and/or snippets set up for development, so that I don't have to spend (read as waste) a lot of time creating starting-points for common items like modules and classes. I mentioned incorporating the documentation-process into code-templates earlier, but before I can lock down any of the starting-point templates noted above, I need to work out how I'm going to provide in-code documentation so that it can be included in those templates. There's a fair amount of thought that has to go into that, and I'm not sure if I'll be able to get any real code interspersed into it, but I'm going to try...

No comments:

Post a Comment