Tuesday, April 18, 2017

PEP-8 vs. My Coding Standards

Before diving into the markup module of the idic framework, I wanted to take a quick look at coding standards from an "official" perspective/standpoint. Specifically Python's PEP-8 standards, since they've been a topic of some interest for me recently. I'm jumping the gun a bit — as I'm writing this, most of the markup module is actually mostly complete, but I haven't posted any of it, but bear with me, please...

Today's post ended up being something that felt better as a global page in the blog (I'll likely refer to it from time to time elsewhere/later), so I've copied it as such: PEP-8 vs. My Coding Standards. As I update my thoughts on this topic, that page will get updated…

If you've been reading the code I've been sharing here, and are aware of the PEP-8 standard, you may have noticed that I'm not adhering to all of the PEP-8 guidelines. A few of them I've consciously chosen to not abide by, for various reasons that I feel fall into its A Foolish Consistency is the Hobgoblin of Little Minds criteria. A handful (I think) are hold-overs from working in projects that had very specific (sometimes rigid) standards for variable names. Those habits were formed over the course of years. In one case that helped form those habits, there was an even expectation that the build-process for the associated codebase would actively check for compliance with those style-guidelines, and if they weren't followed, the build would fail. While that helped reinforce those guidelines, as well as they habits they formed, I think that took things just a bit too far.

That same hobgoblins section raises some thoughts for me about application of rules around coding standards (naming-conventions in particular). Setting aside my admitted habits, I tend to think along lines that I thought were best expressed by a favorite author as:

Look, that's why there's rules, understand? So that you think before you break 'em.
Terry Pratchett, from Thief of Time

PEP-8 itself sets some priorities that I feel fall pretty much in line with that:
Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important. (emphasis mine).
Yes, there are rules. Yes, they are there for a reason. But they aren't set in stone, and there may be valid reasons to bend or even break them.

Still, style-guidelines are valuable, particularly in an environment where more than one person is going to be in the position of reading the code. Since I know I have some habits that would be considered bad by PEP-8 standards, I'm going to run down the list, noting which ones I've kept, which ones I've dropped (and why), and which ones I should probably adopt that I missed somewhere along the line.

Code Layout

Many of the PEP-8 standards about code layout I follow to the best of my ability to do so:

  • Indentation of 4 spaces per level Yes
  • Using spaces for indentation, not tabs Yes
  • Line length of 79 characters max: Usually
  • Line-lengths of 72 characters for long/multiline strings: Probably not — but they will conform to the more general 79-character line-length
A few I actively try to adhere to, but likely with mixed results:
  • Line-breaks before operators
  • Blank-line usage
  • Imports (though I think I'm pretty consistently adhering, I haven't gone and looked)
  • Wildcard imports (probably more consistently not adhering, simply because the IDE I'm using isn't smart enough to add individual import-items in to the code, even though it is smart enough to recognize they exist)

The one remaining item in the Code Layout standards (Module-level "dunder" names), I know I'm not adhering to. They show up in my module- and package-header template files:

#-----------------------------------#
# File metadata                     #
#-----------------------------------#
__author__  =       'Brian D. Allbee'
__version__ =       '0.1'
__copyright__ =     'Copyright 2017, Some rights reserved by Brian D. Allbee'
__license__ =       ('This work is licensed under a Creative Commons '
                      'Attribution-ShareAlike 4.0 International License. '
                      '(http://creativecommons.org/licenses/by-sa/4.0/)')
__credits__ =       ['Brian D. Allbee']
__maintainer__ =    'Brian D. Allbee'
__email__ =         'brian.allbee+module_name@gmail.com'
__status__ =        'Development'
I don't know whether I'll bother with changing those, frankly. On one hand, I find that I can more easily read these they way they are right now, though so far I haven't had much need to read them. On the other hand... Well, I don't really have an on the other hand scenario that I can point to.

String Quotes

I suspect that most developers working in Python will have naturally picked up the guidelines here — Using one type of quote (whether single or double) is probably habitual, and switching to the other to avoid having to escape the preferred quote is something that's pretty easy, provided that you know it's going to happen when you start typing.

Me, I tend to use single-quotes out of habit, and I stick to that pretty well. In the code I've shown so far, there's one pattern where I suspect I break from that pretty consistently: The descriptions of arguments and such in my documentation decorators. That's usually going to happen because I don't know if I'll need to use single- or double-quotes within the description-string until I've written it, by which time it's almost always going to be faster to escape the one or two characters in the string (or, sometimes, to just type \ then and there) than to replace the characters at the ends of the string.

Whitespace in Expressions and Statements

Spaces inside (), [] and {} are going to be a pretty consistent offense (if such it is) on my part. I'm not likely to change this, I'm afraid, at least not soon, because while readability counts in general (PEP-20/The Zen of Python [execute import this in a Python session]), the simple truth of the matter is that right now my ability to read it counts a bit more than anyone else's, and I find it easier to read with those spaces. If things get to a point where anyone but me is actually using my code, I may set something up in the build-process to strip out those additional spaces, but I don't expect that to happen for a while.

Separate, but related: Odds are good that I've got several instances of whitespace at the ends of lines, if only because as I'm shuffling code around, I usually just don't think about removing any that sneak in.

Comments

Keeping comments current and relevant is something that I try to do, but I'd be very surprised if there weren't at least a few here and there where something changed but an associated comment didn't get changed accordingly. The longer a codebase is undergoing active development, the more likely that sort of disjoint is, I suspect.

Block-comments are usually going to be OK, I think — though there will, I'm sure, be chunks of comments that won't conform because they're items I need to differentiate quickly and easily. Usually that will be chunks of actual code, commented out because it's not needed for functionality reasons, but could be useful for testing or logging while code-changes are being made. In those cases, they'll usually be commented out at the beginning of the line, making it easier to differentiate between a real comment and some code that's been commented out but kept for some reason.

I try not to use inline comments, though there are probably a few here and there in some of my older code and projects. I doubt that there are many in the code here, if only because it's just not old enough for them to start accumulating yet.

Documentation Strings

May be moot, since I plan to use my documentation decorators in all my code, and they will generate results that are more consistent (and more verbose) by design.

Naming Conventions

If there is any single category of guidelines that my code is going to violate, this is going to be it, but there's a reason behind it. The target audience, such as there is one, for most of the code I'm expecting to write for this blog could be described pretty well as web application developers. To my thinking, that means there's a pretty good chance that those target-audience users will be familiar with JavaScript and its conventions. Some classes (particularly in the markup-generation parts of the framework) I actually want to mirror their JavaScript equivalents as closely as makes sense. JavaScript uses CapWords (or CamelCase) in its class-names, and mixedCase for object properties and methods (using the names of the styles as listed in PEP-8). In portions of the framework, then, that overlap that desire to mirror a JavaScript API equivalent, I'm using naming-conventions that will be more familiar to developers with JavaScript experience. That's a conscious choice on my part, and since I expect/hope that most of this code will be used by members of that web application development group, it has leaked out across to other sections of code that aren't as directly related...

Outside of those JavaScript-related cases:

  • Short, all-lower-case names for modules and packages — Yes;
  • CapWords for classes (which holds true in both contexts) — Yes;
  • Function and class-member names — No (but there's a reason)
Functions and class-members (whether methods or properties) that do not have a JavaScript equivalent will also use CapWords naming, because there will be cases where classes have JavaScript and non-JavaScript members alike, and I'd like to have some sort of readability-level hinting in those member-names indicating whether they're JavaScript equivalents or not. The recommended convention (lower_case_with_underscores) I find to be less readable in general, particularly in cases where those members exist in a class that also has JavaScript-equivalent members. It's not a great compromise, to be sure, but I think it's the best compromise under the circumstances.

Given that the framework is all one project, and that

Consistency within a project is more important.
that more or less mandates that all of the framework's codebase should at least prefer my non-standard, non-PEP-8-compliant naming conventions for the sake of consistency. Applications built on top of that framework may not share that preference, though, and shouldn't reflexively abide by them.

Programming Recommendations

PEP-8 also contains a pretty substantial list of Programming Recommendations. Many of them are moot for my efforts, at least so far — they relate to capabilities of Python that I'm simply not using (yet). Most of the rest I stick to pretty well, I think, though even in those cases, there may be additional relevant information or concerns:

  • Singleton comparison (e.g., x is not None vs. x != None): Probably moot, since I generally don't use None except in boolean-compatible ways
  • Deriving custom exceptions from Exception: Yes (to be honest, I wasn't aware that there was an underlying exception-class deeper than Exception, so I pretty much got lucky in this case)
  • Catching specific exceptions: Yes, though I know there are a few instances of bare except code that I'll need to track down and reconcile by specifying the base Exception
There are a couple of other areas that I'll need to take a closer look at:
  • Newer exception-binding (e.g., except Exception as error vs. except Exception, err). I hadn't seen this structure until I started this review, so I'll probably need to go looking for the older-style cases and correcting them.
  • Making sure try/except/finally is used to assure resource clean-up where it's relevant. I'm pretty sure I'm already doing that (out of habit), but another look-through wouldn't hurt

Checking Compliance with pylint

There's a tool called pylint that can be used to check and rate Python code. It checks a variety of items across an entire code-base, including many (if not all) of the various PEP-8 standards, and generates a report of all the various items that it finds, plus a rating of the code it analyzed on a 1-to-10 scale. Knowing that I have a slew of varied violations likely, I was expecting it to pretty much report that the code I was writing was somewhere in what might be called a horrible state, and that expectation was borne out when I ran it against the idic package just before the markup module was finished:

% errors / warnings by module
-----------------------------

+-------------------+------+--------+---------+-----------+
|module             |error |warning |refactor |convention |
+===================+======+========+=========+===========+
|idic.markup        |56.00 |39.89   |39.53    |49.95      |
+-------------------+------+--------+---------+-----------+
|idic.doc_metadata  |44.00 |42.13   |44.19    |30.71      |
+-------------------+------+--------+---------+-----------+
|idic.unit_testing  |0.00  |12.92   |11.63    |14.70      |
+-------------------+------+--------+---------+-----------+
|idic.serialization |0.00  |5.06    |4.65     |4.63       |
+-------------------+------+--------+---------+-----------+


Messages
--------

+-------------------------------+------------+
|message id                     |occurrences |
+===============================+============+
|bad-whitespace                 |5259        |
+-------------------------------+------------+
|bad-continuation               |1319        |
+-------------------------------+------------+
|trailing-whitespace            |1212        |
+-------------------------------+------------+
|invalid-name                   |538         |
+-------------------------------+------------+
|protected-access               |91          |
+-------------------------------+------------+
|unidiomatic-typecheck          |88          |
+-------------------------------+------------+
|line-too-long                  |27          |
+-------------------------------+------------+
|(There are more)                            |
+--------------------------------------------+
The bad-whitespace, trailing-whitespace and invalid-name items were more or less expected to be widespread, for the reasons I mentioned above. I'll have to look at the details on the bad-continuation items, to see if they are really a problem, or if they're something in my coding-style that I'm doing for an identifiable reason. The protected-access items are, I'm sure, places where I'm directly accessing a protected member of one class-instance from within an instance of another class, probably in cases where I need to set a property-value in the first from within the second, and that property is read-only from a public-interface perspective, but I'll take a deeper look at some point later to determine if that's really a problem. The unidiomatic-typecheck items will warrant a closer look as well, since I'm not sure what those entail. At and below the count/level of the line-too-long items, I'm not terribly concerned — so long as the code is functional and testable, it seems likely that they are minor, tweaky little things. That said, there are 45 reported item-types, and 224 individual items, so those will probably warrant a closer look before I wrap up the modules in the current idic package-structure. The final rating, using pylint defaults is reported as:
Global evaluation
-----------------
Your code has been rated at -18.45/10

On a scale from 0 to 10, -18.45 sounds pretty bad, but I suspect that a lot of that bad raing will go away once the bad-whitespace, trailing-whitespace and invalid-name items are ignored. Doing that is pretty straightforward, and can even be set up so that the rule-set for the idic package is all stored in one place, so that a pylint run that has specific exceptions relating to the idic code-base won't contaminate pylint runs against other packages and modules:

  • Create a custom pylint configuration-file:
    $ pylint --generate-rcfile > pylint.idic.rc
  • Add the bad-whitespace, trailing-whitespace, invalid-name, and protected-access messages to the disable list (I prepended hem in my case, since there were already quite a few in the list generated in the pylint.idic.rc file generated by pylint itself):
    [MESSAGES CONTROL]
    # ...
    
    # Disable the message, report, category or checker with the given id(s). You
    # can either give multiple identifiers separated by comma (,) or put this
    # option multiple times (only on the command line, not in the configuration
    # file where it should appear only once).
    
    # ...
    
    disable=bad-whitespace,trailing-whitespace,invalid-name,protected-access # ...
    
  • Re-run pylint using the package-specific configuration-file:
    pylint --rcfile=pylint.idic.rc idic
That yields:
% errors / warnings by module
-----------------------------

+-------------------+------+--------+---------+-----------+
|module             |error |warning |refactor |convention |
+===================+======+========+=========+===========+
|idic.markup        |56.00 |55.17   |39.53    |63.63      |
+-------------------+------+--------+---------+-----------+
|idic.doc_metadata  |44.00 |26.44   |44.19    |16.95      |
+-------------------+------+--------+---------+-----------+
|idic.unit_testing  |0.00  |10.34   |11.63    |14.67      |
+-------------------+------+--------+---------+-----------+
|idic.serialization |0.00  |8.05    |4.65     |4.76       |
+-------------------+------+--------+---------+-----------+


Messages
--------

+-------------------------------+------------+
|message id                     |occurrences |
+===============================+============+
|bad-continuation               |1319        |
+-------------------------------+------------+
|unidiomatic-typecheck          |88          |
+-------------------------------+------------+
|line-too-long                  |27          |
+-------------------------------+------------+
|(There are more)                            |
+--------------------------------------------+
... and a rating of 4.38:
Global evaluation
-----------------
Your code has been rated at 4.38/10 (previous run: -18.45/10, +22.83)
If all of the various bad-continuation items are either corrected, or deemed to be something I'm doing for a reason, and go away (simulated by adding that message to the configuration-file's disable list), the results are still better:
% errors / warnings by module
-----------------------------

+-------------------+------+--------+---------+-----------+
|module             |error |warning |refactor |convention |
+===================+======+========+=========+===========+
|idic.markup        |56.00 |55.17   |39.53    |48.28      |
+-------------------+------+--------+---------+-----------+
|idic.doc_metadata  |44.00 |26.44   |44.19    |31.61      |
+-------------------+------+--------+---------+-----------+
|idic.unit_testing  |0.00  |10.34   |11.63    |17.82      |
+-------------------+------+--------+---------+-----------+
|idic.serialization |0.00  |8.05    |4.65     |2.30       |
+-------------------+------+--------+---------+-----------+


Messages
--------

+-------------------------------+------------+
|message id                     |occurrences |
+===============================+============+
|unidiomatic-typecheck          |88          |
+-------------------------------+------------+
|line-too-long                  |27          |
+-------------------------------+------------+
|(There are more)                            |
+-------------------------------+------------+
and a rating of
Global evaluation
-----------------
Your code has been rated at 8.62/10 (previous run: 4.38/10, +4.24)
I'm going to leave the bad-continuation items in place until I have a chance to look at them in more detail, though.

From my perspective, knowing that my coding-practices are going to generate some pylint errors and warnings, I'm OK with disabling some of them (at least so far). If I were using a different IDE, one that handled certain tasks differently (I won't go so far as to say better), or if the code-base that I'm writing weren't intended to break some of the rules, it might well be a very different story.

I'd also give some serious consideration to including a pylint run as part of a quality-check during a build or commit process, assuming that I could figure out a good/useful way to integrate them. For now, I'll just run pylint periodically to see what the results look like, and as a check for any egregious omissions or errors on my part...

No comments:

Post a Comment