Cleaning up legacy python code

Python is growing in popularity which is a great thing! And with that growth we now are seeing more and more legacy python project. Occasionally you are going to inherit one of these “legacy” projects. Here’s some tips on how to get it under control.

What is a legacy project?

Over the years I have heard lots of definitions of what a legacy project is. Here’s a few of the gems I’ve heard used to describe these projects:

  • An “older” project that has been around forever
  • A code base without any kind of tests
  • The project that no one wants to work on
  • “Everyone who worked on this left the company years ago…”

And sometimes the code isn’t old. A lot of times a project will be done real quick and put into production before everyone realizes that there’s a better way. (e.g. using a different framework) By the time that happens the “legacy” project is doing an adequate job and management is afraid to touch it. This is a pretty legitimate concern to higher ups in a company: “If it ain’t broke, why try to fix it?

So, what are the best things to do with a legacy project?

Make sure it is in version control

Before you do anything, make sure the project is in some kind of version control system. Too often I have seen “proof of concept” programs that were thrown together and pushed into production without any thought given to making sure that the code was somewhere safe.

This is doubly true if the code is anyway important to business operations: You do not want to be the last person who was seen with the only copy of the code. If for some reason this project is not in git/subversion/etc. put it in there NOW. If your company doesn’t have a version control system, beg your manager to invest in one ASAP.

Delete commented out code

Once a project is under version control, one of my favorite tasks is to delete any commented out code.

The older the code base the more commented out code there tends to be. I’m not talking about 1 or 2 lines here and there. I encounter dozens to hundreds of lines of code commented out on a fairly regular basis.

Commented out code is a waste of cognitive time. New developers (like you) who look at the code will see all of that code and try to understand why it is there but hidden in a block of comments.

In my experience it is there because something in the project changed and someone is hedging their bets that the old code will be needed again. So it gets commented out and then haunts the code base forever and ever.

Once the code is under source control, the fact that it got deleted will be recorded in the repository. If for some reason it becomes necessary to revive that code, any developer on the team can go and revisit the commit history and pull out just what is needed. (Spoiler alert: Most of the time you will never need that commented out code.)

Running tests/adding tests

Now that your legacy python project is under version control, we can start do some more interesting things. One of the first things I like to do is to try and run any unit tests that might be there.

A WORD OF CAUTION: Sometimes a project will have unit tests that aren’t quite “unit” tests. In other words, beware of any tests that might reach out and talk to a live system. I was bitten by this recently when I ran a set of unit tests that were doing destructive things to a production cache system. Thankfully we were able to recover it quickly, but I still get grief about it every few weeks.

If there are no tests, this is a great time to add some. Most bosses are cool with test because they usually don’t impact the existing code. Of course, check first before you add anything.

As you are running the tests, consider using to see how well the code base is being covered by the tests. If there are any “critical” spots where the tests aren’t hitting, those should be the first spots you should write tests for.


Something to consider doing at this point is running some type of linter over the code to see how “healthy” it is.

A linter is a bit of software that examines code and looks for anything suspicious or “wrong”. Pylint is a great python specific linter that can look at python code and offer some suggestions.

By default pylint is pretty verbose and will flag all kinds of things that might not really be that important (such as lines longer than 79 chars). There are ways to control the output, and you should check here for more information about that.

So what should you look for in the pylint output? Personally I like to look for:

  • Unused variables
  • Anything that is notes as a potential bug

If this project is going to be worked on, then these are things that you probably should consider fixing them as you add features. Removing unused variables is no brainer. It reduces visual noise and helps developers reason with the code.

Things that are flagged as potential bugs: These need to be handled carefully. Sometimes the code is working in spite of the bug.

Flake8? Not so fast!

Since I’ve mentioned pylint and it’s awesomeness, I should also mention formatters like pep8 or my favorite flake8. These tools can be used to reformat python code to make it more pep8 complaint.

While this is normally a good thing, I don’t recommend doing this on a legacy project right away. My reason for saying this is because if the code is working (especially if it is in production) any changes made to it should be minimal so that you preserve the code as it is.

While most of the time the tools will not modify the code in any destructive way, I have seen strings that were too long get mangled a little bit. This can lead to confusion about what the line was actually doing.

My personal approach would be to have unit tests in place first, and then apply flake8. Also, if you are developing new feature to add to the code base then you should be using flake8.

More idiomatic python

By this point you probably have a really good grip on your legacy project. Depending on what the future holds for the project (new features, or just maintenance) you might want to consider revisiting some of the suggestions from pylint.

One thing I have seen pop up in legacy code projects is code that is “unpythonic”. This includes things like badly formed for loops, checking if “something == None”, and other spots where things could just be more idiomatic.

This is a topic I really enjoy learning more about as I feel I am still learning the true way. If you would like to learn more, I highly recommend reading Effective Python as it is full of great examples of pythonic coding techniques.

Wrapping up

With the exploding popularity of python we are now starting to see more and more legacy python projects. Thanks to some basic tools and the beauty of the language itself, this doesn’t have to be a scary proposition like it is with other languages. Java, I am looking directly at you.

Pssst… Quick favor?

I’m putting together a course on Debugging Python code. If you want to get in on this, sign up below to learn more!

Sign up to learn more!

python __debug__

The other day I stumbled upon Python __debug__ and I thought I would share some interesting things I learned about it.

What is python __debug__?

It is a constant that Python uses to determine if calls to assert should result in code being generated. If you have the -O optimization flag  set, then assert calls will not be “triggered” in your code, even if the condition it is testing is true.

Interesting fact: according to the documentation it is one of two constants that will raise a SyntaxError exception if you attempt to assign something to it. (The other constant that does this is None.)

For example, in python you can do this:

How to assign false to true in python

Totally legal. Not smart, but legal.

And that is totally legal in Python 2.x. (Python 3 wisely does not allow this!) I would not recommend doing this, as your co-workers will hunt you down to express their “displeasure”. But if you try that with __debug__ or None, you’ll get an error:

can't assign things to __debug__ or None

The only two “true” constants in python

Normally I’m not a fan of these types of language tricks, but this looked pretty cool to me and I thought I would share. I do find it interesting that in Python 3 they have True and False locked down to be true constants. Honestly, I thought there would be more language keywords that would be protected like that.


Improving your python: using pylint and flake8 in emacs

python using pylint flake8 in emacsIn a previous post I mentioned an issue I had with some python code that failed in a way I hadn’t expected.

Long story short, I was in the wrong which does happen from time to time. Aside from the actual bug, there was another failure: I thought my tools were checking my work. It turned out this was not the case!

I’ve been a big fan of flake8 for some time. Its ability to find PEP8 issues is a big help, and most of the time I find that it tends to root out problem code before it gets too bad. When I was dealing with that bug I was using PyCharm. I’ve since started using emacs, but would that have made a difference?

It turns out if I had been using pylint I might have caught this issue. I’m placing the emphasis on might there because there’s a bit of overlap between flake8 and pylint that I wasn’t aware of. Let’s explore them a little bit more. Continue reading

Becoming a better programmer

Becoming a better programmer one step at a time

Navy SEALs jump from the ramp of a C-17 Globemaster III over Fort Picket Maneuver Training Center, Va. (Air Force photo by Staff Sgt. Brian Ferguson)

In the Navy SEALs they have a saying: Everyday you have to earn your trident. The trident is the symbol that sailors earn as they complete the training that makes them part of the elite SEALs. It is possible to lose one’s trident however. To prevent a behavior that might cause this, the SEALs remind themselves that everyday they have to “earn (the right to wear) their trident”.

As I spend more time in the programming world I have come to realize there is great wisdom in this approach. A good friend of mine once told me that “Experience and skills are expiring assets.” In other words, if you don’t use them, you loose them. Just because your job title has the word “Senior” in it, you don’t automatically get a pass. You need to earn that title every day.

So as a programmer, how can you earn your place? How can you improve who you were yesterday? What does it take to make sure you are becoming a better programmer? Here’s what I’ve been doing. Continue reading

On moving from Java into Python

Before coming to Python, I did a lot of work in Java. Java is pretty good language and From Java into Pythonenvironment, but it is different than Python. Beyond the language syntax there’s a ton little differences to be aware of. Sometimes when we move from Java into Python it shows by some of the things that we do.

Here’s some things I’ve learned over the years, (or things that I’ve stubbed my toes on recently). Continue reading

Debugging like Elon Musk

Every time I think I’m doing pretty good with my projects, I take look over and see what Elon Musk is doing. Then I instantly feel like I’m wasting my life. That guy does such huge things and he does them so quickly it is mind boggling. What is his secret? Can I be like that?

The answers is yes, if you use “First Principles” reasoning. I have found that when you apply first principles to debugging software that you will get to better solutions. So what is first principles? Here’s Elon explaining the how and what of it:

So here’s the take away:

Continue reading

Adventures with minimongo

python minimongo is the best choice

Too many choices!

Over the last few years I have wound up using mongo as my data store on several projects. In fact, it has been a while since I have written any SQL! For my latest project I again reached for mongo to hold my data, but this time I decided to use the Python minimongo project as my ODM (Object Document Mapper).

What is minimongo?

An ODM is to no-SQL what an ORM (Object Relation Model) is to a SQL database. For Python, minimongo is a small lightweight wrapper around mongo. Minimongo is built on Continue reading

Why flask

“Why flask? Why not Django?”

Recently I was asked this after introducing a new subsystem at work. Originally part of a monolithic Django app, this was new micro-service was one of the first pieces to be split out on its own. I had chosen Flask to be the web framework, and now it was time to explain why…

I’m not anti-Django, but…

Continue reading