Using functools.partial to refactor repetitive code

The other day a friend made a comment about iterative development and it got me thinking. In some situations this is a good approach to get things going, but there is a dark side to it: Crufty nasty code. Functions that we are fear to touch. Code that screams out to for a refactoring.

It got me thinking about the code I hacked together for Remote Matcher. It’s shiny and new, but does it have a dark side?

How bad could it be?

For this project I developed “iteratively”, and I decided I needed to stop and see what shape the code was in. In my file, it definitely needed some attention, and not just because there were todo comments saying “THIS IS TERRIBLE. PLEASE CLEAN IT UP”. (I literally put that in the code. Twice.)

Here’s a quick enumeration of the sins of this code:

  • Repeated strings (like, we check for a string, then go and use that string again on the next line)
  • A bunch of elif statements that grows every time a new data source is added
  • There are several long constants that get import‘ed (and they will grow every time a source is added)
  • The same 2 functions are called over and over, but with slightly differing parameters
  • Although this file is called it sure looks like there’s business logic that’s starting to leak into the functions… even though we have a dedicated module that is supposed to handle that logic!

And that’s in just 25 lines of code.


Clearly things need to change. I have 2 new sources I want to add to the system and the thought of that causing that function to grow at least 4 lines really made me mad.

The strings could be consolidated, but that wouldn’t help with the leaking of logic, or the growth of the if statement. Usually I’m ok with a little bit of repetition in code, but at this point we clearly spiraling out of control. I kept thinking if I could get this code into a dict and then do a lookup I could probably help get this code under control.

As I thought more about this I had a flash of insight: I could use Python’s functools module to help with the function invocation!

I decided to take a swing with the approach and it worked! Rather than try to explain what I did, I made a video showing my approach. Here’s me walking and talking my way through this refactoring:

Parting thoughts

Although the total line count didn’t go down tremendously in the video, the code in the file is on the path to getting more streamlined and having less of business logic laced into it.

The root cause of this was me hacking on it to “just get it working”. Since I knew I was going to have 2 similar but different data sources I didn’t put a lot of thought into “correct” software architecture principals early on. Thankfully I revisited this code before it got too nasty.

So, the moral of the story: revisit your code and look for opportunities to simplify and consolidate things. That and Python’s functools module is pretty awesome! A lot of things like partials sound like magic, but when you need them they work perfectly.

Testing AppEngine cron jobs locally

Lately I’ve been doing a lot with Google AppEngine. It has a lot of great features, but to get those you need to give up a few things. Sadly I discovered that included the ability to locally run “protected” API endpoints. At least until I discovered this one strange trick to make everything work…

The setup

So AppEngine applications need an app.yaml file that defines a lot of things needed to run the code. It is also defines the routing for the app’s endpoints, and who is allowed to access them. (Basically either administrators, or the whole world)

My app is making use of the cron.yaml file to periodically ping certain endpoints in the app. The catch is that I don’t want just anyone hitting those endpoints, a bad actor could hammer that sensitive endpoint and kill my API access.

Did someone say "Bad Actor"?
Did someone say “Bad Actor”?

Thankfully, Google recognized this and allows you to setup endpoints in the app.yaml file with a login: parameter. Setting this to “admin” tells AppEngine that only logged in users who have admin rights to the domain are allowed to hit that end point.

Yay! I don’t have to write any custom login/user management code. But….

The problem

If you are running the code locally, say doing development, you are probably going to need to hit those end points to make sure the damn thing is working. Right?

Well, the script doesn’t know about who is and isn’t logged into Google… because it is only running on localhost! Therefore having the login set to “admin” means you will never be able to access that endpoint.

Boo Hoo, HTTP 302 for you.

So, what do we do? Commenting out the login: field will let you access it locally, but what if you accidently deploy that into production? (Spoiler alert: You are :screwed:)

Run to the console

Although is the cause of our problems, it also turns out to be the solution too!

When boots, it not only starts your app, but it also starts a lightweight admin app too. This app by default runs on localhost:8000 and provides all kinds of useful tools like a DataStore viewer and… a cron utility!

Going to localhost:8000/cron brings up a page that lists all of the (AppEngine application) registered cron jobs, what schedule they are setup to run on, and…. wait for it… a…. button to kick off that job!

Yes, by clicking on that button the admin console will trigger your cron job for you so that you can run and see the results locally! Yay for debugging locally not in production!

Other tricks

The admin console is pretty awesome and has lots of other useful tricks up it sleeves. Here’s some of what I use it for:
* Doing quick checks on entities stored in the DataStore
* Faking incoming XMPP and SMTP messages (I’ve never tried this, but it looks pretty cool for one off testing)
* A memcache viewer/editor
* An interactive console

That last one is pretty sweet. Since I can’t seem to startup an IPython terminal AND connect it up to my app, this is the next best thing. From the webpage it will let you type in some Python code and it will execute it for you.

Perfect for those times when you just want to delete all of your entries because you had a horrible misspelling in one of the field names.

Not that I’ve ever done that.

If you are curious to see the app I built using AppEngine, check out RemoteMatcher! It is a remote job aggregator that scans a bunch of job sites and only emails you the ones that match your interests. No more scanning tons of boards, instead just check your inbox for the best matches.

Running a daily mailing list with Python and MailChimp

So I’m a really big fan of Stoic philosophy. I really like the way it prepares us for troubles in life, and I thought it would be really cool to have a daily email to go out and give you a shot of Stoic inspiration for the day. And since I liked it, why not start a mailing list and share this others?

The first step was to go to MailChimp and setup a mailing list. Getting people on to your mailing list is a huge topic and I won’t really go into detail here but if you’re interested to learn more tweet me at @nloadholtes and let me know and I’ll whip up a post for you. (Here’s the list if you want to join it)

The next thing that was needed was to organize these quotes into a way that was usable. I’m using an a Google spreadsheet because it’s just really easy to put stuff there. Simpler to maintain than a database, this choice turned out to be a pretty good move! There are python libraries that can easily manipulate these spreadsheets.

My (basic) Workflow

Every Sunday evening I would go and go through my list of quotes in the spreadsheet. I usually just did a sort on the “date_used” column and then I choose a quote that I have not used in a long time and set that into an email template that would go out on a given day.

Doing this is an extremely manual process. In the beginning I could be done fairly quickly, taking about 20 minutes. But after a while that got very old and there were a few days where I actually missed setting up the emails because I just didn’t have the time or energy on Sunday night.

Another problem that I ran into was a human error. When you are copying and pasting from a spreadsheet into a separate window of a web browser, it’s very easy to lose track of which quote you’re working on and what day is supposed to go out. Additionally there was a weird mental stress that popped up, but more on that later.

Putting this process into a script seem like a very obvious way to make my life easier.

Python + MailChimp API = <3

The script I wound up writing randomly chooses 5 older quotes that have not been in the past 90 days. It takes those quotes and generates an API call to MailChimp for each one to create a email campaign, one for each weekday. As it chooses a quote, it updates the “date_used” cell with the date we are going to publish the quote on. Here are the things you need to make this happen:

  • A MailChimp account (free is fine)
  • A google spreadsheet with quotes (See this example sheet, and make a copy!)
    • An API key for access to that spreadsheet. (You will need read and write access, see this documentation for details)
    • The “key” id for the spreadsheet (this is the long string in the URL of the spreadsheet, 
  • `pip install mailchimp3 gspread` to get the Mailchimp library and the GSpread library

With those pieces, you are ready to rock! Here’s what my code looks like:

This code is little hacky because I threw it together slowly over several months. At first, I was just getting the quotes and printing them to the screen. Then eventually I modified it to start posting them to MailChimp. The most recent change makes a dump of the quote data into a JSON file that I then feed into another script that handles posting to Facebook. (Let me know if you want a post about that)

How it works

The MailChimp and Google credentials are read from environment variables, but the spreadsheet key and a few other things are hardcoded. This is just how I did it, ideally those hardcoded things should also be parameters or env variables. (Translation: Don’t do what I did there!)

The main method gets a “start” date parameter from the command line. (This script is assuming it will generate 5 days worth of quotes at a time, which is my normal cadence.)

That date is then passed into the get_quotes() function which eventually returns a list of dicts containing the quotes for the week.

That list is then serialized for another script to use, or if I need to do a re-run of this with the same data. The list is then iterated through, and each “day” in the list is fed into the create_campaign() function which generates the email.

The final step is having the email scheduled for delivery on the appropriate day.

After this runs you can log into your MailChimp account and see the emails all scheduled for delivery:

And at this point, everything is set! I have found MailChimp to be very reliable and the scheduled emails have gone out without a hitch for over a year.

Some numbers

As of this writing I have 109 people on the mailing list. MailChimp has some very generous quotas for the free level, I have yet to bump into them with this list.

Another thing I like about MailChimp: The reporting page is pretty nice and straightforward for these types of campaigns. Your numbers may vary, but this is what I see when I look at these reports:

Considering this is a small list on a very niche topic, and is running on a free plan, this is pretty nice!

Earlier I mentioned the time savings. Before my script, the MailChimp portion of this was taking about 20 minutes to do manually. Now that it is automated, all I have to do is type in the correct date and run the script. That normally takes about 30 seconds to complete. 🙂 At this point I should probably create a cronjob and just use that to kick off the process automatically every Sunday.

Another interesting thought: before I used to stress a little about picking the “right” quote for the day. By handing this responsibility to Python’s random.sample() function, I no longer worry about this. Instead I too get the pleasant surprise of seeing a random quote every weekday.

Quick Note: I haven’t done the cronjob YET because I still haven’t fully automated the Facebook script that cross posts these quotes. Once I get that “fixed” the whole process will become hands off.


Making a special string with __str__

“Reuse!”The battle cry of Object Oriented aficionados

Occasionally you really want to use a library so that you don’t have to write your own version of whatever the library provides. But, there’s just one little thing that it doesn’t do. Here’s a story of when this happened to me and how I managed to get around it in a creative manner!

At work we are using Elasticsearch as a datastore for some logging. For “reasons” Elasticsearch doesn’t encourage the use of TTL (time to live) on its records, instead they encourage you to just name your indexes after today’s date and then delete the index when it is past your TTL.

And this is ok. But… if you want to use a library like logzio-python-handler this can be a problem. That library has some awesome capabilities but one limitation it has is that expects the index you are writing to is going to be static and unchanging.

If you have a long running server process this can be a problem. You don’t want your logs from August 4th being written into the July 14th index because that was when you started the server. You want your logs written to their daily index! But you have to supply a string to the library for it to know where to write to. What?!?!?

It would be really impractical to create a new logging handler object every time I needed to write to a logging message!

I need a magic string

So when I was faced with this problem recently I thought about it for a few minutes. It occurred to me if I could pass a function to the library and let that function get called and generate the correct string, that would solve my problem.

See, a string is an object. And when an string is being printed out Python calls the str() method on the object to get that string. So all I needed to do create my own object with its own special str method! Here’s what I did:

When the logzio logging handler runs it is going to call that MagicURL’s str() method which is going to figure out today’s date and plug it into the URL, and then return that to the framework. At that point the the messages will write to the correct index.

The advantage of this is that as you app stays up for days and weeks (it does, doesn’t it?) the logging messages will automatically roll forward into the new index every day.

The other huge advantage here is that you don’t have to change the library in any way. You are simply passing in an object with special behavior and letting the library be a black box.

Here’s what it looks like to call this in action:

The end result is that we got to use this library (instead of trying to re-implement it ourselves) and we got the behavior we needed out of it. A win-win!

Wrapping up

The next time you see something that “just takes a string”, remember that you can define the string with a little bit of magic. The str method lets you inject more runtime logic into places it wouldn’t normally go!

Python Debugging

cool beetle from is an awesome language and environment to work in. And thanks to some great tools Python debugging can actually be fun!

Let’s look at some of the things that separate Python debugging from debugging in other languages:

Interactive debugging

Compared to other languages like Java, Python values interactive tools like the REPL. The REPL (Read-Evaluate-Print-Loop) allows Python developers to “experiment” on code without having to go through the usual write/save/compile/run cycle.

This feature carries over into the built in Python debugger pdb. With pdb you can do all of the normal debug operation like stepping into code, etc, but you can also run simple arbitrary Python code!

Command line first

With everything moving to “the cloud” these days things the command line is becoming more important than ever. Since most Python debugging tools are build off of pdb, it is now super convenient to use the debugger on a remote machine.

Simply ssh into your remote machines and boom, you can start using pdb just like you would on your local machine.

Hopefully this isn’t something you will need to do often, but as we all know sometimes things happen in production that just don’t happen on your local dev machine. It is great to have this option!


While pdb is pretty cool as it is, there are other choices and options to make it even more awesome! Here are some command line tools that can make your Python debugging experience more enjoyable:

  • pdb++ — Just `pip install pdbpp` and you will get a new coat of paint on pdb with tab completion, colors, and more!
  • PuDB — A cool text-based GUI for debugging
  • better_exceptions — A pretty printer for your exceptions

And of course there are more visual oriented tools, for those who prefer working in Integrated Development Environments (IDE’s). Here’s some great ones that I have used:

  • PyCharm — My preferred Python IDE. Lots of great things in this tool, and I highly recommend it to everyone.
  • Wing IDE — Another popular IDE I have used off and on over the years.
  • Eclipse — Is there anything Eclipse can’t do? With the installation of a few plugins it becomes a decent Python IDE.

Each of these offers the ability to set breakpoints, examine the stack, and all kinds of other debugging goodness all in a nice and easy to look at format. If you are just starting out with Python I highly recommend checking them out to help guide you as you learn the language.

More on Python Debugging

I’ve collected my best tips on Python Debugging into an e-book called “Adventures In Python Debugging”. Check it out over at There’s a free 5 day email course if you would like to get a sample of the book and learn more!Adventures In Python Debugging book cover

The curse of knowledge: Finding os.getenv()

Recently I was working with a co-worker on an unusual nginx problem. While working on the nginx issue we happened to look at some of my Python code. My co-worker normally does not do a lot of Python development, she tends to do more on the node.js side. But this look at the Python code lead to a rather interesting conversation.

The code we were looking at had some initialization stuff that made my coworker said “Hey why are using os.environ.get() in order to read in some environment variables?” She asked “Why aren’t you using os.getenv()?” I stared blankly for a second and said “huh?”

I was a bit puzzled by this question because this developer is really good with node and also with Ruby. Perhaps they were thinking of a command in a different language and not Python I thought to myself. Together we looked it up real quick and much to my surprise I discovered there actually was a command there in the standard library called os.getenv() and it does exactly what you think it would. It gets a environment variable if it exists, and returns None (or a specified value) if it doesn’t exist.

Using os.getenv() is a few characters shorter than using os.environ.get() and in the code we were looking at it just looked better. Since the code didn’t need to modify the environment variables, it just made sense to use it. But it got me thinking: I’ve been working in Python for a few years now, how did I not know about this?

You don’t know what you don’t know

For me this was a real educational moment. It is very easy to think that we know it all, especially with things that you use day-in and day-out. But, you should never think that you know everything about a language even if you are an expert. There are people around you who, even though they might be experts in different languages or technology, still have something interesting to offer to you and your code.

Have a conversation with someone who is either junior or senior to your skill level. Very quickly one of you will discover something new. For example, the junior person could discover a new approach to solving a problem. And a senior person can get a new perspective.

The curse of knowledge: how I discovered os.getenvThe second situation is one that I really identify with. As you become more “senior” in most things you begin to suffer from “the curse of knowledge”. This means your knowledge advances to a point where you can no longer realize that something is beyond a beginner. The danger with that is that you develop a new set of assumptions about everything and you stop questioning things in the manner you used to.

If you are not aware of this, it can lead to some nasty things. (Think arrogance, blind spots in the code/system, etc.) It also can lead to conversations that unintentionally intimidate others from participating in your development process in an effective manner. No matter how you slice it, this is a very bad thing.

Having a second set of eyes, especially those that come from a different background, can really help surface issues in your code. That is always useful. In this case I was very fortunate and was able to get some insight into code that was working but perhaps a little bit inefficient. Now I have code that looks a lot better when it gets to the code review.

Learn from this

So, today go and talk with someone who has different areas of knowledge or experience levels than you. Something good will probably come of it soon.


Debugging Flask, requests, curl, and form data

Here’s a recent situation I found myself in where some HTTP form data was not appearing like we expected.

Debugging Flask

The basic setup is this: A Django process is replaying some HTTP traffic to another system that is written in Flask. The issue was that some requests that were coming in had form data that wasn’t making it to the other system.

To help troubleshoot this, I created a simple flask app that would echo out the headers, body, and form fields it saw on incoming requests. Let’s call this the receiving program. The idea was that we could point our relay app to that address and dump out everything so we could see what the issue was.

The first thing that I noticed was that our form POSTs did not have any of the form fields I was expecting. There was nothing in the request.form or request.body fields.

At this point I was concerned that there was something I was missing in how flask was either reading the request or in how it was sending it. To narrow it down I chose to use curl to send requests to my receiving program.

This revealed what turned out to be the first problem: The receiving program was looking for form data, but the replay program wasn’t sending it. When I did a curl command like this:

curl http://receiver/hello –data ‘{“my”:”form”,”data”:”blah”}’

I would see the receiver print out the data. So that pointed to my replay code as being a source of the problem.

Sending form data with requests

The replay code uses the most excellent Requests library to do its HTTP communication. Requests is very easy to use, most of the time just doing a, data=<your data to send>) is all you need to do. But for form data there is another option.

It turns out you can also send multipart form data by swapping out the data parameter with the “files” parameter. This is where my debugging went off the rails for an hour.

The wrong path

My original code was using the data parameter but I wasn’t seeing anything pop out in the receiver. Putting 2 and 2 together I managed to get 153 and figured I must be using the wrong parameter so I replaced data with files and retested.

To my surprise, the receiving program was still not seeing any form data! In the flask code looking at request.form revealed an empty string!

After using pdbpp to step through the code and inspect the request object closer I made a surprising discovery: The data I sent was in the request.files field!

Thoroughly confused I killed the receiving program and replaced it with the nc command. NetCat (nc) is a handy utility that can send or receive data on a socket. I had reached a point where I didn’t understand why or how Flask was getting the data and manipulating my HTTP request.

Invoking the command:

nc -l 5000

Makes nc listen on port 5000. As it listen it dumps out what it receives. Since HTTP is a plain-text protocol, I could see exactly what it was sending. In this case it was sending:

Which looks pretty different compared to what curl was sending:

The big difference is that one has the markers for multipart and the other doesn’t. What gives?

The multipart is just that: “Multiple Parts”. As when you are sending things mixed together in the same requests like HTML and images. The plain form (the 2nd example) doesn’t have that because we are declaring in the header that the entire request is going to be one type. For my replay code, this is what we were doing in the first place, and it was correct.

Where’s the beef?

So at this point we have walked in a giant circle. It turns out I was sending the data correctly, but it wasn’t being seen. What gives?

Going back and investigating the original replay code I focused on logic where we handle form encoded requests. It turned out we had a nasty bug in how we detected and handled form data.

To identify requests with form data we were looking at the Content-Type field and looking for “form-data”. The code looked like this:

If request.content_type == “form-data”

This is a bit of a problem because the accepted Content-Types for form data have a lot more text in them. (Specifically “application/x-www-form-urlencoded” and “multipart/form-data”) This resulted in us never looking at the request.form field to get the data! For the morbidly curious, the next few lines took data from request.body which is blank if the Content-Type is set to some kind of form data.

Further down the line when it was time to replay the data, we took what happened to be a properly formatted Content-Type and then passed along an empty string in the data field.

As soon as I changed the logic to:

If “form” in request.content_type:

The code started working as expected. It detected the form data properly, and then put it into the correct spot before transmitting to the receiving program.

The lessons learned

First and foremost, make sure you are sending the data you think you are. 🙂 Other lessons:

  • Even though form data can look like the body of an HTTP request, Flask will treat it differently if the Content-Type is set correctly
  • Using curl to send “correct” requests is a great way to confirm your code is sending the data you think it is.
  • Debugging flask sometimes means using other tools. Using netcat/nc to dump out the data is an even better way to make sure you are really sending what you think you are sending.