Installing OpenCV for Python on OS X

OpenCV is a computer vision library. It is a really powerful library and has bindings for Python. The only thing that it doesn’t have is a good explanation of how to make the python bindings work! Here is how I got it to work on my Mac (running OS X 10.9.5) inside of a virtualenv.

  1. Install cmake (brew install cmake) and the normal development tools
  2. Download and unzip the OpenCV code
  3. Change directories into the OpenCV directory
  4. Type in “cmake CMakeLists.txt”
  5. Type in “make -j5″ (this will use 5 threads and make the code build pretty fast)
  6. Type in “make opencv_python2″
  7. Type in “sudo make install” (to install all of the code)

At this point the python code has been installed to the main system python in /usr/local/lib/python2.7/site-packages which is not very helpful if you are using a virtualenv (which is what you should be using if you are working in python!)

The next step is to copy the OpenCV files from the global directory into your virtualenv. This is done by typing in the following:

cp /usr/local/lib/python2.7/site-packages/cv2.so <path_to_your_virtualenv>/lib/python2.7/site-packages/

This will copy the .so created during the build to your virtualenv which will make it accessible to your python code. At this point you should be able to fire up the python interpreter in you virtualenv and type in:

import cv2

and it should work. Happy Computer Visioning!

Making a plan B

There’s an old saying that “no plan survives first contact with the enemy”.

There is a lot of truth in that statement in a lot of situations. It would seem to say that we shouldn’t bother making plans, but I see it a different way: make plans that are flexible.

Rigidly following a course of action is rarely a good idea. Failing to recognize that something isn’t working is something a lot of people are able to do, but the critical second step is to make a change is something that is rarely done.

Having a Plan B is usually a good idea for anything important. But if your backup plan is just as rigid as the first plan you will have the same problems. A better approach is to make sure your plans can adapt to difficulties you encounter.

For example, if you are debugging software and nothing is working, try stopping what you are doing and approach it from a wildly different angle. You will still be accomplishing you goal (debug the software), but from a different angle.

This ability to change up your approach is the ultimate Plan B. It allows you to move forward maintaining your momentum. You still reach the same destination, but hopefully faster. The root idea is try and overcome your functional fixedness.

Here’s some random examples:

  • You have a flat tire. Your spare is flat too. How do you get the car/tire to the repair shop?
    • Call a tow truck?
    • Use a bicycle pump to get just enough air in the tire for you to drive it to the shop?
    • Take the tire off the car and get a friend to drive you to the shop?
  • You need to edit a large file on your computer, but you don’t have enough free disk space.
    • Delete other old files to free up space?
    • Try and hook up a USB thumb drive and do the work there?
    • See if there’s a way to do the work without copying?

There’s tons of situations in daily life that can be tackled in new ways. All it takes is the ability to remain fluid in our approaches to solving them.

Making Progress

I have a problem. Actually two problems.

One problem is that I have this feeling that I can’t shake. A feeling that I’m not doing enough, or getting enough things done. The second problem is that I have this strange hesitation to post on my blog these days.

The 2nd problem is related to twitter, I tend to post things there more frequently. Short thoughts, less friction. The first problem has a bit of a challenge to it.

Most people by default tend to look at their own achievements in a less-than-flattering light. I think the main reason for this is we tend to “forget” things as we get further in time from them. For example, if you had a really great day at work, but then had 2 weeks that were really bad, you won’t see the really great day as the accomplishment that it was: instead you are overwhelmed with the most recent events (which in this case were not so great).

I’ve decided to tackle both these problems by blogging things that I do shortly after I do them. This way I’ll get two birds with one stone; more frequent updates, and a record of the cool and fun things I’ve done recently.

Now having said all that, here’s what I did in Januaray:

Raspberry Pi – ZNC

Last year I got 2 raspberry pi computers. Very cool little machines, they are about the size of a credit card and are only $35. But… what should I do with them? An answer finally hit me in the form of IRC.

IRC seems to be making a comeback, a lot of interesting/cool people tend to hang out on it. So not wanting to be late to the party, I decided I would hang out there too.

The problem is I could be connecting to IRC from one of several different machine (work machines, home machines, phone, etc.). The solution is to use an “IRC bouncer” like znc to keep my connection. Znc acts as a single sign on point, I log into it, and it keeps my connections to the IRC network alive. It also logs the conversations so I can scroll back on different machines and never miss anything.

The only catch is that znc needs to be connected to the internet all the time to maintain the connection. Not wanting to keep my power-hog home PC running all the time, the flyweight raspberry pi suddenly seemed like the ideal server. It is low power (it runs off of a micro usb connection), and runs linux. The perfect combo!

So with a little research (several other people had the same idea) and a little bit of time I was able to quickly accomplish the following:

  • Setup a Raspberry Pi so it is on the internet 24/7
  • Setup a dynamic dns program so I can get to the Pi (even if my home network gets a new IP address)
  • Setup znc and have it connect to the IRC networks of my choice
  • Set up SSL so that everything on the IRC side is encrypted

The last one is my favorite so far. With all of the security talk going on, a little more encryption is a great idea.

I’ve got a few more network-aware apps that I’m thinking about putting on the Pi, but this is a great start. And the next time I do something I’ll have it posted here! A win-win.

Running python tests randomly with randomize

Recently I was having a conversation with a co-worker about some test problems we were having. While we were brainstorming the idea of running the tests randomly came up. A quick search showed that someone else had been thinking about this issue too, but it looked like it never went anywhere.

The code that we found was here on google code but from the discussion it seems like the code was never included in nose, nor was it submitted to pypi. There was what looked like one repository on github that had this code, but it too wasn’t in the best shape.

So… I decided to grab the code from the Issue on Google Code and start up a new Github repo for it. I added several tests and fixed 1 little bug that I found and today I released it to the world.

If you are doing testing with nose in python, you can check out randomize, a nose plugin for running tests in a random order. The main reason you would want to do this is to ensure that there are no dependencies between your tests. This is very important for unit tests, they should be isolated and independent. This plugin helps confirm that isolation.

How it works: Basically as classes are read into the testing framework the plugin gets called and it will apply python’s random.shuffle() to the tests to produce a random order of the tests. One shortcoming of the plugin is that is only shuffles the tests, not the classes that hold them. (If anyone is interested in implementing this, please feel free to send me a pull request!)

Installation is simple. On the command line just type in:

pip install randomize

and then you’ll have it installed and ready to run. To use the plugin, all you will need to do is this:

nosetests –randomize

And that will invoke it. When it runs it will print out a seed number and then begin executing the tests. If for some reason you need to re-run the tests (say to troubleshoot a test failure), all you need to do is run:

nosetests –randomize –seed=<the seed number>

and that will re-run the tests in the same order.

Why you need tests instead of comments

Today I saw this blog post from Timothy Fitz: I hate comments

My first reaction was to dismiss this as another anti-documentation post by a coder who prefers doing nothing but coding.

But…

Timothy makes some really good points. His intent isn’t to downplay documentation (something that programmers are notorious for skimping on), but rather to play up the need to keep your code honest.

The best example is when he notes that a unit test could replace a comment about what a function “should never do”. That is a really insightful statement. Instead of leaving a comment, leave a unit test. If you are using a continous integration system, it should detect if someone ever violates the intent of the function. A broken test is much more likely to be seen than a potentially out-of-date comment.

I still think comments aren’t the “trash” that many programmers believe they are, but I do think that a unit test would be of more value to the code base overall.

The funny thing is that if you get into an argument with a programmer about comments and you counter with “Well, lets replace it with a unit test!” I think you will find the programmer suddenly back peddling. If there’s one thing a lot of programmers like less than writing comments it is writing unit tests.

Which is a shame, because unit tests that give you high code coverage are so useful when you want to keep your system running correctly.

Today’s Django lesson

Today I was working on re-doing my main website. I have built the site in Django, and then when I’m ready to publish I use the static generator module to create an html snapshot of the pages that I then upload to the server. Basically I wanted the fun of using Django, but the ease of serving up static html pages.

The last time I did this, the version of Django was 0.9 and today’s version is 1.3.1. As a I started work on the site I discovered some things here and there that had changed in the years in-between those version. The one that bit me the hardest was this: I could view the main page (/), but any other page (including index) would give me a 404 error.

This was really weird because the mappings existed in the urls.py file, but a 404 error means it couldn’t be found. If it was a programmatic error you would expect a 500 error, but I never saw one.

Slowly but surely I tore the system apart looking to see what could cause mapped urls to disappear in Django. Eventually I discovered the root cause was how static resources are handled.

One of the changes in the new versions of Django was to add functionality for handling static resources. The biggest change (at least as far as my ancient code was concerned) was in introduction of the STATIC_URL variable. According the documentation this is the prefix for static resources (like css, javascript) that are referenced by the web page that is being built by Django.

In my laziness many years ago, I just plopped the main CSS file (and everything else) in the root of the web directory so that in the html pages didn’t have to have paths on them. So when I was adding the STATIC_URL variable to make sure my CSS file would actually load, I set the variable this way:

STATIC_URL = ‘/’

…because it wouldn’t let me put an empty string of ”. Well, it turns out that this was wrong wrong wrong. Django was looking at the incoming requests as I tried to visit the pages of the site and checking the static directory to see if there was anything there (which there wasn’t, just my CSS file).

Once I realized this, I changed it to:

STATIC_URL = ‘/static/’

…which basically forces you to create a directory to hold your static files. Which is right right right right. Basically Django was forcing me to clean up my act and code the site in a much cleaner manner. As soon as I did that all of the 404 errors went away and I was able to hit every page in my urls.py file.

Choices – The hidden powers of python

The other day I saw something about this interesting posting on Stack Overflow: What are the lesser-known but useful features of the Python programming language? As someone who loves to work in Python and is always looking for a better way to get things done, I had to read this article right away.

I was pretty pleased with myself that I knew a lot of the things listed, and I was really happy when I ran into some nuggets that I knew nothing about. For example, I did not know that you could do this in Python:

1 < x <10

To me, that is really awesome because not only is what you would write in a mathematical sense, but it also is much more concise and what you would see in a pseudo-code representation of a program. That is the power of Python, taking an idea (a computer program) and presenting it in a readable manner. Being able to use the above type of expression just takes us further down that path of readability and that is a good thing. And yes, you can write crappy code in Python, but at least you have the choice: many languages simply have one way to do things. Choices are good.

And speaking of choices, I thought this feature was very cool, even if I would never use it:

from __future__ import braces

That allows the use of {} instead of white space in a Python program. I know some people can’t stand the indention rules of the language and feel more comfortable when they are “braced” (for impact?), but the fact the language has features to be able to turn this on? That is 1 million pounds of awesome in  a 5 pound sack.

There is a ton of other neat tricks there, go and read up and see what you can do to make your code work more for you! Choice is a great thing. With Python, you have lots of great choices you can make. And if you decide to make no choices, you still wind up with pretty readable code which is always a good thing.

Big Data, Big Opportunity

There is a really great article about data is the new commodity in the same way that we look at oil. One thing the both have in common is that they are out there, it just who is willing to go and dig it up.

Information is quickly piling up all over the place, and I agree with the article that the people who are able to capitalize on this are the ones that will get the big payoff. I especially like the idea of calling these start-ups “wildcats”, that perfectly captures the wild west atmosphere that is going to start happening.

The neat thing is that a lot of this information is out there for free, the real value is how people are going to aggregate those individual data streams into a new and often unexpected products. Take twitter for example (are you following me on twitter?), it is a conduit to what is going on in the hive mind of the internet. This site seems to be gathering up the trends on twitter and then adding news articles about some of the things that are hot.

That is pretty neat: Data is generated in the form of people tweeting about Topic X, as X becomes more “important” (in this case more people discuss it so that it rises above other topics) it gets published to the “trending” list. This website then goes in and looks at that list adds more data to the conversation by reporting news about topic X. That way the separate data points are tied together to show that there is a relationship between them, and in the process makes the data more valuable to the end users (by supplying more context, etc.)

Big data is going to lead to a lot of big opportunities. All we have to do is find the data, combine it in the right way, and perform the right data analysis on it. And unlike bit oil, big data is going to be around a very long time.

Python in the Amazon cloud

I have a couple ideas rolling around in my head, and recently I decided to take advantage of Amazon’s AWS service since they are offering a “free” tier. (Basically its free if you are below certain generous limits on resource utilization.) Since python is my weapon of choice these days, I set about to figure out how to get python up and running on an EC2 instance.

If you are going to go down the Linux route with AWS (and honestly why wouldn’t you?), then you are in luck since python ships on most modern distributions. But, while python is “batteries included” there are some really nice projects out there that it is good to take advantage of. Installing these projects is where you start to find out there are differences in the EC2 AMI’s.

My first attempt was with the default Amazon AMI. It seemed like a good choice, but the more I played with it, the more I realized that it was pretty bare-bones. easy_install is included, but when running it to try and install something you run into permissions problems.

Because this is supposed to be a nice experiment, I only played around with it for a hour or so. My patience exhausted, I went looking for an alternative and ran right into the perfect match: ActiveState has an AMI specifically for python developers

This is awesome because ActiveState puts out a really nice Windows package for Python (and other lesser languages). I fired up their AMI instance and have found it to be really nice. I put my python project out there earlier this month and it ran without a problem… until the code crashed because I forgot to put a try catch block in place. :)

Seriously though, I really like this AMI. Not only is a nice python distribution there, but they also included gcc so if you need to compile something, you don’t have to go far. The easy_install script runs like a dream, and I haven’t had any problems with it at all. In fact, my whole experiment (which seems to getting attention mostly from blog aggregators at the moment) has only cost me a total of $0.19. Yes, only 19 cents! And 1 cent of that was from the ActiveState Python AMI (the other 18 cents was from my playing around with the Amazon AMI that wasn’t that great).

Now to get back to my ideas, and sling some python code out into the cloud…

Dev Twitters to follow

In an earlier post, I talk about how twitter is becoming the new “plan file”. Since that post, I’ve joined twitter, and I try to use it as a plan file for my own development activities. 140 characters isn’t a lot of space, so in a way I think it is more honest: you have to get to your point pretty quickly. Which isn’t to say that you can’t goof off with it…

But of course, I’m not the best or most interesting developer on twitter. Here’s a lit of developers I’m follow because they post really interesting stuff:

  • John Carmack – id Software, creator of Doom, Quake, and lots of other awesome stuff
  • Ryan C. Gordon (icculus) – Awesome dude, ports a lot of games to linux
  • Ted Dziuba – He’s got opinions, and he isn’t shy about sharing them
  • Zed Shaw – An old-school hacker, he’s always up to something interesting

There’s a ton more out there, go and check it out!