Why you need tests instead of comments
Posted by Nick on 12/17/2012 filed in Programming, Software Development, TDDToday I saw this blog post from Timothy Fitz: I hate comments
My first reaction was to dismiss this as another anti-documentation post by a coder who prefers doing nothing but coding.
But…
Timothy makes some really good points. His intent isn’t to downplay documentation (something that programmers are notorious for skimping on), but rather to play up the need to keep your code honest.
The best example is when he notes that a unit test could replace a comment about what a function “should never do”. That is a really insightful statement. Instead of leaving a comment, leave a unit test. If you are using a continous integration system, it should detect if someone ever violates the intent of the function. A broken test is much more likely to be seen than a potentially out-of-date comment.
I still think comments aren’t the “trash” that many programmers believe they are, but I do think that a unit test would be of more value to the code base overall.
The funny thing is that if you get into an argument with a programmer about comments and you counter with “Well, lets replace it with a unit test!” I think you will find the programmer suddenly back peddling. If there’s one thing a lot of programmers like less than writing comments it is writing unit tests.
Which is a shame, because unit tests that give you high code coverage are so useful when you want to keep your system running correctly.
Today’s Django lesson
Posted by Nick on 12/27/2011 filed in django, Python, WebToday I was working on re-doing my main website. I have built the site in Django, and then when I’m ready to publish I use the static generator module to create an html snapshot of the pages that I then upload to the server. Basically I wanted the fun of using Django, but the ease of serving up static html pages.
The last time I did this, the version of Django was 0.9 and today’s version is 1.3.1. As a I started work on the site I discovered some things here and there that had changed in the years in-between those version. The one that bit me the hardest was this: I could view the main page (/), but any other page (including index) would give me a 404 error.
This was really weird because the mappings existed in the urls.py file, but a 404 error means it couldn’t be found. If it was a programmatic error you would expect a 500 error, but I never saw one.
Slowly but surely I tore the system apart looking to see what could cause mapped urls to disappear in Django. Eventually I discovered the root cause was how static resources are handled.
One of the changes in the new versions of Django was to add functionality for handling static resources. The biggest change (at least as far as my ancient code was concerned) was in introduction of the STATIC_URL variable. According the documentation this is the prefix for static resources (like css, javascript) that are referenced by the web page that is being built by Django.
In my laziness many years ago, I just plopped the main CSS file (and everything else) in the root of the web directory so that in the html pages didn’t have to have paths on them. So when I was adding the STATIC_URL variable to make sure my CSS file would actually load, I set the variable this way:
STATIC_URL = ‘/’
…because it wouldn’t let me put an empty string of ”. Well, it turns out that this was wrong wrong wrong. Django was looking at the incoming requests as I tried to visit the pages of the site and checking the static directory to see if there was anything there (which there wasn’t, just my CSS file).
Once I realized this, I changed it to:
STATIC_URL = ‘/static/’
…which basically forces you to create a directory to hold your static files. Which is right right right right. Basically Django was forcing me to clean up my act and code the site in a much cleaner manner. As soon as I did that all of the 404 errors went away and I was able to hit every page in my urls.py file.
Choices – The hidden powers of python
Posted by Nick on 2/6/2011 filed in Programming, Python, Software DevelopmentThe other day I saw something about this interesting posting on Stack Overflow: What are the lesser-known but useful features of the Python programming language? As someone who loves to work in Python and is always looking for a better way to get things done, I had to read this article right away.
I was pretty pleased with myself that I knew a lot of the things listed, and I was really happy when I ran into some nuggets that I knew nothing about. For example, I did not know that you could do this in Python:
1 < x <10
To me, that is really awesome because not only is what you would write in a mathematical sense, but it also is much more concise and what you would see in a pseudo-code representation of a program. That is the power of Python, taking an idea (a computer program) and presenting it in a readable manner. Being able to use the above type of expression just takes us further down that path of readability and that is a good thing. And yes, you can write crappy code in Python, but at least you have the choice: many languages simply have one way to do things. Choices are good.
And speaking of choices, I thought this feature was very cool, even if I would never use it:
from __future__ import braces
That allows the use of {} instead of white space in a Python program. I know some people can’t stand the indention rules of the language and feel more comfortable when they are “braced” (for impact?), but the fact the language has features to be able to turn this on? That is 1 million pounds of awesome in a 5 pound sack.
There is a ton of other neat tricks there, go and read up and see what you can do to make your code work more for you! Choice is a great thing. With Python, you have lots of great choices you can make. And if you decide to make no choices, you still wind up with pretty readable code which is always a good thing.
Big Data, Big Opportunity
Posted by Nick on 2/1/2011 filed in analytics, data, Statistics, WebThere is a really great article about data is the new commodity in the same way that we look at oil. One thing the both have in common is that they are out there, it just who is willing to go and dig it up.
Information is quickly piling up all over the place, and I agree with the article that the people who are able to capitalize on this are the ones that will get the big payoff. I especially like the idea of calling these start-ups “wildcats”, that perfectly captures the wild west atmosphere that is going to start happening.
The neat thing is that a lot of this information is out there for free, the real value is how people are going to aggregate those individual data streams into a new and often unexpected products. Take twitter for example (are you following me on twitter?), it is a conduit to what is going on in the hive mind of the internet. This site seems to be gathering up the trends on twitter and then adding news articles about some of the things that are hot.
That is pretty neat: Data is generated in the form of people tweeting about Topic X, as X becomes more “important” (in this case more people discuss it so that it rises above other topics) it gets published to the “trending” list. This website then goes in and looks at that list adds more data to the conversation by reporting news about topic X. That way the separate data points are tied together to show that there is a relationship between them, and in the process makes the data more valuable to the end users (by supplying more context, etc.)
Big data is going to lead to a lot of big opportunities. All we have to do is find the data, combine it in the right way, and perform the right data analysis on it. And unlike bit oil, big data is going to be around a very long time.
Python in the Amazon cloud
Posted by Nick on 12/30/2010 filed in Linux, Programming, PythonI have a couple ideas rolling around in my head, and recently I decided to take advantage of Amazon’s AWS service since they are offering a “free” tier. (Basically its free if you are below certain generous limits on resource utilization.) Since python is my weapon of choice these days, I set about to figure out how to get python up and running on an EC2 instance.
If you are going to go down the Linux route with AWS (and honestly why wouldn’t you?), then you are in luck since python ships on most modern distributions. But, while python is “batteries included” there are some really nice projects out there that it is good to take advantage of. Installing these projects is where you start to find out there are differences in the EC2 AMI’s.
My first attempt was with the default Amazon AMI. It seemed like a good choice, but the more I played with it, the more I realized that it was pretty bare-bones. easy_install is included, but when running it to try and install something you run into permissions problems.
Because this is supposed to be a nice experiment, I only played around with it for a hour or so. My patience exhausted, I went looking for an alternative and ran right into the perfect match: ActiveState has an AMI specifically for python developers
This is awesome because ActiveState puts out a really nice Windows package for Python (and other lesser languages). I fired up their AMI instance and have found it to be really nice. I put my python project out there earlier this month and it ran without a problem… until the code crashed because I forgot to put a try catch block in place.
Seriously though, I really like this AMI. Not only is a nice python distribution there, but they also included gcc so if you need to compile something, you don’t have to go far. The easy_install script runs like a dream, and I haven’t had any problems with it at all. In fact, my whole experiment (which seems to getting attention mostly from blog aggregators at the moment) has only cost me a total of $0.19. Yes, only 19 cents! And 1 cent of that was from the ActiveState Python AMI (the other 18 cents was from my playing around with the Amazon AMI that wasn’t that great).
Now to get back to my ideas, and sling some python code out into the cloud…
Dev Twitters to follow
Posted by Nick on 12/7/2010 filed in ProgrammingIn an earlier post, I talk about how twitter is becoming the new “plan file”. Since that post, I’ve joined twitter, and I try to use it as a plan file for my own development activities. 140 characters isn’t a lot of space, so in a way I think it is more honest: you have to get to your point pretty quickly. Which isn’t to say that you can’t goof off with it…
But of course, I’m not the best or most interesting developer on twitter. Here’s a lit of developers I’m follow because they post really interesting stuff:
- John Carmack – id Software, creator of Doom, Quake, and lots of other awesome stuff
- Ryan C. Gordon (icculus) – Awesome dude, ports a lot of games to linux
- Ted Dziuba – He’s got opinions, and he isn’t shy about sharing them
- Zed Shaw – An old-school hacker, he’s always up to something interesting
There’s a ton more out there, go and check it out!
Mercurial: forking from one place to another
Posted by Nick on 11/6/2010 filed in mercurial, Software DevelopmentLets say there’s a cool project that you want to make a fork of at http://hg.example.com/coolproject and you want to store it on BitBucket. How do you do this?
It is pretty easy to do! First clone the project to your local system:
hg clone http://hg.example.com/coolproject
You now have a local copy. Then go and log into your bitbucket account. Create a new repository, for this example lets call it coolproject-fork. To push your local copy to BitBucket you can do this:
hg push http://bitbucket.org/your_username/coolproject-fork
Then your local copy will be pushed out to BitBucket. But… what do you do if you want to pull from the original and push to your private fork? Doing an hg out (from your local copy) will show that it is trying to push to http://hg.example.com, not BitBucket. To fix this, you have to modify the hgrc in the project and tell it the out path is different than the in path.
Go into the .hg directory in the coolproject directory and open up the hgrc file. Make these changes:
[paths]
default = http://hg.example.com/coolproject
default-push = http://bitbucket.org/your_username/coolproject-fork
Now you can do hg pull to get updates from the original project (on example.com) and when you do hg push it will push your changes back to BitBucket!
Slow and steady wins the race
Posted by Nick on 10/18/2010 filed in Programming, Software DevelopmentEvery now and then I’ll have a conversation with a developer and the topic of “Why doesn’t everyone switch to X?” comes up. For X, substitute any of the following “new” things: version control, language, database, OS, methodology, pattern.
It has taken me a while, but I have learned through experience that just because something is new, it doesn’t mean you should jump into it.
A lot of people don’t realize the value that an established process/technology provides, that stability can be priceless. For example, people who jumped whole-hog into JavaFX are probably starting to regret that choice.
Now having said that, there is a break even point, the older a project is, the more one should look at specific fixes for specific problems. Switching from Java to .NET? Not important. Switching from COBOL to Java/.NET/Ruby? Important. Changing as few variables at one time as possible? Vitally important!
Version Control really seems to be finally getting the attention it deserves lately and that is a great thing. Disciplined open source projects have been setting a great example for developers and businesses alike for some time. As a result, a lot of them are using “older” technologies like CVS. A few years ago there was a flood of people breaking away from CVS to use Subversion, and these days there’s a push to go to distributed version control solutions like Git or Mercurial.
This great article talks about one group who made that decision and the work they did to ensure their history (and build-able version of older releases) were ported over into the brave new world of Git.
I am very impressed with the level of maturity the project leaders showed with that conversion: They did not just drink the git-workflow kool-aide, instead they decided to change only one thing: the version control system.
Side note: One of the major turn offs for me about the Git community was their instance on adopting their workflow model when using the git tool. While I welcome new points of view and ideas, a while ago it seemed like it was “git’s way or the highway” when it came to using that system. Mercurial on the other hand seemed a little more open to the typical svn/cvs way of doing development. That is one of the reasons why I personally chose to work with Mercurial. In the time since then my personal workflow has changed a little bit to be more like the DVCS approach. In time I might become less of a svn developer and more of a DVCS developer. Having said that, I still don’t care for the Git staging area.
My hat is off to them for keeping their project under their control while modernizing a significant piece of infrastructure. My hope is that Postgres will stand as an example of how to grow without starting over. Once the developers, testers, users, and other stakeholders get used to the new version control system, then they can make the next move (whatever that may be).
I think that down the road a few years this will be seen as an important decision by the project leadership, in the event of a bug being found, they will be able to look at the history and see when it entered the system and at the same time be able to reap the benefits of the DVCS revolution.
There is no gate address for Sci-Fi on Tuesdays
Posted by Nick on 10/6/2010 filed in Entertainment, Space, TV & MoviesWell, it looks like the ratings for Stargate Universe continue to swirl around the black hole of cancellation.
Tuesday nights just are not meant for sci-fi. Repeat after me: Friday night. That’s when all of the hard workin’ geeks out there (like myself) just want to kick back with a cold beverage and let the TV watch them. Tuesday nights, that the most boring night of the week. Tuesday is just the graveyard that Monday gets buried in.
I’m not a fan of SGU, I stopped watching after the mid-season finale last season, but the show has some hope. I think they could do a lot to make it super exciting like SG-1 (and to a lesser extent SGA) used to be… But handicapping them by putting the show on Tuesday night. Wow, that is like a nail in the coffin.
I am not sure what the advertising for the show was like this season, I sure have not seen non-stop promos for it. Maybe SyFy wants the show to tank… That’s a bit of a shame, the franchise is a great concept. It seems like they tried too hard to make it into Battle-Stargate-Galactica with this incarnation, but hey, at least it wasn’t vampires. (Atlantis, I am looking sternly in your direction…)
At any rate, seeing this post has inspired me to compare the ratings for the various shows to see if this is an isolated thing, or is the series really sinking. My current hypothesis is that it is probably not doing real well.
Hopefully the next iteration (if there is one) will avenge the fallen.
TDD and honesty in code
Posted by Nick on 9/10/2010 filed in Astronomy, Python, Software Development, TDDI’m working on a little project to help keep track of satellites.
Specifically, I’m taking an existing code base (the AIAA SGP4 code which is written in C/C++), wrapping it with python, and then eventually putting a web front end on it. The end goal is that astronomers will be able to use this code/website to check their observations to see if there was a satellite in their field of view when they took their measurements.
Since I’m using a known library in to create something new, this seemed like a really great opportunity to use some Test Driven Development (TDD) to ensure that as I build up the parts of the system, the numbers stay true and that the code stays honest.
This is working out really well. Combining python’s unit test module with nosetests allows me to quickly bang out a test to make sure that the positional numbers that the SGP4 code generates is what the documentation says it should be. By using TDD I can map out where the code needs to go a little better: As I find something that is missing I can write out a quick test that will fail. When I run nosetests, I have that failure glaring at me telling me where more code is needed.
For example, I have one set of test data that gives the observer’s location in standard latitude and longitude coordinates. My code isn’t quite expecting this, the MPC format is a little bit different. This is a problem that is easy to fix, but I have a plethora of problems at the moment and I need to make sure this one gets taken care of properly. So I opened up the python file that tests the Observatory class and added in a failing test that called testLatLongToXYZ(). When I run the tests I now see that this is a failing method which gives me the mental kick in the pants to go and implement that method, even if it is just a stub.
Why do I say just a stub? Because putting in the stub lets me do two things: 1) I’ve now got something there, and 2) I can now update the test to call that stub. The secret is that the test should now call the stub and expect it to pass, but the stub should always return something (None, False, -1, etc.) to indicate that it failed.
This acts as a pointer to an area that needs improvement in the code. This is what TDD is all about. You keep repeating this process until you converge on a “correct” answer which is a method that does just what it needs to do. In theory, and so far my practice confirms this, the resulting code should be smaller and more accurate.
For me this is a great thing because it encourages code that is more honest. What is honest code? To me, it is code that does what it says and nothing else. The shorter the code is, the more focused it is. The python language is very expressive and allows you to do a lot without saying a lot. This power can lead you to try and do too much in one function. Once you have a function that not only converts Lat and Long, but also converts polar coordinates and writes satellite data to the database, can you really honestly call that function convertLatLongToXYZ()?
This is especially valuable in the scientific context of this astronomy related code. The code is open source, and in the event there is a problem with it, as people dig into the code they need to have confidence that the code is relatively well written. At a high level the tests give them the numerical sense of ease that it is working when the tests run and show the proper numbers being returned. At a low level, seeing that the code is broken up into small logical sections gives the confidence that once problem is found it can be corrected, tested, and shown that there are minimal side effects in other parts of the program.
As the lone developer on this project, this type of check-and-balance gives me a lot of hope that I’ll be able to produce something that is accurate and useful to the community at large. Openness and honesty help build confidence and trust. If you are interested in the project, be sure to check it out:
ObsSatId: A python astronomy project to wrap the SGP4 code to check for satellites