<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Possibility and Probability &#187; Math</title>
	<atom:link href="http://ironboundsoftware.com/blog/category/math/feed/" rel="self" type="application/rss+xml" />
	<link>http://ironboundsoftware.com/blog</link>
	<description>Droplets of Yes and No</description>
	<lastBuildDate>Wed, 28 Dec 2011 01:37:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Big Data, Big Opportunity</title>
		<link>http://ironboundsoftware.com/blog/2011/02/01/big-data-big-opportunity/</link>
		<comments>http://ironboundsoftware.com/blog/2011/02/01/big-data-big-opportunity/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 03:38:10 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://ironboundsoftware.com/blog/?p=396</guid>
		<description><![CDATA[There is a really great article about data is the new commodity in the same way that we look at oil. One thing the both have in common is that they are out there, it just who is willing to go and dig it up. Information is quickly piling up all over the place, and [...]]]></description>
			<content:encoded><![CDATA[<p>There is <a href="http://gigaom.com/2011/02/01/mining-the-tar-sands-of-big-data/">a really great article about data is the new commodity</a> in the same way that we look at oil. One thing the both have in common is that they are out there, it just who is willing to go and dig it up.</p>
<p>Information is quickly piling up all over the place, and I agree with the article that the people who are able to capitalize on this are the ones that will get the big payoff. I especially like the idea of calling these start-ups &#8220;wildcats&#8221;, that perfectly captures the wild west atmosphere that is going to start happening.</p>
<p>The neat thing is that a lot of this information is out there for free, the real value is how people are going to aggregate those individual data streams into a new and often unexpected products. Take twitter for example (<a href="http://twitter.com/nloadholtes">are you following me on twitter?</a>), it is a conduit to what is going on in the hive mind of the internet. This site seems to be <a href="http://trendyontwitter.blogspot.com/">gathering up the trends on twitter</a> and then adding news articles about some of the things that are hot.</p>
<p>That is pretty neat: Data is generated in the form of people tweeting about Topic X, as X becomes more &#8220;important&#8221; (in this case more people discuss it so that it rises above other topics) it gets published to the &#8220;trending&#8221; list. This website then goes in and looks at that list adds more data to the conversation by reporting news about topic X. That way the separate data points are tied together to show that there is a relationship between them, and in the process makes the data more valuable to the end users (by supplying more context, etc.)</p>
<p>Big data is going to lead to a lot of big opportunities. All we have to do is find the data, combine it in the right way, and perform the right data analysis on it. And unlike bit oil, big data is going to be around a very long time.</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2011/02/01/big-data-big-opportunity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stargate Universe: I&#8217;m not the only one not watching</title>
		<link>http://ironboundsoftware.com/blog/2010/06/16/stargate-universe-im-not-the-only-one-not-watching/</link>
		<comments>http://ironboundsoftware.com/blog/2010/06/16/stargate-universe-im-not-the-only-one-not-watching/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 03:14:23 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[Entertainment]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Space]]></category>
		<category><![CDATA[TV & Movies]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=348</guid>
		<description><![CDATA[I&#8217;m a huge fan of SG-1. I own all 10 seasons on DVD. Even the last one, which *really* wasn&#8217;t the greatest. When Stargate Universe was announced I was skeptical. I wasn&#8217;t big on Stargate Atlantis, but I decided that the new series deserved a fair shake. So I watched the first half of the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a huge fan of SG-1.</p>
<p>I own all 10 seasons on DVD. Even the last one, which *really* wasn&#8217;t the greatest.</p>
<p>When Stargate Universe was announced I was skeptical. I wasn&#8217;t big on Stargate Atlantis, but I decided that the new series deserved a fair shake.</p>
<p>So I watched the first half of the season. It went from &#8220;Not bad&#8221; to &#8220;Ehhh&#8221; to &#8220;*sigh*&#8221; to &#8220;Why bother&#8221; pretty quickly for me. Since it was taking a 4 month break between the first and second half combined with me not really caring about the major characters, I never watched the second half.</p>
<p>Today on <a href="http://www.gateworld.net/news/2010/06/sgus-season-one-ratings-report/">the most excellent Stargate site, Gateworld.net</a>, they released some ratings numbers. It looks like I wasn&#8217;t alone in my decision to tune out. Check out this nifty plot I made of the ratings over time:</p>
<p><img src="https://spreadsheets.google.com/oimg?key=0AnYsR527TmLRdE9vTjNtQk1fNXdudG9qSE1zYUZFY1E&amp;oid=1&amp;zx=leguu09yg78s" alt="" /></p>
<p>*sigh*&#8230;..</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2010/06/16/stargate-universe-im-not-the-only-one-not-watching/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The iPad and the German Tank Problem</title>
		<link>http://ironboundsoftware.com/blog/2010/03/15/the-ipad-and-the-german-tank-problem/</link>
		<comments>http://ironboundsoftware.com/blog/2010/03/15/the-ipad-and-the-german-tank-problem/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 02:06:21 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Probability]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=337</guid>
		<description><![CDATA[Based on order numbers the day the iPad was made available for pre-order, several sites were speculating that Apple was selling about 50,000 iPads per hour. This reminded me of something I heard about back when the iPhone was just getting started: The German Tank Problem. Its basically a way of statistically estimating the maximum [...]]]></description>
			<content:encoded><![CDATA[<p>Based on order numbers the day the iPad was made available for pre-order, several sites were speculating that Apple was selling about 50,000 iPads per hour.</p>
<p>This reminded me of something I heard about back when the iPhone was just getting started: The <a href="http://en.wikipedia.org/wiki/German_tank_problem">German Tank Problem</a>.</p>
<p>Its basically a way of statistically estimating the maximum number of &#8220;items&#8221; that have been produced based on the serial numbers of the item. This method works when the serial numbers are in sequential order, and basically allows you to produce a somewhat realistic estimation (as opposed to a wild-ass guess).</p>
<p>Estimation is one of those skills that most people (myself included) could stand to improve a little bit. In the wikipedia link above, this is shown by the intelligence estimates the Allies had for the number of tanks the Germans were producing in the middle of WWII. The initial estimates were really high (1,000 per month), but using statistics based on the serial numbers of crankcases from captured or destroyed German tanks showed that the number might be lower (around 200 per month, or 1/5th of the original. After the war when the factory records were looked at, the true number was a lot closer to 200 than 1,000.</p>
<h2>Math FTW</h2>
<p>The formula is pretty easy:</p>
<p style="padding-left: 30px;">N=m + (m/k) -1</p>
<p>Where <strong>m</strong> is the largest serial number observed, and <strong>k</strong> is the number of serial numbers seen. The <a href="http://en.wikipedia.org/wiki/Variance">variance</a> is roughly equal to:</p>
<p style="padding-left: 30px;">(N^2)/(k^2)</p>
<p>Which means that the <a href="http://en.wikipedia.org/wiki/Standard_deviation">standard deviation</a> is roughly:</p>
<p style="padding-left: 30px;">N/k</p>
<p>So, how does this apply to the iPad&#8217;s initial orders? Based on the data points of 2 known orders spaced 50,000 numbers apart (keeping in mind these order numbers also probably included orders for items other than iPads). So plugging those two numbers into the equation we get N=</p>
<blockquote><p>
&gt;&gt;&gt; m=50000<br />
&gt;&gt;&gt; k=2<br />
&gt;&gt;&gt; N = m + (m/k)-1<br />
&gt;&gt;&gt; N<br />
74999<br />
&gt;&gt;&gt;</p></blockquote>
<p>Wow, almost 75,000&#8230; That seems like a really big number. The question we should then ask ourselves is &#8220;How realistic is this number?&#8221; Using the standard deviation and variance we could find out how spread out our numbers are:</p>
<blockquote><p>&gt;&gt;&gt; (N**2)/(k**2)<br />
1406212500L<br />
&gt;&gt;&gt; N/k<br />
37499<br />
&gt;&gt;&gt;</p></blockquote>
<p>Wow. Those numbers huge. And that is a very bad thing. The bigger the standard deviation and variance, the more less accurate the estimation. Another way to approach this analysis is to look at the confidence interval and see how big it is. The wikipedia article has a handy formula for finding the <a href="http://en.wikipedia.org/wiki/German_tank_problem#Confidence_intervals">confidence interval</a> which leads us to the estimation that there are (based on <strong>k</strong>=2 and <strong>m</strong>=50,000) between 50,000 and 225,000 iPads ordered!</p>
<h2>Those numbers are lying</h2>
<p>Why? There&#8217;s two reasons: The main one is that our sample size (of 2 orders) is waaaaaaaay too small. Its like trying to guess how big your grocery bill is by averaging the price of two items, and then multiplying it times the number of things you bought. That estimation will be way off.</p>
<p>As an example, if there were 20 orders to work with (<strong>k</strong>=20), the high end of the confidence interval would shrink to 58,000. But that leads to the second reason why we can&#8217;t trust these numbers:</p>
<p>We don&#8217;t know the lower bound.</p>
<p>In other words, yes, there could have been 50,000 orders between the first and last data points. But what if only half of them were for iPads? That would mean that <strong>m</strong> is actually 25,000 which would drastically skew the numbers down. Remember that <strong>N</strong> that was almost 75,000? With <strong>m</strong> at 25,000 (and keeping <strong>k </strong>= 2) <strong>N</strong> drops to 37,499 which is half the original estimate!</p>
<h2>So how many were sold?</h2>
<p>That&#8217;s a really good question. Knowing Apple and how people love their products, I bet they sold a TON of iPads. But based off these rough numbers we see in the news, we can&#8217;t really draw a good conclusion. We can get a couple of estimates which are better than nothing, but they are so numerically shaky (huge standard deviation and enormously questionable confidence interval) that they strain believability. More data points will help establish a upper bound (i.e. the estimated maximum number sold), but without a lower bound to keep us grounded the numbers will still look really huge.</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2010/03/15/the-ipad-and-the-german-tank-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why I like the SICP book</title>
		<link>http://ironboundsoftware.com/blog/2007/11/01/why-i-like-the-sicp-book/</link>
		<comments>http://ironboundsoftware.com/blog/2007/11/01/why-i-like-the-sicp-book/#comments</comments>
		<pubDate>Fri, 02 Nov 2007 02:00:25 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[Lisp]]></category>
		<category><![CDATA[Math]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/2007/11/01/why-i-like-the-sicp-book/</guid>
		<description><![CDATA[The Structure and Interpretation of Computer Programs is turning out to be a rather awesome read. What makes it so cool for me is that I&#8217;m taking a class in Numerical Analysis, and in sections 1.2 and 1.3 the examples read like something right out of the class. The really odd thing for me is [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://mitpress.mit.edu/sicp/full-text/book/book.html">Structure and Interpretation of Computer Programs</a> is turning out to be a rather awesome read. What makes it so cool for me is that I&#8217;m taking a class in Numerical Analysis, and in sections 1.2 and 1.3 the examples read like something right out of the class. The really odd thing for me is that this &#8220;double teaming&#8221; of my brain is paying off in that I&#8217;m learning the Lispy things, and at the same time I&#8217;m learning the Numerical things. Each one builds on the other, reenforcing the lessons learned. Very recursive, just like Lisp.</p>
<p>One of the things I have learned just today is that loops are actually allowed in Lisp. All of the recursive thinking I&#8217;ve been doing lately has made me think that there&#8217;s always a way too write a loop in a recursive manner. But, I was looking at <a href="http://en.wikipedia.org/wiki/Neville%27s_algorithm">Neville&#8217;s Method</a> the other day and it struck me that with it&#8217;s nested loops, that could be quite a challenge to write it in a recursive manner. While trying to find an example of nested loop in Lisp, I found all kinds of cool links I thought I would share:</p>
<ul>
<li><a href="http://www.notam02.no/internt/cm-sys/cm-1.4/doc/contrib/lispstyle.html">Lisp style tips for the beginner</a></li>
<li><a href="http://www.psg.com/~dlamkins/sl/chapter05.html">Successful Lisp</a></li>
<li><a href="http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/math/clmath/0.html">CLMath</a></li>
</ul>
<p>Go check them out, they are great resources!</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2007/11/01/why-i-like-the-sicp-book/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Graphing eBay users to find fraud</title>
		<link>http://ironboundsoftware.com/blog/2006/12/05/graphing-ebay-users-to-find-fraud/</link>
		<comments>http://ironboundsoftware.com/blog/2006/12/05/graphing-ebay-users-to-find-fraud/#comments</comments>
		<pubDate>Wed, 06 Dec 2006 01:55:56 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=260</guid>
		<description><![CDATA[Here&#8217;s an interesting article about data mining and auction fraud. Graphs, and what you can do with them, never ceases to amaze me. The article talks about how looking at the relationships between users on eBay can help uncover fraud and the accomplices that help keep it going. They do this by seeing if the [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s an interesting article about <a href="http://www.sciencedaily.com/releases/2006/12/061205143326.htm">data mining and auction fraud</a>.</p>
<p>Graphs, and what you can do with them, never ceases to amaze me. The article talks about how looking at the relationships between users on eBay can help uncover fraud and the accomplices that help keep it going. They do this by seeing if the relationship between groups of users turns into a <a href="http://mathworld.wolfram.com/BipartiteGraph.html">biparte graph</a>. That is, there is a concentration of links between two groups of users, and few links with other users.</p>
<p>The links between users refers to the &#8220;feedback scores&#8221; that eBay users use to determine a user&#8217;s trustworthiness. Usually, when someone begins ripping others off, they get bad feedback until it reaches a point where no one will do business with them. This is the way that community sites commonly work (i.e. letting the population of users determine each other&#8217;s rankings). The issue is that sometimes there are people lurking in the shadows assisting the fraudster, but because they are never the target of the bad feedback, they are able to keep going supporting new fraudsters (i.e. a new user id).</p>
<p>This setup allows a scammer to setup a new user id and get its feedback levels boosted quickly without having to engage in a lot of &#8220;legitimate&#8221; transactions. Think of it as passing a baton in a relay race: Instead of one person running a mile, why not let several people sprint as fast as they can for a quarter-mile and then hand off to someone &#8220;fresh&#8221;.</p>
<p>Think about it: When was the last time you checked someone&#8217;s feedback ratings on eBay? Probably right before your last purchase/bid. But, when was the last time you checked the people who gave the feedback to see what their reputation was? Probably never. I know I had never thought of this before reading the article.</p>
<p>This technique produces a graphical representation of this relationship which stands out quickly to a user. (Also, as side note, there are mathematical formulas that would/should see this also.) A normal user would probably have a relationship graph (over 2 or more degrees) that looks like a star-burst pattern. A potential scammer would show up in a clustered bipartite graph. This would give the community of users (eBay bidders) a powerful tool to determine who is the real deal and who is trying to give them a wooden nickel. Very cool stuff. Graph theory to the rescue!</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/12/05/graphing-ebay-users-to-find-fraud/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Penny Stock analysis</title>
		<link>http://ironboundsoftware.com/blog/2006/11/30/penny-stock-analysis/</link>
		<comments>http://ironboundsoftware.com/blog/2006/11/30/penny-stock-analysis/#comments</comments>
		<pubDate>Fri, 01 Dec 2006 01:51:33 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Probability]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Thinking]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=258</guid>
		<description><![CDATA[The stock market is one of those things that really intrigues me. An open system where everyone can see what&#8217;s going on, perhaps make some money, and perhaps influence the direction of the stock. Its a system that is ripe for data mining, something that seems to be equal parts analytical skill, part fortune teller, [...]]]></description>
			<content:encoded><![CDATA[<p>The stock market is one of those things that really intrigues me. An open system where everyone can see what&#8217;s going on, perhaps make some money, and perhaps influence the direction of the stock. Its a system that is ripe for data mining, something that seems to be equal parts analytical skill, part fortune teller, part industry expert, and often times being plain lucky.</p>
<p>I&#8217;ve talked with <a href="http://hipegg.blogspot.com/">Hip Egg and Jym Khana</a> about stocks before and one topic that I bring up every now and then are penny stocks. As I&#8217;m sure most people with an email account know, there&#8217;s a ton of stock related spam going around these days. Most of it appears to be the pump-and-dump variety in which the scammers hope that people will purchase the suggested stock causing the price to rise so that they can sell their shares (that they purchased before sending out the email) at an inflated price. This technique has been around for ever, but it seems to the flavor of the month for scam and con artists.</p>
<p>The main questions that we usually talk about are a)Does any one actually get rich doing this? and b)Just how &#8220;influence-able&#8221; are these low priced stocks? Well, today has been a banner day for answers, I came across two articles talking about the scams:</p>
<ul>
<li><a href="http://www.crummy.com/features/StockSpam/">Stock Spam Effectiveness Monitor</a> (via <a href="http://joelonsoftware.com">Joel</a>)</li>
<li><a href="http://blog.wired.com/business/2006/11/when_youre_an_o.html">Spammers as scammers</a></li>
<li><a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=920553">Stock Touts and Corresponding Market Activity</a></li>
</ul>
<p>While reading these I saw the simplest idea yet to help stop those spams: Simply watch the stocks and see who bought a lot of stock before the email was sent, and who sold a lot right around the sell date in the email.</p>
<p>That idea is pure genius. It targets a potentially large group of people, but the probabilities are that a pattern will emerge that a small group of people are moving from one stock to another. At a minimum those groups would be a starting point for a fraud investigation. More than likely, those would be the people responsible for sending out the emails. And since the spammers are kind enough to send these messages to just about everyone on the planet, it shouldn&#8217;t take too long to gather a nice body of evidence (or actionable intelligence). From what I understand in the past scams like this have been hard to track because the scammers can move quickly. But now that they are announcing their moves in advance, it should be pretty easy to set up a system to monitor spams, then watch the stock activity&#8230; It just seems so simple, that it should work like a champ!</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/11/30/penny-stock-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spicing up iTunes playlists</title>
		<link>http://ironboundsoftware.com/blog/2006/11/10/spicing-up-itunes-playlists/</link>
		<comments>http://ironboundsoftware.com/blog/2006/11/10/spicing-up-itunes-playlists/#comments</comments>
		<pubDate>Fri, 10 Nov 2006 23:46:33 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[Entertainment]]></category>
		<category><![CDATA[ipod]]></category>
		<category><![CDATA[Music]]></category>
		<category><![CDATA[Probability]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=255</guid>
		<description><![CDATA[I really like my iPod. I&#8217;ve got a 4GB Mini that I keep in the car (hooked up to the stereo) and I listen to that instead of the radio. I have a couple of playlists that account for about 2GB of songs on there. You would think with that many songs all would be [...]]]></description>
			<content:encoded><![CDATA[<p>I really like my <a href="http://apple.com/ipod">iPod</a>. I&#8217;ve got a 4GB Mini that I keep in the car (hooked up to the stereo) and I listen to that instead of the radio. I have a couple of playlists that account for about 2GB of songs on there. You would think with that many songs all would be good. And most of time it is, but after a while, you will listen to all of those songs and start to hear the same ones over and over and over. Even your most favorite of songs will begin to grate on your nerves.</p>
<p>So how do you prevent this?</p>
<p>For me the secret has been to set up two new play lists. One is dedicated to new songs (i.e. just bought or ripped), and the other is for songs that haven&#8217;t been played a while.</p>
<p>With iTunes, you can setup a play list that will select songs based on certain fields. For me, most of my play lists revolve around the &#8220;rating&#8221; of the song. 1 Star means I don&#8217;t like it, 5 Stars means its the best thing I&#8217;ve ever heard. As a consequence, I have a lot of songs that fall into the 3 to 4 star range. Randomly choosing songs from this pool is ok, but for some reason I always seem to wind up with the same core groups of songs, and like I said earlier, they are starting to get stale.</p>
<p>It turns out that there is another field that iTunes keeps track of, the &#8220;last time played&#8221;. This is interesting because now we can build a play list based not only on how much we like the song, but also how long it has been since we have heard it. Combining the two ideas together leads to an interesting new play list. Here&#8217;s a picture of how I have mine setup:</p>
<p align="center"><img id="image264" alt="playlist.png" src="http://ironboundsoftware.com/blog/wp-content/uploads/2006/11/playlist.png" /></p>
<div align="left">With this play list feeding into my iPod when I sync I get a nice selection of &#8220;fresh&#8221; songs almost every time. Since I have about 800 songs rated between 3 and 5 stars, this gives me a good size pool of songs to pull from. And since the play list is time dependant, what is in the list today will be different than what is in the list 2 weeks from now.</div>
<p>The real beauty of this play list is that as songs are listened to, their &#8220;last played&#8221; date is set to now, and when I sync up the next time, a new song will take its place. This way, I can keep listening to the songs I like, but don&#8217;t have to worry about stale songs because the play list is always being refreshed.</p>
<p>And as I listen to songs on the Mac, this updates the last played dates also, so the net effect is I&#8217;m adding a lot of chaotic variability to the play list. Which in turn means that the songs on the play list will tend to be more &#8220;random&#8221; because there are two sources of input (the iPod and iTunes) that are influencing the results of what gets picked.</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/11/10/spicing-up-itunes-playlists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Netflix analysis</title>
		<link>http://ironboundsoftware.com/blog/2006/10/29/netflix-analysis/</link>
		<comments>http://ironboundsoftware.com/blog/2006/10/29/netflix-analysis/#comments</comments>
		<pubDate>Sun, 29 Oct 2006 22:16:57 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Thinking]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=250</guid>
		<description><![CDATA[Looking through the Netflix prize data, I saw something that made me do a double-take. Miss Congeniality seemed to be the most rated movie in the database. That struck me as odd, because I always imagined a movie like Lord Of The Rings would be the #1 most rated movie (since that the folks that [...]]]></description>
			<content:encoded><![CDATA[<p>Looking through the Netflix prize data, I saw something that made me do a double-take. Miss Congeniality seemed to be the most rated movie in the database. That struck me as odd, because I always imagined a movie like Lord Of The Rings would be the #1 most rated movie (since that the folks that I think Netflix is most popular with are into those movies).</p>
<p>Today I saw this analysis on the Netflix forums:<br />
<a href="http://www.netflixprize.com/community/viewtopic.php?pid=800#p800">Netflix Prize: Forum / Miss Congeniality</a>. In this posting there is a breakdown of the movie ratings, and it explains a few things. Like the most loved or the most hated. Its a pretty interesting read (especially if you are feeling lazy and don&#8217;t want to do the SQL, which is the category I fall into).</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/10/29/netflix-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thinking about measuring consumer response</title>
		<link>http://ironboundsoftware.com/blog/2006/09/03/thinking-about-measuring-consumer-response/</link>
		<comments>http://ironboundsoftware.com/blog/2006/09/03/thinking-about-measuring-consumer-response/#comments</comments>
		<pubDate>Sun, 03 Sep 2006 23:11:12 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Probability]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Thinking]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=232</guid>
		<description><![CDATA[Recently I got an interesting promotion in the mail. Home Depot sent the wife and I a gift card that was loaded with a &#8220;mystery amount&#8221; between $1 and $10,000. To find out how much the card is worth, you have to go to a Home Depot and redeem it. Our card turned out to [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I got an interesting promotion in the mail. <a href="http://homedepot.com">Home Depot</a> sent the wife and I a gift card that was loaded with a &#8220;mystery amount&#8221; between $1 and $10,000. To find out how much the card is worth, you have to go to a Home Depot and redeem it. Our card turned out to be worth $1. How much did we wind up spending? $75. As we were walking out of the store I started thinking about this promotion and it hit me how genius it was.</p>
<p>Think about it: They are giving away money in exchange for people shopping at their stores. But, they aren&#8217;t going to loose money, and if anything they will gain huge insights into the consumer base in an area!</p>
<p>For example, lets say they mail out 10,000 of the cards. And lets say there are only two values for the cards, $1, and $10,000. The smart thing would be to send out as few of the $10k cards also possible, so in this exercise we&#8217;ll assume 1 was sent out. That means that the other 9,999 cards were all worth $1 which means the total value of the &#8220;prize&#8221; is $19,999. That is, assuming that all of the cards are used (which in real life probably wouldn&#8217;t happen for a variety of reasons), and ignoring other things like the cost of the mailings, the time of the staff to prepare things, etc.</p>
<p>A lot of people who get these cards are probably like me, they have a bunch of small projects around the house, but they just don&#8217;t have any motivation to do any work on them (because they need parts, they need time, etc., etc., etc.) This card comes along and they decide &#8220;Hey, I could win $10k, why not go down there and get some of the supplies I need anyway, and then I can find out what this is worth!&#8221;.</p>
<p>So lets assume that 50% of the people who get the cards decide to use them. My wife and I wound up spending $75, but I&#8217;m sure that other people will spend more, and other will spend less. For this little exercise, lets figure on the average amount spend is $50 per person. That works out to $250,000 spent at the store! Even if one of the cards happened to be the winning $10k card, Home Depot will still have brought in over $200k of business. If the average spent per customer was higher and/or more people used the cards, the amount would be even higher.</p>
<p>So, for a $20k investment, the business got back $250k in sales. For a company so big, that&#8217;s probably a drop in the bucket, but the effect is something that can&#8217;t be ignored. And that&#8217;s not even the good part!</p>
<p>For those of you who&#8217;ve never seen one of these gift cards, they are basically a plastic card with a bar code on the back. The bar code is scanned at the register which queries the main database to find out the value of the card. Pretty standard stuff. But then it occurred to me that I thought the card was addressed by name to me and my wife. I could be wrong, it might have said resident, but at any rate the ZIP code was on the address.</p>
<p>What this means is that there is now (potentially) a connection between how much was spent at the store and what ZIP code the customer lives in. This is a gold mine of data, it tells you so much about your customer base. Off the top of my head I could think of the following:</p>
<li>Which store did they go to? (i.e. did they drive past one that is closer to where they live?) Did the customer drive past any competitors to get to the store?</li>
<li>How was the turn out from areas where a competitor&#8217;s store is closer than a Home Depot store?</li>
<li>For a given ZIP code how much was spent?</li>
<li>Is there a popular item in a certain area?</li>
<p>Continuing with the example above, imagine if the results of this campaign showed that there was a higher turn out from certain part of town those parts could be investigated to find out what kind of homes are there. It could be that the homes in that area are older and thus more likely to be getting &#8220;fixed up&#8221;. Likewise, if the turnout was low in another area, research could turn up that the houses are newer and less likely to have a need for large amount of work (i.e. the owners are less likely to spend higher amounts of money).</p>
<p>There&#8217;s a ton more that could be derived, but the important thing is that the lessons learned can be applied to the next marketing campaign. There&#8217;s all kinds of targets that can be aimed for: a higher turn out, a larger purchase per customer, moving a certain type of product (like paint, etc.), the list is endless.</p>
<p>If nothing else, the whole thing has shown that I&#8217;m willing to put a lot of thought into something if it will keep me from having to do home repairs. <img src='http://ironboundsoftware.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/09/03/thinking-about-measuring-consumer-response/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is the randomness of randint()?</title>
		<link>http://ironboundsoftware.com/blog/2006/07/02/what-is-the-randomness-of-randint/</link>
		<comments>http://ironboundsoftware.com/blog/2006/07/02/what-is-the-randomness-of-randint/#comments</comments>
		<pubDate>Mon, 03 Jul 2006 03:21:05 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Probability]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=223</guid>
		<description><![CDATA[Recently I discovered the random.randint() function in python. Basically you call it with 2 ints, a low value and a high value. It will return a integer in that range (inclusive). I was playing around with it and I thought it seemed to be giving me the same number awfully often, so I whipped up [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I discovered the <a href="http://www.python.org/doc/current/lib/module-random.html">random.randint()</a> function in <a href="http://python.org">python</a>. Basically you call it with 2 ints, a low value and a high value. It will return a integer in that range (inclusive). I was playing around with it and I thought it seemed to be giving me the same number awfully often, so I whipped up a test: call that method 1 million times, record the values, then repeat 6 times.</p>
<p>I&#8217;m using randint() to simulate dice so I&#8217;m curious to see if the number distribution is even across the numbers 1 through 6. Below is my test code:</p>
<blockquote>
<pre>
for x in range(6):
    counts = [0,0,0,0,0,0,0,0]
    for x in range(ONE_MILLION):
        counts[d6()] += 1
    for i in counts:
        print i, ',',
    print ''
</pre>
</blockquote>
<p>Each time d6() (my wrapper around randint()) is called, it returns a number 1-6. This is used as a look up into the counts list, and the number there is incremented by one. I have 0&#8242;s on both sides of the 1-6 slots just to make sure it really is returning a correctly bounded value. The numbers in each row should sum up to 1 million.</p>
<p>By running this 6 times, I should get an idea of where the numbers are falling to make sure there is an even distribution. (Truly random numbers will have an average distribution over the long term, if they are grouping around one number, then they random number generator is not doing a good job.) I took the total of each column (which should be very close to 1 million) and then found the percent error ( ((amount &#8211; expected) / expected) *100) (omitting the absolute values that are usually used).  The average of the percent errors was 0. This leads me to believe that the distribution of random numbers generated by the randint() function are sufficiently random for my uses.</p>
<p>Now that I have stated this, I have no more excuses but to continue on with coding the game that will use said function in a dice throwing function. <img src='http://ironboundsoftware.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Below is the spreadsheet of my data as generated by <a href="http://spreadsheets.google.com">Google Spreadsheets</a>.</p>
<p><meta name='trixrows' content='10'/><meta name='trixr1' content='0'/><meta name='trixr2' content='10'/><meta name='trixdiv' content='20'/><meta name='trixcnt' content='1'/><meta name='trixlast' content='10'/><body style='border:0px;margin:0px'><br />
<style>.g {text-indent:3px;padding-right:3px;overflow:hidden;white-space:nowrap;letter-spacing:0;word-spacing:0;background:#FFFFFF; z-index:1;border-top:0px none;border-left:0px none;border-bottom:1px solid #CCC;border-right:1px solid #CCC;} .s2{background:white;font-family:Arial;font-size:100.0%;font-weight:bold;font-style:normal;color:#000000;text-decoration:none;text-align:left;vertical-align:bottom;white-space:normal;overflow:hidden;text-indent:0px;padding-left:3px;border-top:0px none;border-left:0px none;border-bottom:1px solid #CCC;border-right:1px solid #CCC;} .s0{background:white;font-family:Arial;font-size:100.0%;font-weight:normal;font-style:normal;color:#000000;text-decoration:none;text-align:right;vertical-align:bottom;white-space:normal;overflow:hidden;text-indent:0px;padding-left:3px;border-top:0px none;border-left:0px none;border-bottom:1px solid #CCC;border-right:1px solid #CCC;} .s1{background:white;font-family:Arial;font-size:100.0%;font-weight:normal;font-style:normal;text-decoration:none;vertical-align:bottom;white-space:normal;overflow:hidden;text-indent:0px;padding-left:3px;border-top:0px none;border-left:0px none;border-bottom:1px solid #CCC;border-right:1px solid #CCC;} </style>
<table border=0 cellpadding=0 cellspacing=0 id='tblMain'>
<tr>
<td>
<table border=0 cellpadding=0 cellspacing=0 class='tblGenFixed' style='font-size:10pt;' id='tblMain_0'>
<tr>
<td class='cAll' style='height:0px;width:0px;'></td>
<td class='cAll' style='height:0px;width:92px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
<td class='cAll' style='height:0px;width:64px;'></td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s0'>0</td>
<td class='g s0'>166367</td>
<td class='g s0'>166368</td>
<td class='g s0'>166846</td>
<td class='g s0'>166996</td>
<td class='g s0'>167006</td>
<td class='g s0'>166417</td>
<td class='g s0'>0</td>
<td class='g s1'></td>
<td class='g s0'>1000000</td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s0'>0</td>
<td class='g s0'>165463</td>
<td class='g s0'>166853</td>
<td class='g s0'>166669</td>
<td class='g s0'>166644</td>
<td class='g s0'>167031</td>
<td class='g s0'>167340</td>
<td class='g s0'>0</td>
<td class='g s1'></td>
<td class='g s0'>1000000</td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s0'>0</td>
<td class='g s0'>167284</td>
<td class='g s0'>166470</td>
<td class='g s0'>167052</td>
<td class='g s0'>166227</td>
<td class='g s0'>166123</td>
<td class='g s0'>166844</td>
<td class='g s0'>0</td>
<td class='g s1'></td>
<td class='g s0'>1000000</td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s0'>0</td>
<td class='g s0'>166893</td>
<td class='g s0'>166893</td>
<td class='g s0'>165958</td>
<td class='g s0'>166655</td>
<td class='g s0'>167011</td>
<td class='g s0'>166590</td>
<td class='g s0'>0</td>
<td class='g s1'></td>
<td class='g s0'>1000000</td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s0'>0</td>
<td class='g s0'>166887</td>
<td class='g s0'>166370</td>
<td class='g s0'>166124</td>
<td class='g s0'>166672</td>
<td class='g s0'>167160</td>
<td class='g s0'>166787</td>
<td class='g s0'>0</td>
<td class='g s1'></td>
<td class='g s0'>1000000</td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s0'>0</td>
<td class='g s0'>166802</td>
<td class='g s0'>167174</td>
<td class='g s0'>166724</td>
<td class='g s0'>165704</td>
<td class='g s0'>166800</td>
<td class='g s0'>166796</td>
<td class='g s0'>0</td>
<td class='g s1'></td>
<td class='g s0'>1000000</td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s2'>Total:</td>
<td class='g s0'>999696</td>
<td class='g s0'>1000128</td>
<td class='g s0'>999373</td>
<td class='g s0'>998898</td>
<td class='g s0'>1001131</td>
<td class='g s0'>1000774</td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s2'>% Err:</td>
<td class='g s0'>-0.304</td>
<td class='g s0'>0.128</td>
<td class='g s0'>-0.627</td>
<td class='g s0'>-1.102</td>
<td class='g s0'>1.131</td>
<td class='g s0'>0.774</td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
</tr>
<tr>
<td class='rAll'>
<p style='height:17px;'/></td>
<td class='g s2'>Avg % Err:</td>
<td class='g s0'>-0</td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
<td class='g s1'></td>
</tr>
</table>
</td>
</tr>
</table>
<p></body></p>
<p>By the way, this data was generated with python 2.4.3.</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/07/02/what-is-the-randomness-of-randint/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

