<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Possibility and Probability &#187; analytics</title>
	<atom:link href="http://ironboundsoftware.com/blog/category/analytics/feed/" rel="self" type="application/rss+xml" />
	<link>http://ironboundsoftware.com/blog</link>
	<description>Droplets of Yes and No</description>
	<lastBuildDate>Wed, 28 Dec 2011 01:37:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Big Data, Big Opportunity</title>
		<link>http://ironboundsoftware.com/blog/2011/02/01/big-data-big-opportunity/</link>
		<comments>http://ironboundsoftware.com/blog/2011/02/01/big-data-big-opportunity/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 03:38:10 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://ironboundsoftware.com/blog/?p=396</guid>
		<description><![CDATA[There is a really great article about data is the new commodity in the same way that we look at oil. One thing the both have in common is that they are out there, it just who is willing to go and dig it up. Information is quickly piling up all over the place, and [...]]]></description>
			<content:encoded><![CDATA[<p>There is <a href="http://gigaom.com/2011/02/01/mining-the-tar-sands-of-big-data/">a really great article about data is the new commodity</a> in the same way that we look at oil. One thing the both have in common is that they are out there, it just who is willing to go and dig it up.</p>
<p>Information is quickly piling up all over the place, and I agree with the article that the people who are able to capitalize on this are the ones that will get the big payoff. I especially like the idea of calling these start-ups &#8220;wildcats&#8221;, that perfectly captures the wild west atmosphere that is going to start happening.</p>
<p>The neat thing is that a lot of this information is out there for free, the real value is how people are going to aggregate those individual data streams into a new and often unexpected products. Take twitter for example (<a href="http://twitter.com/nloadholtes">are you following me on twitter?</a>), it is a conduit to what is going on in the hive mind of the internet. This site seems to be <a href="http://trendyontwitter.blogspot.com/">gathering up the trends on twitter</a> and then adding news articles about some of the things that are hot.</p>
<p>That is pretty neat: Data is generated in the form of people tweeting about Topic X, as X becomes more &#8220;important&#8221; (in this case more people discuss it so that it rises above other topics) it gets published to the &#8220;trending&#8221; list. This website then goes in and looks at that list adds more data to the conversation by reporting news about topic X. That way the separate data points are tied together to show that there is a relationship between them, and in the process makes the data more valuable to the end users (by supplying more context, etc.)</p>
<p>Big data is going to lead to a lot of big opportunities. All we have to do is find the data, combine it in the right way, and perform the right data analysis on it. And unlike bit oil, big data is going to be around a very long time.</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2011/02/01/big-data-big-opportunity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Metrics: The kick in the ass that is the key to productivity</title>
		<link>http://ironboundsoftware.com/blog/2010/07/12/metrics-the-kick-in-the-ass-that-is-the-key-to-productivity/</link>
		<comments>http://ironboundsoftware.com/blog/2010/07/12/metrics-the-kick-in-the-ass-that-is-the-key-to-productivity/#comments</comments>
		<pubDate>Tue, 13 Jul 2010 02:27:34 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Getting Things Done]]></category>
		<category><![CDATA[Organization]]></category>
		<category><![CDATA[Productivity]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=354</guid>
		<description><![CDATA[What you can measure, you can manage. I just watched a new video from Giles where he talks about how you can improve your programming productivity. Its a really good short video that hits the nail on the head. If you want to make a change, guessing about what to fix won&#8217;t cut it. You [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p><em>What you can measure, you can manage.</em></p></blockquote>
<p>I just watched a new video from Giles where he talks about how you can <a href="http://gilesbowkett.blogspot.com/2010/07/secrets-of-superstar-programmer_12.html">improve your programming productivity</a>. Its a really good short video that hits the nail on the head. If you want to make a change, guessing about what to fix won&#8217;t cut it. You need to measure what you are doing, and then adjust accordingly.</p>
<p>The quote at the top was in the video, and I have to say I don&#8217;t think I had ever heard it before. But as soon as I heard it I knew it was 100% correct. Think about it: everyone who is in charge of things in you life (bosses, teachers, parents, etc.) track what you do to some extent. When you step out of line, they know about it and are able to let you know. Why? Because they are &#8220;measuring&#8221; you progress via grades, work done, chores completed, etc.</p>
<p>So if it works for them, why can&#8217;t it work for you? Track the things that are important to you, and see if you can make a positive change.</p>
<p>I&#8217;ve been hemming and hawing lately about tracking things like my programming projects or seeing if my neighborhood association really is increasing the house values. The time for action is now.</p>
<p>My first step: Putting widget on this blog to track my <a href="http://bitbucket.org">BitBucket</a> RSS feed. If I&#8217;m going to work on <a href="http://bitbucket.org/nloadholtes">a project out in the open</a> why not let everyone know about it? This way if I&#8217;m not being productive, it will be pretty visible.</p>
<p>(As a side note, this is something that Giles has mentioned before that I really believe in: If you are a programmer, you should have some project out in the public eye. Open source is a good thing. Contributing to open source is a great thing. Being known as a programmer who contributes to open source software is the best thing.)</p>
<p>My next step: Start treating time tracking on my projects as a first class citizen. I&#8217;m starting a new sprint tomorrow: I&#8217;m going to track my time better. Also, I&#8217;m going to add some tasks to my <a href="http://bitbucket.org/nloadholtes/obssatid">Satellite Tracking project</a> so I can make sure I&#8217;m on task when I&#8217;m working on it.</p>
<p>Thanks Giles, that video was a good kick in the ass. <img src='http://ironboundsoftware.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2010/07/12/metrics-the-kick-in-the-ass-that-is-the-key-to-productivity/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Matching resumes to jobs</title>
		<link>http://ironboundsoftware.com/blog/2009/10/03/matching-resumes-to-jobs/</link>
		<comments>http://ironboundsoftware.com/blog/2009/10/03/matching-resumes-to-jobs/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 21:50:48 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=309</guid>
		<description><![CDATA[Have you ever looked at a job posting and tried to figure out if you are a good match for that job? I&#8217;ve written a Google App Engine application to try and help people figure that out. Paste in a copy of your resume and a copy of the job description, and it will try [...]]]></description>
			<content:encoded><![CDATA[<p>Have you ever looked at a job posting and tried to figure out if you are a good match for that job?</p>
<p>I&#8217;ve written a Google App Engine application to try and help people figure that out. Paste in a copy of your resume and a copy of the job description, and it will try and figure out how well of a match you would be for that job.</p>
<p>Check it out: <a href="http://app.ironboundsoftware.com">http://app.ironboundsoftware.com</a></p>
<p>I&#8217;m really impressed with the Google App Engine environment (go Python!) and had fun writing this. Hopefully this will help people out in their job hunt. Times are tough, and hopefully this little application will help someone get into the perfect job for them.</p>
<p>Try it out and let me know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2009/10/03/matching-resumes-to-jobs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Technorati skipping smaller blogs?</title>
		<link>http://ironboundsoftware.com/blog/2007/01/04/technorati-skipping-smaller-blogs/</link>
		<comments>http://ironboundsoftware.com/blog/2007/01/04/technorati-skipping-smaller-blogs/#comments</comments>
		<pubDate>Fri, 05 Jan 2007 03:03:11 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Thinking]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=263</guid>
		<description><![CDATA[I&#8217;ve been reading The MineThatData Blog for a few weeks now, and the other day there was an interesting article about understanding the traffic that a website receives. The article talked about using sites like Alexa, Blog Juice, Bloglines, and Technorati to measure a site&#8217;s popularity. Overall it seems like a good approach to aggregate [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading The <a href="http://minethatdata.blogspot.com/2006/12/fully-understanding-traffic-your-site.html">MineThatData Blog</a> for a few weeks now, and the other day there was an interesting article about understanding the traffic that a website receives. The article talked about using sites like <a href="http://www.alexa.com/">Alexa</a>, <a href="http://www.text-link-ads.com/blog_juice/">Blog Juice</a>, <a href="http://www.bloglines.com/">Bloglines</a>, and <a href="http://www.technorati.com/">Technorati</a> to measure a site&#8217;s popularity.</p>
<p>Overall it seems like a good approach to aggregate this data together to get the &#8220;big picture&#8221; of where one&#8217;s website stands in the web. I&#8217;ve tried this but I&#8217;ve noticed that Technorati doesn&#8217;t seem to report the numbers I expect it would.</p>
<p>For example: I got a link to my blog from Hip Egg a few weeks ago, and this link has not been reported on Technorati. I know this blog is small potatoes in the grand scope of the universe, but it strikes me as odd that my more recent updates are featured in my &#8220;favorites&#8221; (as is Hip Egg&#8217;s posts), yet the link from him hasn&#8217;t shown up.</p>
<p>My working assumption is that Technorati has some kind of filter where lower popularity sites aren&#8217;t &#8220;updated&#8221; as often as the bigger sites. Either that or the link database is broken. But links for other sites seem to be working, although I&#8217;m not watching them as closely as I watch my own stats. <img src='http://ironboundsoftware.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>At any rate I wonder how many other blogs (or sites in general) are suffering from this problem. Metrics for websites are difficult to agree on, so a site or sites where a reputation can be established and measured becomes more and more important. Aggregating data from multiple sites is a good start, but if there are too many &#8220;issues&#8221; with how a site is ranked, then the data becomes suspect and it becomes harder to get a clear picture of what&#8217;s going on.</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2007/01/04/technorati-skipping-smaller-blogs/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Graphing eBay users to find fraud</title>
		<link>http://ironboundsoftware.com/blog/2006/12/05/graphing-ebay-users-to-find-fraud/</link>
		<comments>http://ironboundsoftware.com/blog/2006/12/05/graphing-ebay-users-to-find-fraud/#comments</comments>
		<pubDate>Wed, 06 Dec 2006 01:55:56 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=260</guid>
		<description><![CDATA[Here&#8217;s an interesting article about data mining and auction fraud. Graphs, and what you can do with them, never ceases to amaze me. The article talks about how looking at the relationships between users on eBay can help uncover fraud and the accomplices that help keep it going. They do this by seeing if the [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s an interesting article about <a href="http://www.sciencedaily.com/releases/2006/12/061205143326.htm">data mining and auction fraud</a>.</p>
<p>Graphs, and what you can do with them, never ceases to amaze me. The article talks about how looking at the relationships between users on eBay can help uncover fraud and the accomplices that help keep it going. They do this by seeing if the relationship between groups of users turns into a <a href="http://mathworld.wolfram.com/BipartiteGraph.html">biparte graph</a>. That is, there is a concentration of links between two groups of users, and few links with other users.</p>
<p>The links between users refers to the &#8220;feedback scores&#8221; that eBay users use to determine a user&#8217;s trustworthiness. Usually, when someone begins ripping others off, they get bad feedback until it reaches a point where no one will do business with them. This is the way that community sites commonly work (i.e. letting the population of users determine each other&#8217;s rankings). The issue is that sometimes there are people lurking in the shadows assisting the fraudster, but because they are never the target of the bad feedback, they are able to keep going supporting new fraudsters (i.e. a new user id).</p>
<p>This setup allows a scammer to setup a new user id and get its feedback levels boosted quickly without having to engage in a lot of &#8220;legitimate&#8221; transactions. Think of it as passing a baton in a relay race: Instead of one person running a mile, why not let several people sprint as fast as they can for a quarter-mile and then hand off to someone &#8220;fresh&#8221;.</p>
<p>Think about it: When was the last time you checked someone&#8217;s feedback ratings on eBay? Probably right before your last purchase/bid. But, when was the last time you checked the people who gave the feedback to see what their reputation was? Probably never. I know I had never thought of this before reading the article.</p>
<p>This technique produces a graphical representation of this relationship which stands out quickly to a user. (Also, as side note, there are mathematical formulas that would/should see this also.) A normal user would probably have a relationship graph (over 2 or more degrees) that looks like a star-burst pattern. A potential scammer would show up in a clustered bipartite graph. This would give the community of users (eBay bidders) a powerful tool to determine who is the real deal and who is trying to give them a wooden nickel. Very cool stuff. Graph theory to the rescue!</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/12/05/graphing-ebay-users-to-find-fraud/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Penny Stock analysis</title>
		<link>http://ironboundsoftware.com/blog/2006/11/30/penny-stock-analysis/</link>
		<comments>http://ironboundsoftware.com/blog/2006/11/30/penny-stock-analysis/#comments</comments>
		<pubDate>Fri, 01 Dec 2006 01:51:33 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Probability]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Thinking]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=258</guid>
		<description><![CDATA[The stock market is one of those things that really intrigues me. An open system where everyone can see what&#8217;s going on, perhaps make some money, and perhaps influence the direction of the stock. Its a system that is ripe for data mining, something that seems to be equal parts analytical skill, part fortune teller, [...]]]></description>
			<content:encoded><![CDATA[<p>The stock market is one of those things that really intrigues me. An open system where everyone can see what&#8217;s going on, perhaps make some money, and perhaps influence the direction of the stock. Its a system that is ripe for data mining, something that seems to be equal parts analytical skill, part fortune teller, part industry expert, and often times being plain lucky.</p>
<p>I&#8217;ve talked with <a href="http://hipegg.blogspot.com/">Hip Egg and Jym Khana</a> about stocks before and one topic that I bring up every now and then are penny stocks. As I&#8217;m sure most people with an email account know, there&#8217;s a ton of stock related spam going around these days. Most of it appears to be the pump-and-dump variety in which the scammers hope that people will purchase the suggested stock causing the price to rise so that they can sell their shares (that they purchased before sending out the email) at an inflated price. This technique has been around for ever, but it seems to the flavor of the month for scam and con artists.</p>
<p>The main questions that we usually talk about are a)Does any one actually get rich doing this? and b)Just how &#8220;influence-able&#8221; are these low priced stocks? Well, today has been a banner day for answers, I came across two articles talking about the scams:</p>
<ul>
<li><a href="http://www.crummy.com/features/StockSpam/">Stock Spam Effectiveness Monitor</a> (via <a href="http://joelonsoftware.com">Joel</a>)</li>
<li><a href="http://blog.wired.com/business/2006/11/when_youre_an_o.html">Spammers as scammers</a></li>
<li><a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=920553">Stock Touts and Corresponding Market Activity</a></li>
</ul>
<p>While reading these I saw the simplest idea yet to help stop those spams: Simply watch the stocks and see who bought a lot of stock before the email was sent, and who sold a lot right around the sell date in the email.</p>
<p>That idea is pure genius. It targets a potentially large group of people, but the probabilities are that a pattern will emerge that a small group of people are moving from one stock to another. At a minimum those groups would be a starting point for a fraud investigation. More than likely, those would be the people responsible for sending out the emails. And since the spammers are kind enough to send these messages to just about everyone on the planet, it shouldn&#8217;t take too long to gather a nice body of evidence (or actionable intelligence). From what I understand in the past scams like this have been hard to track because the scammers can move quickly. But now that they are announcing their moves in advance, it should be pretty easy to set up a system to monitor spams, then watch the stock activity&#8230; It just seems so simple, that it should work like a champ!</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/11/30/penny-stock-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>One person vs. Traffic Waves</title>
		<link>http://ironboundsoftware.com/blog/2006/11/24/one-person-vs-traffic-waves/</link>
		<comments>http://ironboundsoftware.com/blog/2006/11/24/one-person-vs-traffic-waves/#comments</comments>
		<pubDate>Fri, 24 Nov 2006 22:17:11 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Cars]]></category>
		<category><![CDATA[The coming apocalypse]]></category>
		<category><![CDATA[Thinking]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=257</guid>
		<description><![CDATA[Thanks to reddit, I read this article today about how one person can change traffic waves. This is a topic I wind up thinking about a lot as I sit in traffic. Compared to most people (if you believe the news reports), my commute isn&#8217;t too terribly bad (i.e. mine is less than 30 minutes [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to <a href="http://reddit.com/">reddit</a>, I read this article today about how <a href="http://amasci.com/amateur/traffic/trafexp.html">one person can change traffic waves</a>. This is a topic I wind up thinking about a lot as I sit in traffic. Compared to most people (if you believe the news reports), my commute isn&#8217;t too terribly bad (i.e. mine is less than 30 minutes most days), but reading this article did make me think.</p>
<p>Traffic waves are what happens when there is a slow down for some reason in the flow of traffic. The cause could be an accident, a glare from the sun that blinds people, or just about anything that causes traffic to slow down. As people slow down, the drivers behind them also have to slow down. As the first drivers pass the &#8220;distraction&#8221; that caused them to slow down, they begin to speed up. But this speed up does not get propagated to the other drivers right away, so the drivers further back in the pack are still going slow (and thus causing the drivers behind them to slow down). The result is a &#8220;standing wave&#8221; where the cars slow down. As long as there are more cars heading to wave, the wave will persist (assuming the original distraction is gone), once the rate of cars coming towards the wave slows, the wave breaks down and disappears.</p>
<p>Anyways&#8230; The article has an interesting idea of using Police patrol cars to help break up the wave by having them in the traffic (several miles before the slowdown) driving at a &#8220;slower&#8221; speed than the normal traffic flow. Because people are not likely to drive fast past a cop, this effectively slows the rate of cars flowing into the wave, which helps to break it up. Its a really interesting idea, and I think it could really work.</p>
<p>One thing that I&#8217;m not so sure about is the authors assertion that he could affect the same thing by driving at a steady rate (i.e. avoiding stop-and-go and trying to keep a good buffer distance between himself and the car in front of him). His idea is that once he does this it encourages the drivers behind him to do the same thing. I&#8217;m not sure I agree with this, I see a lot of impatient drivers on a daily basis. Maybe its just here in Atlanta, but if there&#8217;s a half a car length in front of you, and your lane is moving, someone is going to try and get in there.</p>
<p>Having said that, I do like the spirit of the idea and I&#8217;m going to try it out next week as I drive in traffic. Who knows, maybe a few other people will read that article and try the same thing. Anything that keeps the traffic moving is a good thing in my book.</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/11/24/one-person-vs-traffic-waves/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Netflix analysis</title>
		<link>http://ironboundsoftware.com/blog/2006/10/29/netflix-analysis/</link>
		<comments>http://ironboundsoftware.com/blog/2006/10/29/netflix-analysis/#comments</comments>
		<pubDate>Sun, 29 Oct 2006 22:16:57 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Thinking]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=250</guid>
		<description><![CDATA[Looking through the Netflix prize data, I saw something that made me do a double-take. Miss Congeniality seemed to be the most rated movie in the database. That struck me as odd, because I always imagined a movie like Lord Of The Rings would be the #1 most rated movie (since that the folks that [...]]]></description>
			<content:encoded><![CDATA[<p>Looking through the Netflix prize data, I saw something that made me do a double-take. Miss Congeniality seemed to be the most rated movie in the database. That struck me as odd, because I always imagined a movie like Lord Of The Rings would be the #1 most rated movie (since that the folks that I think Netflix is most popular with are into those movies).</p>
<p>Today I saw this analysis on the Netflix forums:<br />
<a href="http://www.netflixprize.com/community/viewtopic.php?pid=800#p800">Netflix Prize: Forum / Miss Congeniality</a>. In this posting there is a breakdown of the movie ratings, and it explains a few things. Like the most loved or the most hated. Its a pretty interesting read (especially if you are feeling lazy and don&#8217;t want to do the SQL, which is the category I fall into).</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/10/29/netflix-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Protecting minors by mining MySpace</title>
		<link>http://ironboundsoftware.com/blog/2006/10/16/protecting-minors-by-mining-myspace/</link>
		<comments>http://ironboundsoftware.com/blog/2006/10/16/protecting-minors-by-mining-myspace/#comments</comments>
		<pubDate>Tue, 17 Oct 2006 03:01:44 +0000</pubDate>
		<dc:creator>Nick Loadholtes</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Thinking]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.ironboundsoftware.com/blog/?p=246</guid>
		<description><![CDATA[This is a very interesting article:Wired News: MySpace Predator Caught by Code Finally, a mashup that does something useful. Â  As the article points out there are certain patterns of usage that probably would raise red flags. What surprises me the most though is people aren&#8217;t excited about this type of technology being used. Given [...]]]></description>
			<content:encoded><![CDATA[<p>This is a very interesting article:<a href="http://wired.com/news/technology/0,71948-0.html?tw=wn_index_1">Wired News: MySpace Predator Caught by Code</a></p>
<p>Finally, a mashup that does something useful. <img src='http://ironboundsoftware.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> Â  As the article points out there are certain patterns of usage that probably would raise red flags. What surprises me the most though is people aren&#8217;t excited about this type of technology being used.</p>
<p>Given the large size of the user base of MySpace, it is pretty impractical to have a set of eyes on every user and every posting that goes on. Granted, there is a real possibility of false hits when doing an automated search like this. But given that a computer can sift through the set of millions of possible hits and narrow it down to a few hundred (which can then be followed up by a human), to me it is a no brainer.</p>
<p>The best part of doing an automated data mining scan of a site like <a href="http://myspace.com">MySpace</a> would be that its a computer, not a person doing the scanning. Computers don&#8217;t make judgements, or laugh at your music choices, they just scan. To me it seems like this would be the best of all possible worlds: Let the computer make the rough pass over the site, and pick out the most &#8220;questionable&#8221; users/postings for human followup. That way the &#8220;invasion of privacy&#8221; (if there even is such a thing on the internet anymore) is limited. And by making sure that a human is doing the follow up, we hopefully remove the problem of the over-zealous filter that assumes everyone is bad.<br />
I&#8217;m curious to see what the reaction is when the reporter releases the code that lead to the investigations in the story. Will MySpace adopt its usage? Will vigilante surveillance groups pop up and patrol the internet? Interesting times lie ahead&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://ironboundsoftware.com/blog/2006/10/16/protecting-minors-by-mining-myspace/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

