Toby Segaran is the author of the O'Reilly book on Collective Intelligence and the Director of Software Development at Genstruct, a biotechnology company.
He loves applying machine-learning and data-mining algorithms to everything ranging from pharmaceutical trials to the Technorati Top 100. He is also the creator of the free web applications Tasktoy and Lazybase.
Huge sets of data are generated every day by people using online applications, whether they're blogging, shopping, or just clicking on links. Many techniques for analyzing and interpreting these datasets exist in the fields of data-mining and machine learning, making it possible to use this data to draw new conclusions and build predictive models.
This talk will use this idea to explore some analyses of how bloggers and buyers cluster together, what message boards tell us about psychographics, predictive models for hotness and home prices, and other insights that can be gleaned from publicly available data.
I'll show you the way the data was collected, an overview of how the algorithm works, and some results.