Data Smart, Ch8, Forecasting Seasonal Demand For Replica Swords

Executive Summary In chapter 8 of John Foreman’s book, Data Smart, he turns to forecasting demand for a fictional replica sword manufacturing business. The author focuses an Exponential smoothing method which takes Trend and Seasonality into account (ETS), known as the Holt-Winters method. The code to generate the forecast in R is very, very concise …

Continue reading ‘Data Smart, Ch8, Forecasting Seasonal Demand For Replica Swords’ »

Data Smart, Ch7, Predicting Pregnancy with Ensemble Models – this time using R’s caret package

Executive Summary In chapter 7 of John Foreman‘s book, Data Smart, he again predicts the pregnancy status of Retail Mart’s customers based on their shopping habits. This time he uses ensemble techniques, specifically bagging and boosting, to build his predictive models. Since the author also provides the R code for a logistic regression and random …

Continue reading ‘Data Smart, Ch7, Predicting Pregnancy with Ensemble Models – this time using R’s caret package’ »

Data Smart, Ch6, Predicting Customer Pregnancies with Logistic Regression

Executive summary In chapter 6 of the book, Data Smart, by John Foreman, Chief Data Scientist at Mailchimp, the synthesized challenge is to predict which of a retailers’ customers are pregnant based on a dataset of their shopping records. A logistic regression model is used. The model is trained on the shopping records of 500 …

Continue reading ‘Data Smart, Ch6, Predicting Customer Pregnancies with Logistic Regression’ »

Data Smart, Ch5, Network Graphs and Community Detection

Executive Summary Chapter 5 of John Foreman‘s book Data Smart looks at data which can be arranged as a network graph of related data points. It uses a cluster analysis technique called Modularity Maximization to optimize cluster assignments for the graph data. We can implement the same process succinctly in R, making use of functions in the R igraph and lsa packages. …

Continue reading ‘Data Smart, Ch5, Network Graphs and Community Detection’ »

Data Smart, Ch2, Customer Segmentation With R Using K-Medians Clustering

Executive Summary This is a walk-through of a customer segmentation process using R’s skmeans package to perform k-medians clustering. The dataset examined is that used in chapter 2 of John Foreman‘s book, Data Smart. The approach followed is that outlined by the author. The major difference is that the author, as per his teaching objectives, built his solution …

Continue reading ‘Data Smart, Ch2, Customer Segmentation With R Using K-Medians Clustering’ »

Being ‘Data Smart’ with Predixion Insight

 Figure 1: Gartner Magic Quadrant for Advanced Analytics Platforms, 2015 & Data Smart Executive Summary I signed up for a trial of Predixion’s predictive analytics software, Predixion Insight and decided to test it on some of the data problems posed in John Foreman‘s book, Data Smart. Specifically, I examined the application of Insight’s classification, segmentation and forecast models. What I …

Continue reading ‘Being ‘Data Smart’ with Predixion Insight’ »

Review of Data Smart (the book) by John Foreman – It’s Excellent

I highly recommend John Foreman’s book: ‘Data Smart – Using Data Science to Transform Data into Insight’. The author’s approach is unique – he teaches data science skills without teaching programming. His approach works because he limits the newness of each subject item to one dimension, that being the data science technique at hand. Each skill is introduced in the …

Continue reading ‘Review of Data Smart (the book) by John Foreman – It’s Excellent’ »

Predixion Delivers At ‘The Last Mile Of Analytics’

Agenda: ‘From Data Science to Business Impact’, with Jamie MacLennan, Co-founder and CTO of Predixion, at the Data Science Dojo meetup, Redmond, WA, Feb. 3, 2015.   In Brief Prediction has developed an impressive, cloud-based and user friendly predictive analytics framework which can be used on the data organizations have today. The session consisted of: an update on their distinctive and …

Continue reading ‘Predixion Delivers At ‘The Last Mile Of Analytics’’ »

Hadoop Essentials – The Eight Things You Need To Know

Agenda: Hadoop Essentials Live, by Cloudera, Seattle, December 18, 2014.   In brief: Cloudera prepared and presented a comprehensive slidedeck (200 slides) describing Hadoop and its ecosystem. Here are eight lessons from the workshop that are worth learning and remembering. 1. What is Hadoop?  Hadoop is a software framework for storing, processing and analyzing large volumes of data. Its key features are that it …

Continue reading ‘Hadoop Essentials – The Eight Things You Need To Know’ »

Zillow opens the kimono – reveals R, Python and Graphlab Create underneath

Meetup: ‘Data Science at Zillow – the Zestimate and Beyond‘, at the Python Data Science Meetup, Seattle, Jan 27th, 2015. Slidedeck: http://slidesha.re/1ALRbvU   In brief Zillow described their 20TB dataset and the technology they use to estimate house values for more than 110 million homes in the US. Zillow uses the statistical programming language, R, for both prototyping and production. The use of Python in Zillow is …

Continue reading ‘Zillow opens the kimono – reveals R, Python and Graphlab Create underneath’ »