For older Meetup reports, please scroll down.
The main speaker this evening, Rolfe Lindberg, of Double Down Interactive, gave the most impressive and convincing presentation I have seen on BI systems this year. Underlying the story of a young company’s success and a BI function’s progression, is a tale of great execution. Rolfe’s presentation is available here: BI Buildout.
Double Down is a social-gaming company. It makes casino-style apps, the most successful being slot-machine games. It is the #3 top-grossing Facebook app and #4 top-grossing iPAD app. Annual sales are $300M, up 600% from 2011. Number of employees: >200. Double Down was purchased by IGT, a maker of real-world gambling equipment, for $500M in 2012.
Rolfe started the BI function at Double Down just two and a half years ago, as leader of a team of two people. Since then, the function has grown to a team of nine people, BI has come to be seen as a core business driver and BI’s services are utilized daily by all facets of the business.
One of the first BI implementations was the generation and transfer of simple business metrics to a web-browser, accessible to key stakeholders at any time. This improvement replaced a manual query system and daily email performance summary. The execution of the automated web-browser function was done by one MySQL programmer and Rolfe, acting as the PHP programmer and designer. The new service proved to be extremely popular (an important initial win for the BI function) and was heavily used, leading to many more calls for additional BI services to be added.
The next major advance was the commissioning and installation of a Data Warehouse, ultimately hosted on Amazon AWS servers. Session-level game data was made available for analysis from early 2012. Currently, 2TB of data at the slot-machine spin level is being generated daily. However, this data is aggregated up to session-level before being subject to analysis.
Double Down soon realized that their tabular DWH would not scale to handle the data being generated and, after a 5-month search, they were able to hire a data architect to design a star-schema Enterprise Data Warehouse. In a one-week period, they built a working instance in Amazon’s Redshift columnar database. Redshift’s speed proved to be 10X-100X that of SQL and Double Down selected it as the technology for their EDW. It also meant spectacular price savings. Rolfe benchmarked it as being 0.3% the cost of a similar solution in SAP. Through smart technology and business decisions, the technology solution for Double Down cost a mere $100k!
Other comments re: technology and vendors:
Tableau is used in Double Down by their marketing department for some complementary reports.
Talend is used for ETL functions and to automate SQL routines.
MongoDB is currently being used. The aim is to transfer the entire data warehouse to Amazon Redshift in mid-2014, in order to improve data quality.
Google Analytics, Medio and, especially, Kontagent (now Upsight), were given consideration for analytics services. However, Double Down chose not to transfer this activity outside the company as the knowledge inherent in the data and the data organization was deemed to be a strategic asset. Data analysts and statisticians have since been hired. The application of analytics in Double Down has so far has been descriptive in nature. It is anticipated that predictive analytics will play a greater role in the future.
R is used for statistical programming and analysis.
Rolfe referred to BI guru Wayne Eckerson’s blog as a source of advice. Rolfe’s specific pointers, based on his experience at Double Down are:
- Start small
- Focus on business needs to drive technology decisions. Function should report into high-level business exec
- Listening to third-party technology vendor proposals can be instructive. If you do not fully understand or are not convinced of the benefits or believe you can best execute yourself, do not proceed
- Important to build a well-thought out EDW
- BI developers and analysts should be in the same organization
- Experienced BI people are hard to find, you may have to grow your own. Double Down is hiring
The progression of Double Down’s BI function from set-up to core-function was compellingly told and is a very useful reference. A job well done.
Tim Flood, Account Director, Digital Analytics at Syntasa, started the evening’s session with a well thought-out and comprehensive overview of BI/Data Analytics Education Resources at the present time. Tim’s presentation is available here: Data analytics education resources
The learning continues,
Topic 1: Microsoft’s Savas Parastatidis gave a great demonstration of Cortana, Bing’s digital, voice-activated personal assistant. The communication with Cortana appeared very close to natural language. Cortana makes extensive use of online resources and the user’s personal data repository to optimize its responses. Note: Cortana still functions when the phone is offline but with diminished capability. Cortana is extensible. Savas encouraged developers with an eye on the future to look at Schema. org.
Topic 2: Steve McPherson from Amazon’s Elastic Map Reduce team spoke of Amazon’s Kinesis real-time processing of streaming data.. EMR is Hadoop in the cloud. Kinesis online docs are available here. The flow is:
S3 -> HDFS -> DynamoDB -> Redshift -> (Glacier) -> RDBS
# Amazon S3 is Amazon’s basic data storage in the cloud.
# DynamoDB is Amazon’s managed NoSQL database service
# Redshift is Amazon’s low-latency, columnar-stored, massively parallel Data Warehouse-as-a-Service. Click on the link for a great explanation video.
# Glacier is Amazon’s data archiving service.
Amazon Kinesis takes in large streams of data records that can then be consumed in real time by multiple data-processing applications, which can be run on Amazon Elastic Compute Cloud (Amazon EC2) instances. Data output goes to Amazon’s Simple Storage Service, Redshift or Elastic Map Reduce.
The type of data used in an Amazon Kinesis use case includes IT infrastructure log data, application logs, social media, market data feeds, web clickstream data, and more. Because the response time for the data intake and processing is in real time, the processing is typically lightweight.
The delay between the time a record is added to the stream and the time it can be retrieved (put-to-get delay) is less than 10 seconds.
What is streaming data?
- continuous, sequential, granular
- typically machine generated: sensor readings, server reports, …
- examples: Netflix records 80 billion events per day.
- Flow: sensors -> recording service -> aggregator/sequencing service (e.g. Kafka, Flume, Scribe) -> continuous processor (e.g. Spark, Storm) -> storage (Phoenix) -> analytics and reporting
- High-availability is required, connection failure can lead to permanent data loss.
- Scalability required
- Physical elasticity required -> need flexible capacity to accommodate demand
Kinesis manages splits and merges to handle scale. Units are in shards. The user provisions the shards e.g. 1MB in but up to 2 MB out available.
An Amazon objective is that the ‘the man in the street’ can run AWS functions, such as Kinesis. It should be possible to use Hive, Pig or SQL scripts to retrieve information.
Steven gave an example of Kinesis in operation on a website with a Log4j appender.
The learning continues,