Hosted by the Knock Data user group, the night will cover two main topics. The first being Apache Spark, a fast and general engine for large-scale data processing. The programs can run up to 100x faster than MapReduce. This is a hands-on session without Spark or Scala experience required. Apache Spark architecture will be explained along the way and the only requirement is a laptop with Docker installed.
The second half of the eveing will focus on machine learning, in particular its potential to explore news articles. It’s an End to End solution, including Web Scraping, Natural Language Processing, Visualization and Insight Extraction will be presented in the form of a Web App in R Shiny.
17:45 – 18:00 Arrive
18:00 – 18:35 Big Data Analytics with Apache Spark by Rockie Yang @ ThinkBigAnalytics
18:35 – 19:00 Break with food and drinks
19:00 – 19:35 Machine Learning Project about Exploring News Articles by Konrad Ilczuk & Keven Wang @ ThinkBigAnalytics
19:35 – More drinks and a chance to ask questions