In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data se...

Buy Now From Amazon

Product Review

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.

You€ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques€"classification, collaborative filtering, and anomaly detection among others€"to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you€ll find these patterns useful for working on your own data applications.

Patterns include:

  • Recommending music and the Audioscrobbler data set
  • Predicting forest cover with decision trees
  • Anomaly detection in network traffic with K-means clustering
  • Understanding Wikipedia with Latent Semantic Analysis
  • Analyzing co-occurrence networks with GraphX
  • Geospatial and temporal data analysis on the New York City Taxi Trips data
  • Estimating financial risk through Monte Carlo simulation
  • Analyzing genomics data and the BDG project
  • Analyzing neuroimaging data with PySpark and Thunder


Similar Products

Learning Spark: Lightning-Fast Big Data AnalysisSpark: The Definitive Guide: Big Data Processing Made SimpleDesigning Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable SystemsHadoop: The Definitive Guide: Storage and Analysis at Internet ScaleHigh Performance Spark: Best Practices for Scaling and Optimizing Apache SparkDeep Learning with PythonPython for Data Analysis: Data Wrangling with Pandas, NumPy, and IPythonPractical Statistics for Data Scientists: 50 Essential ConceptsDeep Learning (Adaptive Computation and Machine Learning series)Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems