In this seminar you will get an overview of the most important Spark components and the architecture of a Spark application.
Today, more and more data needs to be stored and processed in ever shorter time. This is where conventional frameworks and algorithms quickly rech their limits. Apache Spark - a framework for the distributed processing and calculation of large amounts of data - provides the solution. In this seminar you will get an overview of the most important Spark components and the architecture of a Spark application. You will apply your newly gained knowledge in practical exercises and write your first own jobs in Python. Among other things, you will use Spark core - the foundation of Spark's parallel processing - to analyze data using Spark SQL and learn the key configurations, including those related to YARN. You will also get a brief introduction to streaming (Spark Streaming), machine learning (MLlib) and graph-processing (GraphX).- Overview of Spark
- Spark Core
- Spark architecture
- Spark SQL
- Spark Streaming, MLlib and GraphX
- You know Apache Spark and the associated components.
- You have learned how to use SPARK programming in Python in exercies.
- You know the structure of a Spark application and the most important configuration parameters.
- You will be able to implement your first SPARK solutions.
Participation in our seminar "Python Programming" (P-PYTH-01) or comparable knowledge, participation in our seminar "Hadoop Basics" (HADOOP-01) or comparable knowledge
Application developers, data engineers, data scientists, system integratos, IT architects, IT consultants