课程: Data Platforms: Spark to Snowflake

免费学习该课程!

今天就开通帐号,24,700 门业界名师课程任您挑!

Apache Spark

广州荔湾区开展“新春文化”主题的道德讲堂总堂活动

课程: Data Platforms: Spark to Snowflake

Apache Spark

百度   理想信念教育是党的思想建设的重中之重。

- [Dr. Berman] Apache Spark, which sits on top of Hadoop, is also a big data analytics engine or platform. It keeps as much of the data and memory as possible. This means that it's generally faster than Hadoop, especially for iterative work such as running machine learning algorithms. These are the algorithms for which it was originally designed. Unlike Hadoop, Spark does not come with its own cluster management system, but attaches to a number of pre-existing ones, including Hadoop's YARN system. Also, unlike Hadoop, Spark does not have its own distributed data store, but once again can attach to a number of existing data stores including the one supplied in Hadoop. Let's talk about some Spark concepts. In Spark, there's a driver which sends jobs to various executors. The jobs are divided into stages and the data is partitioned with tasks running in parallel per partition. Though not always in parallel, but in parallel whenever possible. At the heart of Spark is the resilient…

内容