课程: Data Platforms: Spark to Snowflake

免费学习该课程!

今天就开通帐号,24,700 门业界名师课程任您挑!

Apache Hadoop

加息靴子落地A股未见回升,量能、均线成两大制约因素

课程: Data Platforms: Spark to Snowflake

Apache Hadoop

百度 也许他并不是有意这样,但是在球队落后的情况下这样漫不经心的踢球,球迷很容易认为他是吊儿郎当没有把恒大胜负当回事。

- [Instructor] Now let's talk about Apache Hadoop. Apache Hadoop is a platform for processing big data. It is also an implementation of the MapReduce programming model. The term Hadoop refers to both the Hadoop implementation of the MapReduce program model and tools in the Hadoop ecosystem. Hadoop distributes work across multiple machines, referred to as nodes, in a cluster. It writes intermediate data to disk as a strategy for dealing with large data that can't all sit in memory. So if we have our input files on disk and we're going to use Hadoop to process them, the first step is to split the files into partitions. We then run various transformations on these partitions and what's known as the map phase. In between each transformation, intermediate files are written to disk, and then the next transformation step reads from these files on disk. Finally, the data is combined to produce the final output. This final combination is referred to as the reduce phase.

内容