Tuesday, June 19, 2018

HADOOP MAP REDUCE VS. APACHE SPARK





Both Hadoop Map Reduce and Spark are open-source platforms for writing various applications. There need is evident in the current Big Data scenario. They are comparable in many aspects like speed of processing, real time applications, ease of use and fault tolerance etc.
  1. Hadoop is a platform or structure that encourages everyone to store Big Data in effective and disseminated way. Hadoop system has two central core components i.e. HDFS (Hadoop Distributed File System) and Map Reduce. YARN (Yet Another Resource Negotiator) is an additional component which improves execution and speed of operations. The framework of Hadoop Map Reduce is such that it furthermore eases the process of accessing the information in parallel phase.


  1. Apache Spark is an autonomous data processing and handling machine for real time applications. Spark can be introduced on any Distributed File framework like Hadoop. Analogous to YARN, Spark has its own cluster resource manager know as Local Resource administrator. This manager isn't as developed as YARN so it isn't utilized as a part of Production.
Following is the detailed analysis of Hadoop Map Reduce and Apache Spark:
  1. Speed of Operation:
Spark provides a faster data processing engine. It is basically designed for workloads or applications involving fast computation. Apache Spark gives an execution which is 10 times speedier than Map Reduce on disk and 100 times quicker than Map Reduce on a system in memory.


  1. Graph Processing:
Spark accompanies a chart calculation library called GraphX to make things more clear. In-memory calculation combined with graph support enables it to perform better as compared to conventional Map Reduce programs. On the other hand, in Hadoop many algorithms such as Page Rank, lays more emphasis on similar kind of information. Map Reduce at times performs repeated iterations which makes it inefficient in data processing in comparison to Apache Spark. Hadoop Map Reduce tend to read the same data twice from the disk and HDFS for each iteration. Such a procedure builds inactivity and makes graph handling poor.


  1. User Friendly:
Generally speaking, Apache Spark is easier to use than Hadoop. It is available in many variants and comes with Application Programming Interfaces (APIs) for Scala (its native language), Python, and Spark SQL. This makes Spark more user friendly than Hadoop Map Reduce. Instant feedbacks help to improve the user interface further. As far as Hadoop is concerned Java is the prominent language used to code. This makes it very difficult to program. Abstractions are also needed when working on Hadoop platform.


  1. Cost:
Being open-source projects, both the frameworks discussed are free to use. But there are certain hidden costs attached to Apache Spark. Spark utilizes large amounts of Random Access Memory (RAM) to run everything present in memory, Every-one is aware about the cost of RAM, which is costlier than space or memory. This adds to the cost in Spark framework. There is no doubt that Hadoop will be cheaper than Spark. The cost factor may be ignored by many organizations if looking for more general and real time processing engine.


  1. Adaptation to internal failure:
Apache Spark utilizes RDD and different models for adaptation to internal failure by limiting system I/O. In case of loss of a RDD, the RDD remakes that segment through the data it has currently.  On the other hand, Hadoop accomplishes adaptation to non-critical failure through replication. Map Reduce utilizes Task Tracker and Job Tracker for adaptation to non-critical failure. Thus both the open-source platforms have a certain amount of in-built fault tolerance.



  1. Security Aspects:

Hadoop Training Map Reduce offers more security features as compared to Apache Spark. Kerberos authentication is supported by Hadoop which boosts the security. Spark is prone to external threats. Spark offers only authentication support through shared secret passwords. To beef up the security, organizations are encouraged to run Spark on Hadoop Distributed File System.

1 comment:

3 Best Practices and Strategies for Test Automation

Test Automation increases the efficiency and quality of the testing. Manual Testing takes a long time to finish the project. It is time t...