Using HCP with Hadoop v1.2.pdf

File uploaded by Jeff Lundberg on Jun 10, 2013Last modified by Roguen - DevNet Answer Curator on Feb 21, 2017
Version 3Show Document
  • View in full screen mode

This documents describes how to setup Apache Hadoop to use Hitachi Content Platform (HCP) as source and/or target for its operations. It is left to the reader to decide if it makes sense to run Hadoop against S3-compatible storage, as this will take away a prominent feature of Hadoop: data locality (MapReduce jobs run on the same nodes where the data is stored, preventing from extensive network traffic).


However, if a lot of data to be processed is already located within HCP, it might be more effective to run MapReduce jobs that read from HCP than first copying the data to HDFS. Another point is, that HCP provides data reliability and redundancy out of the box, making the triple-storage overhead of Hadoops HDFS unnecessary.