Hadoopインストールメモ

インストール先はCentOS 5.4(VMWare)


SunのJDKをいれる

[root@localhost ~]# wget -O jdk-6u17-linux-i586-rpm.bin http://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_Developer-Site/en_US/-/USD/VerifyItem-Start/jdk-6u17-linux-i586-rpm.bin?BundledLineItemUUID=x.JIBe.pGdUAAAEloNgdaDYE&OrderID=fdZIBe.pyo0AAAElkNgdaDYE&ProductID=lBFIBe.oSOMAAAEkGehn5G0y&FileName=/jdk-6u17-linux-i586-rpm.bin
[root@localhost ~]# sh jdk-6u17-linux-i586-rpm.bin
[root@localhost ~]# java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)

Hadoopをダウンロードし、解凍。(0.20.1 :2009/11/27時点最新)

[hadoop@localhost ~]# wget http://ftp.kddilabs.jp/infosystems/apache/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
[hadoop@localhost ~]$ tar -zxvf hadoop-0.20.1.tar.gz

パスワードなしでsshログインできるように。

[hadoop@localhost ~]$ ssh-keygen -t rsa -P ""
[hadoop@localhost ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
[hadoop@localhost ~]$ chmod 600 .ssh/authorized_keys

conf/hadoop-env.sh に、JAVA_HOMEを設定。
HADOOP_HEAPSIZEも増やす。

# The java implementation to use.  Required.
export JAVA_HOME=/usr/java/default

# Extra Java CLASSPATH elements.  Optional.
# export HADOOP_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=2000

conf/core-site.xml を設定。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadoop-0.20.1/tempdir</value>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>

conf/hdfs-site.xml を設定。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

conf/mapred-site.xml を設定。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
</configuration>

フォーマット。

[hadoop@localhost hadoop-0.20.1]$ ./bin/hadoop namenode -format

hadoop起動。

[hadoop@localhost hadoop-0.20.1]$ ./bin/start-all.sh
[hadoop@localhost hadoop-0.20.1]$ /usr/java/default/bin/jps 
2614 NameNode
2808 SecondaryNameNode
2703 DataNode
3022 TaskTracker
3117 Jps
2894 JobTracker

サンプルを実行。

[hadoop@localhost hadoop-0.20.1]$ mkdir input
[hadoop@localhost hadoop-0.20.1]$ emacs input/file1
[hadoop@localhost hadoop-0.20.1]$ ./bin/hadoop dfs -copyFromLocal input input
[hadoop@localhost hadoop-0.20.1]$ ./bin/hadoop dfs -ls input
[hadoop@localhost hadoop-0.20.1]$ ./bin/hadoop jar hadoop-0.20.1-examples.jar wordcount input output
[hadoop@localhost hadoop-0.20.1]$ ./bin/hadoop dfs -cat output/part-r-00000