Wednesday, April 9, 2014

Installing OpenTSDB 2.0 "next" on an HBase cluster in Amazon's ElasticMapReduce (EMR) Service

The remainder of this post will assume you've already gotten an HBase cluster installed on EMR.

Note:  This is not intended as an endorsement of either the performance nor the cost-effectiveness of using EMR to back your OpenTSDB deployment.  Rather, there are definite use cases for being able to quickly bring up a cluster and install OpenTSDB on it, so if you have one of those, this should get it done.


  1. ssh to the master server in your cluster.  If you took the defaults when creating your cluster, you can identify the master in the console by the name of the security group, which will be either ElasticMapReduce-Master or ElasticMapReduce-Slave:

         
    ssh -i nameOfYourKeyPair.pem hadoop@emrMasterPublicDnsName
  2. telnet to the local zookeeper port to make sure zookeeper is installed and running, e.g.:

         telnet localhost 2181and you should see:

         Trying 127.0.0.1...
      Connected to localhost.
      Escape character is '^]'.which means your telnet was successful.  Now enter:
      stats
    and you should see:

       Zookeeper version: 3.4.5-1392090, built 09/30/12 17:52 GMT
       Clients:
        /10.XXX.XXX.XXX:50333[1](queued=0,recved=4048,sent=4048)
        /10.XXX.XXX.XXX:50360[1](queued=0,recved=4052,sent=4052)
        /127.0.0.1:37169[0](queued=0,recved=1,sent=0)
        /10.XXX.XXX.XXX:50343[1](queued=0,recved=12209,sent=12212)
        /10.XXX.XXX.XXX:50331[1](queued=0,recved=4053,sent=4054)
        /10.XXX.XXX.XXX:45165[1](queued=0,recved=12174,sent=12175)
        /10.XXX.XXX.XXX:45160[1](queued=0,recved=4079,sent=4087)
        /10.XXX.XXX.XXX:50350[1](queued=0,recved=4259,sent=4260)
        /10.XXX.XXX.XXX:45168[1](queued=0,recved=12174,sent=12175)
        /10.XXX.XXX.XXX:36065[1](queued=0,recved=4057,sent=4057)
        /10.XXX.XXX.XXX:50336[1](queued=0,recved=8114,sent=8114)
        /10.XXX.XXX.XXX:50332[1](queued=0,recved=4051,sent=4051)
        /10.XXX.XXX.XXX:50335[1](queued=0,recved=4140,sent=4153)
        /10.XXX.XXX.XXX:50337[1](queued=0,recved=20258,sent=24309)
        /10.XXX.XXX.XXX:36062[1](queued=0,recved=4077,sent=4085)

       Latency min/avg/max: 0/1/206
       Received: 101746
       Sent: 105832
       Connections: 15
       Outstanding: 0
       Zxid: 0x1ff1
       Mode: standalone
       Node count: 34
       Connection closed by foreign host.

    So far, so good.
  3. Install git:

         sudo yum install git
  4. In the directory above the location you'd like to have the OpenTSDB repo live, run:
         git clone https://github.com/OpenTSDB/opentsdb.git
  5. To build OpenTSDB, we need to add some more basic dev tools to the base system.  The following command is the nuclear option for adding dev tools:
         sudo yum groupinstall 'Development Tools'
  6. And we also need gnuplot:

         sudo yum install gnuplot
  7. Change directory into the directory containing OpenTSDB:
         cd opentsdb
  8. Pull all of the branches from github:
         git fetch
  9. Checkout "next":
         git checkout next
  10. Now you've got the right code and the right tools.  Let 'er rip!:
         ./build.sh
  11. When complete, change directory into the build directory:
         cd ./build
  12. To verify success, you're looking for the .jar file created by the build process.  In this case, we've built a file named tsdb-2.0.0.jar and a script named tsdb.
         ls tsdb*
  13. And with that, we now have OpenTSDB built and ready to install.  To install it, run:
    sudo make install

  14. Now we'll create the required tables in HBase (with compression).  The good folks at OpenTSDB made this easy by providing a script that does the heavy lifting for us.  By default, this script will enable compression for your HBase tables.  First we'll change directory to the location of the script, then execute it:

    cd ../src

    ./create_table.sh 


  15.  With the tables created, we can configure the OpenTSDB process by editing the opentsdb.conf file and moving it into a place that the process can find it:

    vi ./opentsdb.conf

    and give appropriate values to the following variables (safe recommendations noted in italics, but you should provide answers appropriate for your system/configuration):

    tsd.http.cachedir = /dev/shm/tsdtsd.http.staticroot = /home/hadoop/opentsdb/tsd.storage.hbase.zk_quorum = localhost
     and move the file to a recognized configuration directory:

    sudo mv ./opentsdb.conf /etc  



  16.  Now that you're fully configured, you can start OpenTSDB:

    cd ..

    ./build/tsdb tsd 


  17. You now have OpenTSDB running and receiving requests (both telnet and HTTP!) on port 4242 (assuming you took the defaults)