Install Red Sqirl

Red Sqirl is a web-based application, meaning that once you've installed Red Sqirl onto your Hadoop cluster you can simply perform all your data analytics activities in your browser using the intuitive drag and drop interface.

Red Sqirl has packages that give you an easy to use, intuitive interface for the Hadoop algorithms and software that you already have on your Hadoop cluster. You just install the package for Spark, Pig, Hama etc. from within Red Sqirl, and use these packages to analyse your data.

There are two ways to use Red Sqirl on your Hadoop cluster, Online and Offline.


Online Red Sqirl



Online Red Sqirl can be set up in less than 10 minutes and gives you the ability to download software packages instantly from inside the application.


▸ 1. What you'll need

To use Red Sqirl online there are a few things you'll need to have first

  • A Linux web server

  • An Apache Hadoop cluster with Apache Oozie installed

  • An account on the server (or using LDAP) for every Red Sqirl user

  • SSH on the web server with password authentication enabled for localhost

  • HDFS Home directory for each user (/user/$user)

  • A direct connection to internet and white list redsqirl.com on your firewall

  • At least one of the following installed on your Hadoop cluster

    • Apache Hive
    • Apache Pig
    • Apache Hama
    • Apache Spark

  • To use Spark in Red Sqirl you will need the following python libraries on all data nodes:

    • python-dateutil
    • numpy (used by Spark for machine learning: official documentation)

If you don't have access to a Hadoop cluster you can try Red Sqirl on Docker here


▸ If your Hadoop is Kerberized

Red Sqirl now supports Hadoop secure mode. If your cluster is Kerberized, you'll need to create passwordless principals for your users and copy them to the Red Sqirl server:

The principal should follow the format _USER/_HOST@_REALM for example, myuser/myhost.example.com@EXAMPLE.COM .

All keytabs should be exclusively accessible to the user that it has been created for (permission 400). By default, Red Sqirl is expecting the keytab files at /etc/security/keytabs/redsqirl/user-_USER.keytab.

In a Kerberized environment, Red Sqirl has to sit on a server on which Hadoop is installed. The Hadoop conf folder is required to be in the Hadoop home folder.


▸ 2. Architecture

Red Sqirl uses Tomcat as a web service. When you are logging in, it will create another process owned by the logged in user and make key components available on RMI. Every action on the application is run through the users' process to avoid permission conflicts.

▸ 3. Steps to install

Step 1: Click the Red Sqirl download for the version of Hadoop that you are running

Red Sqirl version 1.5
Compatible Hadoop Version Compatible Hive Version Red Sqirl Release Date
2.7.3 1.2.1 19 September 2017
2.6.0 1.1.0 19 September 2017
2.4.0 1.1.0 19 September 2017
hadoop-1.0.3-mapr-4.1.0-hive-1.1.0 1.1.0 19 September 2017


Step 2: Unzip the download where you want to install it through the Tomcat user

Step 3: Run the script bin/install.sh. It will ask for your Tomcat webapps port and directory if necessary. This is going to start the Tomcat

Step 4: Go to http://myserver:portNumber/redsqirl in your browser, where you should see a login page

Step 5: Create a user account on Red Sqirl Analytics Store, confirm the registration through your email and sign in with your new credentials

Step 6: You will then be asked to install the recommended default package and any other packages you want

Step 7: Set up the settings from the Settings tab

Step 8: You can then click to sign out from the admin page and click to sign into Red Sqirl using your server account details (OS name and password)

Congratulations! You now have Red Sqirl installed

   

Offline Red Sqirl



Offline Red Sqirl gives you the ability to use Red Sqirl separate from a network connection. This may take slightly longer than the Online Red Sqirl installation.


▸ 1. What you'll need

To use Red Sqirl offline there are a few things you'll need to have first

  • A Linux web server

  • An Apache Hadoop cluster with Apache Oozie installed

  • An account on the server (or using LDAP) for every Red Sqirl user

  • SSH on the web server with password authentication enabled for localhost

  • HDFS Home directory for each user (/user/$user)

  • At least one of the following installed on your Hadoop cluster

    • Apache Hive
    • Apache Pig
    • Apache Hama
    • Apache Spark

  • To use Spark in Red Sqirl you will need the following python libraries on all data nodes:

    • python-dateutil
    • numpy (used by Spark for machine learning: official documentation)
If you don't have access to a Hadoop cluster you can try Red Sqirl on Docker here


▸ If your Hadoop is Kerberized

Red Sqirl now supports Hadoop secure mode. If your cluster is Kerberized, you'll need to create passwordless principals for your users and copy them to the Red Sqirl server:

The principal should follow the format _USER/_HOST@_REALM for example, myuser/myhost.example.com@EXAMPLE.COM .

All keytabs should be exclusively accessible to the user that it has been created for (permission 400). By default, Red Sqirl is expecting the keytab files at /etc/security/keytabs/redsqirl/user-_USER.keytab.

In a Kerberized environment, Red Sqirl has to sit on a server on which Hadoop is installed. The Hadoop conf folder is required to be in the Hadoop home folder.


▸ 2. Architecture

Red Sqirl uses Tomcat as a web service. When you are logging in, it will create another process owned by the logged in user and make key components available on RMI. Every action on the application is run through the users' process to avoid permission conflicts.

▸ 3. Steps to install

Step 1: Click the Red Sqirl download for the version of Hadoop that you are running

Red Sqirl version 1.5
Compatible Hadoop Version Compatible Hive Version Red Sqirl Release Date
2.7.3 1.2.1 19 September 2017
2.6.0 1.1.0 19 September 2017
2.4.0 1.1.0 19 September 2017
hadoop-1.0.3-mapr-4.1.0-hive-1.1.0 1.1.0 19 September 2017


Step 2: Unzip the download where you want to install it through the Tomcat user

Step 3: Run the script bin/install.sh. It will ask for your Tomcat webapps port and directory if necessary. This is going to start the Tomcat

Step 4: Before you can use Red Sqirl Offline, you'll need to create a Red Sqirl user account on redsqirl.com in order to receive the licence keys and download packages. Go to redsqirl.com and click register on the top right of the menu

Step 5: Once logged in, click on 'request new software key' and fill the form. You'll need to choose the Red Sqirl version you downloaded. Your licence key will be generated automatically

Step 6: The next page will show you the packages available. You'll need to download the Red Sqirl package for each of the Hadoop technologies that you have on your Hadoop cluster. For example, choose the Red Sqirl Pig package (redsqirl-pig) and generate the package key if you have Apache Pig on your Hadoop cluster.

Step 7: Once you've generated the package key, you'll then need to download the package itself and the keys you just generated from the Module Key Manager page

Step 8: Now you're ready to set up Offline Red Sqirl, disconnect from your network and go to http://myserver:portNumber/redsqirl

Step 9: Sign into Red Sqirl using your server account details (OS name and password)

Step 10: On the License key page you'll need to upload your license key file (licenseKey.properties), after pressing ok you'll be redirected to the Packages page

Step 11: Upload the Red Sqirl package (.zip) and any other packages you've downloaded on this page

Step 12: Set up the settings from the Settings tab

Step 13: You'll then need to sign out and sign in again with your server account details (OS name and password)

Congratulations! You now have Red Sqirl installed

Tutorials



Now that you have Red Sqirl installed, you can follow along with our video and written tutorials here