Share my learning's: June 2018

Wednesday 13 June 2018

Apache Nifi Installation on Ubuntu

Apache Nifi Installation on Ubuntu:
Step 1: Download from Apache Nifi website and extract Nifi package in desired directory

Apache Nifi website link to download : https://nifi.apache.org/download.html

There could be two distributions:

ends with tar.gz - for Linux
ends with zip - for Windows

Screenshot for reference:

Extract the distribution:

Command: tar -xvf /home/mano/Hadoop_setup/nifi-1.6.0-bin.tar.gz

Screenshot for reference:

Step 2: Configuration

NiFi provides several different configuration options which can be configured on nifi.properties file.

At present, i'm just making change to nifi.ui.banner.text property.

Step 3: Starting Apache Nifi:
On the terminal window,navigate to the Nifi directory and run the following below commands:

bin/nifi.sh run - Lauches the applicaion run in the foreground and exit by pressing Ctrl-c.
bin/nifi.sh start - Lauches the application run the background.
bin/nifi.sh status - To check the application status
bin/nifi.sh stop - To shutdown the application

i)bin/nifi.sh run:

ii) bin/nifi.sh start:

iii) bin/nifi.sh status:

iv)bin/nifi.sh stop:

Step 4: Apache Nifi Web User Interface:
After Apache Nifi Started, Web User Interface (UI) to create and monitor our dataflow.

To use Apache Nifi, open a web browser and navigate to http://localhost:8080/nifi

Friday 1 June 2018

Steps to install Apache Spark on Ubuntu

Steps to install Apache Spark on Ubuntu

Step 1: Download Apache Spark distribution

Use the link to download the spark distribution ==> http://spark.apache.org/downloads.html

Download from terminal using below command:

wget http://www-eu.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz

Step 2: Untar the Spark distribution

tar xzf spark-1.6.1-bin-hadoop2.6.tgz

Step 3: Setup the environment variable:

set SPARK_HOME=/usr/local/spark

Follow below steps to set spark environment variables in .bashrc file.

nano .bashrc

Step 4: Launch Spark shell /pyspark context:

scala API command line:

run spark-shell to enter into scala context.

Python API command line:

run pyspark to enter into python context.

R API command line:

run sparkR to enter into R context.

Step 5: Spark UI:

Enter the below url in the browser to check spark execution or DAG information to debug etc.

URL ==> http://localhost:4040

Done, it's great step to proceed further data processing using Apache Spark.

Wednesday 13 June 2018

Apache Nifi Installation on Ubuntu

Friday 1 June 2018

Steps to install Apache Spark on Ubuntu

Fundamentals of Python programming