Tuesday, 10 October 2017

Scala tuples

Scala tuple:
Scala tuple is a class and combines a fixed number of items together so that they can be passed around as a whole.

Note:

  • Unlike an array or list, a tuple can hold objects with different types but they are also immutable.
  • A tuple isn't actually a collection; it's a series of classes named Tuple2, Tuple3, etc., through Tuple22.

Example:

Little bag or container you can use to hold things and pass them around.

1)Creating tuples:
Tuples can be created in two ways:
  1. Using the enclosing elements in parentheses.
  2. Creating a tuple with ->
i)Using the enclosing elements in parentheses:
 
Ex:
val tupleex = (1,"Mano")

To get to know the tuple class:
tupleex.getClass()


ii)Creating a tuple with ->:
We can also create tuple with ->, mostly useful at Map collection.


2)Accessing tuple elements:

Accessing tuples in different ways:
  1. Using underscore with position 
  2. Use variable names to access tuple elements 
  3. Iterating over a Scala tuple 
  4. The tuple toString method
i)using underscore with position
We can access tuple elements using an underscore syntax. The first element is accessed with _1, the second element with _2, and so on.





Ex:


ii)Use variable names to access tuple elements:
When referring to a Scala tuple we can also assign names to the elements in the tuple.

Let's we try to do this when returning miscellaneous elements from a method.

Create method, that returns tuple:

def tuplemeth = (1,"Mano",24)

Create variables to hold tuple elements:
val(id,name,age) = tuplemeth

We can ignore the elements by using an underscore placeholder for the elements you want to ignore.
val(id,name,_) = tuplemeth



iii)Iterating over a Scala tuple
As mentioned, a tuple is not a collection; it doesn't descend from any of the collection traits or classes. However, we can treat it a little bit like a collection by using its productIterator method.


iv)The tuple toString method:
The tuple toString method gives you a good representation of a tuple.





Sunday, 1 October 2017

2)Spark- Resilient Distributed Dataset

What is RDD?
  • RDD is the spark's core abstraction which is resilient distributed dataset,
  • It is immutable distributed collection of objects, spark distributes the data in RDD, to different nodes across the cluster to achieve parallelization
Note:
resilient - meaning ability to re-computed from history which in turn fault tolerant.


In simple: Resilient Distributed Dataset:
  • Collection
  • Distributed
  • In-memory
  • Resilient
How to create RDD's?
They are two ways to create RDD's
  1. Parallelizing an existing collection
  2. Referencing a external dataset
1.parallelizing an existing collection:
Parallelizing an existing collection in your driver program can be created by calling SparkContext’s parallelize method on an existing iterable or collection

Example: python spark



 Example: scala spark



Note:The number of partitions can be set in the second parameter of parallelize method
sc.parallelize(list, 10)
2.Referencing a external dataset:
Referencing a dataset in an external storage system, such as a shared filesystem, HDFS, HBase, Cassandra or any data source offering a Hadoop InputFormat.

Text file RDDs can be created using SparkContext’s textFile method.

Below are the types of files to Read:
  • textFile(path)
  • sequenceFile(path)
  • objectFile(path)
 
Example:python spark


 Example: scala spark



Note:
SparkContext.wholeTextFiles lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs.

RDD Operations:
RDDs support two types of operations:
  1. transformations
  2. actions
i)transformations:
transformations, which will create a new dataset from an existing one,

Example:
map is a transformation that passes each dataset element through a function and returns a new RDD representing the results.

Example in python spark:


Example in scala spark:




ii)actions:
actions, which will return a value to the driver program after running a computation on the dataset.

Example:

reduce is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program.


Example in python spark:



Example in scala spark:



Note:
  • Since all transformations in Spark are lazy, in that they do not compute their results right away. 
  • Only when the actions need the results of transformations.
  • By default, each transformed RDD may be recomputed each time you run an action on it.
We can also use persist an RDD in memory using the persist (or cache) method.

Please click next to proceed further ==> Next

Saturday, 30 September 2017

1)About Apache Spark

Apache Spark:

Apache Spark is in-memory cluster computing technology that increases the processing speed of an application.

Spark uses Hadoop in one way for storage purpose only, Since Spark has its own cluster management computation,

Note:

  • Designed for fast computation. 
  • It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing etc.,
  • It reduces the management burden of maintaining separate tools.

Features of Apache Spark:
  1. Speed
  2. Supports multiple languages
  3. Advanced Analytics
  4. Runs Everywhere

1.Speed:

  • Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory, and 10 times faster when running on disk. 
  • To achieve speed of processing, It stores the intermediate processing data in memory.


2.Supports multiple languages:

  • Scala applications supports built in API's in Java, Scala, Python, R

3.Advanced Analytics:

  • Combine SQL, streaming, and complex analytics  
  • Spark not only supports ‘Map’ and ‘reduce’. It also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.

4.Runs Everywhere:

  • Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.

Components/Modules of Spark:
The following are the components/modules of Apache Spark:

  1. Spark core
  2. Spark SQL
  3. Spark Streaming
  4. MLlib (Machine Learning Library)
  5. GraphX

1)Spark core:
Spark core provides In-Memory computing and referencing datasets in external storage systems.

2)Spark SQL:
Spark SQL is a component on top of Spark Core that introduces a new data abstraction called SchemaRDD.

Note:
Provides support for structured and semi-structured data.

3)Spark Streaming:
Spark Streaming on top of spark core to perform streaming analytics in batches of data.

4)MLlib (Machine Learning Library):
MLlib is a distributed machine learning framework above Spark

Note:
Spark MLlib is nine times as fast as the Hadoop disk-based version of Apache Mahout

5)GraphX:
GraphX is a distributed graph-processing framework on top of Spark

Spark Architecture Execution flow:
Spark Architecture Execution flow includes the following below

  • Driver Program
  • Cluster Manager
  • Worker nodes
  • Executor


Apache Spark - Cluster modes:
Below are the cluster modes in Apache Spark
  • Localmode
  • YARN
  • Mesos
  • Standalone

Please click the next to proceed further ==> Next

Thursday, 28 September 2017

7)Python if.elif.else Statements

Python if.elif.else Statements

The if…elif…else statement is used in Python for decision making.

i)if Statement:
Syntax:
if test expression:
    statement(s)
if Statement Flowchart:


Example:
x = 10
if x > 0:
    print(x,"is positive numner")

ii)if...else Statement
Syntax:
if test expression:
    Body of if
else:
    Body of else
if...else Statement Flowchart:

Example:
x = 10
if x > 0:
    print(x,"is positive numner")
else:
    print(x,"is either zero or negative number")

iii)if...elif...else statement:

Syntax:
if test expression:
    Body of if
elif test expression:
    Body of elif
else:
    Body of else
if...elif...else Flowchart:




Example:
x = 10
if x > 0:
    print(x,"is positive numner")
elif x < 0:
    print(x,"is negative number")
else:
    print(x,"is zero")






6)Python Operators

Python Operators
Operators are special symbols in Python that are useful to compute values of operands

Type of operators in Python:
  1. Arithmetic operators
  2. Comparison (Relational) operators
  3. Logical (Boolean) operators
  4. Bitwise operators
  5. Assignment operators
  6. Special operators

1.Arithmetic operators:
Arithmetic operators are used to perform mathematical operations like addition, subtraction, multiplication etc.

Arithmetic operators table:


Operator
Meaning
Example
+
Add two operands or unary plus
x + y
+2
-
Subtract right operand from the left or unary minus
x - y
-2
*
Multiply two operands
x * y
/
Divide left operand by the right one (always results into float)
x / y
%
Modulus - remainder of the division of left operand by the right
x % y (remainder of x/y)
//
Floor division - division that results into whole number adjusted to the left in the number line
x // y
**
Exponent - left operand raised to the power of right
x**y (x to the power y)


Examples:



2.Comparison operators:

Comparison operators are used to compare values. It either returns True or False

Comparison operators table:


Operator
Meaning
Example
> 
Greater that - True if left operand is greater than the right
x > y
< 
Less that - True if left operand is less than the right
x < y
==
Equal to - True if both operands are equal
x == y
!=
Not equal to - True if operands are not equal
x != y
>=
Greater than or equal to - True if left operand is greater than or equal to the right
x >= y
<=
Less than or equal to - True if left operand is less than or equal to the right
x <= y

Examples:



3.Logical operators:

Logical operators are the and, or, not operators.

Logical operators table:


Operator
Meaning
Example
and
True if both the operands are true
x and y
or
True if either of the operands is true
x or y
not
True if operand is false (complements the operand)
not x

Examples:



4.Bitwise operators:

Bitwise operators act on operands as if they were string of binary digits.

Bitwise operators table:


Operator
Description
Example
& Binary AND
Operator copies a bit to the result if it exists in both operands
(a & b) (means 0000 1100)
| Binary OR
It copies a bit if it exists in either operand.
(a | b) = 61 (means 0011 1101)
^ Binary XOR
It copies the bit if it is set in one operand but not both.
(a ^ b) = 49 (means 0011 0001)
~ Binary Ones Complement
It is unary and has the effect of 'flipping' bits.
(~a ) = -61 (means 1100 0011 in 2's complement form due to a signed binary number.
<< Binary Left Shift
The left operands value is moved left by the number of bits specified by the right operand.
a << 2 = 240 (means 1111 0000)
>> Binary Right Shift
The left operands value is moved right by the number of bits specified by the right operand.
a >> 2 = 15 (means 0000 1111)


Examples:





5.Assignment operators:
Assignment operators are used to assign values to variables.

Assignment operators table:


Operator
Example
Equivatent to
=
x = 5
x = 5
+=
x += 5
x = x + 5
-=
x -= 5
x = x - 5
*=
x *= 5
x = x * 5
/=
x /= 5
x = x / 5
%=
x %= 5
x = x % 5
//=
x //= 5
x = x // 5
**=
x **= 5
x = x ** 5
&=
x &= 5
x = x & 5
|=
x |= 5
x = x | 5
^=
x ^= 5
x = x ^ 5
>>=
x >>= 5
x = x >> 5
<<=
x <<= 5
x = x << 5



6.Special operators:
They are two types of special operators:

  1. Identity operators
  2. Membership operators

i)Identity operators:
Identity operators used to check if two values (or variables) are located on the same part of the memory.

Identity operators Table:


Operator
Meaning
Example
is
True if the operands are identical (refer to the same object)
x is True
is not
True if the operands are not identical (do not refer to the same object)
x is not True

ii)Membership operators:
in and not in are the membership operators in Python. They are used to test whether a value or variable is found in a sequence (string, list, tuple, set and dictionary).

Membership operators Table:


Operator
Meaning
Example
in
True if value/variable is found in the sequence
5 in x
not in
True if value/variable is not found in the sequence
5 not in x

Example:









Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...