August 1, 2016

The below listed commands are very commonly used in Hadoop command line interface.

Hadoop Listing Files

Hadoop Upload/Download Files

Hadoop File Management

Hadoop Reading and Writing Files

Hadoop Namenode Commands

Hadoop fsck Commands

Hadoop Job Commands

Hadoop dfsadmin Comman...

July 22, 2016

It’s all about Apache Sqoop…

Sqoop provides a mechanism to connect the external systems like EDW (Enterprise Data Warehouse like Amazon Redshift), Relational Database Management Systems like MySQL, Oracle, MS SQL Server, etc. to transfer data between Hadoop System and t...

July 19, 2016

Installing Cloudera Manager 5.4.1 in VirtualBox/Linux/CentOS

A step by step guide to install Cloudera Manager in VirtualBox for a clean installation

Step 1 - OS Installation: 

This is a guide to install Cloudera Manager 5.4.1 in Oracle VirtualBox. The first and fore most...

July 14, 2016

Explaining MapReduce with an example...

MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. In this article, we will see how map reduce works in the Hadoop e...

July 13, 2016

HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. Multiple data node act as a slaves.

NameNode: It is master node that controls the whole...

July 13, 2016

Overview of Hadoop Cluster

A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. 

Hadoop Cluster Operational Process: Divid...

July 13, 2016

Big Data Projects are complex that require innovative solutions. Traditional data warehousing, processing and ETL approaches, in and of themselves, are not effective answers to the increasing volumes and complexity of the data being generated. The increasing complexity...

June 25, 2016

In today's constantly evolving world of big data it is difficult to know what the best solution is for effectively processing and storing large amounts of data. The continuously changing datascape makes implementing successful big data solutions difficult resulting in...

June 22, 2016

An Enterprise data hub uses a Hadoop platform as a central repository. This is next step in the evolution of data warehouses where instead of the traditional model where the data is extracted, transformed and loaded from one repository to the other, it is transformed i...

June 15, 2016

IT project management is a complex process that helps teams achieve project goals through project planning, facilitating team collaboration, problem solving, budgeting, allocating resources and risk mitigation. In order to plan a project, the project manager must first...

Please reload

Featured Posts

Its all about Apache Sqoop

July 22, 2016

Please reload

Recent Posts

July 13, 2016

July 13, 2016

Please reload

Please reload

Follow Us
  • Facebook Basic Square
  • LinkedIn Social Icon
  • Google+ Basic Square