Big data is a relative term. If big data refers to the “volume” of transactions and transaction history, then hundreds of terabytes may be considered “big data” for a company and amount of logs in petabytes may be considered small for a government agency.
So what is Big Data?
Big data refers to the massive datasets that are collect from a variety of data sources for business needs to reveal new insights for optimized decision making. According to IBM sources, e-business and consumer life create 2.5 exabytes of data per day. The Big Data Analytics include hidden correlations of patterns of data.
Google stores its millions of servers around the world. Every day around 10 million text messages is sending via Whatsapp. Facebook has millions of active accounts and friends share content, photos and videos. These data are mostly related to human behavior and interactions, and this helps the Information Technology analysts to store and analyze data to extract useful information to run the business.
Big Data Analytics is the result of three major trends in computing:
1. Mobile Computing
2. Social Networking
3. Cloud Computing
Mobile Computing using hand-held devices, such as mobile phones and tablets; Social Networking, such as Facebook and Pinterest; and Cloud Computing by which one can rent the hardware setup for storing and computing.
Big Data Characteristics: –
The big data has five characteristics:
1. Volume: It specifies the information about data generated every second.
2. Velocity: It is the speed at which data is generated and processed to meet the requirements.
3. Variety: It includes the various types (text, image, audio, video, etc.) and nature of data.
4.Veracity: It shows the accuracy of the information.
5. Variability: Inconsistency of the data set makes difficult to handle and manage the data.
Types of Big Data: –
1. Operational data: Datasets are the rows of the table corresponding to the transactions they are known as operational data. These are structured.
2. Semi-Structured data: XML data is semi-structured. Semi-structured data does not have fixed fields but contains tags to separate data.
3. Unstructured data: Information present in emails, audio and video images, logs, blogs, forums, social networking sites, click streams, sensors, statistical data centers and mobile phone applications are the examples of unstructured data. These data are requiring for making business decisions in organizations.
Technologies Available for Big Data: –
There are some new alternatives to address the big data. Technologies of big data are as follows:
1. Hadoop: It manages the storing and processing big data.
2. Hadoop database (HBase): It is a non-relational database. It is a backend system to store data in column-oriented tables.
3. Mahout: It is a data mining library.
4. Hive: It is a data warehouse platform built on top of Hadoop.
5. NoSQL: It is a database used for storing big data.
6. Hadoop Distributed File System (HDFS): It manages the storage and retrieval of data.
Advantages of “Big Data” Analytics: –
Big Data Analytics is advantageous in the following ways when compared to the traditional analytical model:
1. Scalability: we can add data with little administration.
2. Big data includes any unstructured data, such as text, images, audio and videos.
3. There is no limit on how much data needs to be stored and for how long.
4. No pre-processing is required to store data.
5. Protection against hardware failure- Multiple copies of data are automatically replicate in the database.