Big data are those data sets with sizes beyond the ability of
commonly used software tools to capture, curate, manage, and process the data
within a tolerable elapsed time. It is the term for a collection of large and
complex data sets that is difficult to process using traditional database
management tools or traditional data processing applications. Big Data is characterized by 3V- volume,
velocity and variety.
Data sets grow in size in part because they are increasingly
being gathered from many sources such as information-sensing mobile devices,
aerial sensory technologies (remote sensing), software logs, cameras,
microphones, radio-frequency identification readers, and wireless sensor
networks.
As the data collection is increasing day by day, it becomes difficult
to work with using most relational database management systems and desktop
statistics and visualization packages, requiring "massively parallel
software running on tens, hundreds, or even thousands of servers. The
challenges include capture, duration, storage, search, sharing, transfer,
analysis, and visualization. So such large gathering of data suffers the
organization forces the need to big data management with distributed approach.
Distributed
System in Big Data Technology
A
distributed system is a collection of independent computers that appears to its
users as a single coherent system. A distributed system is one in which
components located at networked computers communicate and coordinate their
actions only by passing messages.
Distributed system play an important role in managing the big
data problems that prevails in today’s world. In the distributed approach, data
are placed in multiple machines and are made available to the user as if they
are in a single system. Distributed system makes the proper use of hardware and
resources in multiple location and multiple machines.
Example: How google uses distributed system to manage data
for search engines
Due to accumulation of large amount of data in the web every
day, it is difficult to manage the document in the centralized server. So to
overcome the big data problems, search engines companies like Google uses
distributed server. In distributed search engine there is no central server.
Unlike traditional centralized search engines, work such as
crawling, data mining, indexing, and query processing is distributed among
several peers in decentralized manner where there is no single point of
control. Several distributed servers are set up in different location. The
information is made accessible to the user from nearby located servers. Mirror
servers perform different types of caching operation as required.
No comments:
Post a Comment