RSS Feed BLOG

Google's BigTable

POSTED 29th June 09, by Mike Griffiths.

Google is a search engine that many of us take for granted. Launched ten years ago with a simple idea - to save a copy of the Internet. The search engine works by 'crawling' web sites. Crawling simply means that it will visit a web site and then go to every link it finds on that web site. Using this method, most web sites on the Internet can be discovered. They obviously have a substantially sized initial list of sites to visit.

There are a lot of impressive things about Google that the majority of people wouldn't even think about. The average sized database that we build at RNM would hold a few thousand rows of data. When you start getting into hundreds of thousands of rows of data, the database may start to become slugglish, especially when searching. When has Google ever been sluggish? And they don't have a few thousand entries, they'll have tens of billions.

Anyone who has done GCSE or A-level IT/Computing will have been taught that there are two types of database: flat-file and relational. Anyone worth their due will not be using a flat-file database due to it's numerous problems by nature. A relational database will store data in a practical way and minimises having empty fields in tables or having repeated data.

Google being Google decided they needed something new. Google already have their own operating systems that they run on all of their servers and a desktop version that they run inside their offices. When faced with the problem of the giant-sized databases they decided to create their own database storage engine.

Known as Google's BigTable, the engine is very complicated but benefits from lightning-fast speed and reliability. It is designed as a distributed system, scaling across many machines. It has been designed to store massive amounts of data, ranging in petabytes. According to Kevin Kelly of the New York Times, "the entire works of humankind, from the beginning of recorded history, in all languages" would amount to 50 petabytes of data. Having said that, Google processes about 20 petabytes of data every day. A Petabyte is over one million gigabytes. The hard-drive on your computer probably holds a maximum of around 80 gigabytes.

Just like all Google servers, the machines that BigTable runs on aren't super expensive super-computers, they are actually just lots of normal commodity computers.

BigTable stores data for lots of Google services, including its indexing service (the search), satellite imagery for Google Earth, Google finance, Reader, Blogger and YouTube.

Unlike traditional databases, which have a fixed number of columns and an infinite number of rows, BigTable is described as being more of a sorted map for data. It has characteristics of row-oriented and column-oriented databases.

BigTable itself runs on the Google File System (GFS) and as such is split into 200megabyte chunks. BigTable stores the locations of these chunks on the file system within itself. Normally this would be very unreliable and slow, but because BigTable has been deliberately created this way, Google have been able to overcome a lot of the obstacles associated with it.

There have been some other implementations of BigTable away from Google, notably an open-source application called Hypertable.

You can learn more about BigTable by reading Google's Whitepaper.

TAGS: BIGTABLE, GOOGLE

  • Chester Zoo
  • Oddfellows
  • Dreamland
  • Perfect Getaways
  • Makro