Petabyte Age

Petabyte Age

Everything in computing needs storage – blogs, instant messaging, social networking and person documents, all reside on our own computers or that of some one else’s, e.g. Gmail in case of emails. As the amount of data available increases, storage requirement and its measuring units increase as well.

Units of storage in computing start with bytes (or 8 bits). A little more than thousand bytes (i.e. 1024 bytes) comprises a Kilobyte (KB), a 1024 KB comprises a Megabyte (MB), and 1024 MB comprises a Gigabyte (GB) which is the most common unit of storage today. This multiplication of 1024 goes on to define Terabyte and then Petabyte.

Petabyte is considered to be a milestone in the scientific approach – to an extent that at times it is called the Petabyte Age. What sets this huge amount of data apart from previously available limited data is the prediction that in the Petabye Age, scientific researchers would no longer need to create hypothesis, models and then test if their hypothesis and model is correct or not.

For example, instead of hypothesizing that a certain age group is more susceptible to health risks, or certain geographical area is likely to be hit by a riots or political uncertainly due to a certain reason and testing this against some data, advanced data mining could be used. Such a mining over Petabytes of data would allow crunching virtually unlimited inflow of information such as scanning news items around the world to pin point trouble areas along with trends and issues of ‘high importance or severity’ without the need to identify their underlying causes. This type of ‘geo-tagging’ has already started in form of projects such as Google Zeitgeist and Europe Media Monitor – EMM. Therefore, in Petabyte Age, ages old scientific methods of hypothesize, model, test are poised to be replaced by what huge amounts of data tells us. In short, inferences out of huge data collected from around the world would need no models of their explanation as numbers would speak for themselves. For e.g. rapid monitoring of epidemics, prediction of wars, voting patterns etc. In his article titled ‘The end of theory’, Chris Anderson, editor in chief of Wired writes – ‘Science can advance even without coherent models, unified theories, or really any mechanistic explanation at all’. More radical views have even called Petabyte Age as End of Science while others have dismissed it as too futuristic.

Terminologies have already been defined beyond Petabyte – this includes Exabyte, Zettabyte, Yottabyte and Brontobyte, with each one starting from Petabyte being multiplied by 1024 to arrive at the next terminology. But only time will tell if Petabyte Age with ability to process zillions of data points and aggregating information through numerous sources and sensors using processing clouds would change the science or not.