Monday, August 07, 2006

 

MapReduce

I’ve been familiar with the concept of MapReduce for some time (a less generic form of MapReduce formed the basis of a paper I wrote circa 1988, ‘Sorting with near-linear speedup on tightly-coupled multi-processors”), although I’ve never used a functional programming language in anger. I’ve just finished reading an excellent research paper by Jeffrey Dean and Sanjay Ghemawat (both at Google) titled “Simplified Data Processing on Large Clusters”. MapReduce is a programming paradigm and an associated implementation for processing large datasets. The key point being that “Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.”

This is a very accessible paper regardless of your computing background. Well worth reading, if only to get a glimpse of how the Google distributed indexing engine performs its work.


    

Powered by Blogger