Processing extremely large datasets soon exhausts the resources of most data centres owned and administered by a single institution. Not wanting, or not being able to afford, an expensive and proprietary supercomputer, many institutions opt instead for clusters of servers cabled together to create a distributed pool of computing resources, on which parallel programs run rather effectively. These servers typically run Linux and other open-source software: the Open Grid Engine job scheduler, for instance, accepts a program, schedules it, and allocates resources such as CPUs, disks, and software licences, all while hiding the complexity of the system from the user. A computing cluster like this need not be maintained by a single institution only; indeed, some of the largest and most successful “home-grown” supercomputers are those shared by several institutions. An example is the Open Science Grid project, “a multi-disciplinary partnership to federate local, regional, community and national cyberinfrastructures to meet the needs of research and academic communities at all scales.”

Not only do such partnerships save on costs, bringing the power of supercomputing to those who may not have been able to afford it on their own (and thereby democratizing the practice of science), but their common platform and toolsets help ensure that the various parties involved can more easily share data with each other.

