Data pours out of the LHC detectors at a blistering rate. Even after filtering out 99% of it, in 2017 we're expecting to gather around 50 petabytes of data. That's 50 million gigabytes, the equivalent to nearly 15 million high-definition (HD) movies.

The scale and complexity of data from the LHC is unprecedented. This data needs to be stored, easily retrieved and analysed by physicists all over the world. This requires massive storage facilities, global networking, immense computing power, and, of course, funding.

CERN does not have the computing or financial resources to crunch all of the data on site, so in 2002 it turned to grid computing to share the burden with computer centres around the world.

The result, the Worldwide LHC Computing Grid (WLCG), is a distributed computing infrastructure arranged in tiers – giving a community of over 10,000 physicists near real-time access to LHC data. The WLCG builds on the ideas of grid technology initially proposed in 1999 by Ian Foster and Carl Kesselman.

Using the Grid

With more than 10,000 LHC physicists across the four main experiments – ALICEATLASCMS and LHCb – actively accessing and analysing data in near real-time, the computing system designed to handle the data has to be very flexible.

WLCG provides seamless access to computing resources which include data storage capacity, processing power, sensors, visualization tools and more. Users make job requests from one of the many entry points into the system. A job will entail the processing of a requested set of data, using software provided by the experiments

The computing Grid establishes the identity of the user, checks their credentials, and searches for available sites that can provide the resources requested. Users do not have to worry about where the computing resources are coming from – they can tap into the Grid's computing power and access storage on demand.

LHC Season 2

The LHC is now running at the new energy frontier of 13 teraelectonvolts (TeV) - nearly double the energy of collisions in the LHC's first three-year run. These collisions, which occurs up to 40 million times every second, sends showers of particles through the detectors.

With every second of run-time, gigabytes of data will come pouring into the Tier0 - CERN Data Centre - to be stored, sorted and shared with physicists worldwide. During Run 1, CERN was storing 1 gigabyte-per-second, with the occasional peak of 6 gigabytes-per-second. For Run 2, what was once the "peak" is now be considered average, and CERN could even go beyond 10 gigabytes-per-second if needed.

This of course translates into masses amounts more of data sent out across the Worlwide LHC Computing Grid, to be made available to physicists across the globe. The computing requirements of ALICE, ATLAS, CMS and LHCb are evolving and increasing in conjunction with the experiments’ physics programmes and the improved precision of the detectors’ measurements. 

See the article on what CERN has been doing to prepare its computing for LHC Season 2 here in "CERN Computing ready for data torrent"

Future challenges and requirements are the result of great successes. Grid performance has been excellent and all of the experiments have not only been good at recording data, but have also found that their detectors could even do more. This has led to the experiment collaborations wanting to capitalize on this potential. With a wealth of data, they can be thankful for the worldwide computer, showing global collaboration at its best.