Data pours out of the LHC detectors at a blistering rate. Even after filtering out 99% of it, in 2018 we're expecting to gather around 50 to 70 petabytes of data. That's 50-70 million gigabytes, the equivalent to around 15-20 million high-definition (HD) movies.

The scale and complexity of data from the LHC is unprecedented. This data needs to be stored, easily retrieved and analysed by physicists all over the world. This requires massive storage facilities, global networking, immense computing power, and, of course, funding.

CERN does not have the computing or financial resources to crunch all of the data on site, so in 2002 it turned to grid computing to share the burden with computer centres around the world. 

The result, the Worldwide LHC Computing Grid (WLCG), is a distributed computing infrastructure arranged in tiers – giving a community of over 10,000 physicists near real-time access to LHC data. The WLCG builds on the ideas of grid technology initially proposed in 1999 by Ian Foster and Carl Kesselman.

CERN currently provides around 20% of the global computing resources.

Using the Grid

With more than 10,000 LHC physicists across the four main experiments – ALICEATLASCMS and LHCb – actively accessing and analysing data in near real-time, the computing system designed to handle the data has to be very flexible.

WLCG provides seamless access to computing resources which include data storage capacity, processing power, sensors, visualization tools and more. Users make job requests from one of the many entry points into the system. A job will entail the processing of a requested set of data, using software provided by the experiments

The computing Grid establishes the identity of the user, checks their credentials, and searches for available sites that can provide the resources requested. Users do not have to worry about where the computing resources are coming from – they can tap into the Grid's computing power and access storage on demand.

LHC Season 2

The LHC is in its last year of "Run 2", and is running at the new energy frontier of 13 teraelectonvolts (TeV) - nearly double the energy of collisions in the LHC's first three-year run. These collisions, which occurs up to 40 million times every second, sends showers of particles through the detectors.

With every second of run-time, gigabytes of data will come pouring into the Tier0 - CERN Data Centre - to be stored, sorted and shared with physicists worldwide. During Run 1, CERN was storing 1 gigabyte-per-second (Gb/s), with the occasional peak of 6 gigabytes-per-second. For Run 2, CERN is now seeing global transfer rates of over 35 Gb/s.

This of course translates into masses amounts more of data sent out across the Worlwide LHC Computing Grid, to be made available to physicists across the globe. The computing requirements of ALICE, ATLAS, CMS and LHCb are evolving and increasing in conjunction with the experiments’ physics programmes and the improved precision of the detectors’ measurements. 

See the article on what CERN has been doing to prepare its computing for LHC Season 2 here in "CERN Computing ready for data torrent"

The Future

With the LHC’s computing now well on track with Run 2 needs, the WLCG collaboration is looking further into the future, already focusing on the two phases of upgrades planned for the LHC. The first phase (2019–2020) will see major upgrades of ALICE and LHCb, as well as increased luminosity of the LHC. The second phase – the High Luminosity LHC project (HL-LHC), in 2024–2025 – will upgrade the LHC to a much higher luminosity and increase the precision of the substantially improved ATLAS and CMS detectors.

The requirements for data and computing will grow dramatically during this time, with rates of 500 PB/year expected for the HL-LHC. The needs for processing are expected to increase more than 10 times over and above what technology evolution will provide. As a consequence, partnerships such as those with CERN openlab and other programmes of R&D are essential to investigate how the computing models could evolve to address these needs. They will focus on applying more intelligence into filtering and selecting data as early as possible. Investigating the distributed infrastructure itself (the grid) and how one can best make use of available technologies and opportunistic resources (grid, cloud, HPC, volunteer, etc), improving software performance to optimise the overall system.

See more at http://cerncourier.com/cws/article/cern/65031