At present, much of this development has happened in specific collaborations. Even when this is not possible, experience with one type of problem will provide insights into how to approach other types of problem.
- A Roadmap for HEP Software and Computing R&D for the 2020s.
- Similar ebooks?
- 40 Days: A Daily Devotion for Spiritual Renewal.
- The Complete Collection of Travel Literature: In Search of King Solomons Mines, Beyond the Devils Teeth, House of the Tiger King, Sorcerers Apprentice, Travels With Myself, Trail of Feathers?
- What is NaNoWriMo?;
- Lynxars Ghost (Lynxar Series, Book 4);
- How to Write a Successful Personal Statement for Art School!
The following programme of work is foreseen. Particle identification and particle properties: in calorimeters or Time Projection Chambers TPCs , where the data can be represented as a 2D or 3D image or even in 4D, including timing information , the problems can be cast as a computer vision task. Deep Learning DL , one class of ML algorithm, in which neural networks are used to reconstruct images from pixel intensities, is a good candidate to identify particles and extract many parameters. Promising DL architectures for these tasks include convolutional, recurrent, and adversarial neural networks.
A proof of concept and comparison of DL architectures should be finalised by Particle identification can also be explored to tag the flavour of jets in collider experiments e. The investigation of these concepts, which connect to Natural Language Processing [ 41 ], has started at the LHC and is to be pursued on the same timescale. A desirable data format for ML applications should have the following attributes: high read—write speed for efficient training, sparse readability without loading the entire dataset into RAM, compressibility, and widespread adoption by the ML community.
The thorough evaluation of the different data formats and their impact on ML performance in the HEP context must be continued, and it is necessary to define a strategy for bridging or migrating HEP formats to the chosen ML format s , or vice-versa. Computing resource optimisations: managing large volume data transfers is one of the challenges facing current computing facilities. Networks play a crucial role in data exchange and so a network-aware application layer may significantly improve experiment operations.
ML is a promising technology to identify anomalies in network traffic, to predict and prevent network congestion, to detect bugs via analysis of self-learning networks, and for WAN path optimisation based on user access patterns.
- Sailing True North and the Voyage of Character.
- The Revelation of Jesus Christ : His Return - His Judgment - His Kingdom.
- Investigating Child Exploitation and Pornography: The Internet, Law and Forensic Science.
- Books by Mike Baker (Author of My Favorite Horror Story)?
ML as a service MLaaS : current cloud providers rely on a MLaaS model exploiting interactive machine learning tools to make efficient use of resources, however, this is not yet widely used in HEP. To use these tools more efficiently, sufficient and appropriately tailored hardware and instances other than SWAN will be identified. Detector anomaly detection: data taking is continuously monitored by physicists taking shifts to monitor and assess the quality of the incoming data, largely using reference histograms produced by experts.
A whole class of ML algorithms called anomaly detection can be useful for automating this important task. Such unsupervised algorithms are able to learn from data and produce an alert when deviations are observed. By monitoring many variables at the same time, such algorithms are sensitive to subtle signs forewarning of imminent failure, so that pre-emptive maintenance can be scheduled.
These techniques are already used in industry. Simulation: recent progress in high fidelity fast generative models, such as Generative Adversarial Networks GANs [ 70 ] and Variational Autoencoders VAEs [ 86 ], which are able to sample high dimensional feature distributions by learning from existing data samples, offer a promising alternative for Fast Simulation. A simplified first attempt at using such techniques in simulation saw orders of magnitude increase in speed over existing Fast Simulation techniques, but has not yet reached the required accuracy [ ].
Triggering and real-time analysis: one of the challenges is the trade-off in algorithm complexity and performance under strict inference time constraints. To deal with the increasing event complexity at HL-LHC, the use of sophisticated ML algorithms will be explored at all trigger levels, building on the pioneering work of the LHC collaborations.
A critical part of this work will be to understand which ML techniques allow us to maximally exploit future computing architectures. Sustainable Matrix Element Method MEM : MEM is a powerful technique that can be utilised for making measurements of physical model parameters and direct searches for new phenomena. As it is very computationally intensive its use in HEP is limited.
Although the use of neural networks for numerical integration is not new, it is a technical challenge to design a network sufficiently rich to encode the complexity of the ME calculation for a given process over the phase space relevant to the signal process. Tracking: pattern recognition is always a computationally challenging step. Several efforts in the HEP community have started to investigate ML algorithms for track pattern recognition on many-core processors. The scientific reach of data-intensive experiments is limited by how fast data can be accessed and digested by computational resources.
Changes in computing technology and large increases in data volume require new computational models [ 92 ], compatible with budget constraints. The integration of newly emerging data analysis paradigms into our computational model has the potential to enable new analysis methods and increase scientific output. The field, as a whole, has a window in which to adapt our data access and data management schemes to ones that are more suited and optimally matched to advanced computing models and a wide range of analysis applications.
The LHC experiments currently provision and manage about an exabyte of storage, approximately half of which is archival, and half is traditional disk storage. Other experiments that will soon start data taking have similar needs, e.
17 Pitch Deck Templates Inspired By Real-Life Startups and Businesses [Edit and Download]
The HL-LHC storage requirements per year are expected to jump by a factor close to 10, which is a growth rate faster than can be accommodated by projected technology gains. Storage will remain one of the major cost drivers for HEP computing, at a level roughly equal to the cost of the computational resources. The combination of storage and analysis computing costs may restrict scientific output and the potential physics reach of the experiments, so new techniques and algorithms are likely to be required.
In devising experiment computing models for this era many factors have to be taken into account. In particular, the increasing availability of very high-speed networks may reduce the need for CPU and data co-location. Such networks may allow for more extensive use of data access over the Wide-Area Network WAN , which may provide failover capabilities, global and federated data namespaces, and will have an impact on data caching.
Shifts in data presentation and analysis models, such as the use of event-based data streaming along with more traditional dataset-based or file-based data access, will be particularly important for optimising the utilisation of opportunistic computing cycles on HPC facilities, commercial cloud resources, and campus clusters. This can potentially resolve currently limiting factors such as job eviction.
101 Tips for the on Set Extra
The experiments will significantly increase both the data rate and the data volume. The computing systems will need to handle this with as small a cost increase as possible and within evolving storage technology limitations. The significantly increased computational requirements for the HL-LHC era will also place new requirements on data access. Specifically, the use of new types of computing resources cloud, HPC that have different dynamic availability and characteristics will require more dynamic data management and access systems.
Applications employing new techniques, such as training for machine learning or high rate data query systems, will likely be employed to meet the computational constraints and to extend physics reach.
These new applications will place new requirements on how and where data is accessed and produced. Specific applications, such as training for machine learning, may require use of specialised processor resources, such as GPUs, placing further requirements on data. As with computing resources, the landscape of storage solutions is trending towards heterogeneity.
The ability to leverage new storage technologies as they become available into existing data delivery models is a challenge that we must be prepared for. Volatile data sources would impact many aspects of the system: catalogues, job brokering, monitoring and alerting, accounting, the applications themselves.
A Roadmap for HEP Software and Computing R&D for the s | SpringerLink
Currently, tape is extensively used to hold data that cannot be economically made available online. While the data is still accessible, it comes with a high latency penalty, limiting effective data access. We suggest investigating either separate direct access-based archives e.
This is especially relevant when access latency is proportional to storage density. Either approach would need to also evaluate reliability risks and the effort needed to provide data stability. For this work, we should exchange experiences with communities that rely on large tape archives for their primary storage. Cost reductions in the maintenance and operation of storage infrastructure can be realised through convergence of the major experiments and resource providers on shared solutions.
This does not necessarily mean promoting a monoculture, as different solutions will be adapted to certain major classes of use cases, type of site, or funding environment. There will always be a judgement to make on the desirability of using a variety of specialised systems, or of abstracting the commonalities through a more limited, but common, interface. Reduced costs and improved sustainability will be further promoted by extending these concepts of convergence beyond HEP and into the other large-scale scientific endeavours that will share the infrastructure in the coming decade e.
Efforts must be made as early as possible, during the formative design phases of such projects, to create the necessary links. Finally, all changes undertaken must not make the ease of access to data any worse than it is under current computing models. We must also be prepared to accept the fact that the best possible solution may require significant changes in the way data is handled and analysed.
Data organisation is essentially how data is structured as it is written. Most data is written in files, in ROOT format, typically with a column-wise organisation of the data. The records corresponding to these columns are compressed. The internal details of this organisation are visible only to individual software applications. In the past, the key challenge for data management was the transition to use distributed computing in the form of the grid. The experiments developed dedicated data transfer and placement systems, along with catalogues, to move data between computing centres.