Shirley Ho

Submitted by:

Adrian Liu

General area of research:

Cosmology and machine learning

McGill courses:

Any of the astro courses (especially 321!) and computational physics

Why you chose to feature this researcher:

Shirley does great work combining cosmology with cutting edge techniques in machine learning.

More info:

No.

Highlighting research:

Describe in detail the research from this individual that you would like to highlight:

How does this research relate to a undergraduate curricular topic, and teachable concepts in physics?

What is the significance of this research? This can mean within the particular field, as well as broader societal relevance.

Do you wish to upload figures and/or images relevant to understanding the research?

Speaker notes:

Introduction slide / General speaker notes:

Synopsis of work:

Shirley Ho is a cosmologist and astrophysicist who studies dark matter, dark energy, and cosmological models. She recently led her team in developing a deep learning algorithm called the “Deep Density Displacement Model” (D³M) that predicts structure formation of the Universe. D³M is able to generate complex 3D cosmological simulations from evolving dark matter particles under gravity. The model produces accurate results - even outperforming a recent fast approximation widely used by cosmologists: “second order Lagrangian perturbation theory” (2LPT), in terms of error bars, statistics, and computational cost.

Researcher's background:

Shirley Ho received her bachelor’s degree in Physics and Computer Science from the University of California at Berkeley. As an undergraduate she completed several thesis projects, researching particle physics then working on weak lensing of the Cosmic Microwave Background. She wrote two papers in cosmology as a senior. Ho received her PhD in astrophysical sciences from Princeton University, where she wrote her thesis: “Baryons, Universe and Everything Else in Between”under the supervision of David Spergel. From 2008 to 2012 Ho held a postdoctoral position at the Lawrence Berkeley Laboratory as a Chamberlain fellow and Seaborg fellow. She joined Carnegie Mellon University in 2011 as an assistant professor, becoming a Cooper-Siegel Assistant Professor in 2015, then an Associate Professor with tenure in 2016. In the same year, she joined Lawrence Berkeley Lab as a Senior Scientist. In 2018, she became the leader of the Cosmology X Data Science group at the Center for Computational Astrophysics (CCA) at the Flatiron institute. She holds faculty positions at New York University and Carnegie Mellon University and was named the Interim Director of CCA in 2021. She is the recipient of several awards, including the NASA Group Achievement Award (2011), the Macronix Prize: The Outstanding Young Researcher Award by International Organization of Chinese Physicists and Astronomers (2014), the Carnegie Science Award (2015), and received the International Astrostatistics Association Fellowship (2020).

Societal relevance:

Ho describes the understanding of the dark components (dark matter and dark energy) of the Universe and its beginning as “the two most outstanding problems of contemporary cosmology”. She aims to address these problems by studying the distribution of galaxies, quasars, and neutral hydrogen in the Universe. The development of the Cosmic Web is studied through observations from astrophysical experiments and computer simulations. By comparing the Observed Universe to the Predicted Universe, scientists can test the accuracies of their cosmological models. Ho’s recent work in applying machine learning techniques to cosmological problems provides an alternative to expensive approximations from computer simulations. D³M is the deep learning algorithm that makes accurate predictions of structure formation which is trained on pre-run simulations of dark matter particles evolved under gravity, producing a Cosmic Web. The initial conditions of the simulation were given a specific choice of cosmological parameters that affect the outcome of the large scale structure of the Universe. D³M, though trained on one choice of values for two cosmological parameters, is able to make accurate predictions using a variety of other values - reducing the need for large amounts of training data. In particular, it makes more accurate predictions of non-linear structure formation than the popular 2LPT approximation, despite being trained on data from more linear regimes. Ho's team hopes to uncover how D³M was able to extrapolate beyond its training data so well, and in doing so could help advance artificial intelligence and machine learning.

General citations and resources:

https://arxiv.org/abs/1811.06533

https://www.cmu.edu/physics/people/faculty/ho.html

https://en.wikipedia.org/wiki/Shirley_Ho

https://www.simonsfoundation.org/people/shirley-ho/

https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/

Slide 1: Energy density of the Universe

Science details:

In the 1900s, the Universe was believed to be composed entirely of normal (baryonic) matter, meaning matter made up of protons, neutrons, etc. In the 1970s, it was thought that the Universe was composed of about 85% dark matter and 15% baryonic matter. Dark matter is believed to only interact gravitationally with baryonic matter and warp space with its mass, meaning it does not appear to absorb, reflect, or emit electromagnetic radiation. Observations of galaxies show that their inferred masses are ten times bigger than the mass due to their component parts (stars, dust, and gas). The additional mass is attributed to dark matter, which was confirmed by observations of gravitational lensing. This was first predicted by Einstein’s theory of general relativity: mass distorts the space surrounding it, causing effects like “Einstein rings” where light’s trajectory is diverted and makes it appear as though its source has a different location. Currently, the Universe is believed to be 68% dark energy, 27% dark matter, and 5% baryonic matter. In the 1990s, observations of supernovae were the first indicators of the existence of dark energy, which showed that the Universe’s expansion is accelerating - when it was previously thought that the expansion should decelerate over time. Dark energy describes the unknown force that causes this expansion and it affects the large-scale structure of the Universe. The nature of dark matter and dark energy is not well understood, so many experiments in cosmology and astrophysics are designed to observe these features. From these observations, scientists can analyze the data and create cosmological models.

Citations and resources:

https://wmap.gsfc.nasa.gov/universe/uni_matter.html

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://en.wikipedia.org/wiki/Dark_energy

https://en.wikipedia.org/wiki/Dark_matter

https://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy

Figures:

Left: Pie chart of the energy density of the Universe today. The Universe is 4.6% matter, 24% dark matter, and 71.4% dark energy. https://wmap.gsfc.nasa.gov/universe/uni_matter.html

Right: Image of a gravitational lens mirage taken by the Hubble Space Telescope's Wide Field Camera 3. A luminous red galaxy is pictured in the center. Its mass distorts the light from a distant blue galaxy to form a ring (Einstein ring) around it. https://en.wikipedia.org/wiki/Gravitational_lens#/media/File:A_Horseshoe_Einstein_Ring_from_Hubble.JPG

Slide 2: Simulating the Universe

Science details:

Cosmologists would like to be able to compare the Observed Universe to the Predicted Universe. This is because there are many constraints (both physical and financial) on how much of the Universe scientists can observe in their experiments. Theoretical predictions should replicate observed phenomena if the models behind them are correct. This poses a particular problem when considering the complex physics involved in large-scale volumes: the computations become extremely expensive when the system is composed of a large number of particles that interact gravitationally and hydrodynamically. “Uchuu”, for example, is the largest simulation of the Universe that includes these forces, as well as gas interactions, supernovae, formation of metals, and the proportions of matter/dark matter/dark energy. It evolved 2.1 trillion particles in a computational cube with sides of 9.63 billion light years in length and it cost 20 million CPU hours (the amount of time for a single CPU to run), generating 3 Petabytes of data. Simulations of the Universe should predict cosmological parameters, such as the densities of dark matter and dark energy, the properties of dark matter and dark energy, how galaxies form, and more. A simulation can be made less computationally expensive by simplifying the system or by implementing deep learning (or both).

Citations and resources:

https://en.wikipedia.org/wiki/Lambda-CDM_model

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://en.wikipedia.org/wiki/CPU_time

https://phys.org/news/2021-09-largest-virtual-universe-free-explore.html

Figures:

Video of Uchuu simulation. Shows large scale structures: cosmic web, filaments, clusters, and supernovae. https://www.youtube.com/watch?v=R7nV6JEMGAo

Slide 3: Machine learning: neural networks

Science details:

Neural networks are a type of machine learning algorithm made up of artificial neurons (or nodes) that process information. A simple neural network would take an input and pass it through a hidden layer of neurons that perform functions on the input and pass the result to the output. A “deep” neural network is simply one with many hidden layers. The connections between neurons are given by weights, which are initially set to be random numbers. A positive/negative weight indicating an excitatory/inhibitory connection. The weights are used to linearly combine inputs to a neuron, then a non-linear “activation function” controls the amplitude of the output.

Citations and resources:

https://en.wikipedia.org/wiki/Deep_learning

https://en.wikipedia.org/wiki/Neural_network

https://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html

Figures:

Left: Simplest neural network. 3 input nodes are shown on the left, which are connected to 4 nodes that make up the hidden layer in the center. The nodes of the hidden layer connect to 2 nodes that make up the output on the right. https://www.edge-ai-vision.com/2015/11/using-convolutional-neural-networks-for-image-recognition/

Right: Venn diagram showing Deep Learning within Machine Learning, which is within Artificial Intelligence. https://levity.ai/blog/difference-machine-learning-deep-learning

Slide 4: Convolutional neural networks: architecture

Science details:

Within machine learning algorithms are convolutional neural networks (CNNs): a unique type of neural network where some of the hidden layers are “convolutional” which means they convolve their inputs before passing information to the next layer. This takes an input, which is a tensor with dimensions (number of inputs)✕(input height)✕(input width)✕(input channels), to a “feature map”, which has dimensions (number of inputs)✕(feature map height)✕(feature map width)✕(feature map channels). CNNs may also have “pooling” layers which reduce the dimensions of the data by combining the outputs of several neurons in a given layer into one neuron in the next layer. Together these layers are responsible for pattern detection, which makes CNNs useful for image analysis with deeper networks being able to perform more sophisticated pattern detection. Each layer in the CNN acts by receiving an input from a previous layer and performs a specific function determined by a “filter” (a vector of weights and biases) and passes the output to the next layer. The algorithm learns by adjusting these biases and weights.

Citations and resources:

https://en.wikipedia.org/wiki/Deep_learning

https://en.wikipedia.org/wiki/Convolutional_neural_network

https://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html

Figures:

Schematic of a CNN that classifies images of vehicles. The input image is a photograph of a car. The convolutional layers convolve (green) and pool (orange) the image data. The convolutional layers are followed by fully connected layers (gray). The output class is a set of vehicles (car, bus, train, plane, chip), with “car” highlighted in green. https://www.nvidia.com/en-us/glossary/data-science/computer-vision/

Slide 5: Convolutional neural networks: datasets

Science details:

The machine learning algorithm should be able to make predictions by building a mathematical model from data. This requires the data to be divided into subsets: training, validation, and test data. The training data is used to train the model while the validation data is used to ensure that the model is not overfitting to the training data, which CNNs in particular are prone to doing, thus making the model more optimized for making predictions for other datasets. The model is initially trained repeatedly on training data through supervised learning wherein the inputs are paired to “target” outputs. At each iteration, or “epoch”, the algorithm looks for the optimal parameters (such as neuron weights) to use for the model. Its performance is then evaluated using methods such as gradient descent or stochastic gradient descent. After each epoch, the fitted model is tested on validation data then evaluated on its performance. This step tunes the model’s “hyperparameters”: parameters that are not determined from training but affect the algorithm’s learning (such as the number of hidden layers, the number of neurons in a layer, etc.). Without the validation step, the model can overfit to the training data and be unable to make predictions on other datasets. The final model is then evaluated using test data that it has not seen before. This step will determine whether overfitting has occurred.

Citations and resources:

https://en.wikipedia.org/wiki/Convolutional_neural_network

https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets

https://en.wikipedia.org/wiki/Supervised_learning

Figures:

Flowchart of the process of building a machine learning model using training, validation, and test data. The initial steps are described by a loop: train the model on training data, evaluate the model on validation data, tweak the model according to the results on validation data, (then repeat). The final model is chosen based on which performed best on the validation data. Then results are confirmed on test data. Adapted from https://www.v7labs.com/blog/train-validation-test-set

Slide 6: D³M: introduction

Science details:

N-body simulations are a way of predicting structure formation in the universe. The simulations generate data that are snapshots of the simulated Universe at various times. This is both computationally expensive and demands a large storage space for the produced data. Ho’s group built a deep neural network called the “Deep Density Displacement Model” (D³M) that predicts structure formation of the Universe. It produces accurate results for a fraction of the computing power required for the analytical approximation. It also outperforms 2LPT, which is the benchmark model widely used by cosmologists that does a fast approximation of an N-body simulation.

Citations and resources:

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://www.youtube.com/watch?v=FPExx_jIH7E

https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/

https://arxiv.org/abs/2012.05472

https://arxiv.org/abs/1811.06533

Figures:

3D plot of displacement errors, measured in millions of light-years, of D³M (left) and 2LPT (right). The maximum errors in 2LPT are much higher than in D³M. https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/

Slide 7: D³M: input and output data

Science details:

D³M learns from data that are pre-run numerical simulations which predict the large scale structure of the Universe. The input data are simulations from an analytical approximation: the Zel’dovich Approximation (ZA) which is a model based on perturbation theory. It begins with a grid of N particles and evolves each of them on linear trajectories from their initial displacements. Because it is a simple linear model, it is accurate when the displacements are small. The resulting displacement field (difference between final and initial positions of the particles) from ZA is often used to generate the initial conditions of N-body simulations. The data are 2D slices of 3D boxes of N particles, totalling 10,000 input-output pairs where each pair has the same fundamental cosmological parameters (density of dark matter, amount of dark energy, and so on).

The target output is the displacement field produced by “FastPM”. FastPM takes the same ZA displacement field input and produces an approximate N-body simulation wherein all N particles are evolved under gravity. The resulting displacement field is accurate enough to use as a target output for D³M, meaning D³M learns from the pre-run simulation.

A recent method commonly used in cosmology to approximate N-body simulations is “second order Lagrangian perturbation theory” (2LPT). This is a fast analytical approximation that Ho’s group used as a benchmark to compare with D³M. They found that D³M outperformed the benchmark 2LPT. One of the ways that they evaluated the results was by the errors in the final displacement field, which were calculated with respect to the FastPM displacement field. While the majority of the errors from 2LPT and D³M were close to 0 Mpc, in high-density regions the maximum errors from 2LPT were almost a factor of 10 larger than those from D³M.

They originally made a choice of two cosmological parameters to use as input for ZA to produce the training data. Specifically the primordial amplitude of the scalar perturbation from cosmic inflation, and the ratio of matter to total energy density, both of which have unknown true exact values and have an effect on the large-scale structure of the Universe. They then thought to try using different values of those cosmological parameters to construct the ZA input but without re-training D³M. Their results were surprising: D³M was still able to make accurate predictions which Ho’s group said is “highly unexpected and remains a mystery”. This means that D³M could generate more simulations without needing added training data, making it even more computationally efficient.

Citations and resources:

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://www.youtube.com/watch?v=FPExx_jIH7E

https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/

https://arxiv.org/abs/2012.05472

https://arxiv.org/abs/1811.06533

Figures:

Fig: 2D particle distributions (top row) and displacement vectors (bottom row) from four models. The colors represent displacement errors (in Mpc/h, h is the Hubble parameter) calculated with respect to the target ground truth from FastPM (far left). From left to right: (a) FastPM, (b) Zel’dovich approximation (ZA), (c) second order Lagrangian perturbation theory (2LPT), (d) deep density displacement model (D³M). The error bars show that high-density regions have higher errors for models (b-d), with D³M having the smallest error bars of the three models. https://arxiv.org/abs/1811.06533

Slide 8: D³M: results

Science details:

D³M learns from data that are pre-run numerical simulations which predict the large scale structure of the Universe. The input data are simulations from an analytical approximation: the Zel’dovich Approximation (ZA) which is a model based on perturbation theory. It begins with a grid of N particles and evolves each of them on linear trajectories from their initial displacements. Because it is a simple linear model, it is accurate when the displacements are small. The resulting displacement field (difference between final and initial positions of the particles) from ZA is often used to generate the initial conditions of N-body simulations. The data are 2D slices of 3D boxes of N particles, totaling 10,000 input-output pairs where each pair has the same fundamental cosmological parameters (density of dark matter, amount of dark energy, and so on).

Citations and resources:

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://www.youtube.com/watch?v=FPExx_jIH7E

https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/

https://arxiv.org/abs/2012.05472

https://arxiv.org/abs/1811.06533

Figures:

Planning, notes, references

Evolution of topics (5-7 slides??)

Intro to cosmology: what the universe is made of (with timeline)
What cosmologists are working on: simulations of galaxies to compare with observations. This is computationally expensive (so many CPU hours, give the 2000 hours example from the paper). But ML is comparable or even outperforms
Intro to ML:

Deep learning, CNNs

Shirley Ho work

Inputs, outputs, 3D slides, etc. I guess like the setup and goals
Evaluated with statistics (power spectrum, bispectrum) - results are sick

Notes:

Paper: “From Dark Matter to Galaxies with Convolutional Networks”

Abstract

What is the “nature of dark matter or the reason of unexpected accelerated expansion of the Universe? Need (1) data from observations and (2) theoretical model that allows fast comparison between observations and theory.”
Modeling is computationally expensive - meaning it requires millions of CPU hours.
Paper: “deep learning to map 3D galaxy distribution in hydrodynamic simulations and its underlying dark matter distributions. They have developed a two-phase convolutional neural network architecture to generate fast galaxy catalogs, and compare results against a standard cosmological technique, which it outperforms.”

Keywords

Convolutional neural networks, high sparsity, galaxy prediction, hydrodynamic simulation, dark matter

Intro: cosmology

Want origins and evolution of the universe (big bang to today and future). Want physical rules and parameters. Surveying large volumes of the universe and use simulations to compare with observations to get the most information.
Important type of simulation: gravo-hydrodynamical simulations: reproduce the formation and evolution of galaxies
“Forces involved in evolving trillions of galaxies over billions of light years: gravity, electromagnetics, hydrodynamics. Existing gravo-hydrodynamical cosmological simulations with all the physics can only simulate a small fraction of the universe and still requires 19 million CPU hours = 2000 years on one single CPU.“
“On the other hand, the standard cosmological model provides us with a solution to this challenge: most of the matter in the Universe is made up of dark matter, and the large scale cosmic structure of the Universe can be modeled quite accurately when we evolve dark matter through time with only physics of gravity. When we do add the gas into the mix, gas usually traces the matter density, and for large enough dark matter halos, gas falls to the center of dark matter halos, subsequently cool down and form stars and galaxies. In other words, dark matter halos form the skeleton inside which galaxies form, evolve, and merge. Hence, the behaviors, such as growth, internal properties, and spatial distribution of galaxies, are likely to be closely connected to the behaviors of dark matter halos.”
CNN = convolutional neural network to map 3D matter field in N-body simulation to galaxies in a full hydrodynamical simulation. Supervised learning. Output is sparse compared to input. Evaluate results using statistic = power spectrum, bispectrum, and compare to the benchmark method commonly used in cosmology. They show more accurate galaxy distribution than the benchmark re: positions, no. galaxies, power spectrum and bispectrum of galaxies. And it’s scalable (more volumes of realistic galaxies in a short time).

ML in cosmology

CNN used in image classification, detection, and segmentation. Sometimes ML techniques are comparable or outperform cosmological models.

Now fill this in, take down refs+figs as you go

Expand topics

Intro to cosmology: what the universe is made of (with timeline)

https://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy (dark matter/energy with percents, pretty pics)

Components of the universe: in the 1900s, the Universe was believed to be composed entirely of normal (baryonic) matter, meaning matter made up of protons, neutrons, etc.In the 1970s, the Universe was believed to be composed of about 85% dark matter and 15% baryonic matter. (Phrase differently? Other source?) Dark matter is believed to only interact gravitationally with baryonic matter and warp space with its mass, meaning it does not appear to absorb, reflect, or emit electromagnetic radiation (wiki dm). Currently, the Universe is believed to be 68% dark energy, 27% dark matter, and 5% baryonic matter. (NASA CITE) Observations of supernovae were the first indicators of the existence of dark energy, which showed that the Universe’s expansion is accelerating - when it was previously thought that the expansion should decelerate over time. Dark energy describes the unknown force that causes this expansion and it affects the large-scale structure of the Universe. (wiki de) The nature of dark matter and dark energy is not well understood, so many experiments in cosmology and astrophysics are designed to observe these features. From these observations, scientists can analyze the data and create cosmological models.

(You could add a slide for redshift and supernova stuff, if there’s time and energy)

(Could have a picture of MACHOs warping space - the smile thing, DM isn’t emitting light but it warps the light that’s there).

Refs:

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://en.wikipedia.org/wiki/Dark_energy

https://en.wikipedia.org/wiki/Dark_matter

https://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy (dark matter/energy with percents, pretty pics)

What cosmologists are working on: evolution and distribution of galaxies. This is computationally expensive (so many CPU hours, give the 2000 hours example from the paper). But ML is comparable or even outperforms

Break up slides? = (1) cosmologists want to compare observations to predictions. (2) Simulations can cost __ CPU hours, for example - the animation. (3) Ho’s group used deep learning to simulate <dark matter and gravity?> in the Universe at the present time and compared it to the benchmark simulation widely used by cosmologists. (?)

Cosmologists would like to be able to compare the Observed Universe to the Predicted Universe. Simulations are necessary because there are many constraints (both physical and financial) on how much of the Universe scientists can observe in their experiments. Theoretical predictions should replicate observed phenomena if the models behind them are correct. This poses a particular problem when considering the complex physics involved in large-scale volumes: the computations become extremely expensive when the system is composed of a large number of particles that interact gravitationally and hydrodynamically. A simulation that includes these forces, as well as gas interactions, supernovae, formation of metals, and the proportions of matter/dark matter/dark energy over a small volume of the Universe can cost 30 million CPU hours (the amount of time for a single CPU to run). (find that simulation and put it in a slide). The simulations should predict cosmological parameters, such as the densities of dark matter and dark energy, the properties of dark matter and dark energy, how galaxies form, and more. A simulation can be made less computationally expensive by simplifying the system or by implementing deep learning (or both).

(The video focuses on massive particles that interact gravitationally with dark energy, to start. This simplifies the simulation and allows for faster computation but still contains a lot of information because of the presence of gravity, which all things interact with.)

Refs:

https://en.wikipedia.org/wiki/Lambda-CDM_model

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://en.wikipedia.org/wiki/CPU_time

The inputs: “pre-run numerical simulations that predict large scale structure of the universe”. N-body simulations (but Ho’s deep model outperforms it).

https://www.youtube.com/watch?v=fXhgMRZjDuM “In this lecture, Shirley Ho will discuss her team’s work building a deep neural network that learns from a set of pre-run numerical simulations and predicts the large scale structure of the universe. Extensive analysis demonstrates that their deep-learning technique outperforms the commonly used fast approximate simulation method in predicting cosmic structure in the non-linear regime. They also show that their method can accurately extrapolate far beyond its training data and predict structure formation for significantly different cosmological parameters. This ability to extrapolate outside its training set is highly unexpected and remains a mystery.”

Intro to ML: Deep learning, CNNs, gradient descent (?), (they used Unet which is like Resnet?)

Neural networks (pics of simple thing), CNNs (is there a more complex diagram but not too complex?)
Training/validation/test data (plot of fit for training/test data, maybe something more cosmo-related??)

“Deep learning is a specialized form of machine learning.” “Deep learning algorithms scale with data, whereas shallow learning converges. Shallow learning refers to machine learning methods that plateau at a certain level of performance when you add more examples and training data to the network.” “In machine learning, you manually choose features and a classifier to sort images. With deep learning, feature extraction and modeling steps are automatic.”

https://www.mathworks.com/discovery/deep-learning.html (this also has nice pics of the network)

Ho’s group used a convolutional neural network.

Refs:

https://en.wikipedia.org/wiki/Deep_learning

https://en.wikipedia.org/wiki/Convolutional_neural_network

https://en.wikipedia.org/wiki/Neural_network

https://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html

The machine learning algorithm should be able to make predictions by building a mathematical model from data. This requires the data to be divided into subsets: training, validation, and test data. The training data is used to train the model while the validation data is used to ensure that the model is not overfitting to the training data, which CNNs in particular are prone to doing, thus making the model more optimized for making predictions for other datasets. The model is initially trained repeatedly on training data through “supervised learning”. At each iteration, the algorithm looks for the optimal parameters (such as neuron weights) to use for the model. These are then evaluated using methods such as gradient descent or stochastic gradient descent. At each iteration, the fitted model is tested on validation data then evaluated on its performance. This step tunes the model’s “hyperparameters”: parameters that are not determined from training but affect the algorithm’s learning (such as the number of hidden layers, the number of neurons in a layer, etc.). Without the validation step, the model can overfit to the training data and be unable to make predictions on other datasets. The final model is then evaluated using test data that it has not seen before. This step will determine whether overfitting has occurred.

Refs:

https://en.wikipedia.org/wiki/Convolutional_neural_network

https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets

https://en.wikipedia.org/wiki/Supervised_learning

Does it need more? I’m not sure… Yes - D3M has a TARGET OUTPUT so I need to explain what that is so it doesn’t come out of left field later.

Gravo-hydrodynamical simulations (the physics?) actually I don’t know how necessary this is…

Other paper: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiG37Of5oL2AhWcjYkEHcz5BIgQFnoECAMQAQ&url=https%3A%2F%2Facademic.oup.com%2Fmnras%2Farticle-pdf%2F286%2F2%2F384%2F3851859%2F286-2-384.pdf&usg=AOvVaw2RgGdFFAfoQC8U2Lz9vGQu “Astrophysical problems require the use of numerical techniques to account for hydrodynamic effects because analytical approaches are restricted to systems with special symmetries. These techniques are a powerful tool for the study of fully non-linear hierarchical clustering when a gaseous component is present. Smoothed particle hydrodynamics (SPH) is a widely used numerical method for the treatment of gas dynamics. … These implementations have provided the first insights in a more consistent treatment of systems with gas and dark matter.” I think this paragraph is also in the wiki: https://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics

Shirley Ho work

Goals: higher level goal is to simulate the universe and compare their deep learning algorithm (D3M) with the benchmark simulator (2LPT).
Setup: explain ZA fields and FastPM (input and target output) and compare with 2LPT (explain what these things are).
Results: error bars are great, can use other cosmo parameters in D3M trained on one set.

I think the header thing should be something like Ho’s group produced simulations of the Universe at the present time using deep learning at a cost of ___ CPU hours compared to the ___ which cost ___ CPU hours.

(Higher level of what they did and what their goals/results were, plus the premise)

N-body simulations are a way of predicting structure formation in the universe. The simulations generate data that are snapshots of the simulated Universe at various times (at “different cosmological times (redshift)”). This is both computationally expensive and demands a large storage space for the produced data.

Ho’s group built a deep neural network called the “Deep Density Displacement Model” (D³M) that predicts structure formation of the Universe that produces accurate results for a fraction of the computing power required for the analytical approximation. (maybe put this with the details of the next slide because it talks about the cosmological parameters? Or just repeat that’s fine too.) D³M is also able to predict structure formation beyond its training data which Ho’s group says is “highly unexpected and remains a mystery”.

The input data are the initial conditions of an N-body simulation that evolve matter particles under gravity and with dark energy. These are 2D slices of thousands of 3D boxes of the Universe (Mpc???). Each input is paired to a target output that D3M learns from, where each pair has the same fundamental cosmological parameters (density of dark matter, amount of dark energy, and so on). These outputs are the resulting positions and velocities of the pre-run simulation. (pre-run sim takes ___ CPU’s)

What’s the data (input, target output, benchmark)? D³M learns from data that are pre-run numerical simulations which predict the large scale structure of the universe. The input data are simulations from an analytical approximation: the Zel’dovich Approximation (ZA) which is a model based on perturbation theory. It begins with a grid of N particles and evolves each of them on linear trajectories from their initial displacements. Because it is a simple linear model, it is accurate when the displacements are small. The resulting displacement field (difference between final and initial positions of the particles) from ZA is often used to generate the initial conditions of N-body simulations. The target output is the displacement field produced by “FastPM”. FastPM takes the same ZA displacement field input and produces an approximate N-body simulation wherein all N particles are evolved under gravity. The resulting displacement field is accurate enough to use as a target output for D³M, meaning D³M learns from the pre-run simulation. A recent method commonly used in cosmology to approximate N-body simulations is “second order Lagrangian perturbation theory” (2LPT). This is a fast analytical approximation that Ho’s group used as a benchmark to compare with D³M.

From talk: inputs don’t get velocities, outputs are positions and velocities after evolution under gravity.

Input = ZA field (positions of N particles of matter) which is based on linear perturbation theory.
Target output = FastPM takes ZA as its input and approximates an N-body simulation: the particles are evolved under gravity to determine their final positions and velocities.
Benchmark = 2LPT is a fast analytical approximation that is widely used in cosmology

Ho’s group found that D3M outperformed the benchmark 2LPT. One of the ways that they evaluated the results was by the errors in the final displacement field. While the majority of the errors from 2LPT were close to 0 Mpc, the maximum errors were ~5??? Mpc. The maximum errors from D3M, however, were close to 0.7 Mpc - almost a factor of 10 better than 2LPT.

They originally made a choice of two cosmological parameters to use as input for ZA to produce the training data. Specifically the primordial amplitude of the scalar perturbation from cosmic inflation, and the ratio of matter to total energy density, both of which have unknown true exact values and have an effect on the large-scale structure of the Universe. They then thought to try using different values of those cosmological parameters to construct the ZA input but without re-training D3M. Their results were surprising: D3M was still able to make accurate predictions which Ho’s group said is “highly unexpected and remains a mystery”. This means that D3M could generate more simulations without needing added training data, making it even more computationally efficient.

https://arxiv.org/abs/2012.05472 (“learning the evolution”)

https://arxiv.org/abs/1811.06533 (“learning to predict”)

Paper - “Learning to Predict the Cosmological Structure Formation” https://arxiv.org/abs/1811.06533:

“a variation on the architecture of a well-known deep learning model, can efficiently transform the first order approximations of the displacement field and approximate the exact solutions, thereby producing accurate estimates of the large-scale structure.” “Significance Statement = To understand the evolution of the Universe requires a concerted effort of accurate observation of the sky and fast prediction of structures in the Universe. N-body simulation is an effective approach to predicting structure formation of the Universe, though computationally expensive. Here we build a deep neural network to predict structure formation of the Universe. It outperforms the traditional fast analytical approximation, and accurately extrapolates far beyond its training data. Our study proves that deep learning is an accurate alternative to the traditional way of generating approximate cosmological simulations. Our study also used deep learning to generate complex 3D simulations in cosmology. This suggests deep learning can provide a powerful alternative to traditional numerical simulations in cosmology”. “The outcome of a typical N-body simulation depends on both the initial conditions and on cosmological parameters which affect the evolution equations. A striking discovery is that D3M, trained using a single set of cosmological parameters generalizes to new sets of significantly different parameters, minimizing the need for training data on a diverse range of cosmological parameters.”

Setup: input and output of the deep neural network (“D^3M”) are similar. Input = displacement field from Zeldovich Approximation (ZA) (10,000 pairs of ZA approximations), output = output of N-body simulation (32^3 N-body particles in a 128 h^-1 Mpc = 600 million ly volume). The displacement vectors are the differences of a particle position at redshift z=0 (present time) and its lagrangian position on a uniform grid. ZA evolves the particles from their initial displacements along linear trajectories. If they are accurate, the displacement is small meaning ZA constructs the initial conditions of N-body simulations. A faster option (of N-body simulations) is the second-order Lagrangian perturbation theory (“2LPT”), which introduces a quadratic correction to the particles’ trajectories. It’s used in many cosmo analyses and can be compared with real astronomical data. 2LPT is a way to generate a relatively accurate description of large-scale structure, and then the authors compare it to D^3M. (They used a displacement field rather than a density field to eliminate ambiguity: this is because both fields can describe the same distribution of particles, but under certain conditions different displacement fields can produce identical density fields). Fig 1 = displacement vector-field and resulting density field from D3M: shows structures like clusters, filaments, and voids.

They evaluate with power spectrum, stochasticity (1-r^2), transfer function T(k).

Generalizing to new cosmological parameters: they trained the model using one chosen set of cosmo parameters (A_s=primordial amplitude of the scalar perturbation from cosmic inflation, Omega_m=fraction of the total energy density that is matter at present time a.k.a. “matter density parameter”). The true parameters are unknown, and different choices change the large-scale structure of the universe (Fig. 4: top/bottom panels = particle distribution/displacement field for different A_s on the right, Omega_m on the left). D^3M trained with one set of parameters in conjunction with ZA (ZA depends on A_s, Omega_m) but can predict structure formation for many other choices of A_s and Omega_m - this could lead to more simulations for a range of parameters without the need for a large amount of training data.