Shirley Ho

‣

Slide 1: Energy density of the Universe

Science details:

In the 1900s, the Universe was believed to be composed entirely of normal (baryonic) matter, meaning matter made up of protons, neutrons, etc. In the 1970s, it was thought that the Universe was composed of about 85% dark matter and 15% baryonic matter. Dark matter is believed to only interact gravitationally with baryonic matter and warp space with its mass, meaning it does not appear to absorb, reflect, or emit electromagnetic radiation. Observations of galaxies show that their inferred masses are ten times bigger than the mass due to their component parts (stars, dust, and gas). The additional mass is attributed to dark matter, which was confirmed by observations of gravitational lensing. This was first predicted by Einstein’s theory of general relativity: mass distorts the space surrounding it, causing effects like “Einstein rings” where light’s trajectory is diverted and makes it appear as though its source has a different location. Currently, the Universe is believed to be 68% dark energy, 27% dark matter, and 5% baryonic matter. In the 1990s, observations of supernovae were the first indicators of the existence of dark energy, which showed that the Universe’s expansion is accelerating - when it was previously thought that the expansion should decelerate over time. Dark energy describes the unknown force that causes this expansion and it affects the large-scale structure of the Universe. The nature of dark matter and dark energy is not well understood, so many experiments in cosmology and astrophysics are designed to observe these features. From these observations, scientists can analyze the data and create cosmological models.

Citations and resources:

https://wmap.gsfc.nasa.gov/universe/uni_matter.html

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://en.wikipedia.org/wiki/Dark_energy

https://en.wikipedia.org/wiki/Dark_matter

https://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy

Figures:

Left: Pie chart of the energy density of the Universe today. The Universe is 4.6% matter, 24% dark matter, and 71.4% dark energy. https://wmap.gsfc.nasa.gov/universe/uni_matter.html

Right: Image of a gravitational lens mirage taken by the Hubble Space Telescope's Wide Field Camera 3. A luminous red galaxy is pictured in the center. Its mass distorts the light from a distant blue galaxy to form a ring (Einstein ring) around it. https://en.wikipedia.org/wiki/Gravitational_lens#/media/File:A_Horseshoe_Einstein_Ring_from_Hubble.JPG

‣

Slide 2: Simulating the Universe

‣

Slide 3: Machine learning: neural networks

‣

Slide 4: Convolutional neural networks: architecture

‣

Slide 5: Convolutional neural networks: datasets

Science details:

The machine learning algorithm should be able to make predictions by building a mathematical model from data. This requires the data to be divided into subsets: training, validation, and test data. The training data is used to train the model while the validation data is used to ensure that the model is not overfitting to the training data, which CNNs in particular are prone to doing, thus making the model more optimized for making predictions for other datasets. The model is initially trained repeatedly on training data through supervised learning wherein the inputs are paired to “target” outputs. At each iteration, or “epoch”, the algorithm looks for the optimal parameters (such as neuron weights) to use for the model. Its performance is then evaluated using methods such as gradient descent or stochastic gradient descent. After each epoch, the fitted model is tested on validation data then evaluated on its performance. This step tunes the model’s “hyperparameters”: parameters that are not determined from training but affect the algorithm’s learning (such as the number of hidden layers, the number of neurons in a layer, etc.). Without the validation step, the model can overfit to the training data and be unable to make predictions on other datasets. The final model is then evaluated using test data that it has not seen before. This step will determine whether overfitting has occurred.

Citations and resources:

https://en.wikipedia.org/wiki/Convolutional_neural_network

https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets

https://en.wikipedia.org/wiki/Supervised_learning

Figures:

Flowchart of the process of building a machine learning model using training, validation, and test data. The initial steps are described by a loop: train the model on training data, evaluate the model on validation data, tweak the model according to the results on validation data, (then repeat). The final model is chosen based on which performed best on the validation data. Then results are confirmed on test data. Adapted from https://www.v7labs.com/blog/train-validation-test-set

‣

Slide 6: D³M: introduction

‣

Slide 7: D³M: input and output data

Science details:

D³M learns from data that are pre-run numerical simulations which predict the large scale structure of the Universe. The input data are simulations from an analytical approximation: the Zel’dovich Approximation (ZA) which is a model based on perturbation theory. It begins with a grid of N particles and evolves each of them on linear trajectories from their initial displacements. Because it is a simple linear model, it is accurate when the displacements are small. The resulting displacement field (difference between final and initial positions of the particles) from ZA is often used to generate the initial conditions of N-body simulations. The data are 2D slices of 3D boxes of N particles, totalling 10,000 input-output pairs where each pair has the same fundamental cosmological parameters (density of dark matter, amount of dark energy, and so on).

The target output is the displacement field produced by “FastPM”. FastPM takes the same ZA displacement field input and produces an approximate N-body simulation wherein all N particles are evolved under gravity. The resulting displacement field is accurate enough to use as a target output for D³M, meaning D³M learns from the pre-run simulation.

A recent method commonly used in cosmology to approximate N-body simulations is “second order Lagrangian perturbation theory” (2LPT). This is a fast analytical approximation that Ho’s group used as a benchmark to compare with D³M. They found that D³M outperformed the benchmark 2LPT. One of the ways that they evaluated the results was by the errors in the final displacement field, which were calculated with respect to the FastPM displacement field. While the majority of the errors from 2LPT and D³M were close to 0 Mpc, in high-density regions the maximum errors from 2LPT were almost a factor of 10 larger than those from D³M.

They originally made a choice of two cosmological parameters to use as input for ZA to produce the training data. Specifically the primordial amplitude of the scalar perturbation from cosmic inflation, and the ratio of matter to total energy density, both of which have unknown true exact values and have an effect on the evolution of the large-scale structure of the Universe. They then thought to try using different values of those cosmological parameters to construct the ZA input but without re-training D³M. Their results were surprising: D³M was still able to make accurate predictions which Ho’s group said is “highly unexpected and remains a mystery”. This means that D³M could generate more simulations without needing added training data, making it even more computationally efficient.

Citations and resources:

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://www.youtube.com/watch?v=FPExx_jIH7E

https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/

https://arxiv.org/abs/2012.05472

https://arxiv.org/abs/1811.06533

Figures:

Fig: 2D particle distributions (top row) and displacement vectors (bottom row) from four models. The colors represent displacement errors (in Mpc/h, h is the Hubble parameter) calculated with respect to the target ground truth from FastPM (far left). From left to right: (a) FastPM, (b) Zel’dovich approximation (ZA), (c) second order Lagrangian perturbation theory (2LPT), (d) deep density displacement model (D³M). The error bars show that high-density regions have higher errors for models (b-d), with D³M having the smallest error bars of the three models. https://arxiv.org/abs/1811.06533

‣

Slide 8: D³M: results

Science details:

D³M learns from data that are pre-run numerical simulations which predict the large scale structure of the Universe. The input data are simulations from an analytical approximation: the Zel’dovich Approximation (ZA) which is a model based on perturbation theory. It begins with a grid of N particles and evolves each of them on linear trajectories from their initial displacements. Because it is a simple linear model, it is accurate when the displacements are small. The resulting displacement field (difference between final and initial positions of the particles) from ZA is often used to generate the initial conditions of N-body simulations. The data are 2D slices of 3D boxes of N particles, totaling 10,000 input-output pairs where each pair has the same fundamental cosmological parameters (density of dark matter, amount of dark energy, and so on).

The target output is the displacement field produced by “FastPM”. FastPM takes the same ZA displacement field input and produces an approximate N-body simulation wherein all N particles are evolved under gravity. The resulting displacement field is accurate enough to use as a target output for D³M, meaning D³M learns from the pre-run simulation.

A recent method commonly used in cosmology to approximate N-body simulations is “second order Lagrangian perturbation theory” (2LPT). This is a fast analytical approximation that Ho’s group used as a benchmark to compare with D³M. They found that D³M outperformed the benchmark 2LPT. One of the ways that they evaluated the results was by the errors in the final displacement field, which were calculated with respect to the FastPM displacement field. While the majority of the errors from 2LPT and D³M were close to 0 Mpc, in high-density regions the maximum errors from 2LPT were almost a factor of 10 larger than those from D³M.

They originally made a choice of two cosmological parameters to use as input for ZA to produce the training data. Specifically the primordial amplitude of the scalar perturbation from cosmic inflation, and the ratio of matter to total energy density, both of which have unknown true exact values and have an effect on the evolution of the large-scale structure of the Universe. They then thought to try using different values of those cosmological parameters to construct the ZA input but without re-training D³M. Their results were surprising: D³M was still able to make accurate predictions which Ho’s group said is “highly unexpected and remains a mystery”. This means that D³M could generate more simulations without needing added training data, making it even more computationally efficient.

Citations and resources:

https://www.youtube.com/watch?v=fXhgMRZjDuM

https://www.youtube.com/watch?v=FPExx_jIH7E

https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/

https://arxiv.org/abs/2012.05472

https://arxiv.org/abs/1811.06533

Figures:

Fig: 2D particle distributions (top row) and displacement vectors (bottom row) from four models. The colors represent displacement errors (in Mpc/h, h is the Hubble parameter) calculated with respect to the target ground truth from FastPM (far left). From left to right: (a) FastPM, (b) Zel’dovich approximation (ZA), (c) second order Lagrangian perturbation theory (2LPT), (d) deep density displacement model (D³M). The error bars show that high-density regions have higher errors for models (b-d), with D³M having the smallest error bars of the three models. https://arxiv.org/abs/1811.06533

Shirley Ho

Download the slides with speaker notes here:

Preview them as PDF:

General speaker notes

Slide-specific speaker notes: