Download the slides with speaker notes here:
Preview them as PDF:
General speaker notes
Shirley Ho is a cosmologist and astrophysicist who studies dark matter, dark energy, and cosmological models. She recently led her team in developing a deep learning algorithm called the “Deep Density Displacement Model” (D³M) that predicts structure formation of the Universe. D³M is able to generate complex 3D cosmological simulations from evolving dark matter particles under gravity. The model produces accurate results - even outperforming a recent fast approximation widely used by cosmologists: “second order Lagrangian perturbation theory” (2LPT), in terms of error bars, statistics, and computational cost.
Ho received her bachelor’s degree in Physics and Computer Science from the University of California at Berkeley. As an undergraduate she completed several thesis projects, researching particle physics then working on weak lensing of the Cosmic Microwave Background. She wrote two papers in cosmology as a senior. Ho received her PhD in astrophysical sciences from Princeton University, where she wrote her thesis: “Baryons, Universe and Everything Else in Between”under the supervision of David Spergel. From 2008 to 2012 Ho held a postdoctoral position at the Lawrence Berkeley Laboratory as a Chamberlain fellow and Seaborg fellow. She joined Carnegie Mellon University in 2011 as an assistant professor, becoming a Cooper-Siegel Assistant Professor in 2015, then an Associate Professor with tenure in 2016. In the same year, she joined Lawrence Berkeley Lab as a Senior Scientist. In 2018, she became the leader of the Cosmology X Data Science group at the Center for Computational Astrophysics (CCA) at the Flatiron institute. She holds faculty positions at New York University and Carnegie Mellon University and was named the Interim Director of CCA in 2021. She is the recipient of several awards, including the NASA Group Achievement Award (2011), the Macronix Prize: The Outstanding Young Researcher Award by International Organization of Chinese Physicists and Astronomers (2014), the Carnegie Science Award (2015), and received the International Astrostatistics Association Fellowship (2020).
Dark energy is the name given to the unknown force that drives cosmic expansion. The first evidence of dark energy came from observations of distant supernovae which showed that the Universe does not expand at a constant rate. Supernovae are used as “standard candles”: astronomical objects that produce light of a known brightness as its source. Measuring the rate of cosmic expansion involves comparing the light’s observed brightness, known brightness at the source, and its recorded redshift. Recently, cosmologists have begun to use gravitational wave sources as “standard sirens” to directly determine the rate of cosmic expansion. Soares-Santos searches for gravitational wave-emitting events as well as employs traditional methods such as galaxy clusters and gravitational lensing to further the understanding of the accelerated expansion of the Universe.
Slide-specific speaker notes:
Science details:
In the 1900s, the Universe was believed to be composed entirely of normal (baryonic) matter, meaning matter made up of protons, neutrons, etc. In the 1970s, it was thought that the Universe was composed of about 85% dark matter and 15% baryonic matter. Dark matter is believed to only interact gravitationally with baryonic matter and warp space with its mass, meaning it does not appear to absorb, reflect, or emit electromagnetic radiation. Observations of galaxies show that their inferred masses are ten times bigger than the mass due to their component parts (stars, dust, and gas). The additional mass is attributed to dark matter, which was confirmed by observations of gravitational lensing. This was first predicted by Einstein’s theory of general relativity: mass distorts the space surrounding it, causing effects like “Einstein rings” where light’s trajectory is diverted and makes it appear as though its source has a different location. Currently, the Universe is believed to be 68% dark energy, 27% dark matter, and 5% baryonic matter. In the 1990s, observations of supernovae were the first indicators of the existence of dark energy, which showed that the Universe’s expansion is accelerating - when it was previously thought that the expansion should decelerate over time. Dark energy describes the unknown force that causes this expansion and it affects the large-scale structure of the Universe. The nature of dark matter and dark energy is not well understood, so many experiments in cosmology and astrophysics are designed to observe these features. From these observations, scientists can analyze the data and create cosmological models.
Citations and resources:
https://wmap.gsfc.nasa.gov/universe/uni_matter.html
https://www.youtube.com/watch?v=fXhgMRZjDuM
https://en.wikipedia.org/wiki/Dark_energy
https://en.wikipedia.org/wiki/Dark_matter
https://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy
Figures:
Left: Pie chart of the energy density of the Universe today. The Universe is 4.6% matter, 24% dark matter, and 71.4% dark energy. https://wmap.gsfc.nasa.gov/universe/uni_matter.html
Right: Image of a gravitational lens mirage taken by the Hubble Space Telescope's Wide Field Camera 3. A luminous red galaxy is pictured in the center. Its mass distorts the light from a distant blue galaxy to form a ring (Einstein ring) around it. https://en.wikipedia.org/wiki/Gravitational_lens#/media/File:A_Horseshoe_Einstein_Ring_from_Hubble.JPG
Science details:
Cosmologists would like to be able to compare the Observed Universe to the Predicted Universe. This is because there are many constraints (both physical and financial) on how much of the Universe scientists can observe in their experiments. Theoretical predictions should replicate observed phenomena if the models behind them are correct. This poses a particular problem when considering the complex physics involved in large-scale volumes: the computations become extremely expensive when the system is composed of a large number of particles that interact gravitationally and hydrodynamically.
“Uchuu”, for example, is the largest simulation of the Universe that includes these forces, as well as gas interactions, supernovae, formation of metals, and the proportions of matter/dark matter/dark energy. It evolved 2.1 trillion particles in a computational cube with sides of 9.63 billion light years in length and it cost 20 million CPU hours (the amount of time for a single CPU to run), generating 3 Petabytes of data.
Simulations of the Universe should predict cosmological parameters, such as the densities of dark matter and dark energy, the properties of dark matter and dark energy, how galaxies form, and more. A simulation can be made less computationally expensive by simplifying the system or by implementing deep learning (or both).
Citations and resources:
https://en.wikipedia.org/wiki/Lambda-CDM_model
https://www.youtube.com/watch?v=fXhgMRZjDuM
https://en.wikipedia.org/wiki/CPU_time
https://phys.org/news/2021-09-largest-virtual-universe-free-explore.html
Figures:
Video of Uchuu simulation. Shows large scale structures: cosmic web, filaments, clusters, and supernovae. https://www.youtube.com/watch?v=R7nV6JEMGAo
Science details:
Neural networks are a type of machine learning algorithm made up of artificial neurons (or nodes) that process information. A simple neural network would take an input and pass it through a hidden layer of neurons that perform functions on the input and pass the result to the output. A “deep” neural network is simply one with many hidden layers. The connections between neurons are given by weights, which are initially set to be random numbers. A positive/negative weight indicates an excitatory/inhibitory connection. The weights are used to linearly combine inputs to a neuron, then a non-linear “activation function” controls the amplitude of the output.
Citations and resources:
https://en.wikipedia.org/wiki/Deep_learning
https://en.wikipedia.org/wiki/Neural_network
https://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html
Figures:
Left: Simplest neural network. 3 input nodes are shown on the left, which are connected to 4 nodes that make up the hidden layer in the center. The nodes of the hidden layer connect to 2 nodes that make up the output on the right. https://www.edge-ai-vision.com/2015/11/using-convolutional-neural-networks-for-image-recognition/
Right: Venn diagram showing Deep Learning within Machine Learning, which is within Artificial Intelligence. https://levity.ai/blog/difference-machine-learning-deep-learning
Science details:
Within machine learning algorithms are convolutional neural networks (CNNs): a unique type of neural network where some of the hidden layers are “convolutional” which means they convolve their inputs before passing information to the next layer. This takes an input, which is a tensor with dimensions (number of inputs)✕(input height)✕(input width)✕(input channels), to a “feature map”, which has dimensions (number of inputs)✕(feature map height)✕(feature map width)✕(feature map channels). CNNs may also have “pooling” layers which reduce the dimensions of the data by combining the outputs of several neurons in a given layer into one neuron in the next layer. Together these layers are responsible for pattern detection, which makes CNNs useful for image analysis with deeper networks being able to perform more sophisticated pattern detection. Each layer in the CNN acts by receiving an input from a previous layer and performs a specific function determined by a “filter” (a vector of weights and biases) and passes the output to the next layer. The algorithm learns by adjusting these biases and weights.
Citations and resources:
https://en.wikipedia.org/wiki/Deep_learning
https://en.wikipedia.org/wiki/Convolutional_neural_network
https://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html
Figures:
Schematic of a CNN that classifies images of vehicles. The input image is a photograph of a car. The convolutional layers convolve (green) and pool (orange) the image data. The convolutional layers are followed by fully connected layers (gray). The output class is a set of vehicles (car, bus, train, plane, chip), with “car” highlighted in green. https://www.nvidia.com/en-us/glossary/data-science/computer-vision/
Science details:
The machine learning algorithm should be able to make predictions by building a mathematical model from data. This requires the data to be divided into subsets: training, validation, and test data. The training data is used to train the model while the validation data is used to ensure that the model is not overfitting to the training data, which CNNs in particular are prone to doing, thus making the model more optimized for making predictions for other datasets. The model is initially trained repeatedly on training data through supervised learning wherein the inputs are paired to “target” outputs. At each iteration, or “epoch”, the algorithm looks for the optimal parameters (such as neuron weights) to use for the model. Its performance is then evaluated using methods such as gradient descent or stochastic gradient descent. After each epoch, the fitted model is tested on validation data then evaluated on its performance. This step tunes the model’s “hyperparameters”: parameters that are not determined from training but affect the algorithm’s learning (such as the number of hidden layers, the number of neurons in a layer, etc.). Without the validation step, the model can overfit to the training data and be unable to make predictions on other datasets. The final model is then evaluated using test data that it has not seen before. This step will determine whether overfitting has occurred.
Citations and resources:
https://en.wikipedia.org/wiki/Convolutional_neural_network
https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets
https://en.wikipedia.org/wiki/Supervised_learning
Figures:
Flowchart of the process of building a machine learning model using training, validation, and test data. The initial steps are described by a loop: train the model on training data, evaluate the model on validation data, tweak the model according to the results on validation data, (then repeat). The final model is chosen based on which performed best on the validation data. Then results are confirmed on test data. Adapted from https://www.v7labs.com/blog/train-validation-test-set
Science details:
N-body simulations are a way of predicting structure formation in the universe. The simulations generate data that are snapshots of the simulated Universe at various times. This is both computationally expensive and demands a large storage space for the produced data. Ho’s group built a deep neural network called the “Deep Density Displacement Model” (D³M) that predicts structure formation of the Universe. It produces accurate results for a fraction of the computing power required for the analytical approximation. It also outperforms 2LPT, which is the benchmark model widely used by cosmologists that does a fast approximation of an N-body simulation.
Citations and resources:
https://www.youtube.com/watch?v=fXhgMRZjDuM
https://www.youtube.com/watch?v=FPExx_jIH7E
https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/
https://arxiv.org/abs/2012.05472
https://arxiv.org/abs/1811.06533
Figures:
3D plot of displacement errors, measured in millions of light-years, of D³M (left) and 2LPT (right). The maximum errors in 2LPT are much higher than in D³M. https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/
Science details:
D³M learns from data that are pre-run numerical simulations which predict the large scale structure of the Universe. The input data are simulations from an analytical approximation: the Zel’dovich Approximation (ZA) which is a model based on perturbation theory. It begins with a grid of N particles and evolves each of them on linear trajectories from their initial displacements. Because it is a simple linear model, it is accurate when the displacements are small. The resulting displacement field (difference between final and initial positions of the particles) from ZA is often used to generate the initial conditions of N-body simulations. The data are 2D slices of 3D boxes of N particles, totalling 10,000 input-output pairs where each pair has the same fundamental cosmological parameters (density of dark matter, amount of dark energy, and so on).
The target output is the displacement field produced by “FastPM”. FastPM takes the same ZA displacement field input and produces an approximate N-body simulation wherein all N particles are evolved under gravity. The resulting displacement field is accurate enough to use as a target output for D³M, meaning D³M learns from the pre-run simulation.
A recent method commonly used in cosmology to approximate N-body simulations is “second order Lagrangian perturbation theory” (2LPT). This is a fast analytical approximation that Ho’s group used as a benchmark to compare with D³M. They found that D³M outperformed the benchmark 2LPT. One of the ways that they evaluated the results was by the errors in the final displacement field, which were calculated with respect to the FastPM displacement field. While the majority of the errors from 2LPT and D³M were close to 0 Mpc, in high-density regions the maximum errors from 2LPT were almost a factor of 10 larger than those from D³M.
They originally made a choice of two cosmological parameters to use as input for ZA to produce the training data. Specifically the primordial amplitude of the scalar perturbation from cosmic inflation, and the ratio of matter to total energy density, both of which have unknown true exact values and have an effect on the evolution of the large-scale structure of the Universe. They then thought to try using different values of those cosmological parameters to construct the ZA input but without re-training D³M. Their results were surprising: D³M was still able to make accurate predictions which Ho’s group said is “highly unexpected and remains a mystery”. This means that D³M could generate more simulations without needing added training data, making it even more computationally efficient.
Citations and resources:
https://www.youtube.com/watch?v=fXhgMRZjDuM
https://www.youtube.com/watch?v=FPExx_jIH7E
https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/
https://arxiv.org/abs/2012.05472
https://arxiv.org/abs/1811.06533
Figures:
Fig: 2D particle distributions (top row) and displacement vectors (bottom row) from four models. The colors represent displacement errors (in Mpc/h, h is the Hubble parameter) calculated with respect to the target ground truth from FastPM (far left). From left to right: (a) FastPM, (b) Zel’dovich approximation (ZA), (c) second order Lagrangian perturbation theory (2LPT), (d) deep density displacement model (D³M). The error bars show that high-density regions have higher errors for models (b-d), with D³M having the smallest error bars of the three models. https://arxiv.org/abs/1811.06533
Science details:
D³M learns from data that are pre-run numerical simulations which predict the large scale structure of the Universe. The input data are simulations from an analytical approximation: the Zel’dovich Approximation (ZA) which is a model based on perturbation theory. It begins with a grid of N particles and evolves each of them on linear trajectories from their initial displacements. Because it is a simple linear model, it is accurate when the displacements are small. The resulting displacement field (difference between final and initial positions of the particles) from ZA is often used to generate the initial conditions of N-body simulations. The data are 2D slices of 3D boxes of N particles, totaling 10,000 input-output pairs where each pair has the same fundamental cosmological parameters (density of dark matter, amount of dark energy, and so on).
The target output is the displacement field produced by “FastPM”. FastPM takes the same ZA displacement field input and produces an approximate N-body simulation wherein all N particles are evolved under gravity. The resulting displacement field is accurate enough to use as a target output for D³M, meaning D³M learns from the pre-run simulation.
A recent method commonly used in cosmology to approximate N-body simulations is “second order Lagrangian perturbation theory” (2LPT). This is a fast analytical approximation that Ho’s group used as a benchmark to compare with D³M. They found that D³M outperformed the benchmark 2LPT. One of the ways that they evaluated the results was by the errors in the final displacement field, which were calculated with respect to the FastPM displacement field. While the majority of the errors from 2LPT and D³M were close to 0 Mpc, in high-density regions the maximum errors from 2LPT were almost a factor of 10 larger than those from D³M.
They originally made a choice of two cosmological parameters to use as input for ZA to produce the training data. Specifically the primordial amplitude of the scalar perturbation from cosmic inflation, and the ratio of matter to total energy density, both of which have unknown true exact values and have an effect on the evolution of the large-scale structure of the Universe. They then thought to try using different values of those cosmological parameters to construct the ZA input but without re-training D³M. Their results were surprising: D³M was still able to make accurate predictions which Ho’s group said is “highly unexpected and remains a mystery”. This means that D³M could generate more simulations without needing added training data, making it even more computationally efficient.
Citations and resources:
https://www.youtube.com/watch?v=fXhgMRZjDuM
https://www.youtube.com/watch?v=FPExx_jIH7E
https://www.simonsfoundation.org/2019/06/26/ai-universe-simulation/
https://arxiv.org/abs/2012.05472
https://arxiv.org/abs/1811.06533
Figures:
Fig: 2D particle distributions (top row) and displacement vectors (bottom row) from four models. The colors represent displacement errors (in Mpc/h, h is the Hubble parameter) calculated with respect to the target ground truth from FastPM (far left). From left to right: (a) FastPM, (b) Zel’dovich approximation (ZA), (c) second order Lagrangian perturbation theory (2LPT), (d) deep density displacement model (D³M). The error bars show that high-density regions have higher errors for models (b-d), with D³M having the smallest error bars of the three models. https://arxiv.org/abs/1811.06533
Slides by: Katherine Savard