Here are some non-exhaustive notes on the frontiers of my understanding of various topics. I read about Feynman doing something similar; I’ve found that keeping these notes has been helpful for me since I started in late 2017.

If you know a lot about something that I list here and you want to tell me about it, I’d love to hear your explanation!

Contents:

- Biology
- Computer science
- Economics
- Engineering
- History
- International relations/political science
- Machine learning
- Math
- Paul Christiano’s AI safety stuff
- Philosophy
- Physics
- Reinforcement learning
- Statistical mechanics
- Statistics

## Biology

- The thing in biology I most want to understand is evolutionary dynamics. In particular, I’d love to know more about how evolution compares to gradient descent as an optimization process.
- In particular, Claire tells me that there are various genes that control the rate of mutation, and that organisms can evolve to have higher or lower rates of mutation. This is super interesting and reminds me of adaptively setting learning rates or something.
- (I’d like to consider second-derivative methods as well as first-derivative methods)
- Damon tells me that there are all kinds of other interesting things in evolutionary dynamics that I don’t know about. For example, I don’t know about the main differences between sexual and asexual reproduction and how they affect evolution.
- relatively unimportant:
- What exactly is protein folding?
- What exactly is Adenosine triphosphate (ATP)? How does it work?
- I should more clearly understand how DNA works. It’s something like:
- DNA gets turned into RNA somehow
- You use ribosomes to express the DNA as proteins somehow?
- The base pairs code for proteins by using three consecutive letters to give you one of twenty amino acids. How does this work?
- How does DNA get reproduced?
- How do we sequence DNA?
- What precisely do they do in bioinformatics?

## Computer science

- If I want to hear someone better when they’re talking to me on Skype, does it help if I turn off my video? What about if they turn off theirs?
- Programming language theory, type theory, etc. I would love to learn more about this stuff and would love to spend an evening with someone walking me through doing some basic stuff with it on my computer.
- Haskell stuff that I sort of know:
- Generalised algebraic data types
- Existential quantification
- Rank n types
- Refinement types
- what do these not have that dependent types have
- Lots of higher-kinded types: I’m solid on monads, but need to think more about comonads, applicatives, etc
- Liquid Haskell
- Dependent types—I really would enjoy learning me some Idris or whatever.
- Lambda-pi calculus
- Type theory
- Game of life: Is there biological life inside Game of Life? Eg, self-replicating structures. Also, what is the very large scale, very long term behavior of GoL?
- I don’t know the fast Fourier transform algorithm

## Economics

- Why has economic growth looked exponential?
- or maybe it’s hyperbolic?
- Why is Moore’s law true?

## Engineering

- Materials science
- Why are some things brittle, vs being ductile or malleable? (My guess is that it’s the valence electrons that allow this—as you move the metal, you have to shift it between local minima in the energy of the structure, which is why it resists you moving it, but why it doesn’t shatter or crack.)
- What shapes of crystals do different things have? Why? What effects does this have on their strain resistance etc?
- How do you model strain in materials? I have a guess but I don’t know if it’s correct. (My guess is that you describe the strain energy in the material as something like a quadratic function in the distance it is from equilibrium in some sense.)
- Why is it that if you break an object and then push the two halves together, it doesn’t fuse?
- My guess is that this is because there are tiny little irregularities in the material which don’t match when you put it back together. Or maybe it oxidizes quickly.
- Why don’t cells break when I squeeze my fingers together? Intuitively it seems like they’re small so they should be fragile. I think the intuition is something like “but the force is spread over many many of them”
- Electrical engineering
- I want to understand all of the components of a computer, both functionally (as in, how to model their behavior) and from a physical perspective. Here are some components, but there are probably more:
- capacitor
- resistor
- inductor
- diode
- transistor
- MOSFET?
- DMZ told me that there’s some neat duality between resistance and something else. Or between current and voltage or something. I would love to know what this is.

## History

- It seems like the American internment of Japanese during World War 2 was totally unhelpful for national security (though also not harmful). Why did it happen anyway? Possibilities:
- People in the government thought it was a good idea
- People in the government didn’t really think about whether it would be helpful and were just reacting emotionally to their fear of the Japanese
- It was extremely popular and the government gave in to public pressure
- I wrote about this here. I’m still interested in learning more about how much people talked about the Niihau incident at the time.
- Questions about WW2:
- I would like a better picture of how the strategy and tactics of WW2 warfare works
- Before the rise of the Nazis, how popular was anti-semetism?
- How did Germans feel about the Holocaust when they heard about it?
- How did the Nazis feel about war crimes as they were committing them?
- To what extent did Truman understand that Hiroshima was a city rather than a military base, as is claimed here?
- Why did Americans engage in strategic bombing? How strategically reasonable does this look in retrospect?
- Why didn’t strategic bombing get banned before WW2?
- How did the attitude of the Japanese towards other Asians compare to the attitude of Nazis to Slavs?
- What’s the history of German, Slavic, and Japanese attitudes to other races?
- My impression of WW2 is that the US and UK were pretty morally upstanding compared to other nations, with the main stain on their reputation coming from their bombing of civilians in cities. Am I missing something here?

## International relations/political science

- theoretical IR stuff:
- Why did people argue for realism over neoliberalism?
- How much empirical evidence is there for constructivism?
- How do leaders generally value their own power vs the welfare of their nation?
- How much has the defense-dominance vs offence-dominance of military technology changed over time?
- Is Japan purposefully pretending to have a weaker economy than it actually has, to prevent other countries from imposing tariffs on it out of fear? Someone told me this and I don’t know if it’s true.
- How strong a democracy is America compared to other countries? What metrics could we use? I want this because it would be interesting to see if this allows predictions of future stability. Eg if the probability of a civil war in a given year is equal to one tenth of the probability that the head of the government lies to the judicial branch of the government that year, then we could use the more common event to forecast the less common event. Possible metrics:
- Rate of civil wars/coups
- Rate of rigging elections
- Rate of federal government lying to other branches of government. I know of two cases: Nixon and the sketchiness around Japanese internment. How many more are there?

## Machine learning

- Deep learning
- Adam, RmsProp
- Details of attentional models like the Transformer
- Variational auto encoders
- global average pooling
- Inception score
- Resnet
- Variational inference, evidence lower bound, etc
- I now know what VI is, I think
- Markov chain monte carlo
- I think I now understand this
- SVMs
- What’s the statistical interpretation of logistic regression? It’s something about a maximum likelihood Gaussian or something, but I don’t remember the details.
- What’s the statistical interpretation of an MLP?
- Neural net distillation
- What are the asympototic difference between performance of evolution and gradient descent? (and gradient descent with momentum)
- Hessian-free optimization

Things I don’t know from theoretical ML:

- Jacobian
- the matrix of partial derivatives in a function from vectors to vectors
- Hessian
- square matrix of second order partial derivatives of a function from vectors to scalars
- Kalman filters
- What’s the relationship between Markov random fields and the exponential family
- Marginal polytope?
- the geometric structure which consistent marginal distributions need to be on
- Are phase transitions (from Bethe lattices) related to anything in ML?
- What is the Dirichlet distribution
- It’s like the beta distribution, but for n != 2
- What does “linear Gaussian” mean
- What are a bunch of sufficient statistics for the exponential family?
- more generally, what does “sufficient statistic” mean
- How do you do that MLE thing for exponential distributions in general?
- how does it work for graphical models?
- Can you do some kind of dynamic programming thing where you find the best causal model of a bunch of data by mixing over PGMs of different connectivities?
- How do conjugate priors work in general for the exponential family?
- Naive Bayes
- What’s the connection between Markov random fields and the exponential family?

## Math

- calculus of variations, Lagrange multipliers
- I understood this for a brief period once!
- Lagrange multiplier type things for constrained optimization
- Why is group theory cool? This is embarrassing to admit: I like abstract algebra but I don’t actually know of concrete situations where it’s very helpful.
- Contour integration
- What are the Clebsch Gordan coefficients? How does this work?
- Representation theory
- I don’t know about the different types of infinities
- Category theory
- I should relearn the definitions of sum, product, and exponentials
- i now know sum and product, I think
- Sigma algebras
- What is the proper scoring rule for confidence intervals?
- answer is here but I haven’t read it
- Linear algebra
- I’m sure there’s a whole bunch of conceptual linear algebra points I don’t know.
- Spectral graph theory
- i’d love to learn this material about the spectral graph theory of planar graphs

Here’s my current state of knowledge:

- Calculus
- I am pretty confident that I know how differentiation works.
- I’m happy with multivariate calculus (though I get confused when I try to do things like converting a PDE into radial coordinates)
- I think I’m reasonably capable at understanding ODEs and PDEs, though I’m not good at solving them manually
- Linear algebra
- I know most of the concepts from intro linear algebra, plus perhaps some extra concepts from machine learning or statistics or quantum mechanics
- Abstract algebra
- I know very basic theory of groups, rings, fields, vector spaces, and categories.
- I know what representation theory is but not how to do useful things with it
- Logic
- I know some basics. Eg propositional logic, predicate logic, higher order logics, intuitionistic logic, satisfiability vs consistency, soundness vs completeness. I know what the arithmetical hierarchy is.
- Number theory
- I know very little
- Graph theory
- i know very little
- Real analysis
- I can do basic epsilon delta definitions. I know what metric spaces are.

## Paul Christiano’s AI safety stuff

Concepts I don’t understand

- Meta-execution
- learning the right model
- Annotated functional programming
- benign model-free RL
- benign model-based RL (https://ai-alignment.com/aligned-search-366f983742e9)

Concepts I do understand

- reliability and robustness
- oversight/reward learning
- Two approaches: IRL, learning from human feedback
- deliberation and amplification
- secure
- competitive
- scalable
- adversarial training
- ensembling and consensus
- Against mimicry
- When can mimicry go wrong? robot and blocks example
- Weak and strong HCH
- KWIK learning— knows what it knows

## Philosophy

- Type-A Physicalism

## Physics

- Classical mechanics
- I don’t totally understand Lagrangian mechanics of fields.
- I don’t know much about orbital mechanics
- Legendre transforms
- I think the most important hole is that I don’t really understand where Poisson brackets come from in Hamiltonian mechanics
- Cosmology
- I don’t quite know how we measure some stuff.
- Things I think I know: We know how far away stars are because many of them happen to be a very similar size to each other? (Why is that?) And then we know how fast they’re going because they have hydrogen spectra that is redshifted.
- Why did our solar system form planets instead of remaining a cloud of stuff? Why does mass congeal?
- I should read the first few chapters of the Loeb book about this
- How do we know the age of the universe?
- I think you can guess it by looking at how fast things are moving away from you as a function of how far away they are, and then extrapolating to see when they would have all been in the same place.
- What is the distribution of baryonic matter? This one is absurdly hard to look up.
- Special relativity
- What is the question whose answer is preserved by Lorentz boosts and rotations?
- The literal answer is “what is the time squared minus distance squared between the two events, where time is converted to distance by multiplication by the speed of light”. This follows from wanting the speed of light to appear as the same from all coordinate frames.
- How do spinors change under Lorentz transformations?
- General relativity
- I’m on the path to understanding this; I just need to read the relevant chapters of “The Physical World” more carefully. Example concepts that I am currently shaky on:
- Christoffel symbols
- Gaussian curvature
- What’s the Gaussian curvature of a quadratic bowl
- Riemann curvature tensor
- Einstein tensor
- Einstein field equation
- Stress-energy tensor
- Ricci curvature tensor
- https://en.wikipedia.org/wiki/Scalar_curvature "In general relativity, the scalar curvature is the Lagrangian density for the Einstein–Hilbert action. The Euler–Lagrange equations for this Lagrangian under variations in the metric constitute the vacuum Einstein field equations, and the stationary metrics are known as Einstein metrics. The scalar curvature of an n-manifold is defined as the trace of the Ricci tensor, and it can be defined as n(n − 1) times the average of the sectional curvatures at a point.”
- Linearized Einstein field equations
Gauss-Bonnet theorem

- After I learn more GR, here are some questions I want to be able to answer:
- Given a mass distribution, how do I calculate the geodesic of a light ray? Eg, calculate the path taken by light near a star.
- Also, how do I calculate the geodesic of a small test mass?
- Does the path of an object get closer and closer to the path of light as speed gets closer to c?
- How do I calculate the paths of planets with first-order corrections from GR? (Or second order corrections.)
- How much of a correction do I need in order to see gravitational waves?
- Does GR make sense in 2D?
- Quantum field theory
- I am still learning QFT. I am not currently able to reliably execute QFT calculations using Feynman diagrams, but I’m working on it.
- What’s the 4-momentum of a photon?
- Relatedly, what does the conjugate momentum of the Maxwell field mean?
- Under what conditions does a Lagrangian ensure that the L2 norm of a field stays constant over time
- I think this is just an easy Noether’s theorem application? I think it might be something like “if there’s a phase symmetry in this field, the total amount of thing stays constant”. But I don’t really know.
- What is gauge invariance?
- It’s something like “This field seems physically real, but everything is the same if the field has a constant factor added”. Or something like that. People seem to think it’s very physically important and I don’t understand why.
- Gauge covariant derivatives
- What does it mean when the Hamiltonian “couples” to something?
- It means that there’s a term in the potential energy part of the hamiltonian that relates the two things
- What does positive definiteness mean; why does energy have to be positive definite? P&S p77
- How does statistics arise from spin?
- this is covered in P&S in chapter 4
- Why do you get the Dirac equation for spin 1/2 things, but the Klein-Gordon equation for spin 0 things? What do you get for things with spins other than that? Is there some more general equation that takes a spin and gives you the right equation?
- "The Hamiltonian and Lagrangian which are rather abstract constructions
in classical mechanics get a very simple interpretation in relativistic quantum
mechanics. Both are proportional to the number of phase changes per
unit of time. The Hamiltonian runs over the time axis while the Lagrangian
runs over the trajectory of the moving particle, the t’-axis.” from here
- Why do same-charge particles repel each other in electromagnetism but attract in gravity?
- From a different page: "See Zee (Quantum Field Theory in a nutshell), Chapter 1.5, for a complete discussion.”
- Can you have stable muons inside a white dwarf? I don’t understand the arguments here.
- I want to have a list of all possible types of interaction in QFT, a la the discussion in P&S section 4.1
- Easy: Are the values of the Dirac wavefunction complex or real?
- they’re complex: it’s a bispinor
- Is QFT (eg the Klein-Gordon equation) really second order in time? That seems ugly.
- I don’t think so.
- Nonrelativistic quantum mechanics
- Perturbation theory
- I don’t know higher order perturbation theory—I only understand first order perturbation theory because it’s exactly the same as the variational principle AFAICT
- I should revise the use of time-dependent perturbation theory to predict transition probabilities
- Why do interactions between electrons and photons have to take particles to eigenstates?
- it’s something about how matrices diagonalize or something
- Ladder operators for the harmonic oscillator
- Scattering
- Rotation group as applied to the quantum mechanics of spin
- I think I still don’t really understand the angular momentum stuff in QM.
- I still don’t totally get Slater determinants
- Zeeman effect
- canonical commutation relations—why do some people start their explanation of QM with these?
- Ah, I understand this better now—it naturally follows from the Poisson brackets in Hamiltonian classical mechanics. But I still don’t really know where these come from. In general I don’t understand the Heisenberg picture very well.
- After I understand this properly, I should look at the justification for doing QFT with canonical quantization.
- Why does Galilean relativity lead to conservation of mass?
- Approximation methods:
- Quantum monte carlo methods
- This is an approach where you directly simulate the many-electron system. Instead of trying to explicitly represent the complete many-body wavefunction, you sample it.
- Variational Monte Carlo
- One of the largest gains in accuracy over writing the wave function separably comes from the introduction of the so-called Jastrow factor. In this formulation the wave function assumes independence except that it incorporates a term that depends on the distance between two particles in a configuration whose form is to be determined.
- Diffusion Monte Carlo
- "DMC is potentially numerically exact, meaning that it can find the exact ground state energy within a given error for any quantum system. When actually attempting the calculation, one finds that for bosons, the algorithm scales as a polynomial with the system size, but for fermions, DMC scales exponentially with the system size. This makes exact large-scale DMC simulations for fermions impossible; however, DMC employing a clever approximation known as the fixed-node approximation can still yield very accurate results.”
- Density functional theory
- What exactly is the effective Hamiltonian? Am I right that the electronic density function is a function from position to $$\mathbb{R}$$?
- What behavior is present in the approximations made in DFT in practice? How much of normal chemistry is present?
- Hellmann-Feynman theorem
- I still don’t why things look different. That is: white, black, green, transparent, reflective.
- Part of the story is that a photon can only be absorbed if it corresponds to a gap in energy levels. In glass, the band gap is big and so visible light isn’t energetic enough to knock an electron into an excited state.
- But this would predict that metals would be black. ???
- I think that the search term “electromagnetic response” is part of the key here.
- You can allegedly use the Kubo formula for calculating the linear response of an observable quantity from a perturbation.
- Density matrices
- Condensed matter physics
- i basically don’t know any of this stuff.
- I’m told Fermi liquid theory is cool.
- I was recommended "Fundamentals of Condensed Matter Physics” by Cohen and Louie
- Nuclear physics
- There’s a shell model and a liquid drop model. The shell model takes into account the fact that the nuclei have to occupy states with energy that increases quadratically and capacity that increases according to 2n**2. But these models aren’t super solid. How not accurate are they? Is the inaccuracy just because we don’t know how the strong nuclear force works? How did we try to model them in the 1940s?
- I now understand these better. In particular I understand that the models are semi-empirical—we don’t even try to model the interactions from first principles, we just try to fit empirical parameters into our models to match reality as well as we can.
- How do atomic bombs work? How complicated are the rules required to see what reactions are possible?
- How good is the liquid drop model at predicing all observed binding energies?
- What is enriched uranium?
- Computing stuff
- I don’t understand what negentropy is
- Quantum computing
- I basically don’t remember anything about quantum computing
- Acoustics
- Why does Jeff Kaufman’s pipe sound like it does? What is the physics that tells me how to guess the resonant frequency of any pipe system? What is the analogy between air pipe resonance and circuit resonance?
- Helmholtz resonance
- Misc
- How do lasers work?
- How do LEDs work?
- How do semiconductors work?
- Crazy speculative stuff
- Why don’t magnetic monopoles exist? What would happen if they did?

## Reinforcement learning

- Interesting questions:
- How good at Atari is a randomly initialized conv net with a linear policy?
- If you have your q-function predict immediate reward and returns of a state-action pair separately, does that improve performance (a la distributional RL, duelling Q nets, etc)
- Can you use distributional RL for exploration?
- Distributional RL—which auxillary tasks work well? Does predicting variance and skew work?
- Do people use RL algorithms which actually iterate on hidden state (cf Dario on model-free vs model-based)
- How good are LSTM q-functions on Atari compared to the traditional stacked-frame approach?
- What’s AlphaZero’s elo per watt
- there’s a paper called ReLuplux or something, to do with dual spaces and linear programming and verifying neural nets
- How well does RL do on tasks where linear policies don’t work at all?
- Generalized advantage estimation
- vanishing gradients
- I’ve forgotten the RL stuff I learned before. At some point I should draw a list of connections between RL ideas and things in deep RL.
- Eg td(lambda)
- WRT policy gradients: Finite difference methods, likelihood ratio methods, REINFORCE, natural policy gradients
- Variance reduction, in policy gradients
- local response normalization
- q-prop
- control variate
- How powerful is AlphaZero’s value function
- according to the paper, it’s better than most but not all human experts
- How does DeepMind’s Labyrinth agent work—what algorithm is it running?
- How do RL algorithms output continuous-valued actions?
- Josh told me that Mujoco can mostly be solved with linear policies. What does he mean?
- How exactly is AlphaZero’s network shaped
- How do the actor and critic in A2C differ from the nets in duelling q-nets? Do the actor and critic in A2C usually share non-final layers?
- yes, they usually share non-final layers
- According to this post Deep RL only works well and better than other things when (a) you can run simulations cheaply (so sample efficiency doesn't matter much) but (b) you can't plan against the true model (so you can't use normal control theory or tree search). Do conditions (a) and (b) ever actually obtain in the real world? Alternatively, is this claim just wrong? (HT Daniel Filan)

things formerly on this list:

- linear policies
- I wasn't quite clear on how convolution across multiple timesteps works. Now I understand
- Conv nets—I don’t really understand them very well and should do the tutorial
- I feel a little better now
- Action advantage/Advantage function
- I think I know this now! This is how much better an action is than other actions would be.
- Specifics of convolution: what are the arguments? I should consider rewriting these functions myself.
- Max pooling
- Python’s named_tuple things
- qfunc network vs target network
- Dueling DQN
- Double DQN
- model-free RL
- model-based RL
- batch normalization
- TRPO

## Statistical mechanics

- Equipartition theorem
- Apparently this implies the Dulong-Petit law—the heat capacity of a solid is proportional to 3 times the “gas constant” times the number of molecules. So elements with lower atomic weights have higher heat capacity per unit mass.
- This law overestimates the heat capacity, because nearby molecules have correlated motion.
- Thermal wavelength
- Maximum entropy thermodynamics
- Suppose I have a bunch of oxygen and a bunch of nitrogen in tanks next to each other, at the same temperature. This has lower entropy than the state where the gasses are at diffusive equilibrium. How in general do I extract energy from the process of mixing them?
- I don’t know about the different kind of cycles and stuff
- I have now decided that I don’t care much about this
- Deriving the Sackur-Tetrode equation for entropy from information theory

- "In addition to using the thermodynamic perspective of entropy, the tools of information theory can be used to provide an information perspective of entropy. The Sackur–Tetrode equation for entropy can be derived in information theoretic terms. The equation can be seen to consist of the sum of four entropies (missing information) due to positional uncertainty, momenta uncertainty, the quantum mechanical uncertainty principle and the indistinguishability of the particles.” this sounds pretty dope and I don’t understand it
- Calculating a black hole radius from no-hair and thermodynamics.
- $$dU = T \cdot dS - P \cdot dV + \mu \cdot dN$$
- I am sort of familiar with these facts now
- How do we use fundamental physical models to calculate:
- thermal conductivity
- phase transitions
- Why does a mix of oxygen and nitrogen start to liquify at below the freezing point of either oxygen or nitrogen? (This is explained in 5.4 of Schroeder.)
- I still don’t know anything about how engines and refrigerators work, but I don’t know if I care
- I still don’t quite understand how black body radiation works. I should read Manton and Mee section 10.10 properly.

Things I now understand:

- Some systems have their temperature fall as their energy rises. And allegedly some have their entropy fall as their energy rises? I don’t get this.
- Negative temperatures. For example, a paramagnet with more than half its bits in the odd side of things. These try to give their energy to any finite-temperature system that’s nearby.
- $$\frac1T = \frac{\partial S}{\partial U}$$
- Gibbs free energy is defined as $$G = U + PV - TS$$: this is the amount of energy required to create the thing and make space for it, if you’re allowed to use temperature from the environment.
- What proportion of atoms in a paramagnet have aligned spins, as a function of temperature?
- Enthalpy: the energy of a system plus the work required to make room for it, in an environment with constant pressure. Enthalpy H = U + P V.
- What precisely is a degree of freedom? How come I can’t turn my degree of freedom into more degrees of freedom by interpreting “horizontal position” with some weird ugly function from R -> R^2?
- How do fractional degrees of freedom work?
- The answer to this is that “degrees of freedom” are just an approximation to the underlying reality. What actually matters is all of the definite energy wavefunctions available, and their energy levels. You can’t turn your degree of freedom into more degrees of freedom with a space-filling curve because your degree of freedom was just being approximated as a continuous space and if you apply a space-filling curve then you break the approximation. The reason we bother talking about “degrees of freedom” at all is that often in practice we actually do have degrees of freedom that can be modelled pretty well as uncorrelated. If our degrees of freedom of a system (let’s call them a and b) are "orthogonal", this means that we can write the energy of the system as the sum of a function of a and a function of b.
- Fractional degrees of freedom occur when your energy function can’t be separated out into a function of many of its degrees of freedom.
- An example model of fractional degrees of freedom is the Ising model.
- Imagine you have a cylinder that's a centimeter wide and 1000km tall, with a base here on Earth somewhere. You have some amount of gas in it. How does pressure vary with height in the cylinder?
- I think this is the barometric pressure equation.
- What is the distribution of velocities and positions in this cylinder?
- When we extract energy from increasing entropy (eg by mixing two substances), where does the energy come from? Kinetic energy of the molecules?
- yes

## Statistics

- Time series modelling
- ANOVA
- correlation coefficient
- Variance explained