Two Sides of ML Research
a place for research. image generated by ideogram
There are two extremum in the continuum of ml research as I see it — structural research and developmental research. Structural research puts a structure on a problem in a way that didn’t exist previously. In image generation, this takes the form of GANs, score based diffusion models, and consistency models. For language this it could be one of the first rlhf papers. In reinforcement learning, more fundamental algorithm research also takes this form. What I call developmental research, focuses on developing further current structure: making diffusion models more efficient, improving RL training for language models, or decreasing the time for generation. And yet, neither of these types of inquiry can live without the other.
what makes structural research?
In my opinion, structural research must be derived from two types of “laws” — either the laws of math or the laws of code. The laws of math are self explanatory, one can draw upon fundamental ideas in other fields in order to derive a structure in machine learning. A good example for this is score based diffusion models. Song’s SDE paper is based on ideas from simulated annealing and diffusion processes in mathematics, which gave rise to the fundamental structure of score based diffusion. The laws of code are more obscure. These are fundamental laws that cannot be written down as mathematical formula, but rather come as the next lowest level of abstraction that we have for machine learning. They can be discovered through observation of experiments. To isolate these laws, people have come up with synthetic experiments (like physics of language models or scaling laws for transformers) which observe core properties of the laws of code. Good structural research understands these laws, and leverages them to create effective frameworks.
Most structural research that is done is founded based on laws of math (in fact, at the time of writing, almost all is). But as we move to larger models, which are less interpretable, I predict that structural research must also shift into laws of code — through understanding key principles, coming up with structure that exploits empirical model strengths can further progress.
Structural research is a necessary, but not sufficient component on the path towards AGI — these new frameworks spur breakthroughs in fundamental thinking, but can often fail to achieve the empirical results that we will need: neural . This gives rise to the second face of research: developmental research.
what makes developmental research?
Developmental research is building off of a structure that already exists. Thank about an improvement to an RL algorithm, or efficient inference techniques. This type of research starts from an existing structure and refines—or develops it. Doing good developmental research is also hard. One must focus on correcting existing weaknesses in the aforementioned structure, without losing sight of the foundation that underlies it. Beyond that, performance and how the new advancement plays out in the real word are king. The key to good developmental research is finding important empirical (or mathematical) flaws in the problem structure and improving it. I believe that analysis of these fundamental flaws either through intuition, controlled experiments, or derivation can result in impactful developmental research.
the third side of ml research
There is another, equally important side of ml research that I omitted, because I think it’s orthogonal to the spectrum I just outlined. This is exploratory research. Exploration, like in the natural sciences, is the first step to explaining a phenomena. Much interpretability research falls into this category. This type of research is a prerequisite to what I see as the structural/developmental cycle. To even begin structural research, you must first understand the behavior. This, as mentioned is done through laws of code or laws of math. But there is a lot of work that goes into having this understanding, which forms the third side of ml research.
the yin and yang of ml research
Although it may be tempting to lean towards one type of research over the others, one must realize that these directions cannot live without the other. I believe that that the path to AGI is not solely found through the next, better than transformer architecture or the next inference optimization, but a combination approach, blending all three types of research.
This is also just one way to slice research — this continuum of research is just one way I place my work in the broader world. There are many others, and during my life journey, I hope to gain a deeper understanding of these ways.