Requests for Research 2024


Included below are a few interesting, concrete problems across machine learning I think would be worth further exploring. If you know of anyone working on the request, or of working implementations of the aforementioned please point me towards them by email! Consider this far far less a definitive set of open problems, and more things I want to be thinking about more. For the comically inclined, consider this a winter arc post.

Machine Learning:

deep learning

Further exploration into these works on why the ~weights don’t move~ during training and its implications for pruning, selective quanitization, or distillation: https://arxiv.org/abs/2012.02550, https://x.com/jxbz/status/1845146670365016427, https://arxiv.org/abs/1902.06720, https://arxiv.org/abs/2409.20325

Adapting NTK analysis for deep networks or combining with sparse autoencoders to understand how capabilities across various training regimes. Would love to have a “profiler” of sorts which informs various curriculum learning strategies based on how weights “saturate” at lower scales.

inference soft-weights

Was an optimist in inference time improvements before o1-style reasoning became all the rage. Belief is vindicated by in-context learning results, and I think the most personally interesting (and personally accessible) research will come from what can wryly be written off as “prompt optimization.” To the compute wrappers :)

I have a hypothesis—there is some notable relationship in LLM output quality to the perplexity of the input prompt if it was written by the LLM

I don’t know what the relationship will be (positively proportional/negatively proportional) but I imagine since transformer is coaxed for next token completion, even when SFT’d for QA the “information density” of the proposal distribution might have an impact on outputs for similarly constrained questions (and quite possibly best seen in the varentropy I think of the generated outputs)

More curriculum learning SFT ought to be done with negative examples on backspace token. Perhaps humans are more sample efficient because we can more precisely we are wrong.. hmm… (A similar problem befell 2021 crypto meme coin valuations’ inabiility to observe the EMH—lack of liquidity for cheap short borrowing to calibrate prices)

Need to refine this but improvements on entropy sampling (if entropy sampling is even valuable): …but could the issue be with our definition of best (even though training encourages these deterministic sequence generations as being logically correct) maybe we want to encourage sequences that have the highest information density, but density isn’t just inverse of entropy, but the delta in variance of entropy”. Seems promising (October 9, 2024). Could also be additional varentropy by semantic dissimilarity between the rest of the selected text. That veers too much.

THIS FORMALLY MAKES A WIP POST Will explore tomorrow and formulate my ideas more

Note to self: Broadly anything that explores and better informs our understanding of a models representation for something, and how it expands over it, would be nice. This is super broad and I am going to add a “thank you to Sasha” note after finishing this for refining my thinking.

More for myself:

Conclusion

Finally if I were going to distill how much I know about ML (warning: not a lot!) down into a few key resources I wish I engaged with sooner, I’d point you to the appendix of Mohri’s Foundation of Machine Learning (really give this all a shot), and Karpathy’s GPT Lectures, and Umar Jamill.

Iteract with code as much as you can (I should be interacting with code now, what am I doing writing this blog post) and make sure you’re trying to do something new ASAP. Still intend to go through transformercircuits.pub which is a treasure trove of resources I’m incredibly thankful for

If any of this was poorly constructed or misguided please reach out at itsonlysoleil, and if any of this spoke to you (and you live in the SF/Bay Area) you might consider joining our reading group each Saturday at 2:30.