Index

AI

Questions

  • How do we design benchmarks resistant to shortcut learning and actually evaluate how strong a model is at NLP for example?
  • Should AI progress be made by chasing SOTA on benchmarks or are there alternative routes?
  • How can we measure “sentience” of AI? Parrots are nowhere as good as GPT-3, yet we would intuit that they have more consciousness than GPT-3 does.
    • Why is that?
    • Is this intuition correct?
  • Has the AI Spring/Winter pattern cropped up in any other fields? Is this a pattern unique to AI, or is there some underlying cause for this over-optimism?
  • Why is symmetry so prevalent in nature? Can symmetry be harnessed for neural network design?
  • Could you do policy iteration on a neural network?
    • How do you adapt it to continuous?
  • What comes after tokenization?
    • The Bitter Lesson will probably consume it and replace it with something lower level that is more learnable
  • Why is it that a transformer seems infinitely scalable?
  • Why don’t curriculums really do anything? It’s mad that a model is learning calculus and basic algebra at the same time, and that’s seemingly okay?!
  • How do you combine AI with humans in a productive manner?
  • How do you ground AI in the real world?
  • What is wrong with the classic AI agent formulation?
  • What does AI for love look like?
  • Can we use AI for paper replications & verifiability along with finding papers that it disagrees with?
    • Really, this is like accelerating the Kuhnian revolution, AIs should just accumulate and make extremely clear what the contradictions and problems in a field are.
  • How could AI help us avoid monoculture?
  • Do we care more about the Pass@K than the single time?
    • This kind of has to do with not only creativity but error correction, that it should be able to correct itself. This ties into the papers that training a LLM on wrong paths that correct is actually more beneficial so that it can learn to backtrack. It isn’t just showing it the right way to do things, you have to have it explore and then figure out how to self correct. What does this look like in text modelling?
  • Can you train a model purely on not outputting the wrong answers? Does this not converge to the same thing as positive training? What is the difference?
  • Does training on verifiable rewards lead to overall better performance on unverifiable fields as well?

To-read

Bibliography

  • Chiang, T. (2023, February 9). ChatGPT Is a Blurry JPEG of the Web. The New Yorker. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
  • Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J., Rytting, C., & Wingate, D. (2022). Out of One, Many: Using Language Models to Simulate Human Samples (arXiv:2209.06899). arXiv. http://arxiv.org/abs/2209.06899
  • Benton, G. W., Maddox, W. J., Lotfi, S., & Wilson, A. G. (2021). Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling (arXiv:2102.13042). arXiv. http://arxiv.org/abs/2102.13042
  • Chan, S. C. Y., Santoro, A., Lampinen, A. K., Wang, J. X., Singh, A., Richemond, P. H., McClelland, J., & Hill, F. (2022). Data Distributional Properties Drive Emergent In-Context Learning in Transformers (arXiv:2205.05055). arXiv. http://arxiv.org/abs/2205.05055
  • Cong, Y., & Zhao, M. (2022). Big Learning: A Universal Machine Learning Paradigm? (arXiv:2207.03899). arXiv. http://arxiv.org/abs/2207.03899
  • Delétang, G., Ruoss, A., Grau-Moya, J., Genewein, T., Wenliang, L. K., Catt, E., Hutter, M., Legg, S., & Ortega, P. A. (2022). Neural Networks and the Chomsky Hierarchy (arXiv:2207.02098). arXiv. http://arxiv.org/abs/2207.02098
  • Dohan, D., Xu, W., Lewkowycz, A., Austin, J., Bieber, D., Lopes, R. G., Wu, Y., Michalewski, H., Saurous, R. A., Sohl-dickstein, J., Murphy, K., & Sutton, C. (2022). Language Model Cascades (arXiv:2207.10342). arXiv. http://arxiv.org/abs/2207.10342
  • Ha, D., & Tang, Y. (2022). Collective Intelligence for Deep Learning: A Survey of Recent Developments (arXiv:2111.14377). arXiv. http://arxiv.org/abs/2111.14377
  • Haluptzok, P., Bowers, M., & Kalai, A. T. (2022). Language Models Can Teach Themselves to Program Better (arXiv:2207.14502). arXiv. http://arxiv.org/abs/2207.14502
  • Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D. de L., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., Driessche, G. van den, Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., … Sifre, L. (2022). Training Compute-Optimal Large Language Models (arXiv:2203.15556). arXiv. http://arxiv.org/abs/2203.15556
  • Jaderberg, M., Czarnecki, W. M., Osindero, S., Vinyals, O., Graves, A., Silver, D., & Kavukcuoglu, K. (2017). Decoupled Neural Interfaces using Synthetic Gradients (arXiv:1608.05343). arXiv. http://arxiv.org/abs/1608.05343
  • Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., & Stanley, K. O. (2022). Evolution through Large Models (arXiv:2206.08896). arXiv. http://arxiv.org/abs/2206.08896
  • Liu, Z., Kitouni, O., Nolte, N., Michaud, E. J., Tegmark, M., & Williams, M. (2022). Towards Understanding Grokking: An Effective Theory of Representation Learning (arXiv:2205.10343). arXiv. http://arxiv.org/abs/2205.10343
  • McDermott, D. (1976). Artificial intelligence meets natural stupidity. ACM SIGART Bulletin, 57, 4–9. https://doi.org/10.1145/1045339.1045340
  • Power, A., Burda, Y., Edwards, H., Babuschkin, I., & Misra, V. (2022). Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets (arXiv:2201.02177). arXiv. http://arxiv.org/abs/2201.02177
  • Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., Clopath, C., Costa, R. P., de Berker, A., Ganguli, S., Gillon, C. J., Hafner, D., Kepecs, A., Kriegeskorte, N., Latham, P., Lindsay, G. W., Miller, K. D., Naud, R., Pack, C. C., … Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 1761–1770. https://doi.org/10.1038/s41593-019-0520-2
  • Sejnowski, T. (2022). Large Language Models and the Reverse Turing Test (arXiv:2207.14382). arXiv. http://arxiv.org/abs/2207.14382
  • Tay, Y., Dehghani, M., Abnar, S., Chung, H. W., Fedus, W., Rao, J., Narang, S., Tran, V. Q., Yogatama, D., & Metzler, D. (2022). Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? (arXiv:2207.10551). arXiv. http://arxiv.org/abs/2207.10551
  • The Bitter Lesson. (n.d.). Retrieved September 30, 2021, from http://www.incompleteideas.net/IncIdeas/BitterLesson.html
  • Vogelstein, J. T., Verstynen, T., Kording, K. P., Isik, L., Krakauer, J. W., Etienne-Cummings, R., Ogburn, E. L., Priebe, C. E., Burns, R., Kutten, K., Knierim, J. J., Potash, J. B., Hartung, T., Smirnova, L., Worley, P., Savonenko, A., Phillips, I., Miller, M. I., Vidal, R., … Yang, W. (2022). Prospective Learning: Back to the Future. ArXiv:2201.07372 [Cs]. http://arxiv.org/abs/2201.07372
  • Zador, A. M. (2019). A critique of pure learning and what artificial neural networks can learn from animal brains. Nature Communications, 10(1), 3770. https://doi.org/10.1038/s41467-019-11786-6
  • Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization (arXiv:1710.09412). arXiv. http://arxiv.org/abs/1710.09412