UCSD HDSI-TILOS "LLM Meets Theory" Workshop 2024

Invited Speakers

(alphabetically ordered)

Pierre Baldi

Distinguished Professor

Information and Computer Sciences

UCI

Sébastien Bubeck

Sr. Principal Research Manager

Machine Learning Foundations

Microsoft Research

David Danks

Professor

HDSI & Philosophy

UCSD

Heng Ji

Professor

Computer Science

UIUC, Amazon Scholar

Nanyun (Violet) Peng

Assistant Professor

Computer Science

UCLA

Dan Roth

Eduardo D. Glandt Distinguished Professor

Computer Science

Amazon AWS AI / UPenn

Terry Sejnowski

Francis Crick Chair

The Salk Institute

Mahdi Soltanolkotabi

Associate Professor

ECE and Computer Science

USC

Click to see the talk and speaker details

Terry Sejnowski

Title: Brains and AI
Abstract: The talk will focus on state space models for transformers.
Bio: Terrence Sejnowski is a pioneer in computational neuroscience and his goal is to understand the principles that link brain to behavior. His laboratory uses both experimental and modeling techniques to study the biophysical properties of synapses and neurons and the population dynamics of large networks of neurons. New computational models and new analytical tools have been developed to understand how the brain represents the world and how new representations are formed through learning algorithms for changing the synaptic strengths of connections between neurons. He has published over 300 scientific papers and 12 books, including The Computational Brain, with Patricia Churchland.

Heng Ji

Title: LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
Abstract: Today's large language models (LLMs) typically train on short text segments (e.g., <4K tokens) due to the quadratic complexity of their Transformer architectures. As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts such as encoding scientific articles, code repositories, or long dialogues. Through theoretical analysis and empirical investigation, this work identifies three major factors contributing to this length generalization failure. Our theoretical analysis further reveals that commonly used techniques like truncating the attention window or relative positional encodings are inadequate to address them. Answering these challenges, we propose LM-infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts. LM-infinite is highly flexible and can be used with most modern LLMs off-the-shelf. Without any parameter updates, it allows LLMs pre-trained with 2K or 4K-long segments to generalize to up to 200M length inputs while retaining perplexity. It also improves performance on downstream tasks such as Passkey Retrieval and Qasper in the zero-shot setting. LM-infinite brings substantial efficiency improvements: it achieves 2.7 times$ decoding speed up and 7.5 times memory saving over the original model.
Bio: Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department and Coordinated Science Laboratory of University of Illinois Urbana-Champaign. She is an Amazon Scholar. She is the Founding Director of Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE). She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge-enhanced Large Language Models, Knowledge-driven Generation and Conversational AI. She was selected as a Young Scientist to attend the 6th World Laureates Association Forum, and selected to participate in DARPA AI Forward in 2023. She was selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. She was named as part of Women Leaders of Conversational AI (Class of 2023) by Project Voice. The other awards she received include "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, PACLIC2012 Best paper runner-up, "Best of ICDM2013" paper award, "Best of SDM2013" paper award, ACL2018 Best Demo paper nomination, ACL2020 Best Demo Paper Award, NAACL2021 Best Demo Paper Award, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014 and Bosch Research Award in 2014-2018. She was invited to testify to the U.S. House Cybersecurity, Data Analytics, & IT Committee as an AI expert in 2023. She was invited by the Secretary of the U.S. Air Force and AFRL to join Air Force Data Analytics Expert Panel to inform the Air Force Strategy 2030, and invited to speak at the Federal Information Integrity R&D Interagency Working Group (IIRD IWG) briefing in 2023. She has coordinated the NIST TAC Knowledge Base Population task since 2010. She was the associate editor for IEEE/ACM Transaction on Audio, Speech, and Language Processing, and served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018 and AACL-IJCNLP2022. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2023.

Pierre Baldi

Title: The Quarks of Attention
Abstract: Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, several attention-based neural network architectures have been proposed to tackle problems in natural language processing (NLP) and beyond, including transformer architectures which currently achieve state-of-the-art performance in NLP tasks and power modern large language models (LLMs) In this presentation we will: 1) identify and classify the most fundamental building blocks (quarks) of attention, both within and beyond the standard model of deep learning; 2) identify how these building blocks are used in all current attention-based architectures, including transformers; 3) demonstrate how transformers can effectively be applied to new problems in physics, from particle physics to astronomy; and 4) present a mathematical theory of attention capacity where, paradoxically, one of the main tools in the proofs is itself an attention mechanism. We will showcase some recent applications of LLMs and discuss open questions.
Bio: Pierre Baldi earned MS degrees in Mathematics and Psychology from the University of Paris, and a PhD in Mathematics from the California Institute of Technology. He is currently Distinguished Professor in the Department of Computer Science, Director of the Institute for Genomics and Bioinformatics, and Associate Director of the Center for Machine Learning and Intelligent Systems at the University of California Irvine. The long term focus of his research is on understanding intelligence in brains and machines. He has made several contributions to the theory of AI and deep learning, and developed and applied AI and deep learning methods for the natural sciences, to address problems in physics (e.g., exotic particle detection) , chemistry (e.g., reaction prediction), and bio-medicine (e.g., protein structure prediction, biomedical imaging analysis). He recently published his fifth book: Deep Learning in Science, Cambridge University Press (2021). His honors include the 1993 Lew Allen Award at JPL, the 2010 E. R. Caianiello Prize for research in machine learning, and election to Fellow of the AAAS, AAAI, IEEE, ACM, and ISCB. He has also co-founded several startup companies.

Sébastien Bubeck

Title: Small Language Models
Abstract: How far can you go with a 1B parameters transformer? Turns out, pretty far if you feed it the right data.
Bio: Seb Bubeck used to spend his time thinking about the theory of machine learning. With the advent of miraculous LLMs, he switched to building LLMs to get some insights into how the miracle happens.

Mahdi Soltanolkotabi

Title: Feature Learning in Simple Neural Networks and Prompt-tuned Transformers
Abstract: One of the major transformations in modern learning is that contemporary models trained through gradient descent have the ability to learn versatile representations that can then be applied effectively across a broad range of down-stream tasks. Existing theory however suggests that neural networks, when trained via gradient descent, behave similar to kernel methods that fail to learn representations that can be transferred. In the first part of this talk I will try to bridge this discrepancy by showing that gradient descent on neural networks can indeed learn a broad spectrum of functions that kernel methods struggle with, by acquiring task-relevant representations. In the second part of the talk I will focus on feature learning in prompt-tuning which is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. We demystify how prompt-tuning enables the model to focus attention to context-relevant information/features.
Bio: Mahdi Soltanolkotabi is the director of the center on AI Foundations for the Sciences (AIF4S) at the University of Southern California. He is also an associate professor in the Departments of Electrical and Computer Engineering, Computer Science, and Industrial and Systems engineering where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is the recipient of the Information Theory Society Best Paper Award, Packard Fellowship in Science and Engineering, an NIH Director’s new innovator award, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), the Viterbi school of engineering junior faculty research award, and faculty awards from Google and Amazon. His research focuses on developing the mathematical foundations of modern data science via characterizing the behavior and pitfalls of contemporary nonconvex learning and optimization algorithms with applications in deep learning, large scale distributed training, federated learning, computational imaging, and AI for scientific and medical applications.

Nanyun (Violet) Peng

Title: Towards Empowering Large Language Models with Creativity
Abstract: Recent advances in large auto-regressive language models have demonstrated strong results in generating natural languages and significantly improved the performances for applications such as dialogue systems, machine translation, and summarization. However, the auto-regressive paradigm trains models to capture the surface patterns (i.e. sequences of words) following the left-to-right order, instead of capturing underlying semantics and discourse structures. It is also hard to impose structural or content control/contraints to the model. In this talk, I will present our recent works on controllable natural language generation that go beyond the prevalent auto-regressive formulation with the goal to improve controllability and creativity. We propose novel insertion-based generation models and controllable decoding-time algorithms to steer models to better conform to constraints, with applications to creative poetry generation, lyric generation, and keyword-to-text generation.
Bio: Nanyun (Violet) Peng is an Assistant Professor of Computer Science at the University of California, Los Angeles. She received her Ph.D. in Computer Science from Johns Hopkins University, Center for Language and Speech Processing. Her research focuses on the generalizability of NLP models, with applications to creative language generation, low-resource information extraction, and zero-shot cross-lingual transfer. Her works have won the Outstanding Paper Award at NAACL 2022, the Best Paper Awards at AAAI 2022 DLG workshop, EMNLP 2023 PAN-DL workshop, and have been featured at the IJCAI 2022 early career spotlight. Her research has been supported by NSF, DARPA, IARPA, NIH grants and various industrial faculty awards.

Dan Roth

Title: Reasoning Myths about Language Models: What is Next?
Abstract: The rapid progress made over the last few years in generating linguistically coherent natural language has blurred, in the mind of many, the difference between natural language generation, understanding, and the ability to reason with respect to the world. Nevertheless, robust support of high-level decisions that depend on natural language understanding, and one that requires dealing with “truthfulness” are still beyond our capabilities, partly since most of these tasks are very sparse, often require grounding, and may depend on new types of supervision signals. I will discuss some of the challenges underlying reasoning and argue that we should focus on LLMs as orchestrators – coordinating and managing multiple models, applications, and services, as a way to execute complex tasks and processes. I will discuss some of the challenges and present some of our work in this space, focusing on supporting task decomposition and planning.
Bio: Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, a VP/Distinguished Scientist at AWS AI, and a Fellow of the AAAS, the ACM, AAAI, and the ACL. In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.” Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory. He was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR), has served as the Program Chair for AAAI, ACL and CoNLL, and as a Conference Chair for a few top conferences. Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was acquired by Reveal in 2020. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.

David Danks

Title: The impossibility of governing (current) LLMs
Abstract: LLMs are spreading rapidly across all sectors, resulting in many calls to govern them, whether for safety, public benefit, economic gains, or some other goal. In this talk, I will argue that the standard tools of governance (regulation, post hoc testing, and so forth) will always be inadequate for LLMs as we currently understand them. That is, LLMs simply cannot be governed in the usual ways, at least not without massive changes. I will suggest what some of those changes might be, and also explore how we might move towards more governable LLMs.
Bio: David Danks is Professor of Data Science & Philosophy at University of California, San Diego. He both builds AI systems, and also examines the ethical and policy issues around AI and robotics.

08:00 - 08:45	Arrival and Light Breakfast
08:45 - 09:00	Opening Remarks by Rajesh Gupta (HDSI), Yusu Wang (TILOS), and Misha Belkin
09:00 - 10:00	Terry Sejnowski (The Salk Institute): Brains and AI
10:00 - 11:00	Heng Ji (UIUC, Amazon Scholar): LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
11:00 - 11:30	Break
11:30 - 12:30	Pierre Baldi (UCI): The Quarks of Attention
12:30 - 02:00	Lunch Break
02:00 - 03:00	Sébastien Bubeck (Microsoft Research): Small Language Models
03:00 - 04:00	Mahdi Soltanolkotabi (USC) Feature Learning in Simple Neural Networks and Prompt-tuned Transformers
04:00 - 05:00	Panel Discussion: Heng Ji, Pierre Baldi, and Mahdi Soltanolkotabi, Moderator: Sanjoy Dasgupta (UCSD)

09:30 - 10:00	Arrival
10:00 - 11:00	Nanyun (Violet) Peng (UCLA): Towards Empowering Large Language Models with Creativity
11:00 - 12:00	Dan Roth (Amazon AWS AI / UPenn): Reasoning Myths about Language Models: What is Next?
12:00 - 01:30	Lunch Break
01:30 - 02:30	David Danks (UCSD): The impossibility of governing (current) LLMs
02:30 - 03:30	Panel Discussion: Terry Sejnowski, Nanyun (Violet) Peng, Dan Roth, and David Danks, Moderator: Jingbo Shang (UCSD)

Welcome

Registration

Important Dates

Schedule

Friday, March 15

Saturday, March 16

Invited Speakers

Terry Sejnowski

Heng Ji

Pierre Baldi

Sébastien Bubeck

Mahdi Soltanolkotabi

Nanyun (Violet) Peng

Dan Roth

David Danks

Organizers

Venue

Contact

Welcome

Registration

Important Dates

Schedule

Friday, March 15

Saturday, March 16

Invited Speakers

Terry Sejnowski

Heng Ji

Pierre Baldi

Sébastien Bubeck

Mahdi Soltanolkotabi

Nanyun (Violet) Peng

Dan Roth

David Danks

Organizers

Sponsors

Venue

Contact