I am a Staff Research Scientist at Abridge AI, working on applying my research in multilingual large language models to reinvent healthcare one conversation at a time. Previously, I was a Senior Research Scientist at Meta AI, working on natural language processing research. My current academic pursuits involve understanding the underpinnings of how foundation models learn.
Prior to Meta, I worked on information retrieval, machine translation, and speech recognition as an Applied Scientist at Amazon (AWS) AI. At Amazon, I worked with a group of wonderful scientists whose creativity and enthusiasm are the primary reasons why I have not abandoned society to peddle art NFTs from a sizeable digital estate in Fashion Street district, Decentraland.
I also worked as a Software Engineer at Yahoo and obtained my MS degree in Computer Science from UC San Diego, where I was advised by Prof. Gary Cottrell. During my undergrad, I was fortunate enough to work with Prof. Michael J Tarr exploring how humans perceive faces and scenes.
I am interested in:
- Robust Machine Learning, in particular machine learning models that are robust to noises and distribution shift.
- ML for Social Good, in particular tackling issues in fairness, bias, and misinformation.
- Multilingual Systems, with a focus on zero-shot transfer to low-resource languages.
- Foundation Models, Generative AI, and unsupervised methods for representation learning.
Please send all research and job-related inquiries to davisblaine.liang(at)gmail.com.
|Sep 2, 2023||We are releasing the Belebele dataset, a first-of-its-kind multilingual reading comprehension dataset spanning 122 language variants, 27 language families, and 29 scripts. [Paper] [Github] [Tweet]|
|Aug 28, 2023||I had a great time chatting with the New York Times about generative AI and the role of ML talent in supercharging the field of healthcare.|
|Apr 2, 2023||I’m excited to announce that I’m joining Abridge AI to work on reinventing healthcare for doctors and patients alike!|
|Jan 28, 2023||We are releasing XLM-V, a multilingual model with a 1 million token vocabulary [Link]. The model is also open-sourced in HuggingFace Transformers.|
|Feb 22, 2022||After four years at Amazon, I’ll be moving on to a new role. I’ll officially joining Meta AI (formerly Facebook AI) as a Senior Research Scientist in March!|
Please refer to my Google Scholar for a full list of publications
ArxivXLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language ModelsEMNLP 2023 2023
ArxivThe BELEBELE Benchmark: a Parallel Reading Comprehension Dataset in 122 Language VariantsarXiv preprint arXiv:2308.16884 2023
ArxivAttention-guided generative models for extractive question answeringarXiv preprint arXiv:2110.06393 2021
ArxivEmbedding-based Zero-shot Retrieval through Query GenerationarXiv preprint arXiv:2009.10270 2020
ACLMasked language model scoringACL 2020
EMNLP FindingsImprove transformer models with better relative position embeddingsEMNLP Findings 2020
Resistance AIDecoding and Diversity in Machine TranslationNeurIPS Resistance AI Workshop 2020
SLTLearning noise-invariant representations for robust speech recognition2020
IJCNLPDeep automated multi-task learningIJCNLP 2017