Davis Liang

Staff Research Scientist at Abridge AI (Formerly Meta AI, Amazon AI)


I am a Staff Research Scientist at Abridge AI, working on applying my research in multilingual large language models to reinvent healthcare one conversation at a time. Previously, I was a Senior Research Scientist at Meta AI, working on natural language processing research. My current academic pursuits involve understanding the underpinnings of how foundation models learn.

Prior to Meta, I worked on information retrieval, machine translation, and speech recognition as an Applied Scientist at Amazon (AWS) AI. At Amazon, I worked with a group of wonderful scientists whose creativity and enthusiasm are the primary reasons why I have not abandoned society to peddle art NFTs from a sizeable digital estate in Fashion Street district, Decentraland.

I also worked as a Software Engineer at Yahoo and obtained my MS degree in Computer Science from UC San Diego, where I was advised by Prof. Gary Cottrell. During my undergrad, I was fortunate enough to work with Prof. Michael J Tarr exploring how humans perceive faces and scenes.

Research Interests

I am interested in:

  • Robust Machine Learning, in particular machine learning models that are robust to noises and distribution shift.
  • ML for Social Good, in particular tackling issues in fairness, bias, and misinformation.
  • Multilingual Systems, with a focus on zero-shot transfer to low-resource languages.
  • Foundation Models, Generative AI, and unsupervised methods for representation learning.


Please send all research and job-related inquiries to davisblaine.liang(at)gmail.com.


Sep 2, 2023 We are releasing the Belebele dataset, a first-of-its-kind multilingual reading comprehension dataset spanning 122 language variants, 27 language families, and 29 scripts. [Paper] [Github] [Tweet]
Aug 28, 2023 I had a great time chatting with the New York Times about generative AI and the role of ML talent in supercharging the field of healthcare.
Apr 2, 2023 I’m excited to announce that I’m joining Abridge AI to work on reinventing healthcare for doctors and patients alike!
Jan 28, 2023 We are releasing XLM-V, a multilingual model with a 1 million token vocabulary [Link]. The model is also open-sourced in HuggingFace Transformers.
Feb 22, 2022 After four years at Amazon, I’ll be moving on to a new role. I’ll officially joining Meta AI (formerly Facebook AI) as a Senior Research Scientist in March!

Selected Publications

Please refer to my Google Scholar for a full list of publications

(*=equal contribution)

  1. Arxiv
    XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
    Liang, Davis, Gonen, Hila, Mao, Yuning, Hou, Rui, Goyal, Naman, Ghazvininejad, Marjan, Zettlemoyer, Luke, and Khabsa, Madian
    EMNLP 2023 2023
  2. Arxiv
    The BELEBELE Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
    Bandarkar, Lucas, Liang, Davis, Muller, Benjamin, Artetxe, Mikel, Shukla, Satya Narayan, Husa, Donald, Goyal, Naman, Krishnan, Abhinandan, Zettlemoyer, Luke, and Khabsa, Madian
    arXiv preprint arXiv:2308.16884 2023
  3. Arxiv
    Attention-guided generative models for extractive question answering
    Xu, Peng*, Liang, Davis*, Huang, Zhiheng, and Xiang, Bing
    arXiv preprint arXiv:2110.06393 2021
  4. Arxiv
    Embedding-based Zero-shot Retrieval through Query Generation
    Liang, Davis*, Xu, Peng*, Shakeri, Siamak, Santos, Cicero Nogueira dos, Nallapati, Ramesh, Huang, Zhiheng, and Xiang, Bing
    arXiv preprint arXiv:2009.10270 2020
  5. ACL
    Masked language model scoring
    Salazar, Julian, Liang, Davis, Nguyen, Toan Q, and Kirchhoff, Katrin
    ACL 2020
  6. EMNLP Findings
    Improve transformer models with better relative position embeddings
    Huang, Zhiheng, Liang, Davis, Xu, Peng, and Xiang, Bing
    EMNLP Findings 2020
  7. Resistance AI
    Decoding and Diversity in Machine Translation
    Roberts, Nicholas, Liang, Davis, Neubig, Graham, and Lipton, Zachary C
    NeurIPS Resistance AI Workshop 2020
  8. SLT
    Learning noise-invariant representations for robust speech recognition
    Liang, Davis, Huang, Zhiheng, and Lipton, Zachary C
    Deep automated multi-task learning
    Liang, Davis, and Shu, Yan
    IJCNLP 2017