Davis Liang

I am a Staff Research Scientist at Abridge AI, working on applying my research in multilingual large language models to reinvent healthcare one conversation at a time. Previously, I was a Senior Research Scientist at Meta AI, working on natural language processing research. My current academic pursuits involve understanding the underpinnings of how foundation models learn.

Prior to Meta, I worked on information retrieval, machine translation, and speech recognition as an Applied Scientist at Amazon (AWS) AI. At Amazon, I worked with a group of wonderful scientists whose creativity and enthusiasm are the primary reasons why I have not abandoned society to peddle art NFTs from a sizeable digital estate in Fashion Street district, Decentraland.

I also worked as a Software Engineer at Yahoo and obtained my MS degree in Computer Science from UC San Diego, where I was advised by Prof. Gary Cottrell. During my undergrad, I was fortunate enough to work with Prof. Michael J Tarr exploring how humans perceive faces and scenes.

Research Interests

I am interested in:

Robust Machine Learning, in particular machine learning models that are robust to noises and distribution shift.
ML for Social Good, in particular tackling issues in fairness, bias, and misinformation.
Multilingual Systems, with a focus on zero-shot transfer to low-resource languages.
Foundation Models, Generative AI, and unsupervised methods for representation learning.

Contact

Please send all research and job-related inquiries to davisblaine.liang(at)gmail.com.

News

Sep 2, 2023	We are releasing the Belebele dataset, a first-of-its-kind multilingual reading comprehension dataset spanning 122 language variants, 27 language families, and 29 scripts. [Paper] [Github] [Tweet]
Aug 28, 2023	I had a great time chatting with the New York Times about generative AI and the role of ML talent in supercharging the field of healthcare.
Apr 2, 2023	I’m excited to announce that I’m joining Abridge AI to work on reinventing healthcare for doctors and patients alike!
Jan 28, 2023	We are releasing XLM-V, a multilingual model with a 1 million token vocabulary [Link]. The model is also open-sourced in HuggingFace Transformers.
Feb 22, 2022	After four years at Amazon, I’ll be moving on to a new role. I’ll officially joining Meta AI (formerly Facebook AI) as a Senior Research Scientist in March!

Selected Publications

Please refer to my Google Scholar for a full list of publications

(*=equal contribution)

Arxiv

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

Liang, Davis, Gonen, Hila, Mao, Yuning, Hou, Rui, Goyal, Naman, Ghazvininejad, Marjan, Zettlemoyer, Luke, and Khabsa, Madian

EMNLP 2023 2023

Bib HTML

@article{liang2023xlmv,
  title = {XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models},
  author = {Liang, Davis and Gonen, Hila and Mao, Yuning and Hou, Rui and Goyal, Naman and Ghazvininejad, Marjan and Zettlemoyer, Luke and Khabsa, Madian},
  journal = {EMNLP 2023},
  year = {2023},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2301.10472.pdf},
  bibtex_show = {true},
  selected = {true}
}

Arxiv

The BELEBELE Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

Bandarkar, Lucas, Liang, Davis, Muller, Benjamin, Artetxe, Mikel, Shukla, Satya Narayan, Husa, Donald, Goyal, Naman, Krishnan, Abhinandan, Zettlemoyer, Luke, and Khabsa, Madian

arXiv preprint arXiv:2308.16884 2023

Bib HTML

@article{bandarkar2023belebele,
  title = {The BELEBELE Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants},
  author = {Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian},
  journal = {arXiv preprint arXiv:2308.16884},
  year = {2023},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2308.16884.pdf},
  bibtex_show = {true},
  selected = {true}
}

Arxiv

Attention-guided generative models for extractive question answering

Xu, Peng*, Liang, Davis*, Huang, Zhiheng, and Xiang, Bing

arXiv preprint arXiv:2110.06393 2021

Bib HTML

@article{xu2021attention,
  title = {Attention-guided generative models for extractive question answering},
  author = {Xu, Peng* and Liang, Davis* and Huang, Zhiheng and Xiang, Bing},
  journal = {arXiv preprint arXiv:2110.06393},
  year = {2021},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2110.06393.pdf},
  bibtex_show = {true},
  selected = {true}
}

Arxiv

Embedding-based Zero-shot Retrieval through Query Generation

Liang, Davis*, Xu, Peng*, Shakeri, Siamak, Santos, Cicero Nogueira dos, Nallapati, Ramesh, Huang, Zhiheng, and Xiang, Bing

arXiv preprint arXiv:2009.10270 2020

Bib HTML

@article{liang2020embedding,
  title = {Embedding-based Zero-shot Retrieval through Query Generation},
  author = {Liang, Davis* and Xu, Peng* and Shakeri, Siamak and Santos, Cicero Nogueira dos and Nallapati, Ramesh and Huang, Zhiheng and Xiang, Bing},
  journal = {arXiv preprint arXiv:2009.10270},
  year = {2020},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2009.10270.pdf},
  bibtex_show = {true},
  selected = {true}
}

ACL

Masked language model scoring

Salazar, Julian, Liang, Davis, Nguyen, Toan Q, and Kirchhoff, Katrin

ACL 2020

Bib HTML

@article{salazar2019masked,
  title = {Masked language model scoring},
  author = {Salazar, Julian and Liang, Davis and Nguyen, Toan Q and Kirchhoff, Katrin},
  journal = {ACL},
  year = {2020},
  abbr = {ACL},
  html = {https://arxiv.org/pdf/1910.14659.pdf},
  bibtex_show = {true},
  selected = {true}
}

EMNLP Findings

Improve transformer models with better relative position embeddings

Huang, Zhiheng, Liang, Davis, Xu, Peng, and Xiang, Bing

EMNLP Findings 2020

Bib HTML

@article{huang2020improve,
  title = {Improve transformer models with better relative position embeddings},
  author = {Huang, Zhiheng and Liang, Davis and Xu, Peng and Xiang, Bing},
  journal = {EMNLP Findings},
  year = {2020},
  abbr = {EMNLP Findings},
  html = {https://arxiv.org/pdf/2009.13658.pdf},
  bibtex_show = {true},
  selected = {true}
}

Resistance AI

Decoding and Diversity in Machine Translation

Roberts, Nicholas, Liang, Davis, Neubig, Graham, and Lipton, Zachary C

NeurIPS Resistance AI Workshop 2020

Bib HTML

@article{roberts2020decoding,
  title = {Decoding and Diversity in Machine Translation},
  author = {Roberts, Nicholas and Liang, Davis and Neubig, Graham and Lipton, Zachary C},
  journal = {NeurIPS Resistance AI Workshop},
  year = {2020},
  abbr = {Resistance AI},
  html = {https://arxiv.org/pdf/2011.13477.pdf},
  bibtex_show = {true},
  selected = {true}
}

SLT

Learning noise-invariant representations for robust speech recognition

Liang, Davis, Huang, Zhiheng, and Lipton, Zachary C

2020

Bib HTML

@article{liang2018learning,
  title = {Learning noise-invariant representations for robust speech recognition},
  author = {Liang, Davis and Huang, Zhiheng and Lipton, Zachary C},
  hournal = {IEEE SLT},
  abbr = {SLT},
  html = {https://arxiv.org/pdf/1807.06610.pdf},
  bibtex_show = {true},
  selected = {true}
}

IJCNLP

Deep automated multi-task learning

Liang, Davis, and Shu, Yan

IJCNLP 2017

Bib HTML

@article{liang2017deep,
  title = {Deep automated multi-task learning},
  author = {Liang, Davis and Shu, Yan},
  journal = {IJCNLP},
  year = {2017},
  abbr = {IJCNLP},
  html = {https://arxiv.org/pdf/1709.05554.pdf},
  bibtex_show = {true},
  selected = {true}
}