Quantifying representation bias in the inputs and outputs of LLM on multimodal medical data (tabular and textual)


This project aims to study representation bias and healthcare disparity in medical data and the LLM (small scale) built over it. The main focus of this project will be hands-on data analysis, developing a small-scale LLM, and understanding its behaviors. Students will be given access to real datasets from the medical domain and a pre-trained small-scale LLM as a starting point. Three undergraduate students will be preferred and the project will also function with two students under a limited budget

  • Faculty: Ke Yang
  • Department: Computer Science 
  • Open Positions: 3
  • Mode: Hybrid
  • Hours per week: 10
Requirements and Responsibilities: 

Needed Skills: Preferred experience with data analysis, logical thinking, and interpreting experimental results. There are no prerequisite courses or hard requirements for prior experience.

Student Responsibilities: Students will be responsible for understanding and potentially developing a better LLM under the faculty’s guidance. They will be the main developers while a PhD student will also serve as their peer leaders.