This project aims to study representation bias and healthcare disparity in medical data and the LLM (small scale) built over it. The main focus of this project will be hands-on data analysis, developing a small-scale LLM, and understanding its behaviors. Students will be given access to real datasets from the medical domain and a pre-trained small-scale LLM as a starting point. Three undergraduate students will be preferred and the project will also function with two students under a limited budget
Needed Skills: Preferred experience with data analysis, logical thinking, and interpreting experimental results. There are no prerequisite courses or hard requirements for prior experience.
Student Responsibilities: Students will be responsible for understanding and potentially developing a better LLM under the faculty’s guidance. They will be the main developers while a PhD student will also serve as their peer leaders.