Posted on September 18, 2024 by UTSA School of Data Science

School of Data Science represented for second year at University of Chicago research program

img_6430.jpeg

Senior computer science major Perfect Sylvester was one of 15 students from across the country accepted into the second annual Data Science for Social Impact (DSSI) Summer Experience program, an 8-week paid research opportunity hosted by the University of Chicago. Last year, three UTSA students attended the DSSI program. This year, Sylvester was alone in representing the School of Data Science.

The path that led Sylvester to train large language models in Chicago this summer was a winding one indeed.

For one thing, she left her home in Nigeria in 2020 to study medicine, not computers, and in the United Kingdom, not the U.S.

“Since high school I’ve always been on a premed track,” Sylvester said. “I was going to be a doctor and actually got into med school, and then started and realized I wasn’t sure I wanted to do this.” 

In 2021, she transferred to Texas A&M University-Corpus Christi where she eventually decided to study computer science instead of medicine, although she maintains a minor in biology. While at A&M, she considered various specialties such as cybersecurity or software programming before ultimately settling on data science and transferring to UTSA in 2023.

“I like research, coming from my bio background, so data science seemed like the best bet. Even if I was going into healthcare I still would’ve done research,” Sylvester explained. “So I needed something that related to me as a person, and that’s what data science is.”

Sylvester first heard about the DSSI program from Interim Executive Director of the School of Data Science, Jianwei Niu, after Sylvester had been awarded with a scholarship from Reboot Representation. She happened to mention that she was searching for internship opportunities, and Niu suggested she apply for the program in Chicago. This was around April 14th – the application deadline was April 1st.  But Sylvester applied anyway, and to her surprise she was accepted.

“I was happy. I was elated,” she said. “But that meant I had to commit to it and stop all my other applications for internships, and that was scary because this was my first experience working, so I wasn’t necessarily sure what it was going to be like. I was really scared having to pick, but I’m glad I did pick the program at U Chicago.”

img_7261.jpeg

Once in Chicago, Sylvester was able to meet the other 14 students, a diverse group of students representing fields from data and computer science to business and economics. The students also were of varying ages and at varying stages of their lives and studies, Sylvester noted, which she says made for a particularly well-rounded cohort.

“We were really close, a very tightknit cohort,” she said. “And I got to talk to people and see everybody’s strengths and weaknesses, and just how diverse everybody was, so it was great.”

Sylvester also got to experience the city of Chicago, a place she’d never been to before.

“Chicago was wonderful,” she recalls, “and I got to experience the city – the very large, beautiful city – and also got some tours to learn the history of the city, and the culture, and the different people who lived there.”

But while Sylvester was able to socialize and sightsee, this was no relaxing vacation. As soon as they arrived, the students were thrown straight into an intensive training program, designed to ensure everyone’s skills were up to the challenging projects they’d be engaging in. This was a crash course in data science and programming concepts, compressing a 16-week university-level course into just three weeks.

“Learning brand new data science concepts and Python at the same time was very challenging,” Sylvester said, “but after those first couple weeks it was actually great. And then when we started applying it to the project, I could see why we needed to do all of this.”

Her group’s project involved finetuning large language models, artificial intelligence (AI) programs capableimg_0004.jpeg of understanding language by processing large amounts of data. The particular models her grouped on were for the Center for Good Food Purchasing, which creates metrics that provide large institutions ways to better leverage their buying power based on factors such as environmental sustainability, animal welfare, and nutrition. The task was to help train the AI to classify food products based on their different labels. While that may sound simple, the datasets involved are, as the name “large language model” implies, immense.

“Think of an Excel sheet,” Sylvester explained. “There’s 140,000 rows of food products that need to be classified, and then there’s 20 to 25 columns. Then we have to work on different classifiers; we focused on four for the whole five-week project to basically finetune the model to be able to classify each food product based on different labels.”

How many labels? Up to about seven hundred per classifier. That’s a lot of data.

While the DSSI program proved challenging, Sylvester found it to be an amazing experience. With her newfound skills and increased confidence, she says she’s looking forward to her last year at UTSA taking more advanced data science classes and encourages other UTSA students to participate in the program next year.

img_0020.jpeg

“I would advise anybody to apply for it, especially if you like data science and you want to have more hands-on experience, or research experience, or just introduction to it in general,” she said. “It’s definitely a great, wonderful experience and you’ll love it.”

Sylvester says she’s especially grateful to Niu and Reboot Representation for the scholarship, without which, she says, she would not have had the same educational opportunities. Her experiences as a scholarship winner and DSSI participant have left her with a final piece of wisdom for her university peers:

“Always speak up, always network, and always, always, always, get close to the faculty members – they will be your lifesavers. Who’s going to write your referrals and your references? They are, and you need somebody who genuinely knows you and knows your work ethic and how much work you put in and what you’re like.”

For more information on The University of Chicago Data Science for Social Impact Summer Experience, visit https://datascience.uchicago.edu/outreach/data-science-for-social-impact-network/summer-experience/.

— UTSA School of Data Science