Training Large Language Models at the Speed of Light


As large language models (LLMs)—the technology behind chatbots, translation tools, and other AI applications—become increasingly complex, the amount of computation required to train them has skyrocketed. Traditional computer processors often struggle to handle this demand efficiently. This project aims to explore silicon photonics, a groundbreaking technology that uses light, not electricity, to process and transmit data. Light is much faster and more energy-efficient than electrical signals, making silicon photonics a potential game-changer for training large AI models.

Objectives:

  1. Build a Training Framework Using Silicon Photonics: Develop a system that uses photonic (light-based) processors to train LLMs, focusing on making training both faster and more energy-efficient.
  2. Optimize Training for Speed and Scalability: Implement techniques that allow the model training to run smoothly across multiple photonic processors, ensuring the system can handle large datasets and model sizes.
  3. Test Against Current Systems: Compare the performance of the silicon photonics setup to traditional electronic hardware by measuring speed, energy use, and overall effectiveness.
    Key Areas of Focus for the Team:
    1. Understanding Photonics for Data Processing: Since this project introduces a unique type of hardware, the first step will be learning how photonics can process information differently than electronic systems and adapting algorithms to suit these new components.
    2.  Efficient Parallel Processing Techniques: In order to handle the scale of LLM training, we’ll explore different methods to split tasks across multiple processors, allowing for faster and more balanced data processing.
    3. Testing and Benchmarking Results: Using specific benchmarks, we’ll test how well our photonic system performs with various LLMs and compare it to conventional setups, looking at speed, power use, and accuracy.
  • Faculty: Dharanidhar Dang
  • Department: Electrical & Computer Engineering
  • Open Positions: 3
  • Mode: Hybrid
  • Hours per week: 10-12
Requirements and Responsibilities: 

Needed Skills: For this project, the ideal undergraduate fellows should possess a combination of foundational technical skills, familiarity with AI principles, and a strong desire to learn and tackle complex problems. The following technical and research skills are recommended:

  • Programming and Software Development
    Languages: Proficiency in Python is essential, especially for AI and data science work. Familiarity with C++ or other low-level languages is a bonus for hardware integration.
  • Libraries and Tools: Experience with libraries such as PyTorch or TensorFlow for neural network development, as well as NumPy and Pandas for data manipulation, will be highly beneficial.
  • Basic Knowledge of Machine Learning and Deep Learning
    Core Concepts: Understanding of machine learning fundamentals, especially deep learning concepts like neural networks, backpropagation, and model training.
    Model Types: Familiarity with large language models and Transformers will be helpful, as this project specifically focuses on distributed training for these architectures.
  • Experience with Data Handling and Preprocessing
    Data Manipulation: Ability to preprocess large datasets (e.g., text or image data) and create efficient data pipelines.
    Data Management: Familiarity with data storage formats and handling large datasets effectively for high-performance computing.
  • Parallel and Distributed Computing Basics
    Parallel Processing: A basic understanding of data and model parallelism concepts, which are essential for distributed model training.
    Distributed Systems: Some experience or coursework related to distributed systems or cloud computing platforms (e.g., AWS or Google Cloud) is beneficial.
  • Hardware Awareness
    Interest in Emerging Hardware: Familiarity with hardware-accelerated computing (e.g., GPU processing) is helpful, as well as a willingness to learn about photonics-based processing and how it differs from traditional electronics.
    Basic Knowledge of Computer Architecture: Understanding fundamental concepts in computer hardware (e.g., memory hierarchy, processing units) will support their understanding of system integration.
  • Analytical and Problem-Solving Skills
    Data Analysis: Ability to perform data analysis using tools like Matplotlib, Seaborn, or even Excel, to interpret benchmarking results and identify trends.
    Debugging and Troubleshooting: Comfort with debugging code and solving software or hardware integration issues, essential for testing and optimizing the system.
  • Research and Documentation Skills
    Literature Review: Basic skills in searching academic databases and reading technical papers, as well as summarizing findings and relevant methods.
    Documentation and Technical Writing: Ability to document code, processes, and results in a clear, organized manner, making it easier for the team and future researchers to understand and replicate findings.
  • Willingness to Learn and Adapt

Student Responsibilities:

  1. Data Preparation and Preprocessing
    Responsibilities: Source and preprocess large datasets (such as text and image datasets) tailored for training language models. This includes tasks such as data cleaning, organizing data into manageable formats, and preparing data pipelines compatible with photonics-based systems.
  2. Algorithm Development and Optimization
    Responsibilities: Assist in adapting and optimizing existing algorithms to work efficiently with photonic hardware. This includes working on data parallelism and model parallelism strategies, as well as exploring algorithms that maximize the bandwidth and low-latency features of photonic circuits.
  3. System Integration and Testing
    Responsibilities: Help integrate the silicon photonic processors with the AI training framework. Fellows will test the system to identify potential bottlenecks or errors and ensure smooth data flow across distributed nodes.
  4. Performance Benchmarking and Analysis
    Responsibilities: Conduct performance benchmarking to measure and compare training speed, energy efficiency, and accuracy against traditional electronic hardware. This includes setting up benchmark experiments, collecting data, and analyzing results.
  5. Documentation and Code Maintenance
    Responsibilities: Document procedures, results, and code, ensuring reproducibility and clarity for all project stakeholders. Fellows will also contribute to maintaining and updating the codebase as the project evolves.
  6. Dissemination of Findings
    Responsibilities: Collaborate on writing reports, preparing research papers, and creating presentation materials. Fellows may also have the opportunity to present findings at conferences or internal seminars.
  7. Collaborative Research and Learning
    Responsibilities: Participate in regular team meetings, share insights, and engage in discussions on project progress and findings. Fellows will work closely with the supervising professor and gain exposure to collaborative research dynamics.