June 18, 2025

Day 17- Refining the Foundation: Laying the Groundwork for Model Training

By Ayomide Jeje

What I Learned

Today at my internship, I made meaningful progress on the ECG diagnostic AI project we’ve been developing. The focus of the day was on refining the dataset and setting a solid foundation for model training. I began by reviewing the cleaned datasets that were previously split into training, validation, and testing sets. My main goal during this phase was to ensure that the subclass-to-superclass mappings we had defined were correctly applied across all dataset partitions. This required a detailed check of the diagnostic label column, where I verified that each original subclass label was properly mapped to one of the broader diagnostic categories we had agreed upon—such as NORM, MI, STTC, HYP, or CD. Once I confirmed the integrity of the mappings, I updated the diagnostic labels in the CSV files accordingly. Although this step might seem minor, it’s actually one of the most critical parts of preparing data for deep learning models. Clean, well-labeled data ensures that the neural network will not be confused by unnecessary complexity and can focus on learning the core patterns associated with each diagnostic category. This standardization also improves the model’s ability to generalize to unseen ECG data and helps reduce noise during training.

Blockers

No Blockers

Reflection

Reflecting on today’s work at my internship, I feel a strong sense of steady and meaningful progress. Although the tasks I focused on weren’t the most glamorous parts of AI development, they were foundational and crucial. A large portion of my day was spent meticulously cleaning up the dataset labels and verifying that the subclass-to-superclass mappings were correctly applied across the training, validation, and testing datasets. This process required close attention to detail, as even a single mislabel could negatively affect the performance and reliability of the AI model we’re building. As I worked through the CSV files, I began to appreciate how much of AI success actually depends on the quality of the data. It’s easy to get caught up in the excitement of model architecture and training algorithms, but today reminded me that no matter how sophisticated the model is, it can’t compensate for poorly prepared or inconsistent data. Ensuring that every ECG diagnostic label was mapped to its correct superclass category—such as MI, NORM, STTC, HYP, or CD—wasn’t just a clerical task. It was a step toward making sure our neural network will be learning from well-structured and meaningful input.