Navigation

  • Home
  • About Me
  • About My Mentors
  • About My Project
  • My Blog

June 12, 2025

Day 13- Loading the Waveform

By Ayomide Jeje

What I Learned

Today at my internship, I made significant progress in understanding and working with ECG waveform data. The primary focus of the day was centered around Loading the Waveform from the PTB-XL dataset sampled at 500 Hz. This involved writing and reviewing code that systematically loops through the .dat and .hea files to extract and save raw ECG signals, specifically from Lead I. I also ensured that a new directory structure was being created to store these processed waveform files for later use in filtering and modeling. Throughout this task, I deepened my understanding of why we prioritize the 500 Hz dataset over the 100 Hz version. The higher frequency allows for greater signal resolution, which is crucial for preserving important clinical features during denoising and model training. Additionally, I clarified the purpose of filtering out frequencies below 50 Hz and above 450 Hz—this range helps eliminate baseline wander and high-frequency noise, making the data cleaner and more reliable for deep learning models.An important part of the day involved asking critical questions to verify my understanding of the processes so far. I confirmed that we are not relying on CSV files for waveform extraction but instead accessing the raw data directly from the MIT-BIH format. I also clarified what we were saving at this stage, which includes the raw time-series data needed for the next preprocessing steps.

Blockers

No Blockers

Reflection

Today’s internship experience pushed me to engage more deeply with raw biomedical data than I ever have before. We focused on Step 2.2 of our project—loading ECG waveforms from the PTB-XL dataset sampled at 500 Hz. At first, it seemed like a straightforward task: loop through the files, extract Lead I, and save the signals. But as I dug into it, I realized how much foundational understanding I needed to build to really appreciate what we were doing.One of the key takeaways for me was learning why the 500 Hz version of the data is preferred over the 100 Hz version. It wasn’t just about using “higher quality” data—it’s about resolution and preserving subtle cardiac signals that could easily be lost at lower sampling rates. This helped me realize that even small decisions in preprocessing can make a big difference in model accuracy later on.

Tags: Loading the waveform