July 8, 2025
Day 31 - Model Optimization and Strategic Alignment – A Day of Progress and Guidance
What I Learned
Today’s internship activities were focused and technically challenging, with meaningful progress made on the multimodal deep learning pipeline for ECG classification.
The day started with a continued effort to train the 1D CNN + Transformer model on frequency-domain ECG data derived from the PTB-XL dataset. After some previous instability and suboptimal results, today’s goal was to refine the architecture and ensure better feature learning from the FFT-transformed signals.
I also ran multiple experiments comparing model accuracy with and without integrated metadata, while monitoring training logs and validation metrics. Adjustments were made to input shapes and model layers to resolve compatibility issues between the frequency-domain tensors and the Transformer block.
A key part of today involved meeting with my professor to review model performance and future plans. During this meeting, we discussed preparations for the 2D CNN pipeline for spectrogram-based learning, which will serve as the visual modality of the hybrid model. I’ll also begin experimenting with class weights or oversampling to counter class imbalance.
In all, today was about deep model refinement and strategic alignment — laying the groundwork for a more robust, high-performing ECG diagnostic model.
Blockers
No blockers.
Reflection
Today felt like a turning point in my internship journey. After spending the last few days wrestling with the performance of the 1D CNN + Transformer model, I finally took a step back to reflect on what wasn’t working and began moving toward a clearer, more structured plan.
Much of the day was spent testing the frequency-domain branch of the model. Despite earlier excitement about integrating FFT-transformed ECG signals, the results weren’t meeting expectations. Accuracy was still lagging behind the simpler 1D CNN trained on time-domain features. It was frustrating — not because the model failed, but because I knew it had potential. I realized that without deeper tuning or better integration of metadata, I was only scratching the surface.
The real clarity came during a meeting with my professor. We sat down to review the entire pipeline: what had worked, what hadn’t, and where we needed to go next. I walked him through the current model architecture, data pipeline, and results. He listened closely and asked questions that forced me to think more critically about the structure of my inputs, especially the role of metadata and how to fuse modalities effectively.
We talked about class imbalance — something I had noticed in passing but hadn’t fully addressed. His suggestion to experiment with class weighting or oversampling gave me a new line of thinking. We also agreed to aim for a stronger benchmark: 85–90% accuracy across all classes, using the full dataset and all available features (time, frequency, and metadata).