Been working as a backend developer for 4 years when I joined Meritshots Data Engineering program. I assumed it would be just another extension of coding, writing scripts, transforming data, and move on. But the reality hit me in the very first project.
We were asked to design a data pipeline for log data ingestion. My instinct was to write a simple Python script that loaded everything into a database. It worked fine on a sample file. But during the review, the mentor asked me: What happens if 50GB of logs arrive every hour? That question exposed my blind spot. We then discussed distributed processing, fault tolerance, and why frameworks like Spark exist.
Another turning point was when we studied orchestration. In software development, I was used to cron jobs, but Airflow introduced me to DAGs, retries, backfills, and monitoring. Suddenly I understood that data engineering isnt just about moving data its about making sure it arrives on time, reliably, and in a scalable way.
The academic leap for me was moving from can I write the code? to can this system survive at scale? Meritshot gave me the structure to make that transition.