Advanced SQL Analytics & Optimization on Spotify Streaming Dataset

Advanced SQL Analytics & Optimization on Spotify Streaming Dataset

Date

Date

July 2025

July 2025

I built this project to push my SQL skills beyond the basics - not just writing correct queries, but writing efficient and scalable ones. Using a dataset of over 20,000 tracks, I explored a series of progressively complex data questions ranging from simple aggregations to advanced window functions and CTE-driven transformations.

Rather than treating this as a “one-query-per-result” exercise, I structured the project the way real analysts work:

  • I organized tasks by difficulty - Easy, Medium, and Hard with each stored in dedicated .sql files

  • I approached each question as a use case, often asking: “How would a business team interpret this?”

To dig deeper, I focused on query performance optimization. I compared:

  • A raw SQL query with calculations inside AVG()

  • Versus a refactored version that used a TEMP TABLE and pre-filtered logic

💡 Result? I reduced execution time by 55%! Optimization isn't just about indexing, it's about how you structure your logic.

I also wrapped the project with:

  • Query benchmark screenshots using EXPLAIN ANALYZE

  • Visual EDA with Pandas (coming soon!)

  • A Streamlit dashboard (coming soon!) for dynamic filtering and visual insight

If you're curious about practical SQL use cases, query tuning, or performance benchmarking — this is a project worth checking out.

Skills Demonstrated

  • Advanced SQL (Window Functions, CTEs, TEMP Tables)

  • Query Optimization & Benchmarking

  • EDA with Pandas & Jupyter

  • Structured Problem-Solving

  • Data Communication & Documentation

More projects

Got questions?

I’m always excited to collaborate on innovative and exciting projects!

Got questions?

I’m always excited to collaborate on innovative and exciting projects!