Title of the Talk
Building Scalable AI Pipelines on Databricks : Bridging Data Engineering and Production-Ready Machine Learning
Abstract
As organizations increasingly adopt AI-driven decision-making, the ability to build scalable, reliable, and production-ready data pipelines has become critical. Databricks, with its unified Lakehouse architecture, is transforming how data engineering and machine learning workflows converge into a single, collaborative platform. This session explores how modern data engineering practices on Databricks enable the seamless development, deployment, and monitoring of AI pipelines at scale. It will highlight key components such as Delta Lake for reliable data storage, Apache Spark for distributed processing, and MLflow for lifecycle management of machine learning models. Attendees will gain insights into designing end-to-end pipelines that efficiently handle data ingestion, transformation, feature engineering, and model deployment, while ensuring data quality and governance. The session will also discuss real-world challenges such as pipeline scalability, performance optimization, and integration of AI workloads into existing data ecosystems. By bridging the gap between data engineering and AI, this talk demonstrates how organizations can accelerate innovation, reduce operational complexity, and unlock the full potential of their data platforms using Databricks.
Brief Profile
Thirumal Raju Pambala is a Senior Data Engineer with over 16 years of experience designing and delivering large-scale data and AI solutions. He specializes in data engineering, MLOps, and cloud platforms including AWS, Azure, and GCP, with deep expertise in Databricks and Apache Spark. Throughout his career, Thirumal has helped organizations build production-ready machine learning pipelines that bridge the gap between data infrastructure and business intelligence. He is passionate about unifying data engineering and machine learning workflows to drive innovation and operational efficiency at scale.
