Career Profile

Experienced and AWS Certified Data Engineer with 8 years of expertise in building end-to-end big data and cloud-native solutions across banking, healthcare, and education domains. Proficient in designing real-time data pipelines, low-latency stream processing, and ML-integrated platforms using tools like Kafka, Spark, Flink, and Cloudera, Snowflake, AWS and Azure environments. Passionate about delivering scalable and secure data-driven systems that power business intelligence and operational excellence.

Experiences

Assistant Vice President

2022 - Present
OCBC Bank, Singapore
  • Developed a GenAI-powered Data Retrieval Bot with LangChain and Vector Databases (Milvus)that allows business users to retrieve data from enterprise data marts using natural language.
  • Worked extensively in Cloudera-based big data environments, managing distributed data processing workflows and ensuring seamless integration with enterprise data platforms.
  • Built a Real-Time Location-Based Offers & Geofencing system using Apache Flink and Kafka, resulting in a 50% increase in in-store redemptions.
  • Led the migration from batch to near real-time data processing using Apache Spark (Scala), reducing query latency by 40% and enabling real-time risk assessment, fraud detection, and transaction monitoring.
  • Implemented a metadata-driven ETL framework in Scala, streamlining ingestion and transformation workflows and improving operational efficiency by 20%.
  • Designed a high-throughput Spark with Scala and Kafka Streams-based data processing platform handling millions of daily financial events, ensuring 99.9% fault tolerance.
  • Developed Kafka connectors & transformers for high-volume schema-less data, improving schema evolution and system scalability.
  • Led the deployment of Cloudera Kafka & Kafka Connect, enhancing monitoring and real-time data processing.
  • Created end-to-end data pipelines from source systems to HDFS and Hive Data Warehouse, reducing data ingestion time by 50%.
  • Ingested financial campaign data from S3 via REST APIs into Cloudera Hadoop, enabling downstream analytics for segmentation and compliance reporting.
  • Developed interactive Power BI dashboards for securities transaction reports, improving visibility for compliance and internal audit teams, and reducing manual reporting effort by 60%.

Senior Software Developer

2021 - 2022
E-Zest Solutions, Pune
  • Led the development of a real-time Sepsis Patient Monitoring system using AWS Kinesis and AWS Redshift, enabling hospitals to detect early signs of sepsis by analyzing continuous patient vitals.
  • Optimized ETL pipelines using AWS Glue and AWS Redshift, reducing data processing time by 40% and enhancing retrieval efficiency.
  • Developed and secured REST APIs to integrate IoT-based medical devices using AWS Lambda and AWS DynamoDB, ensuring seamless real-time data flow for clinical decision support.
  • Developed a real-time Clickstream Study Pattern Detection & Recommendation System for a Nursing Education Platform, leveraging Kafka, and Snowflake to analyze user behavior and recommend personalized learning content.
  • Implemented machine learning models for real-time student engagement tracking using Kafka streams, improving course completion rates and optimizing personalized learning paths.
  • Implemented CI/CD pipelines using GitLab, automating deployments and improving healthcare data pipeline reliability by 30%.

Software Developer

2020 - 2021
Datametica, Pune
  • Led the migration of petabytes of data from on-premises to Azure Cloud (Blob Storage, Synapse, and Azure Data Factory), cutting infrastructure costs by 40%.
  • Developed Spark applications on Azure Databricks for complex data transformations, reducing processing time by 50%.
  • Built a data warehouse processing framework using Azure Synapse Analytics and Azure Data Factory, improving query performance by 35%
  • Implemented highly scalable and fault-tolerant systems using Azure Functions, Cosmos DB, and Azure Kubernetes Service (AKS).

Software Developer

2017 - 2020
E-Zest Solutions, Pune
  • Designed and implemented scalable data pipelines using Apache Kafka, AWS Glue, and AWS Redshift, enabling real-time and batch data ingestion for enterprise analytics.
  • Developed a cloud-based data integration system for a MedTech client using AWS S3, AWS Lambda, and AWS Redshift Spectrum, enabling ingestion and analysis of wearable health device data and streamlining regulatory reporting for patient vitals.
  • Developed Scala-based REST APIs secured with OAuth 2.0 and Amazon API Gateway, ensuring secure system integration with AWS Redshift and other AWS services.
  • Optimized SQL queries and data modeling in Amazon Redshift, improving query performance and reducing compute costs by 30%.
  • Analyzed real-time clickstream data using Apache Kafka and Snowflake to derive business insights.
  • Led cloud migration from Oracle/MySQL to Amazon S3 and Amazon Redshift, improving analytics readiness and reducing infrastructure costs.
  • Worked with serialization frameworks like Avro, ORC, Parquet, and Protocol Buffers for efficient data storage and transmission.
  • Built CI/CD pipelines using GitLab, automating deployment workflows for data pipelines and microservices.

Certifications

AWS Certified Data Engineer - Associate

2025 - 2028
Amazon Web Services

Databricks - Accredited Generative AI Fundamentals

2025 - 2027
Databricks

Technical Skills

Big Data & Real-Time
Kafka (Streams, Connect), Apache Flink, AWS Kinesis, Azure Event Hubs, Spark, Hadoop, Hive, Snowflake, Redshift
ETL & Pipelines
Talend, AWS Glue, Apache Airflow, Azure Data Factory
AWS Services
S3, Glue, Kinesis, Lambda, Athena, RDS, Redshift, CloudWatch, IAM, EMR, Step Functions
Databases
MySQL, PostgreSQL, MongoDB, AWS RDS, DynamoDB, Azure SQL Database, Cosmos DB
Cloud & Infrastructure
AWS, Azure, Cloudera, Snowflake
ML & Analytics
LLMs, CML, LangChain, Scikit-learn, TensorFlow, Snowflake ML, SageMaker
Programming & APIs
Scala, Python, Java, REST APIs (Spring Boot, FastAPI)
Monitoring & Visualization
Elasticsearch, Kibana, Power BI, QuickSight
Data Modeling
Star schema, Snowflake schema, Data Mart Design