Suhel Mehta

Hi, I’m Suhel Mehta, a Technology Lead specializing in Data Engineer with 4+ years of experience designing, building, and optimizing scalable data pipelines, ETL workflows, and real-time streaming systems. Strong expertise in Apache Spark (PySpark), Apache Flink, Apache Airflow, and SQL, with hands-on experience in distributed systems, data quality, data governance, and cloud-based data platforms (Azure). Proven ability to build data infrastructure, develop REST-based microservices, and enable data-driven decision-making in production environments.

Technical Skillset

Languages: Python, Java, SQL
Big Data: Apache Spark (PySpark), Apache Flink, Kafka (familiar)
Data Engineering: Data Pipelines (Batch & Streaming), ETL, Data Warehousing, Data Lakes, Data Modeling, Data Schemas
Workflow Orchestration: Apache Airflow
Cloud & Platforms: Azure (Data Factory, EventHub), Databricks, GCP (familiar)
Databases: PostgreSQL
APIs & Backend: PostgREST, REST APIs, Microservices
DevOps & Tools: Kubernetes, Docker, Git, Linux
Monitoring & Reliability: Grafana, Logging, Alerting, SLA Management
Other: Data Quality, Data Validation, Data Governance, Security & Compliance, Distributed Systems

Professional Experience

Technology Lead | Infosys (Oct 2025 – Present)

Architected and built scalable real-time data pipelines using Apache Flink (Java), processing 300+ events/sec with sub-second latency
Designed systems for data ingestion, transformation, and processing of high-volume streaming datasets (EventHub/Kafka-like systems)
Developed data infrastructure and data schemas to compute 40+ real-time metrics from 15+ data sources for production analytics
Developed RESTful APIs using PostgREST on PostgreSQL, enabling backend data access and microservices integration
Optimized pipeline performance, scalability, and resource utilization, reducing infrastructure cost by 10%
Implemented monitoring, alerting, and reliability practices using Grafana to ensure high availability and minimal downtime
Ensured data quality, validation, and integrity across streaming pipelines

Technology Analyst & System Engineer | Infosys (Jul 2021 – Sep 2025)

Designed and implemented ETL pipelines for warehouse operations, improving inventory throughput across 10+ distribution centers
Built a metadata-driven data pipeline framework for ingestion, transformation, and validation
Ensured data quality, governance, and compliance through validation checks and monitoring
Collaborated with stakeholders to translate business requirements into technical solutions
Developed and maintained scalable ETL workflows on Databricks using PySpark for large-scale batch processing
Supported data scientists and analysts with clean, reliable datasets for decision-making
Improved pipeline efficiency and scalability through performance tuning and optimization

Key Projects & Open Source

AI-Powered Coding Assistant: Developed an IDE extension for agentic code generation and autonomous Jira ticket resolution using specialized system prompts and tool-calling.
Semantic Search RAG: Built a Retrieval-Augmented Generation (RAG) system using the Llama2 model and Haystack for a solution portal.
Real Time Streams: Contributed to a cloud-agnostic, open-source streaming framework certified by the Hortonworks platform, building a real-time pipeline with ML and IoT layers.

Achievements & Certifications

Certifications

Databricks Certified Generative AI Engineer Associate
Databricks Certified Data Engineer Associate
Academy Accreditation - Databricks Lakehouse Fundamentals
Google Cloud Certified Associate Cloud Engineer

Awards

Awarded 8 Infosys Insta Awards and 8 Gracias Awards for outstanding technical contributions and stakeholder collaboration.
Recognized as an Elite Performer (Top 3% globally).