Hi, Iām Suhel Mehta, a Technology Lead specializing in Data Engineer with 4+ years of experience designing, building, and optimizing scalable data pipelines, ETL workflows, and real-time streaming systems. Strong expertise in Apache Spark (PySpark), Apache Flink, Apache Airflow, and SQL, with hands-on experience in distributed systems, data quality, data governance, and cloud-based data platforms (Azure). Proven ability to build data infrastructure, develop REST-based microservices, and enable data-driven decision-making in production environments.
Technical Skillset
Languages: Python, Java, SQL
Big Data: Apache Spark (PySpark), Apache Flink, Kafka (familiar)
Data Engineering: Data Pipelines (Batch & Streaming), ETL, Data Warehousing, Data Lakes, Data Modeling, Data Schemas
Workflow Orchestration: Apache Airflow
Cloud & Platforms: Azure (Data Factory, EventHub), Databricks, GCP (familiar)
Databases: PostgreSQL
APIs & Backend: PostgREST, REST APIs, Microservices
DevOps & Tools: Kubernetes, Docker, Git, Linux
Monitoring & Reliability: Grafana, Logging, Alerting, SLA Management
Other: Data Quality, Data Validation, Data Governance, Security & Compliance, Distributed Systems
Professional Experience
Technology Lead | Infosys (Oct 2025 ā Present)
- Architected and built scalable real-time data pipelines using Apache Flink (Java), processing 300+ events/sec with sub-second latency
- Designed systems for data ingestion, transformation, and processing of high-volume streaming datasets (EventHub/Kafka-like systems)
- Developed data infrastructure and data schemas to compute 40+ real-time metrics from 15+ data sources for production analytics
- Developed RESTful APIs using PostgREST on PostgreSQL, enabling backend data access and microservices integration
- Optimized pipeline performance, scalability, and resource utilization, reducing infrastructure cost by 10%
- Implemented monitoring, alerting, and reliability practices using Grafana to ensure high availability and minimal downtime
- Ensured data quality, validation, and integrity across streaming pipelines
Technology Analyst & System Engineer | Infosys (Jul 2021 ā Sep 2025)
- Designed and implemented ETL pipelines for warehouse operations, improving inventory throughput across 10+ distribution centers
- Built a metadata-driven data pipeline framework for ingestion, transformation, and validation
- Ensured data quality, governance, and compliance through validation checks and monitoring
- Collaborated with stakeholders to translate business requirements into technical solutions
- Developed and maintained scalable ETL workflows on Databricks using PySpark for large-scale batch processing
- Supported data scientists and analysts with clean, reliable datasets for decision-making
- Improved pipeline efficiency and scalability through performance tuning and optimization
Key Projects & Open Source
- AI-Powered Coding Assistant: Developed an IDE extension for agentic code generation and autonomous Jira ticket resolution using specialized system prompts and tool-calling.
- Semantic Search RAG: Built a Retrieval-Augmented Generation (RAG) system using the Llama2 model and Haystack for a solution portal.
- Real Time Streams: Contributed to a cloud-agnostic, open-source streaming framework certified by the Hortonworks platform, building a real-time pipeline with ML and IoT layers.
Achievements & Certifications
Certifications
- Databricks Certified Generative AI Engineer Associate
- Databricks Certified Data Engineer Associate
- Academy Accreditation - Databricks Lakehouse Fundamentals
- Google Cloud Certified Associate Cloud Engineer
Awards
- Awarded 8 Infosys Insta Awards and 8 Gracias Awards for outstanding technical contributions and stakeholder collaboration.
- Recognized as an Elite Performer (Top 3% globally).
Blogs
Contact information:
Email: mehtasuhel@gmail.com
LinkedIn: https://www.linkedin.com/in/suhel-mehta/
GitHub: SuhelMehta9