Ajay Gupta – Medium

Ajay Gupta

Cluster computing goes local with Spark Connect

Gone are days when Data/ML engineers need to repeatedly package their data processing logic in a Spark App and submit it to the cluster in…

3 min readJun 29, 2023

--

1

Cluster computing goes local with Spark Connect

--

1

Ajay Gupta

Z Order Optimization for Generic Multi Dimensional predicates

Data Skipping logic constitutes an integral part of Advanced Table formats storing huge data sets. Read this blog to understand how Z order…

4 min readJun 23, 2023

--

1

Z Order Optimization for Generic Multi Dimensional predicates

--

1

Ajay Gupta
in
Towards Data Science

Five Tips to Fasten Skewed Joins in Apache Spark

Skewed Joins lead to stragglers in a Spark Job bringing down the overall efficiency of the Job. Here are the five exclusive tips to address…

9 min readJun 17, 2022

--

Five Tips to Fasten Skewed Joins in Apache Spark

--

Ajay Gupta
in
Towards Data Science

Coalescing Vs. Dynamic Coalescing in Apache Spark

Use of Coalesce in Spark applications is set to increase with the default enablement of ‘Dynamic Coalescing’ in Spark 3.0. Now, you don’t…

7 min readJun 6, 2022

--

1

Coalescing Vs. Dynamic Coalescing in Apache Spark

--

1

Ajay Gupta

Linearizability And/Vs Serializability in Distributed Databases

Linearizability And Serializability together constitute the gold standard for consistency in distributed databases. Based on this gold…

6 min readMay 31, 2022

--

Linearizability And/Vs Serializability in Distributed Databases

--

Ajay Gupta

Map vs MapPartitions in Apache Spark

Spark has provided two very important transformations, viz., Map and MapPartitions, for developers to accomplish certain data processing…

4 min readMay 15, 2022

--

Map vs MapPartitions in Apache Spark

--

Ajay Gupta

Troubleshooting Stragglers in Your Spark Application

Stragglers in your Spark Application affect the overall application performance and waste premium resources.

6 min readNov 22, 2020

--

Troubleshooting Stragglers in Your Spark Application

--

Ajay Gupta
in
Towards Data Science

Four Ways to Filter a Spark Dataset Against a Collection of Data Values

Filtering a Spark Dataset against a collection of data values is commonly encountered in many data analytics flows. This particular story…

5 min readNov 2, 2020

--

Four Ways to Filter a Spark Dataset Against a Collection of Data Values

--

Ajay Gupta
in
Towards Data Science

Demystifying Joins in Apache Spark

This story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which…

9 min readOct 22, 2020

--

1

Demystifying Joins in Apache Spark

--

1

Ajay Gupta

Guide to Spark Partitioning

Partitioning is one of the basic building blocks on which the Apache Spark framework has been built. Just setting the right partitioning…

2 min readOct 4, 2020

--

2

Guide to Spark Partitioning

--

2

Ajay Gupta

Ajay Gupta

Leading Data Engineering Initiatives @ Jio, Apache Spark Specialist, Author, LinkedIn: https://www.linkedin.com/in/ajaywlan/

Following

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams