Ajay Gupta – Medium

Ajay Gupta

Cluster computing goes local with Spark Connect

Gone are days when Data/ML engineers need to repeatedly package their data processing logic in a Spark App and submit it to the cluster in…

Jun 29, 2023

Cluster computing goes local with Spark Connect

Jun 29, 2023

Z Order Optimization for Generic Multi Dimensional predicates

Data Skipping logic constitutes an integral part of Advanced Table formats storing huge data sets. Read this blog to understand how Z order…

Jun 23, 2023

Z Order Optimization for Generic Multi Dimensional predicates

Jun 23, 2023

Published in
TDS Archive

Five Tips to Fasten Skewed Joins in Apache Spark

Skewed Joins lead to stragglers in a Spark Job bringing down the overall efficiency of the Job. Here are the five exclusive tips to address…

Jun 17, 2022

Five Tips to Fasten Skewed Joins in Apache Spark

Jun 17, 2022

Published in
TDS Archive

Coalescing Vs. Dynamic Coalescing in Apache Spark

Use of Coalesce in Spark applications is set to increase with the default enablement of ‘Dynamic Coalescing’ in Spark 3.0. Now, you don’t…

Jun 6, 2022

Coalescing Vs. Dynamic Coalescing in Apache Spark

Jun 6, 2022

Linearizability And/Vs Serializability in Distributed Databases

Linearizability And Serializability together constitute the gold standard for consistency in distributed databases. Based on this gold…

May 31, 2022

Linearizability And/Vs Serializability in Distributed Databases

May 31, 2022

Map vs MapPartitions in Apache Spark

Spark has provided two very important transformations, viz., Map and MapPartitions, for developers to accomplish certain data processing…

May 15, 2022

Map vs MapPartitions in Apache Spark

May 15, 2022

Troubleshooting Stragglers in Your Spark Application

Stragglers in your Spark Application affect the overall application performance and waste premium resources.

Nov 22, 2020

Troubleshooting Stragglers in Your Spark Application

Nov 22, 2020

Published in
TDS Archive

Four Ways to Filter a Spark Dataset Against a Collection of Data Values

Filtering a Spark Dataset against a collection of data values is commonly encountered in many data analytics flows. This particular story…

Nov 2, 2020

Four Ways to Filter a Spark Dataset Against a Collection of Data Values

Nov 2, 2020

Published in
TDS Archive

Demystifying Joins in Apache Spark

This story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which…

Oct 22, 2020

Demystifying Joins in Apache Spark

Oct 22, 2020

Guide to Spark Partitioning

Partitioning is one of the basic building blocks on which the Apache Spark framework has been built. Just setting the right partitioning…

Oct 4, 2020

Guide to Spark Partitioning

Oct 4, 2020

Ajay Gupta

Ajay Gupta

Leading Data Engineering Initiatives @ Jio, Apache Spark Specialist, Author, LinkedIn: https://www.linkedin.com/in/ajaywlan/

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech