Ajay GuptaCluster computing goes local with Spark ConnectGone are days when Data/ML engineers need to repeatedly package their data processing logic in a Spark App and submit it to the cluster in…3 min read·Jun 29, 2023--1--1
Ajay GuptaZ Order Optimization for Generic Multi Dimensional predicatesData Skipping logic constitutes an integral part of Advanced Table formats storing huge data sets. Read this blog to understand how Z order…4 min read·Jun 23, 2023--1--1
Ajay GuptainTowards Data ScienceFive Tips to Fasten Skewed Joins in Apache SparkSkewed Joins lead to stragglers in a Spark Job bringing down the overall efficiency of the Job. Here are the five exclusive tips to address…9 min read·Jun 17, 2022----
Ajay GuptainTowards Data ScienceCoalescing Vs. Dynamic Coalescing in Apache SparkUse of Coalesce in Spark applications is set to increase with the default enablement of ‘Dynamic Coalescing’ in Spark 3.0. Now, you don’t…7 min read·Jun 6, 2022--1--1
Ajay GuptaLinearizability And/Vs Serializability in Distributed DatabasesLinearizability And Serializability together constitute the gold standard for consistency in distributed databases. Based on this gold…6 min read·May 31, 2022----
Ajay GuptaMap vs MapPartitions in Apache SparkSpark has provided two very important transformations, viz., Map and MapPartitions, for developers to accomplish certain data processing…4 min read·May 15, 2022----
Ajay GuptaTroubleshooting Stragglers in Your Spark ApplicationStragglers in your Spark Application affect the overall application performance and waste premium resources.6 min read·Nov 22, 2020----
Ajay GuptainTowards Data ScienceFour Ways to Filter a Spark Dataset Against a Collection of Data ValuesFiltering a Spark Dataset against a collection of data values is commonly encountered in many data analytics flows. This particular story…5 min read·Nov 2, 2020----
Ajay GuptainTowards Data ScienceDemystifying Joins in Apache SparkThis story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which…9 min read·Oct 22, 2020--1--1
Ajay GuptaGuide to Spark PartitioningPartitioning is one of the basic building blocks on which the Apache Spark framework has been built. Just setting the right partitioning…2 min read·Oct 4, 2020--2--2