Ajay GuptaCluster computing goes local with Spark ConnectGone are days when Data/ML engineers need to repeatedly package their data processing logic in a Spark App and submit it to the cluster in…Jun 29, 20231Jun 29, 20231
Ajay GuptaZ Order Optimization for Generic Multi Dimensional predicatesData Skipping logic constitutes an integral part of Advanced Table formats storing huge data sets. Read this blog to understand how Z order…Jun 23, 20231Jun 23, 20231
Ajay GuptainTowards Data ScienceFive Tips to Fasten Skewed Joins in Apache SparkSkewed Joins lead to stragglers in a Spark Job bringing down the overall efficiency of the Job. Here are the five exclusive tips to address…Jun 17, 2022Jun 17, 2022
Ajay GuptainTowards Data ScienceCoalescing Vs. Dynamic Coalescing in Apache SparkUse of Coalesce in Spark applications is set to increase with the default enablement of ‘Dynamic Coalescing’ in Spark 3.0. Now, you don’t…Jun 6, 20221Jun 6, 20221
Ajay GuptaLinearizability And/Vs Serializability in Distributed DatabasesLinearizability And Serializability together constitute the gold standard for consistency in distributed databases. Based on this gold…May 31, 2022May 31, 2022
Ajay GuptaMap vs MapPartitions in Apache SparkSpark has provided two very important transformations, viz., Map and MapPartitions, for developers to accomplish certain data processing…May 15, 2022May 15, 2022
Ajay GuptaTroubleshooting Stragglers in Your Spark ApplicationStragglers in your Spark Application affect the overall application performance and waste premium resources.Nov 22, 2020Nov 22, 2020
Ajay GuptainTowards Data ScienceFour Ways to Filter a Spark Dataset Against a Collection of Data ValuesFiltering a Spark Dataset against a collection of data values is commonly encountered in many data analytics flows. This particular story…Nov 2, 2020Nov 2, 2020
Ajay GuptainTowards Data ScienceDemystifying Joins in Apache SparkThis story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which…Oct 22, 20201Oct 22, 20201
Ajay GuptaGuide to Spark PartitioningPartitioning is one of the basic building blocks on which the Apache Spark framework has been built. Just setting the right partitioning…Oct 4, 20202Oct 4, 20202