Open in app

Sign in

Write

Sign in

Ajay Gupta
Ajay Gupta

766 followers

Home

About

Cluster computing goes local with Spark Connect

Gone are days when Data/ML engineers need to repeatedly package their data processing logic in a Spark App and submit it to the cluster in…

Jun 29, 2023
1
Cluster computing goes local with Spark Connect
Cluster computing goes local with Spark Connect
Jun 29, 2023
1

Z Order Optimization for Generic Multi Dimensional predicates

Data Skipping logic constitutes an integral part of Advanced Table formats storing huge data sets. Read this blog to understand how Z order…

Jun 23, 2023
1
Z Order Optimization for Generic Multi Dimensional predicates
Z Order Optimization for Generic Multi Dimensional predicates
Jun 23, 2023
1
TDS Archive

Published in

TDS Archive

Five Tips to Fasten Skewed Joins in Apache Spark

Skewed Joins lead to stragglers in a Spark Job bringing down the overall efficiency of the Job. Here are the five exclusive tips to address…

Jun 17, 2022
Five Tips to Fasten Skewed Joins in Apache Spark
Five Tips to Fasten Skewed Joins in Apache Spark
Jun 17, 2022
TDS Archive

Published in

TDS Archive

Coalescing Vs. Dynamic Coalescing in Apache Spark

Use of Coalesce in Spark applications is set to increase with the default enablement of ‘Dynamic Coalescing’ in Spark 3.0. Now, you don’t…

Jun 6, 2022
1
Coalescing Vs. Dynamic Coalescing in Apache Spark
Coalescing Vs. Dynamic Coalescing in Apache Spark
Jun 6, 2022
1

Linearizability And/Vs Serializability in Distributed Databases

Linearizability And Serializability together constitute the gold standard for consistency in distributed databases. Based on this gold…

May 31, 2022
Linearizability And/Vs Serializability in Distributed Databases
Linearizability And/Vs Serializability in Distributed Databases
May 31, 2022

Map vs MapPartitions in Apache Spark

Spark has provided two very important transformations, viz., Map and MapPartitions, for developers to accomplish certain data processing…

May 15, 2022
Map vs MapPartitions in Apache Spark
Map vs MapPartitions in Apache Spark
May 15, 2022

Troubleshooting Stragglers in Your Spark Application

Stragglers in your Spark Application affect the overall application performance and waste premium resources.

Nov 22, 2020
Troubleshooting Stragglers in Your Spark Application
Troubleshooting Stragglers in Your Spark Application
Nov 22, 2020
TDS Archive

Published in

TDS Archive

Four Ways to Filter a Spark Dataset Against a Collection of Data Values

Filtering a Spark Dataset against a collection of data values is commonly encountered in many data analytics flows. This particular story…

Nov 2, 2020
Four Ways to Filter a Spark Dataset Against a Collection of Data Values
Four Ways to Filter a Spark Dataset Against a Collection of Data Values
Nov 2, 2020
TDS Archive

Published in

TDS Archive

Demystifying Joins in Apache Spark

This story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which…

Oct 22, 2020
1
Demystifying Joins in Apache Spark
Demystifying Joins in Apache Spark
Oct 22, 2020
1

Guide to Spark Partitioning

Partitioning is one of the basic building blocks on which the Apache Spark framework has been built. Just setting the right partitioning…

Oct 4, 2020
2
Guide to Spark Partitioning
Guide to Spark Partitioning
Oct 4, 2020
2
Ajay Gupta

Ajay Gupta

766 followers

Leading Data Engineering Initiatives @ Jio, Apache Spark Specialist, Author, LinkedIn: https://www.linkedin.com/in/ajaywlan/

Following
  • DailySpend

    DailySpend

  • The Airbnb Tech Blog

    The Airbnb Tech Blog

  • Valentina Alto

    Valentina Alto

  • Andrea Ialenti

    Andrea Ialenti

  • Medium Staff

    Medium Staff

See all (5)

Help

Status

About

Careers

Press

Blog

Privacy

Rules

Terms

Text to speech