Thanks Aravind.

If you refer to section 3.1.2 in the quoted reference, it is clearly mentioned that spilling does happen. Although, they have mentioned it in context of reduce phase, and not on the map phase.

Earlier Map phase was based on the Hash Shuffle writer, and therefore spilling was not required because for each of the reduce task, you have a different file.

However, with sort shuffle writer, there is only one consolidated shuffle data file sorted by reduce partitions. Therefore spilling is required in case the sort shuffle buffer fills up in between.

Hope this helps.

--

--

--

Leading Data Engineering Initiatives @ Jio, Apache Spark Specialist, Author, LinkedIn: https://www.linkedin.com/in/ajaywlan/

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ajay Gupta

Ajay Gupta

Leading Data Engineering Initiatives @ Jio, Apache Spark Specialist, Author, LinkedIn: https://www.linkedin.com/in/ajaywlan/

More from Medium

Foldable and Traverse with Scala and Cats 😼

Your UI design looks good. But are you sure that is what the users need?

Collections and Generics.

Do Soccer Competitions Have Any Similarities