Thanks Aravind.

If you refer to section 3.1.2 in the quoted reference, it is clearly mentioned that spilling does happen. Although, they have mentioned it in context of reduce phase, and not on the map phase.

Earlier Map phase was based on the Hash Shuffle writer, and therefore spilling was not required because for each of the reduce task, you have a different file.

However, with sort shuffle writer, there is only one consolidated shuffle data file sorted by reduce partitions. Therefore spilling is required in case the sort shuffle buffer fills up in between.

Hope this helps.

Written by

Big Data Architect, Apache Spark Specialist,

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store