Databricks cleanup to solve QUOTA_EXCEEDED error

© Mariusz Rafało

This document specifies commands needed to clean up Databricks Community Edition workspace. Databricks Community Edition features and limitations are available here.

There are two main reasons that cause error: QUOTA_EXCEEDED: You have exceeded the maximum number of allowed files on Databricks Community Edition. To ensure free access, you are limited to 10000 files and 10 GB of storage in DBFS. Please use dbutils.fs to list and clean up files to restore service.

This document focuses on both reasons and presents most common solutions.

Data volume used

Large files are mainly:

In order to verify the size of folders and locate large files, use the following command:

%sh du --human-readable --max-depth=1 --exclude='/dbfs' /

Sample output with folder sizes:

.5M /etc 1.9G /databricks 4.0K /media 120K /tmp 4.0K /lib64 4.0K /srv 233M /root 2.6G /usr 51M /opt 4.0K /boot du: cannot read directory '/proc/tty/driver': Permission denied du: cannot read directory '/proc/1231/task/1231/fd': Permission denied du: cannot read directory '/proc/1231/task/1231/fdinfo': Permission denied du: cannot read directory '/proc/1231/task/1231/ns': Permission denied

Number of files

Files are generated mainly by plot functions and libraries activity (log files). For example: running Kafka consumer for several minutes will cause driver to produce hundreds of log files.

Clear plot data:

%sh rm -rf /dbfs/FileStore/plots/*.png

Clear tmp data:

%sh rm -rf /dbfs/tmp/* rm -rf /dbfs/local_disk0/tmp/*

Further information

Additional information can be found on Databricks forum.