Python: File System Storage Report

Recently I have been updating my scripting skills and learning Python. I have been using some of the great resources out there such as Codeacademy and the book “Dive into Python” but after getting about half way through I found a need to build some apps. So, please forgive me if the code looks horrible or if things could be done a better way but this is literally my first Python program…

Anyways, on to the program. I needed a way to parse through terabytes of data and locate files that I could archive to different storage. The scan had to be, non-resource intensive and it had to create a data set that I could easily report on later on.

So I decided to create a program that would crawl the file system and dump the relevant data into a sqlite database that I could then query.

Parser

The storage report utility allows you to first parse the file system and create a database:

./storage-report.py --db filesystem.db /mnt

The –db string will create a new database (which will be a new file in your working directory/path supplied) with the name supplied, but it you do not specify –db one will be created for you called .sqlite-sa.db.

When the utility parses files on the file system it will insert the data into the database, so if you re-run the script with the same database it will add the new scan to the old scan. This can be useful if you want to split up your scan but you can also create a new database by specifying a new database name or you can initialize the current one (of course you could also just delete the old database file).

./storage-report.py --initdb

Once the scan is performed and database is created, you can then run several commands to output reports to the screen. Some of these reports could be useful input into another program:

Old Files

Displays a summary view of all of your files, showing you how many files / how much space is being used by the files by date ranges.

./storage-report.py --db filesystem.db --old

List Files

Output a list of files that are older than the given range of days. This can be useful if you need to move a bunch of files to an archive.

./storage-report.py --db filesystem.db --list 365

File Extension By Date Range

Output a summary of extensions by date range showing how many files of type and how much space is being used by each extension.

./storage-report.py --db filesystem.db --ext

File Extension No Date Range

./storage-report.py --extnodate --db filesystem.db

Files By User

Reports on number of files / how much space is used by each user.

./storage-report.py –db filesystem.db –user

Archive-able Directories

This report parses through the database, looking for paths that do not contain any  under the path or in subfolders that have been modified during the given time range. Example, if I want to locate folders that dont have any files under it (in subfolders as well) that have been modified in within the last 365 days. This command will output the folders.

./storage-report.py --db filesystem.db --archive 365

Help

Of course, there is also a help

./storage-report.py --help

Whats next?

Well, for now this covers almost everything I want. I am thinking about adding one more function that will do an md5sum on all files with the same name so that I can have a less resource intensive duplicate file search (instead of performing an md5sum on all files) but for now, I am going to get this up on github to see if anyone wants it.

Download

You can download or contribute to the app through github here:

Download Here

Leave a Reply

Your email address will not be published. Required fields are marked *