Recently I have been updating my scripting skills and learning Python. I have been using some of the great resources out there such as Codeacademy and the book “Dive into Python” but after getting about half way through I found a need to build some apps. So, please forgive me if the code looks horrible or if things could be done a better way but this is literally my first Python program…
Anyways, on to the program. I needed a way to parse through terabytes of data and locate files that I could archive to different storage. The scan had to be, non-resource intensive and it had to create a data set that I could easily report on later on.
So I decided to create a program that would crawl the file system and dump the relevant data into a sqlite database that I could then query.
The storage report utility allows you to first parse the file system and create a database:
./storage-report.py --db filesystem.db /mnt
The –db string will create a new database (which will be a new file in your working directory/path supplied) with the name supplied, but it you do not specify –db one will be created for you called .sqlite-sa.db.
When the utility parses files on the file system it will insert the data into the database, so if you re-run the script with the same database it will add the new scan to the old scan. This can be useful if you want to split up your scan but you can also create a new database by specifying a new database name or you can initialize the current one (of course you could also just delete the old database file).
Once the scan is performed and database is created, you can then run several commands to output reports to the screen. Some of these reports could be useful input into another program:
Displays a summary view of all of your files, showing you how many files / how much space is being used by the files by date ranges.
./storage-report.py --db filesystem.db --old
Output a list of files that are older than the given range of days. This can be useful if you need to move a bunch of files to an archive.
./storage-report.py --db filesystem.db --list 365
File Extension By Date Range
Output a summary of extensions by date range showing how many files of type and how much space is being used by each extension.
./storage-report.py --db filesystem.db --ext
File Extension No Date Range
./storage-report.py --extnodate --db filesystem.db
Files By User
Reports on number of files / how much space is used by each user.
./storage-report.py –db filesystem.db –user
This report parses through the database, looking for paths that do not contain any under the path or in subfolders that have been modified during the given time range. Example, if I want to locate folders that dont have any files under it (in subfolders as well) that have been modified in within the last 365 days. This command will output the folders.
./storage-report.py --db filesystem.db --archive 365
Of course, there is also a help
Well, for now this covers almost everything I want. I am thinking about adding one more function that will do an md5sum on all files with the same name so that I can have a less resource intensive duplicate file search (instead of performing an md5sum on all files) but for now, I am going to get this up on github to see if anyone wants it.
You can download or contribute to the app through github here: