A Small Utility App For Splitting Large Text Files
### Utility App For Splitting Large Files A coworker recently asked for a data dump of a table. I put a 350MB .csv file on a shared directory, and for some reason Excel 2007 choked on the file. Hmm, that’s weird. So I wrote a utility to split the large file into smaller files.
The concept is that sometimes you might have a large file like a web server log, a large csv from some app, or dumped from SSMS. If you actually need to open and view the files (i.e. in Excel), then you might want to have a set of smaller files.
It’s a simple console app that will take or ask you for 2 args:
- The file you want to split and create multiples from.
- The number of new files to create.
Can you run this app from Windows Explorer or via the command line and optionally send arguments to the app.
How Does It Work?
The app basically does this:
- gets, cleans, and verifies the args from the user.
- loads the contents of the original file.
- opens the source file and finds the number of lines.
- calculates the number of lines to write per split file
- writes each chunk of the source to the new files
### Line-Based I took the approach of reading files by their lines, rather than their bytes. I took this approach because the files are assumed to be structured as rows/lines, and it’s easier to understand from the user’s point of view. The user likely can answer better the question “how many files would you like?”, rather than “how many MB each file would you like?”.
The app attempts to read all lines from the source file into memory. It calculates the number of lines per file using
linesPerFile = RoundUp(sourceLines / numFiles)
For relatively large files, in my tests during development, the app will internally encounter an
OutOfMemoryException when it attempts to read all lines of a 900 MB+ file. In that case, the app will catch and retry by lazy loading the lines while processing.
The files are created by looping from
x lines from the source file, and write to a new file.