Concatenate Text Files using Python

I am recently processing the natural language(Chinese specially). In the very begining, the 5-year Weibo corpus database is quite dirty. After using Python to perform data cleaning, it needs to be filed in txt format for each year because storing files with large amounts of words and sentences will be time-consuming. And I do not want it to stop in the middle of process.

This is why I need to concatenate text files after finishing all the processes. Google tells me there is a question about this topic on Stack Overflow.

And here goes code.

1
2
3
4
5
6
7
from shutil import copyfileobj as cpobj
with open('output_file.txt','wb') as ofd:
for f in ['seg1.txt','seg2.txt','seg3.txt']:
with open(f,'rb') as ifd:
cpobj(ifd, ofd, 1024*1024*10)
# 10MB per writing chunk
# to avoid reading big file into memory at once.

Unfortuantely, the best answer provided by Meow does not score highest(only 38).

Notes

  1. shutil - High-level file operations

  2. UNIX Command is our good friend.

    1
    cat file1 file2 file3 > bigfile
  3. Use file.read(65536) to read 64k of data at a time, rather than iterating through the files with for.

打赏作者一个苹果