Python detects its own memory footprint in real time

Recently doing text statistics,Implemented in Python,Encountered an interesting problem-how to save statistical results。

Writing directly into memory is really impossible,Memory exhausted after ten hours,The program was forced to close。If you write directly to the database,Each write is too slow,It's been more than ten hours,If you go on like this, you will have to count on the week,Not a solution。

At last,I thought of a solution that combines both-using memory as a buffer,After reaching a certain amount, all the current data will be merged into the hard disk at one time。

But then there is a threshold,How to determine the timing of synchronizing hard drives,Can usually be processed according to file granularity,For example, processing a corpus file is synchronized once ... but my corpus is large and small,The big one has 10GB,The memory exploded in no time,Later I want to use the amount of statistical data to judge ... but this is a bit difficult to estimate,Small, write frequently,The meaning of caching is not significant,It ’s too big, do n’t wait until the number of entries is reached,The memory is full。Also consider that in the future the program will run on devices with different configurations,It is a bit too unfriendly for other developers to calculate this threshold based on their own situation,So I thought of a way-it is better to let Python detect its own memory usage,If it is almost full (or the threshold is reached),Write to the hard disk once。

For other developers,It is easy to see how big the memory of your own device is,Set a reasonable threshold based on system operating conditions,Quite convenient。

Use Python to monitor your memory usage,need to use psutil This library to interact with the system,The basic logic is to get your own first pid ,Then according to this pid To get the process information from the system。

For example, my system is 32GB RAM,Then I set a 20GB is quite safe,Statistical corpus in Python,There is so much data that the process takes up 20GB of memory,Just write the current data to the hard disk,Synchronize statistics,Then clear the dictionary cache in the program to free up memory。


Original article written by Gerber drop-off:R0uter's Blog » Python detects its own memory footprint in real time

Reproduced Please keep the source and description link:

About the Author


The non-declaration,I have written articles are original,Reproduced, please indicate the link on this page and my name。

Leave a Reply

Your email address will not be published. Required fields are marked *