[phenixbb] iotbx.merging_statistics on Python 3: too memory consuming

Daniel Paley dwpaley at lbl.gov
Mon Nov 29 10:13:33 PST 2021

Ok, the problem occurs in the cif parser that gets tried
in reflection_file_reader.try_all_readers(file_name). In incomprehensible
antlr3 code, every word in the file gets processed into a token, so any
large file with many words per line will temporarily use gigabytes of
memory when constructing a iotbx.cif.reader, specifically on line

iotbx/cif/__init__.py:68: self.parser = ext.fast_reader(builder,
input_string, file_path, strict)

The most limited fix would be that try_all_readers will only try the cif
reader if the file extension is cif (mod capitalization). Is that an ok

Martin, for a temporary fix, you can apply this diff in cctbx_project:

diff --git a/iotbx/reflection_file_reader.py
index 3c1274fb27..3548eb8415 100644
--- a/iotbx/reflection_file_reader.py
+++ b/iotbx/reflection_file_reader.py
@@ -126,6 +126,7 @@ def try_all_readers(file_name):
   except Exception: pass
   else: return ("shelx_hklf", content)
+    assert os.path.splitext(file_name)[1].lower() == '.cif'
     content = cif_reader(file_path=file_name)
     looks_like_a_reflection_file = False
     for block in content.model().values():


On Mon, Nov 29, 2021 at 11:33 AM Daniel Paley <dwpaley at lbl.gov> wrote:

> Hi Martin,
> Are you able to share the hkl file? (Privately if necessary)
> I just wrote some steps for memory analysis here:
> https://github.com/cctbx/cctbx_project/blob/master/dox/rst/debug.md so
> might be able to help.
> Dan
> On Mon, Nov 29, 2021 at 11:30 AM Martin Malý <martin.maly at ibt.cas.cz>
> wrote:
>> Dear PHENIX & CCTBX developers and users,
>> I tried to calculate merging statistics with CCTBX tools using Python 3.
>> I realized that the memory requirements are much higher comparing with
>> Python 2... I closed all programs firstly so I had just 1.6 GB RAM used
>> of 7.6 GB total RAM. Then I ran these two lines of code in the
>> cctbx.python shell:
>> import iotbx.merging_statistics
>> i_obs = iotbx.merging_statistics.select_data(file_name="XDS_ASCII.HKL",
>> data_labels=None)
>> Python 2.7: The RAM usage went to 4.8 GB and then I was able to
>> calculate merging statistics.
>> Python 3.7: The module import was successful. Then the RAM usage went to
>> total 7.6 GB and then the process was killed by the operating system
>> (CentOS 7).
>> Please, do you have any suggestion how to use this module "more
>> carefully" and save memory?
>> Thank you!
>> Best regards,
>> Martin Malý
>> -----
>> Upozornění: Není-li v této zprávě výslovně uvedeno jinak, má tato
>> e-mailová zpráva nebo její přílohy pouze informativní charakter. Tato
>> zpráva ani její přílohy v žádném ohledu Biotechnologický ústav AV ČR, v. v.
>> i. k ničemu nezavazují. Text této zprávy nebo jejích příloh není návrhem na
>> uzavření smlouvy, ani přijetím případného návrhu na uzavření smlouvy, ani
>> jiným právním jednáním směřujícím k uzavření jakékoliv smlouvy a nezakládá
>> předsmluvní odpovědnost Biotechnologického ústavu AV ČR, v. v. i.
>> Disclaimer: If not expressly stated otherwise, this e-mail message
>> (including any attached files) is intended purely for informational
>> purposes and does not represent a binding agreement on the part of
>> Institute of Biotechnology of the Czech Academy of Sciences. The text of
>> this message and its attachments cannot be considered as a proposal to
>> conclude a contract, nor the acceptance of a proposal to conclude a
>> contract, nor any other legal act leading to concluding any contract, nor
>> does it create any pre-contractual liability on the part of Institute of
>> Biotechnology of the Czech Academy of Sciences.
>> _______________________________________________
>> phenixbb mailing list
>> phenixbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/phenixbb
>> Unsubscribe: phenixbb-leave at phenix-online.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20211129/1b9da036/attachment.htm>

More information about the phenixbb mailing list