Opened 15 years ago
Closed 13 years ago
#296 closed defect (fixed)
reduce memory usage
Reported by: | Peter Johansson | Owned by: | Peter Johansson |
---|---|---|---|
Priority: | major | Milestone: | svndigest 0.9 |
Component: | core | Version: | 0.6.4 |
Keywords: | Cc: |
Description (last modified by )
related to #358
I get the following error message
Parsing /home/peter/projects/osd/subversion/subversion/svn/export-cmd.c terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc Aborted
svn info
returns
URL: http://svn.collab.net/repos/svn/trunk Repository Root: http://svn.collab.net/repos/svn Repository UUID: 65390229-12b7-0310-b90b-f21a5aa7ec8e Revision: 28724 Node Kind: directory Schedule: normal Last Changed Author: glasser Last Changed Rev: 28721 Last Changed Date: 2008-01-02 14:02:31 -0500 (Wed, 02 Jan 2008)
Change History (8)
comment:1 Changed 15 years ago by
comment:2 Changed 15 years ago by
I start to think this is an out of memory problem (in a more global sense).
StatsCollection
holds three different StatsType each holding statistics for each author (plus all) for each LineType. There are 6 different LineType internally namely the fundamental code, comment, empty, copyright plus the comment_or_copy and total. Each of these combinations holds a vector of length rev. So in total we gold 3 (StatsType) X 6 (LineType) X # authors X # revisions. In case of subversion there are 143 authors and they have ~34k revisions, which makes a total number of 3x6x143x34k= 87 million elements. Say that an unsigned int is 4 bytes, it makes the total memory consumption 350Mb. And this is only for one file. I count to more than 1500 files in the project, which would make the total memory consumption 525Gb.
This is an overestimation, though, because when an author has never touched a file there are no stats for him.
Nevertheless, the consclusion is that svndigest doesn't scale. I have a couple of suggestions:
- Process files sequentially. Currently all files are first parsed and then output for all files are printed. Instead parse and print one file, and there is no need for storing the stats for more than one file, except that the stats are propagated up to the mother Node.
- Store data in sparse container. Currently, data are stored in vectors such that vec[rev] gives the stats for revision rev. For most revisions the stats is the same as what it was for previous revision, because the file was not modified in that revision. Therefore one could perhaps save some memory by holding the stats in a
map<rev, count>
instead. But there is typically more overhead in a map than in a vector, so how much is the gain? Also, for high level Nodes such as the Root Directory, the stats are typically not sparse and there is no gain.
comment:4 Changed 13 years ago by
Milestone: | svndigest 0.x+ → svndigest 0.9 |
---|---|
Summary: | svndigest crashes on subversion → reduce memory usage |
comment:5 Changed 13 years ago by
Description: | modified (diff) |
---|
comment:6 Changed 13 years ago by
Description: | modified (diff) |
---|
comment:7 Changed 13 years ago by
Owner: | changed from Jari Häkkinen to Peter Johansson |
---|---|
Status: | new → assigned |
With recent changes I will test on subversion again.
comment:8 Changed 13 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
svndigest --verbose --no-report
now works on a subversion trunk wc.
Perhaps the problem is that we try to create a container (vector) larger than max size. Perhaps we should check what max size is.