Opened 15 years ago

Closed 13 years ago

#296 closed defect (fixed)

reduce memory usage

Reported by: Peter Johansson Owned by: Peter Johansson
Priority: major Milestone: svndigest 0.9
Component: core Version: 0.6.4
Keywords: Cc:

Description (last modified by Jari Häkkinen)

Daughter tickets: #475 #476

related to #358

I get the following error message

Parsing /home/peter/projects/osd/subversion/subversion/svn/export-cmd.c
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc

svn info returns

Repository Root:
Repository UUID: 65390229-12b7-0310-b90b-f21a5aa7ec8e
Revision: 28724
Node Kind: directory
Schedule: normal
Last Changed Author: glasser
Last Changed Rev: 28721
Last Changed Date: 2008-01-02 14:02:31 -0500 (Wed, 02 Jan 2008)

Change History (8)

comment:1 Changed 15 years ago by Peter Johansson

Perhaps the problem is that we try to create a container (vector) larger than max size. Perhaps we should check what max size is.

comment:2 Changed 15 years ago by Peter Johansson

I start to think this is an out of memory problem (in a more global sense).

StatsCollection holds three different StatsType each holding statistics for each author (plus all) for each LineType. There are 6 different LineType internally namely the fundamental code, comment, empty, copyright plus the comment_or_copy and total. Each of these combinations holds a vector of length rev. So in total we gold 3 (StatsType) X 6 (LineType) X # authors X # revisions. In case of subversion there are 143 authors and they have ~34k revisions, which makes a total number of 3x6x143x34k= 87 million elements. Say that an unsigned int is 4 bytes, it makes the total memory consumption 350Mb. And this is only for one file. I count to more than 1500 files in the project, which would make the total memory consumption 525Gb.

This is an overestimation, though, because when an author has never touched a file there are no stats for him.

Nevertheless, the consclusion is that svndigest doesn't scale. I have a couple of suggestions:

  1. Process files sequentially. Currently all files are first parsed and then output for all files are printed. Instead parse and print one file, and there is no need for storing the stats for more than one file, except that the stats are propagated up to the mother Node.
  2. Store data in sparse container. Currently, data are stored in vectors such that vec[rev] gives the stats for revision rev. For most revisions the stats is the same as what it was for previous revision, because the file was not modified in that revision. Therefore one could perhaps save some memory by holding the stats in a map<rev, count> instead. But there is typically more overhead in a map than in a vector, so how much is the gain? Also, for high level Nodes such as the Root Directory, the stats are typically not sparse and there is no gain.

comment:3 Changed 15 years ago by Peter Johansson

Description: modified (diff)

added relationship

comment:4 Changed 13 years ago by Peter Johansson

Milestone: svndigest 0.x+svndigest 0.9
Summary: svndigest crashes on subversionreduce memory usage

comment:5 Changed 13 years ago by Peter Johansson

Description: modified (diff)

comment:6 Changed 13 years ago by Jari Häkkinen

Description: modified (diff)

comment:7 Changed 13 years ago by Peter Johansson

Owner: changed from Jari Häkkinen to Peter Johansson
Status: newassigned

With recent changes I will test on subversion again.

comment:8 Changed 13 years ago by Peter Johansson

Resolution: fixed
Status: assignedclosed
svndigest --verbose --no-report

now works on a subversion trunk wc.

Note: See TracTickets for help on using tickets.