Opened 14 years ago
Closed 12 years ago
#385 closed task (fixed)
faster copyright update
Reported by: | Peter Johansson | Owned by: | Peter Johansson |
---|---|---|---|
Priority: | major | Milestone: | svndigest 0.10 |
Component: | svncopyright | Version: | trunk |
Keywords: | Cc: |
Description
This is just an idea.
Requires ticket:307
As now copyright update uses the AddStats to retrieve needed information. When there is cache available this is fine, but when not we need to parse from the repository which is expensive. It involves a blame call for every revision in the log. For the sake of copyright, this is not really needed. For example, if we have the following log
r2 | jari | 2005-12-20 15:54:03 -0500 (Tue, 20 Dec 2005) | 1 line r5 | peter | 2005-12-29 15:52:24 -0500 (Thu, 29 Dec 2005) | 1 line r6 | peter | 2005-12-29 16:00:00 -0500 (Thu, 29 Dec 2005) | 1 line r8 | jari | 2005-12-30 02:09:36 -0500 (Fri, 30 Dec 2005) | 1 line r12 | peter | 2005-12-30 08:33:47 -0500 (Fri, 30 Dec 2005) | 1 line r21 | jari | 2005-12-30 18:20:00 -0500 (Fri, 30 Dec 2005) | 1 line r81 | jari | 2006-03-11 18:47:40 -0500 (Sat, 11 Mar 2006) | 1 line r84 | jari | 2006-03-13 17:04:34 -0500 (Mon, 13 Mar 2006) | 1 line r96 | jari | 2006-03-28 18:00:51 -0500 (Tue, 28 Mar 2006) | 1 line r138 | jari | 2006-08-03 17:40:15 -0400 (Thu, 03 Aug 2006) | 1 line r146 | jari | 2006-08-10 04:14:58 -0400 (Thu, 10 Aug 2006) | 1 line r149 | jari | 2006-08-12 05:11:46 -0400 (Sat, 12 Aug 2006) | 1 line r153 | jari | 2006-08-13 18:15:55 -0400 (Sun, 13 Aug 2006) | 1 line r156 | jari | 2006-08-14 14:49:36 -0400 (Mon, 14 Aug 2006) | 1 line r157 | jari | 2006-08-14 14:53:21 -0400 (Mon, 14 Aug 2006) | 1 line r160 | jari | 2006-08-19 17:19:06 -0400 (Sat, 19 Aug 2006) | 2 lines r161 | jari | 2006-08-19 17:26:53 -0400 (Sat, 19 Aug 2006) | 1 line r162 | jari | 2006-08-19 19:01:59 -0400 (Sat, 19 Aug 2006) | 1 line r164 | jari | 2006-08-23 18:51:01 -0400 (Wed, 23 Aug 2006) | 1 line r167 | jari | 2006-08-25 18:38:52 -0400 (Fri, 25 Aug 2006) | 1 line r168 | jari | 2006-08-25 18:40:10 -0400 (Fri, 25 Aug 2006) | 1 line r171 | jari | 2006-08-26 17:47:43 -0400 (Sat, 26 Aug 2006) | 3 lines r172 | jari | 2006-08-26 18:15:31 -0400 (Sat, 26 Aug 2006) | 1 line r192 | jari | 2006-09-07 07:27:50 -0400 (Thu, 07 Sep 2006) | 1 line r195 | jari | 2006-09-07 09:44:35 -0400 (Thu, 07 Sep 2006) | 1 line r219 | jari | 2006-12-24 16:39:35 -0500 (Sun, 24 Dec 2006) | 1 line r220 | jari | 2006-12-24 17:53:05 -0500 (Sun, 24 Dec 2006) | 1 line r279 | peter | 2007-05-06 05:39:47 -0400 (Sun, 06 May 2007) | 1 line r358 | peter | 2007-06-06 18:42:50 -0400 (Wed, 06 Jun 2007) | 1 line r368 | peter | 2007-06-12 15:05:14 -0400 (Tue, 12 Jun 2007) | 1 line r383 | jari | 2007-06-24 18:04:58 -0400 (Sun, 24 Jun 2007) | 1 line
we can start analyzing r2
where we find that jari added something in that changeset and add him to copyright for year 2005. Then we analyze r5
and conclude same thing about peter. Next rev in log is r6 with peter, but peter is already added to year 2005 so no need to analyze that rev. Same things for revs 8, 12, and 21. Next rev of interest is r81 where we find that we should add jari to copyright for year 2006. Then we can skip all remaining revs for 2006 because they all belong to jari and he is already added. Next rev to analyze is then r279... et cetera.
Obviously, this would be much faster than parsing every revision. Also we could skip parsing the BlameStats (only that is a factor 2).
Some problems, or rather design decision we need to make. First, I put ticket:307 as a requirement. Otherwise we have to take into account the case when updating copyright as well as output a report.
Second, is how to deal with the cache. We obviously wanna allow cache also for this kind of statistics, but since it other kind of statistics we need to keep the cache separate from the other stats cache.
- If there is copyright cache available, use it (that is obvious).
- No copyright update available but there is cache for AddStats, should we then translate that into Copyright Info?
- Then comes the question about partial cache. Say that we have partial cache for both AddStats and copyright. Which one should we then utilize?
Perhaps the best way is to keep Stats and Copyright separate, i.e., no crosstalk between the cache. Then question in 1) and 2) would be No No. You lose some speed doing so, but the benefits are numerous. First, the applications are more separate which is probably clearer for end-user. Having two sources for the copyright info (through AddStats) and from the algorithm described above, would sooner or later become a maintenance nightmare. When modifying something we would need to ensure that the modification takes place in both places. Smells bad design. The second thing is that by not mixing the two caches, it would be possible to configure the stats in differently for the report and for the copyright. For example, how you wanna treat follow-copy. Perhaps you don't wanna follow-copy in a report, but for the copyright perhaps you do wanna follow-copy. This is a bit hypothetical as the follow-copy config is not available yet, but still... I guess I'm floating into a discussion that belongs in ticket:307 so...
Change History (6)
comment:1 Changed 13 years ago by
comment:2 Changed 13 years ago by
(In [1227]) refs #476 and #385. Move parsing to CopyrightVisitor?. Only parse Files because Directories have no copyrigt to update.
comment:3 Changed 12 years ago by
Milestone: | svndigest 0.x+ → svndigest 0.10 |
---|---|
Owner: | changed from Jari Häkkinen to Peter Johansson |
Status: | new → assigned |
comment:4 Changed 12 years ago by
comment:6 Changed 12 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [1226]) refs #476 and #385. Remove --copyright switch in svndigest program.