8e9ed4ab8e3f96f31d257f795b0ff31239d6a48b
[blog.git] / _posts / 2007-11-24-removals_stuff.html
1 ---
2 layout: post
3 title: Removals stuff
4 date: '2007-11-24 23:33:00 +0000'
5 mt_id: 119
6 blog_id: 1
7 post_id: 119
8 basename: removals_stuff
9 categories:
10 - ftpmaster
11 ---
12
13 <p>
14 Seems like my
15 <a href="http://blog.ganneff.de/blog/2007/11/22#ftpteam">latest
16 script</a> was something people wanted to have, lots of people do seem to
17 like it. At least I guess that from the reactions I got.
18 Today I've done a few modifications to it, making the information
19 displayed more accurate.
20 </p>
21
22 <p>
23 It was initially written to run on a host which doesn't have a projectb
24 (the postgresql database that has all knowledge about the Debian
25 archive, iow. <b>the</b> source of archive-related information), and as
26 such the script had to get its data from elsewhere. I had another little
27 ruby script using the GzipReader going over Sources.gz and all
28 Packages.gz, building up a datastructure consisting of some very nested
29 Hashes and Array. Which takes (on ries, a pretty big machine) about 23
30 minutes to run, using 100% of one CPU. Which I <b>did not like</b>,
31 especially as I would have to run this 80 line monster two times a day.
32 </p>
33
34 <p>
35 But hey, we have projectb there. So I decided to rewrite my html
36 generation script using postgres directly instead of loading 844605
37 lines of (YAML-format) data dump (dump of experimental, stable, testing
38 and unstable, with the data needed for the removals.html).
39 </p>
40
41 <p>
42 The old version, using the data dumps, did use some 10 seconds
43 at full CPU load to generate the html page, the new version using
44 postgres now uses 2 seconds with some 33% CPU load. Nice
45 improvement. (Both times subtracting the time the SOAP interface for
46 bugs.debian.org takes to reply to my query, which varies between 3 and
47 15 seconds when I ask it for all open bugs against ftp.debian.org).
48 </p>
49
50 <p>
51 Added benefit: The data about the packages shown is "live" data, not one
52 thats outdated for up to 12 hours. :)
53 </p>