Re: mailing list archiver chewing patches

Lists: pgsql-hackerspgsql-www
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: mailing list archiver chewing patches
Date: 2010-01-08 23:51:42
Message-ID: 4B47C50E.2010600@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www


Tim Bunce's recent patch has been mangled apparently by the list
archives. He sent it as an attachment, and that's how I have it in my
mailbox, so why isn't it appearing as such in the web archive so that it
can be nicely downloaded? See
<http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
It's happened to other people as well:
<http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>

Reviewers and others shouldn't have to c&p patches from web pages,
especially when it will be horribly line wrapped etc. Can we stop this
happening somehow?

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-09 04:56:06
Message-ID: 20100109045606.GG3635@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Andrew Dunstan wrote:
>
> Tim Bunce's recent patch has been mangled apparently by the list
> archives. He sent it as an attachment, and that's how I have it in
> my mailbox, so why isn't it appearing as such in the web archive so
> that it can be nicely downloaded? See <http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
> It's happened to other people as well:
> <http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>
>
> Reviewers and others shouldn't have to c&p patches from web pages,
> especially when it will be horribly line wrapped etc. Can we stop
> this happening somehow?

Try this

http://archives.postgresql.org/msgtxt(dot)php?id=20100108124613(dot)GL2505(at)timac(dot)local

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-09 05:17:27
Message-ID: 20100109051727.GH3635@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Alvaro Herrera wrote:
> Andrew Dunstan wrote:
> >
> > Tim Bunce's recent patch has been mangled apparently by the list
> > archives. He sent it as an attachment, and that's how I have it in
> > my mailbox, so why isn't it appearing as such in the web archive so
> > that it can be nicely downloaded? See <http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
> > It's happened to other people as well:
> > <http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>
> >
> > Reviewers and others shouldn't have to c&p patches from web pages,
> > especially when it will be horribly line wrapped etc. Can we stop
> > this happening somehow?
>
> Try this
>
> http://archives.postgresql.org/msgtxt(dot)php?id=20100108124613(dot)GL2505(at)timac(dot)local

This was previously broken for a lot of emails, but I just fixed some of
it, and it seems to work for the vast majority of our emails (and
certainly for all emails that matter).

The other point related to this is that each email should have a link
pointing to its text/plain version. This used to be present, but it got
broken (I think) at the same time that the anti-email-harvesting measure
got broken. I'm going to look at that next.

Let me know if you find something broken with this style of link.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-09 22:27:48
Message-ID: 20100109222748.GD2481@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sat, Jan 09, 2010 at 02:17:27AM -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > Andrew Dunstan wrote:
> > >
> > > Tim Bunce's recent patch has been mangled apparently by the list
> > > archives. He sent it as an attachment, and that's how I have it in
> > > my mailbox, so why isn't it appearing as such in the web archive so
> > > that it can be nicely downloaded? See <http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
> > > It's happened to other people as well:
> > > <http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>
> > >
> > > Reviewers and others shouldn't have to c&p patches from web pages,
> > > especially when it will be horribly line wrapped etc. Can we stop
> > > this happening somehow?
> >
> > Try this
> >
> > http://archives.postgresql.org/msgtxt(dot)php?id=20100108124613(dot)GL2505(at)timac(dot)local

That looks like it dumps the raw message. That'll cause problems for any
messages using quoted-printable encoding. I'd hazard a guess it also
won't do thing right thing for non-charset=us-ascii emails/attachments.

> This was previously broken for a lot of emails, but I just fixed some of
> it, and it seems to work for the vast majority of our emails (and
> certainly for all emails that matter).
>
> The other point related to this is that each email should have a link
> pointing to its text/plain version. This used to be present, but it got
> broken (I think) at the same time that the anti-email-harvesting measure
> got broken. I'm going to look at that next.
>
> Let me know if you find something broken with this style of link.

What's needed is a) a download link for each attachment, regardless of the
kind of attachment, and b) the download link should download the content
of the attachment in a way that's directly usable.

For example, see http://archives.postgresql.org/pgsql-hackers/2010-01/msg00589.php
Looking at the raw version of the original message
http://archives.postgresql.org/msgtxt(dot)php?id=757953(dot)70187(dot)qm(at)web29001(dot)mail(dot)ird(dot)yahoo(dot)com
That message has a patch as an attachment:
Content-Type: application/octet-stream; name="patch_bit.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="patch_bit.patch"

It gets a link in the archive (because it's a non-text content-type I presume):
http://archives.postgresql.org/pgsql-hackers/2010-01/bin5ThVOJC3jI.bin
but the link doesn't work well. The url ends with .bin and the http
response content-type is Content-Type: application/octet-stream so
downloaders get a .bin file instead of the original .patch file.

It seems that people wanting to send in a patch have two options: send
it as text/(something) so it's readable on the archive web page but not
copy-n-paste'able because of wordwrapping, or set it as
application/octet-stream so it's downloadable but not readable on the
web page.

Let me know if I've misunderstood anything.

Some sugestions:
- Provide links for all attachments, whether text/* or not.
- For text/* types show the content inline verbatim, don't wrap the text.
- If the attachment has a Content-Disposition with a filename then
append that to the url. It could simply be a fake 'path info':
.../2010-01/bin5ThVOJC3jI.bin/patch_bit.patch
- Instead of "Description: Binary data" on the web page, give the
values of the Content-Type and Content-Disposition headers.

Tim.

p.s. For background... I'm writing an email to the dbi-users &
dbi-announce mailing lists (~2000 & ~5000 users last time I checked)
asking anyone who might be interested to help review the plperl feature
patch and encouraging them to contribute to the commitfest review
process for other patches. It's important that it's *very* easy for
these new-comers to follow simple instructions to get involved.
I was hoping to be able to use a archives.postgresql.org url to the
message with the patch to explain what's the patch does _and_ provide a
download link. It seems I'll have to upload the patch somewhere else.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-09 22:42:54
Message-ID: 4B49066E.6030405@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Tim Bunce wrote:
> It seems that people wanting to send in a patch have two options: send
> it as text/(something) so it's readable on the archive web page but not
> copy-n-paste'able because of wordwrapping, or set it as
> application/octet-stream so it's downloadable but not readable on the
> web page.
>
>
>

That is assuming that the MUA gives you the option of specifying the
attachment MIME type. Many (including mine) do not. It would mean an
extra step - I'd have to gzip each patch or something like that. That
would be unfortunate,as well as imposing extra effort, because it would
make the patch not display inline in many MUAs (again, like mine).

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-10 01:15:23
Message-ID: 20100110011523.GA9321@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Tim Bunce wrote:

> > > Try this
> > >
> > > http://archives.postgresql.org/msgtxt(dot)php?id=20100108124613(dot)GL2505(at)timac(dot)local
>
> That looks like it dumps the raw message. That'll cause problems for any
> messages using quoted-printable encoding. I'd hazard a guess it also
> won't do thing right thing for non-charset=us-ascii emails/attachments.

Yeah. Grab it and open it as an mbox.

> What's needed is a) a download link for each attachment, regardless of the
> kind of attachment, and b) the download link should download the content
> of the attachment in a way that's directly usable.

Yeah, well, that's a bit outside what I am able to do, unless you can
get a MHonArc expert somewhere who can help us figure out how to
set it up for these requirements.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-10 09:27:21
Message-ID: m2pr5i48ee.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> That is assuming that the MUA gives you the option of specifying the
> attachment MIME type. Many (including mine) do not. It would mean an extra
> step - I'd have to gzip each patch or something like that. That would be
> unfortunate,as well as imposing extra effort, because it would make the
> patch not display inline in many MUAs (again, like mine).

Bad MUA, change MUA, or what they say…

More seriously though, it's not the first time we're having some
difficulties with the MHonArc setup, and I think it's also related to
the poor thread following on the archives website at month boundaries.

MHonArc (http://hydra.nac.uci.edu/indiv/ehood/mhonarc.html) seems to be
about converting the mails into some HTML pages, and offering the web
interface to get to use them, with some indexing and searches
facilities.

Are our indexing and searches provided by MHonArc or maintained by the
community? How helpful considering alternatives, such as AOX (which runs
atop PostgreSQL and would offer anonymous IMAP facility over the
archives) would be?

Of course it'll boil down to who's maintaining the current solution and
how much time is allocated to this, the solution research and migration
would have to fit in there I suppose. Same as pgfoundry. But still,
should we talk about it?

Regards,
--
dim


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 00:18:22
Message-ID: 20100111001822.GA7436@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Dimitri Fontaine wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > That is assuming that the MUA gives you the option of specifying the
> > attachment MIME type. Many (including mine) do not. It would mean an extra
> > step - I'd have to gzip each patch or something like that. That would be
> > unfortunate,as well as imposing extra effort, because it would make the
> > patch not display inline in many MUAs (again, like mine).
>
> Bad MUA, change MUA, or what they say…
>
> More seriously though, it's not the first time we're having some
> difficulties with the MHonArc setup, and I think it's also related to
> the poor thread following on the archives website at month boundaries.

Absolutely. The month boundary problem boils down to the fact that
Mhonarc does not scale very well, so we can't have mboxes that are too
large. This is why most people split their archives per month, and then
each month is published as an independent Mhonarc output archive. It's
a horrid solution.

> Are our indexing and searches provided by MHonArc or maintained by the
> community?

Searches are completely external to mhonarc.

> How helpful considering alternatives, such as AOX (which runs
> atop PostgreSQL and would offer anonymous IMAP facility over the
> archives) would be?
>
> Of course it'll boil down to who's maintaining the current solution and
> how much time is allocated to this, the solution research and migration
> would have to fit in there I suppose. Same as pgfoundry. But still,
> should we talk about it?

There's some talk about writing our own archiving system,
database-backed. There have been a few false starts but no concrete
result so far. We need a lot more manpower invested in this problem.
If there's interest, let's talk about it.

My daugher was born yesterday and I'm having a bit of a calm before the
storm because she's not coming home until Tuesday or so (at this time of
the day, that is, because I have to take care of the other daughter).
I'll be probably away for (at least) a week when she does; and I'll
probably have somewhat of a shortage of spare time after that.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 09:46:10
Message-ID: 9837222c1001110146o36e5507eqa7096ea3687e54bc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/1/11 Alvaro Herrera <alvherre(at)commandprompt(dot)com>:
> Dimitri Fontaine wrote:
>> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> > That is assuming that the MUA gives you the option of specifying the
>> > attachment MIME type. Many (including mine) do not. It would mean an extra
>> > step - I'd have to gzip each patch or something like that. That would be
>> > unfortunate,as well as imposing extra effort, because it would make the
>> > patch not display inline in many MUAs (again, like mine).
>>
>> Bad MUA, change MUA, or what they say…
>>
>> More seriously though, it's not the first time we're having some
>> difficulties with the MHonArc setup, and I think it's also related to
>> the poor thread following on the archives website at month boundaries.
>
> Absolutely.  The month boundary problem boils down to the fact that
> Mhonarc does not scale very well, so we can't have mboxes that are too
> large.  This is why most people split their archives per month, and then
> each month is published as an independent Mhonarc output archive.  It's
> a horrid solution.

Yeah.

>> Are our indexing and searches provided by MHonArc or maintained by the
>> community?
>
> Searches are completely external to mhonarc.

It is, but it's tied into the format of the URLs and the format of the
actual messages in order to be more efficient. But it should be fairly
easy to adapt it to some other base system if we want.

>> How helpful considering alternatives, such as AOX (which runs
>> atop PostgreSQL and would offer anonymous IMAP facility over the
>> archives) would be?
>>
>> Of course it'll boil down to who's maintaining the current solution and
>> how much time is allocated to this, the solution research and migration
>> would have to fit in there I suppose. Same as pgfoundry. But still,
>> should we talk about it?
>
> There's some talk about writing our own archiving system,
> database-backed.  There have been a few false starts but no concrete
> result so far.  We need a lot more manpower invested in this problem.
> If there's interest, let's talk about it.

Yeah, definitely, let's talk about it. Anything that gives us an
efficient backend with a good API is interesting (SQL is a reasonably
good API. Not so sure about IMAP, since it is a bit too focused on
single messages IIRC). Particularly, something that can separate
frontend and backend (can still be on the same machine of course, I'm
talking conceptually) seems to be a lot more flexible, which we'd
like.

As for AOX, my understanding is that it is no longer maintained, so
I'd be worried about choosing such a solution for a complex problem.
But it's open for discussion.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 10:18:47
Message-ID: 87my0lvta0.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Absolutely. The month boundary problem boils down to the fact that
> Mhonarc does not scale very well, so we can't have mboxes that are too
> large. This is why most people split their archives per month, and then
> each month is published as an independent Mhonarc output archive. It's
> a horrid solution.
>
>> Are our indexing and searches provided by MHonArc or maintained by the
>> community?
>
> Searches are completely external to mhonarc.

Changing the MHonArc solution would probably mean adapting them, I
guess, or proposing a new solution with compatible output for the
searching to still work…

>> How helpful considering alternatives, such as AOX (which runs
>> atop PostgreSQL and would offer anonymous IMAP facility over the
>> archives) would be?
>>
>> Of course it'll boil down to who's maintaining the current solution and
>> how much time is allocated to this, the solution research and migration
>> would have to fit in there I suppose. Same as pgfoundry. But still,
>> should we talk about it?
>
> There's some talk about writing our own archiving system,
> database-backed. There have been a few false starts but no concrete
> result so far. We need a lot more manpower invested in this problem.
> If there's interest, let's talk about it.

AOX is already a database backed email solution, offering an archive
page with searching. I believe the searching is baked by tsearch
indexing. That's why I think it'd be suitable.

They already archive and offer search over one of our mailing lists, and
from there it seems like we'd only miss the user interface bits:

http://archives.aox.org/archives/pgsql-announce

I hope the UI bits are not the most time demanding one.

Is there someone with enough time to install aox somewhere and have it
subscribed to our lists?

> My daugher was born yesterday and I'm having a bit of a calm before the
> storm because she's not coming home until Tuesday or so (at this time of
> the day, that is, because I have to take care of the other daughter).
> I'll be probably away for (at least) a week when she does; and I'll
> probably have somewhat of a shortage of spare time after that.

Ahaha :)
IME that's not the shortage of spare time which ruins you the most as
the lack of energy when you do have this little precious resource
again, very few piece of it atime.

Regards,
--
dim


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 11:53:08
Message-ID: 4B4B1124.3060509@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Hi,

Il 11/01/2010 11:18, Dimitri Fontaine ha scritto:
> AOX is already a database backed email solution, offering an archive
> page with searching. I believe the searching is baked by tsearch
> indexing. That's why I think it'd be suitable.
>
> They already archive and offer search over one of our mailing lists, and
> from there it seems like we'd only miss the user interface bits:
>
> http://archives.aox.org/archives/pgsql-announce
>
> I hope the UI bits are not the most time demanding one.
>
> Is there someone with enough time to install aox somewhere and have it
> subscribed to our lists?

I recall having tried AOX a long time ago but I can't remember the
reason why I was not satisfied. I guess I can give another try by
setting up a test ML archive.

>> My daugher was born yesterday and I'm having a bit of a calm before the
>> storm because she's not coming home until Tuesday or so (at this time of
>> the day, that is, because I have to take care of the other daughter).
>> I'll be probably away for (at least) a week when she does; and I'll
>> probably have somewhat of a shortage of spare time after that.

BTW, congrats Alvaro!

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 11:58:20
Message-ID: 937d27e11001110358t3fc48bcat4c2493eb8919afc4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, Jan 11, 2010 at 5:23 PM, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Hi,
>
> Il 11/01/2010 11:18, Dimitri Fontaine ha scritto:
>>
>> AOX is already a database backed email solution, offering an archive
>> page with searching. I believe the searching is baked by tsearch
>> indexing. That's why I think it'd be suitable.
>>
>> They already archive and offer search over one of our mailing lists, and
>> from there it seems like we'd only miss the user interface bits:
>>
>>   http://archives.aox.org/archives/pgsql-announce
>>
>> I hope the UI bits are not the most time demanding one.
>>
>> Is there someone with enough time to install aox somewhere and have it
>> subscribed to our lists?
>
> I recall having tried AOX a long time ago but I can't remember the reason
> why I was not satisfied. I guess I can give another try by setting up a test
> ML archive.

I tried it too, before I started writing the new prototype archiver
from scratch. I too forget why I gave up on it, but it was a strong
enough reason for me to start coding from scratch.

BTW, we only need to replace the archiver/display code. The search
works well already.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 13:13:47
Message-ID: 4B4B240B.2020508@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 11/01/2010 12:58, Dave Page ha scritto:
> On Mon, Jan 11, 2010 at 5:23 PM, Matteo Beccati<php(at)beccati(dot)com> wrote:
>> I recall having tried AOX a long time ago but I can't remember the reason
>> why I was not satisfied. I guess I can give another try by setting up a test
>> ML archive.
>
> I tried it too, before I started writing the new prototype archiver
> from scratch. I too forget why I gave up on it, but it was a strong
> enough reason for me to start coding from scratch.
>
> BTW, we only need to replace the archiver/display code. The search
> works well already.

It took me no more than 10 minutes to set up AOX and hook it up to a
domain. An email account is now subscribed to the hackers ML.

I'll try to estimate how hard it could be to write a web app that
displays the archive from the db, even though I'm not sure that this is
a good way to proceed.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 13:23:31
Message-ID: 87fx6cwzak.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> As for AOX, my understanding is that it is no longer maintained, so
> I'd be worried about choosing such a solution for a complex problem.
> But it's open for discussion.

Ouch.
--
dim


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 13:24:58
Message-ID: 876378wz85.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Dave Page <dpage(at)pgadmin(dot)org> writes:
>> I recall having tried AOX a long time ago but I can't remember the reason
>> why I was not satisfied. I guess I can give another try by setting up a test
>> ML archive.
>
> I tried it too, before I started writing the new prototype archiver
> from scratch. I too forget why I gave up on it, but it was a strong
> enough reason for me to start coding from scratch.
>
> BTW, we only need to replace the archiver/display code. The search
> works well already.

What the current archiver looks like? A PG database containing the raw
mails and attachements? It that's the case the missing piece would be to
plug a browsing UI atop of that, right?

Regards,
--
dim


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Matteo Beccati <php(at)beccati(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 13:35:07
Message-ID: 9837222c1001110535s7cfbe470j46b54ab9077b1dc2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/1/11 Dimitri Fontaine <dfontaine(at)hi-media(dot)com>:
> Dave Page <dpage(at)pgadmin(dot)org> writes:
>>> I recall having tried AOX a long time ago but I can't remember the reason
>>> why I was not satisfied. I guess I can give another try by setting up a test
>>> ML archive.
>>
>> I tried it too, before I started writing the new prototype archiver
>> from scratch. I too forget why I gave up on it, but it was a strong
>> enough reason for me to start coding from scratch.
>>
>> BTW, we only need to replace the archiver/display code. The search
>> works well already.
>
> What the current archiver looks like? A PG database containing the raw
> mails and attachements? It that's the case the missing piece would be to
> plug a browsing UI atop of that, right?

No, the current archiver is a set of MBOX files that are processed
incrementally by mhonarc.

(yes, this is why it doesn't scale)

*search* is in a postgresql database, but it doesn't contain the
entire messages - doesn't have attachments for examples - only the
parts it has web-scraped off the the current archives.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 13:50:36
Message-ID: 87wrzovjgz.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Dimitri Fontaine <dfontaine(at)hi-media(dot)com> writes:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> As for AOX, my understanding is that it is no longer maintained, so
>> I'd be worried about choosing such a solution for a complex problem.
>> But it's open for discussion.
>
> Ouch.

It seems that the company baking the development is dead, but the
developpers are still working on the product on their spare time. New
release ahead.

They're not working on the archive UI part.
--
dim


From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 14:00:03
Message-ID: 20100111140003.GA32667@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

(Many thanks to Dimitri for bringing this thread to my attention.)

At 2010-01-11 10:46:10 +0100, magnus(at)hagander(dot)net wrote:
>
> As for AOX, my understanding is that it is no longer maintained, so
> I'd be worried about choosing such a solution for a complex problem.

I'll keep this short: Oryx, the company behind Archiveopteryx (aox), is
no longer around, but the software is still maintained. The developers
(myself included) are still interested in keeping it alive. It's been a
while since the last release, but it'll be ready soon. If you're having
any sort of problems with it, write to me, and I'll help you.

(That said, we're not working on the web interface. It did work, in its
limited fashion, but it's not feature complete; and I need to find some
paying work, so it's not a priority. That, and some health problems, are
also why I haven't been active on the pg lists for a while.)

Feel free to write to me off-list for more.

-- ams


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Matteo Beccati <php(at)beccati(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 14:20:19
Message-ID: 4B4B33A3.9000402@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Magnus Hagander wrote:
>
> No, the current archiver is a set of MBOX files that are processed
> incrementally by mhonarc.
>
> (yes, this is why it doesn't scale)
>
> *search* is in a postgresql database, but it doesn't contain the
> entire messages - doesn't have attachments for examples - only the
> parts it has web-scraped off the the current archives.
>

Fixing this mess and giving us decent archives with guaranteed
downloadable patches and good search would be a nice job for someone who
wants to contribute without having to cut or review core code.

cheers

andrew


From: Matteo Beccati <php(at)beccati(dot)com>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-11 22:23:41
Message-ID: 4B4BA4ED.8060908@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 11/01/2010 15:00, Abhijit Menon-Sen ha scritto:
> I'll keep this short: Oryx, the company behind Archiveopteryx (aox), is
> no longer around, but the software is still maintained. The developers
> (myself included) are still interested in keeping it alive. It's been a
> while since the last release, but it'll be ready soon. If you're having
> any sort of problems with it, write to me, and I'll help you.

That's good news indeed for the project, AOX seems to be working fine on
my server. I've had a few IMAP glitches, but it seems to live happily
with my qmail and stores the emails on the db, fulfilling my current needs.

So, I've decided to spend a bit more time on this and here is a proof of
concept web app that displays mailing list archives reading from the AOX
database:

http://archives.beccati.org/

Please take it as an exercise I've made trying to learn how to use
symfony this afternoon. It's not feature complete, nor probably very
scalable, but at least it features attachment download ;)

http://archives.beccati.org/pgsql-hackers/message/37

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Abhijit Menon-Sen <ams(at)toroid(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 04:54:03
Message-ID: 937d27e11001112054v479106a6p1ae81bf3f80ce208@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/1/12 Matteo Beccati <php(at)beccati(dot)com>:
> So, I've decided to spend a bit more time on this and here is a proof of concept web app that displays mailing list archives reading from the AOX database:
>
> http://archives.beccati.org/

Seems to work.

> Please take it as an exercise I've made trying to learn how to use symfony this afternoon. It's not feature complete, nor probably very scalable, but at least it features attachment download ;)
>
> http://archives.beccati.org/pgsql-hackers/message/37

:-)

So just to put this into perspective and give anyone paying attention
an idea of the pain that lies ahead should they decide to work on
this:

- We need to import the old archives (of which there are hundreds of
thousands of messages, the first few years of which have, umm, minimal
headers.
- We need to generate thread indexes
- We need to re-generate the original URLs for backwards compatibility

Now there's encouragement :-)

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 09:05:56
Message-ID: 87bpgzbsln.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Dave Page <dpage(at)pgadmin(dot)org> writes:
> 2010/1/12 Matteo Beccati <php(at)beccati(dot)com>:
>> So, I've decided to spend a bit more time on this and here is a proof of concept web app that displays mailing list archives reading from the AOX database:
>>
>> http://archives.beccati.org/
>
> Seems to work.

Hehe, nice a beginning!

> So just to put this into perspective and give anyone paying attention
> an idea of the pain that lies ahead should they decide to work on
> this:
>
> - We need to import the old archives (of which there are hundreds of
> thousands of messages, the first few years of which have, umm, minimal
> headers.

Anyone having a local copy of this in his mailboxes? At some point there
were some NNTP gateway, so maybe there's a copy this way.

> - We need to generate thread indexes

We have CTEs :)

> - We need to re-generate the original URLs for backwards compatibility

I guess the message-id one ain't the tricky one... and it should be
possible to fill a relation table like
monharc_compat(message_id, list, year, month, message_number);

Then we'd need some help from the webserver (rewrite rules I guess) so
that the current URL is transformed to call a catch-all script:
http://archives.postgresql.org/pgsql-xxx/YYYY-MM/msg01234.php
-> http://archives.postgresql.org/compat.php?l=xxx&y=YYYY&m=MM&n=01234

In that compat.php script you then issue the following query or the like
to get the message_id, then use the newer infrastructure to get to
display it:

SELECT message_id
FROM monharc_compat
WHERE list = ? and year = ? and month = ? and message_number = ?;

Regards,
--
dim


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 09:30:41
Message-ID: 9837222c1001120130n67516f0fj317b5656b435f964@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Tue, Jan 12, 2010 at 10:05, Dimitri Fontaine <dfontaine(at)hi-media(dot)com> wrote:
> Dave Page <dpage(at)pgadmin(dot)org> writes:
>> 2010/1/12 Matteo Beccati <php(at)beccati(dot)com>:
>>> So, I've decided to spend a bit more time on this and here is a proof of concept web app that displays mailing list archives reading from the AOX database:
>>>
>>> http://archives.beccati.org/
>>
>> Seems to work.
>
> Hehe, nice a beginning!

The problem is usually with strange looking emails with 15 different
MIME types. If we can figure out the proper way to render that, the
rest really is just a SMOP.

(BTW, for something to actually be used In Production (TM), we want
something that uses one of our existing frameworks. So don't go
overboard in code-wise implementations on something else - proof of
concept on something else is always ok, of course)

>> So just to put this into perspective and give anyone paying attention
>> an idea of the pain that lies ahead should they decide to work on
>> this:
>>
>> - We need to import the old archives (of which there are hundreds of
>> thousands of messages, the first few years of which have, umm, minimal
>> headers.
>
> Anyone having a local copy of this in his mailboxes? At some point there
> were some NNTP gateway, so maybe there's a copy this way.

We have MBOX files.

IIRC, aox has an import function that can read MBOX files. The
interesting thing is what happens with the really old files that don't
have complete headers.

I don't think you can trust the NNTP gateway now or in the past,
messages are sometimes lost there. The mbox files are as complete as
anything we'll ever get.

>> - We need to generate thread indexes
>
> We have CTEs :)

Right. We still need the threading information, so we have something
to use our CTEs on :-)

But I assume that AOX already does this?

>> - We need to re-generate the original URLs for backwards compatibility
>
> I guess the message-id one ain't the tricky one... and it should be
> possible to fill a relation table like
>  monharc_compat(message_id, list, year, month, message_number);

Yeah. It's not so hard, you can just screen-scrape the current
archives the same way the search server does.

> Then we'd need some help from the webserver (rewrite rules I guess) so
> that the current URL is transformed to call a catch-all script:
>   http://archives.postgresql.org/pgsql-xxx/YYYY-MM/msg01234.php
> -> http://archives.postgresql.org/compat.php?l=xxx&y=YYYY&m=MM&n=01234

Or just a trivial regexp catch in any modern app platform.

> In that compat.php script you then issue the following query or the like
> to get the message_id, then use the newer infrastructure to get to
> display it:
>
>  SELECT message_id
>    FROM monharc_compat
>   WHERE list = ? and year = ? and month = ? and message_number = ?;

I'd rather see it redirect it to the new style URL, but it's the same
query, yes :-)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 09:33:46
Message-ID: 9837222c1001120133k79f8f2desabf50d9afb276416@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, Jan 11, 2010 at 15:00, Abhijit Menon-Sen <ams(at)toroid(dot)org> wrote:
> (Many thanks to Dimitri for bringing this thread to my attention.)
>
> At 2010-01-11 10:46:10 +0100, magnus(at)hagander(dot)net wrote:
>>
>> As for AOX, my understanding is that it is no longer maintained, so
>> I'd be worried about choosing such a solution for a complex problem.
>
> I'll keep this short: Oryx, the company behind Archiveopteryx (aox), is
> no longer around, but the software is still maintained. The developers
> (myself included) are still interested in keeping it alive. It's been a
> while since the last release, but it'll be ready soon. If you're having
> any sort of problems with it, write to me, and I'll help you.

Hmm. So if this means that the system is actually something we can
rely on long-term for the parsing and importing of messages into the
database, it may be an interesting optino still, so we don't have to
write that part ourselves.

I just want to end up with a non-maintained system. I doubt we, as a
community, want to take on maintaining a message parser in C++. I'd be
much more inclined to end up having to maintain something written in
python or perl in that case, since they'd probably rely much on
external modules that a lot of others rely on --> somebody else would
help maintain large parts of it..

> (That said, we're not working on the web interface. It did work, in its
> limited fashion, but it's not feature complete; and I need to find some
> paying work, so it's not a priority. That, and some health problems, are
> also why I haven't been active on the pg lists for a while.)

As long as the db structure is easy enough to parse and generate stuff
from, this may actually be a feature, because it will make it easier
to integrate with our other website stuff. If it's very low level and
leaves too much to work, well, then it's the opposite of course :-)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 16:38:00
Message-ID: 1263314280.4362.7.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
> 2010/1/12 Matteo Beccati <php(at)beccati(dot)com>:
> > So, I've decided to spend a bit more time on this and here is a proof of concept web app that displays mailing list archives reading from the AOX database:
> >
> > http://archives.beccati.org/
>
> Seems to work.
>
> > Please take it as an exercise I've made trying to learn how to use symfony this afternoon. It's not feature complete, nor probably very scalable, but at least it features attachment download ;)
> >
> > http://archives.beccati.org/pgsql-hackers/message/37
>
> :-)
>
> So just to put this into perspective and give anyone paying attention
> an idea of the pain that lies ahead should they decide to work on
> this:
>
> - We need to import the old archives (of which there are hundreds of
> thousands of messages, the first few years of which have, umm, minimal
> headers.
> - We need to generate thread indexes
> - We need to re-generate the original URLs for backwards compatibility
>
> Now there's encouragement :-)

Or, we just leave the current infrastructure in place and use a new one
for all new messages going forward. We shouldn't limit our ability to
have a decent system due to decisions of the past.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jd(at)commandprompt(dot)com
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 16:54:51
Message-ID: 17928.1263315291@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>> So just to put this into perspective and give anyone paying attention
>> an idea of the pain that lies ahead should they decide to work on
>> this:
>>
>> - We need to import the old archives (of which there are hundreds of
>> thousands of messages, the first few years of which have, umm, minimal
>> headers.
>> - We need to generate thread indexes
>> - We need to re-generate the original URLs for backwards compatibility
>>
>> Now there's encouragement :-)

> Or, we just leave the current infrastructure in place and use a new one
> for all new messages going forward. We shouldn't limit our ability to
> have a decent system due to decisions of the past.

-1. What's the point of having archives? IMO the mailing list archives
are nearly as critical a piece of the project infrastructure as the CVS
repository. We've already established that moving to a new SCM that
fails to preserve the CVS history wouldn't be acceptable. I hardly
think that the bar is any lower for mailing list archives.

Now I think we could possibly skip the requirement suggested above for
URL compatibility, if we just leave the old archives on-line so that
those URLs all still resolve. But if we can't load all the old messages
into the new infrastructure, it'll basically be useless for searching
purposes.

(Hmm, re-reading what you said, maybe we are suggesting the same thing,
but it's not clear. Anyway my point is that Dave's first two
requirements are real. Only the third might not be.)

regards, tom lane


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: jd <jd(at)commandprompt(dot)com>, Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 17:34:45
Message-ID: 937d27e11001120934v7ddb944bj73e91c447d605ff7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Tue, Jan 12, 2010 at 10:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>>> So just to put this into perspective and give anyone paying attention
>>> an idea of the pain that lies ahead should they decide to work on
>>> this:
>>>
>>> - We need to import the old archives (of which there are hundreds of
>>> thousands of messages, the first few years of which have, umm, minimal
>>> headers.
>>> - We need to generate thread indexes
>>> - We need to re-generate the original URLs for backwards compatibility
>>>
>>> Now there's encouragement :-)
>
>> Or, we just leave the current infrastructure in place and use a new one
>> for all new messages going forward. We shouldn't limit our ability to
>> have a decent system due to decisions of the past.
>
> -1.  What's the point of having archives?  IMO the mailing list archives
> are nearly as critical a piece of the project infrastructure as the CVS
> repository.  We've already established that moving to a new SCM that
> fails to preserve the CVS history wouldn't be acceptable.  I hardly
> think that the bar is any lower for mailing list archives.
>
> Now I think we could possibly skip the requirement suggested above for
> URL compatibility, if we just leave the old archives on-line so that
> those URLs all still resolve.  But if we can't load all the old messages
> into the new infrastructure, it'll basically be useless for searching
> purposes.
>
> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
> but it's not clear.  Anyway my point is that Dave's first two
> requirements are real.  Only the third might not be.)

The third actually isn't actually that hard to do in theory. The
message numbers are basically the zero-based position in the mbox
file, and the rest of the URL is obvious.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 17:38:13
Message-ID: 1263317893.4362.95.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Tue, 2010-01-12 at 11:54 -0500, Tom Lane wrote:

> > Or, we just leave the current infrastructure in place and use a new one
> > for all new messages going forward. We shouldn't limit our ability to
> > have a decent system due to decisions of the past.
>
> -1. What's the point of having archives? IMO the mailing list archives
> are nearly as critical a piece of the project infrastructure as the CVS
> repository. We've already established that moving to a new SCM that
> fails to preserve the CVS history wouldn't be acceptable. I hardly
> think that the bar is any lower for mailing list archives.
>
> Now I think we could possibly skip the requirement suggested above for
> URL compatibility, if we just leave the old archives on-line so that
> those URLs all still resolve. But if we can't load all the old messages
> into the new infrastructure, it'll basically be useless for searching
> purposes.
>
> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
> but it's not clear. Anyway my point is that Dave's first two
> requirements are real. Only the third might not be.)

We are saying the same thing. Sorry if I wasn't clear.

Joshua D. Drake

>
> regards, tom lane
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jd <jd(at)commandprompt(dot)com>, Matteo Beccati <php(at)beccati(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 18:54:27
Message-ID: 9837222c1001121054j40bc9302obc1123f5f6c02503@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Tue, Jan 12, 2010 at 18:34, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> On Tue, Jan 12, 2010 at 10:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>>> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>>>> So just to put this into perspective and give anyone paying attention
>>>> an idea of the pain that lies ahead should they decide to work on
>>>> this:
>>>>
>>>> - We need to import the old archives (of which there are hundreds of
>>>> thousands of messages, the first few years of which have, umm, minimal
>>>> headers.
>>>> - We need to generate thread indexes
>>>> - We need to re-generate the original URLs for backwards compatibility
>>>>
>>>> Now there's encouragement :-)
>>
>>> Or, we just leave the current infrastructure in place and use a new one
>>> for all new messages going forward. We shouldn't limit our ability to
>>> have a decent system due to decisions of the past.
>>
>> -1.  What's the point of having archives?  IMO the mailing list archives
>> are nearly as critical a piece of the project infrastructure as the CVS
>> repository.  We've already established that moving to a new SCM that
>> fails to preserve the CVS history wouldn't be acceptable.  I hardly
>> think that the bar is any lower for mailing list archives.
>>
>> Now I think we could possibly skip the requirement suggested above for
>> URL compatibility, if we just leave the old archives on-line so that
>> those URLs all still resolve.  But if we can't load all the old messages
>> into the new infrastructure, it'll basically be useless for searching
>> purposes.
>>
>> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
>> but it's not clear.  Anyway my point is that Dave's first two
>> requirements are real.  Only the third might not be.)
>
> The third actually isn't actually that hard to do in theory. The
> message numbers are basically the zero-based position in the mbox
> file, and the rest of the URL is obvious.

The third part is trivial. The search system already does 95% of it.
I've already implemented exactly that kind of redirect thing on top of
the search code once just as a poc, and it was less than 30 minutes of
hacking. Can't seem to find the script ATM though, but you get the
idea.

Let's not focus on that part, we can easily solve that.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 19:56:29
Message-ID: 4B4CD3ED.5040207@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 12/01/2010 10:30, Magnus Hagander ha scritto:
> The problem is usually with strange looking emails with 15 different
> MIME types. If we can figure out the proper way to render that, the
> rest really is just a SMOP.

Yeah, I was expecting some, but all the message I've looked at seemed to
be working ok.

> (BTW, for something to actually be used In Production (TM), we want
> something that uses one of our existing frameworks. So don't go
> overboard in code-wise implementations on something else - proof of
> concept on something else is always ok, of course)

OK, that's something I didn't know, even though I expected some kind of
limitations. Could you please elaborate a bit more (i.e. where to find
info)?

Having played with it, here's my feedback about AOX:

pros:
- seemed to be working reliably;
- does most of the dirty job of parsing emails, splitting parts, etc
- highly normalized schema
- thread support (partial?)

cons:
- directly publishing the live email feed might not be desirable
- queries might end up being a bit complicate for simple tasks
- might be not easy to add additional processing in the workflow

>>> So just to put this into perspective and give anyone paying attention
>>> an idea of the pain that lies ahead should they decide to work on
>>> this:
>>>
>>> - We need to import the old archives (of which there are hundreds of
>>> thousands of messages, the first few years of which have, umm, minimal
>>> headers.
>>
>> Anyone having a local copy of this in his mailboxes? At some point there
>> were some NNTP gateway, so maybe there's a copy this way.
>
> We have MBOX files.
>
> IIRC, aox has an import function that can read MBOX files. The
> interesting thing is what happens with the really old files that don't
> have complete headers.
>
> I don't think you can trust the NNTP gateway now or in the past,
> messages are sometimes lost there. The mbox files are as complete as
> anything we'll ever get.

Importing the whole pgsql-www archive with a perl script that bounces
messages via SMTP took about 30m. Maybe there's even a way to skip SMTP,
I haven't looked into it that much.

>>> - We need to generate thread indexes
>>
>> We have CTEs :)
>
> Right. We still need the threading information, so we have something
> to use our CTEs on :-)
>
> But I assume that AOX already does this?

there are thread related tables and they seem to get filled when a SORT
IMAP command is issued, however I haven't found a way to get the
hierarchy out of them.

What that means is that we'd need some kind of post processing to
populate a thread hierarchy.

If there isn't a fully usable thread hierarchy I was more thinking to
ltree, mainly because I've successfully used it in past and I haven't
had enough time yet to look at CTEs. But if performance is comparable I
don't see a reason why we shouldn't use them.

>>> - We need to re-generate the original URLs for backwards compatibility
>>
>> I guess the message-id one ain't the tricky one... and it should be
>> possible to fill a relation table like
>> monharc_compat(message_id, list, year, month, message_number);
>
> Yeah. It's not so hard, you can just screen-scrape the current
> archives the same way the search server does.

Definitely an easy enough task.

With all that said, I can't promise anything as it all depends on how
much spare time I have, but I can proceed with the evaluation if you
think it's useful. I have a feeling that AOX is not truly the right tool
for the job, but we might be able to customise it to suit our needs. Are
there any other requirements that weren't specified?

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jd <jd(at)commandprompt(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 19:58:09
Message-ID: 4B4CD451.200@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 12/01/2010 19:54, Magnus Hagander ha scritto:
> On Tue, Jan 12, 2010 at 18:34, Dave Page<dpage(at)pgadmin(dot)org> wrote:
>> On Tue, Jan 12, 2010 at 10:24 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> "Joshua D. Drake"<jd(at)commandprompt(dot)com> writes:
>>>> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>>>>> So just to put this into perspective and give anyone paying attention
>>>>> an idea of the pain that lies ahead should they decide to work on
>>>>> this:
>>>>>
>>>>> - We need to import the old archives (of which there are hundreds of
>>>>> thousands of messages, the first few years of which have, umm, minimal
>>>>> headers.
>>>>> - We need to generate thread indexes
>>>>> - We need to re-generate the original URLs for backwards compatibility
>>>>>
>>>>> Now there's encouragement :-)
>>>
>>>> Or, we just leave the current infrastructure in place and use a new one
>>>> for all new messages going forward. We shouldn't limit our ability to
>>>> have a decent system due to decisions of the past.
>>>
>>> -1. What's the point of having archives? IMO the mailing list archives
>>> are nearly as critical a piece of the project infrastructure as the CVS
>>> repository. We've already established that moving to a new SCM that
>>> fails to preserve the CVS history wouldn't be acceptable. I hardly
>>> think that the bar is any lower for mailing list archives.
>>>
>>> Now I think we could possibly skip the requirement suggested above for
>>> URL compatibility, if we just leave the old archives on-line so that
>>> those URLs all still resolve. But if we can't load all the old messages
>>> into the new infrastructure, it'll basically be useless for searching
>>> purposes.
>>>
>>> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
>>> but it's not clear. Anyway my point is that Dave's first two
>>> requirements are real. Only the third might not be.)
>>
>> The third actually isn't actually that hard to do in theory. The
>> message numbers are basically the zero-based position in the mbox
>> file, and the rest of the URL is obvious.
>
> The third part is trivial. The search system already does 95% of it.
> I've already implemented exactly that kind of redirect thing on top of
> the search code once just as a poc, and it was less than 30 minutes of
> hacking. Can't seem to find the script ATM though, but you get the
> idea.
>
> Let's not focus on that part, we can easily solve that.

Agreed. That's the part that worries me less.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 20:04:27
Message-ID: 9837222c1001121204m52adbf21k6260609f1c8768b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Tue, Jan 12, 2010 at 20:56, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Il 12/01/2010 10:30, Magnus Hagander ha scritto:
>>
>> The problem is usually with strange looking emails with 15 different
>> MIME types. If we can figure out the proper way to render that, the
>> rest really is just a SMOP.
>
> Yeah, I was expecting some, but all the message I've looked at seemed to be
> working ok.

Have you been looking at old or new messages? Try grabbing a couple of
MBOX files off archives.postgresql.org from several years back, you're
more likely to find weird MUAs then I think.

>> (BTW, for something to actually be used In Production (TM), we want
>> something that uses one of our existing frameworks. So don't go
>> overboard in code-wise implementations on something else - proof of
>> concept on something else is always ok, of course)
>
> OK, that's something I didn't know, even though I expected some kind of
> limitations. Could you please elaborate a bit more (i.e. where to find
> info)?

Well, the framework we're moving towards is built on top of django, so
that would be a good first start.

There is also whever the commitfest thing is built on, but I'm told
that's basically no framework.

> Having played with it, here's my feedback about AOX:
>
> pros:
> - seemed to be working reliably;
> - does most of the dirty job of parsing emails, splitting parts, etc
> - highly normalized schema
> - thread support (partial?)

A killer will be if that thread support is enough. If we have to build
that completely ourselves, it'll take a lot more work.

> cons:
> - directly publishing the live email feed might not be desirable

Why not?

> - queries might end up being a bit complicate for simple tasks

As long as we don't have to hit them too often, which is solve:able
with caching. And we do have a pretty good RDBMS to run the queries on
:)

>> I don't think you can trust the NNTP gateway now or in the past,
>> messages are sometimes lost there. The mbox files are as complete as
>> anything we'll ever get.
>
> Importing the whole pgsql-www archive with a perl script that bounces
> messages via SMTP took about 30m. Maybe there's even a way to skip SMTP, I
> haven't looked into it that much.

Um, yes. There is an MBOX import tool.

>>>> - We need to generate thread indexes
>>>
>>> We have CTEs :)
>>
>> Right. We still need the threading information, so we have something
>> to use our CTEs on :-)
>>
>> But I assume that AOX already does this?
>
> there are thread related tables and they seem to get filled when a SORT IMAP
> command is issued, however I haven't found a way to get the hierarchy out of
> them.
>
> What that means is that we'd need some kind of post processing to populate a
> thread hierarchy.
>
> If there isn't a fully usable thread hierarchy I was more thinking to ltree,
> mainly because I've successfully used it in past and I haven't had enough
> time yet to look at CTEs. But if performance is comparable I don't see a
> reason why we shouldn't use them.

I'd favor CTEs if they are fast enough. Great flexibility.

>>>> - We need to re-generate the original URLs for backwards compatibility
>>>
>>> I guess the message-id one ain't the tricky one... and it should be
>>> possible to fill a relation table like
>>>  monharc_compat(message_id, list, year, month, message_number);
>>
>> Yeah. It's not so hard, you can just screen-scrape the current
>> archives the same way the search server does.
>
> Definitely an easy enough task.
>
> With all that said, I can't promise anything as it all depends on how much
> spare time I have, but I can proceed with the evaluation if you think it's
> useful. I have a feeling that AOX is not truly the right tool for the job,
> but we might be able to customise it to suit our needs. Are there any other
> requirements that weren't specified?

Well, I think we want to avoid customizing it. Using a custom
frontend, sure. But we don't want to end up customizing the
parser/backend. That's the road to unmaintainability.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 20:16:47
Message-ID: 20100112201647.GC18076@oak.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www


I'll note that the whole idea of a "email archive" interface might be a
very good "advocacy" project as well. AOX might not be a perfect fit,
but it could be a good learning experience... Really, all the PG mail
archives need is:

1) A nice normalized DB schema representing mail messages and their
relations to other message and "recipients" (or "folders")

2) A "injector" that can parse an email message, and de-compose it into
the various parts/tables of the DB schema, and insert it

3) A nice set of SQL queries to return message, parts, threads,
folders based on $criteria (search, id, folder, etc)

4) A web interface to view the messages/thread/parts #3 returns

The largest part of this is #1, but a good schema would be a very good
candidate to show of some of PG's more powerful features in a way that
"others" could see (like the movie store sample somewhere) , such as:
1) full text search
2) text vs bytea handling (thinking of all the mime parts, and encoding,
etc)
3) CTEs, ltree, recursion, etc, for threading/searching
4) Triggers for "materialized views" (for quick threading/folder queries)
5) expression indexes

a.

* Matteo Beccati <php(at)beccati(dot)com> [100112 14:56]:

> Having played with it, here's my feedback about AOX:
>
> pros:
> - seemed to be working reliably;
> - does most of the dirty job of parsing emails, splitting parts, etc
> - highly normalized schema
> - thread support (partial?)
>
> cons:
> - directly publishing the live email feed might not be desirable
> - queries might end up being a bit complicate for simple tasks
> - might be not easy to add additional processing in the workflow

> If there isn't a fully usable thread hierarchy I was more thinking to
> ltree, mainly because I've successfully used it in past and I haven't
> had enough time yet to look at CTEs. But if performance is comparable I
> don't see a reason why we shouldn't use them.

> With all that said, I can't promise anything as it all depends on how
> much spare time I have, but I can proceed with the evaluation if you
> think it's useful. I have a feeling that AOX is not truly the right tool
> for the job, but we might be able to customise it to suit our needs. Are
> there any other requirements that weren't specified?

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 20:37:50
Message-ID: 4B4CDD9E.7010204@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 12/01/2010 21:04, Magnus Hagander ha scritto:
> On Tue, Jan 12, 2010 at 20:56, Matteo Beccati<php(at)beccati(dot)com> wrote:
>> Il 12/01/2010 10:30, Magnus Hagander ha scritto:
>>>
>>> The problem is usually with strange looking emails with 15 different
>>> MIME types. If we can figure out the proper way to render that, the
>>> rest really is just a SMOP.
>>
>> Yeah, I was expecting some, but all the message I've looked at seemed to be
>> working ok.
>
> Have you been looking at old or new messages? Try grabbing a couple of
> MBOX files off archives.postgresql.org from several years back, you're
> more likely to find weird MUAs then I think.

Both. pgsql-hacker and -general are subscribed and getting new emails
and pgsql-www is just an import of the archives:

http://archives.beccati.org/pgsql-www/by/date (sorry, no paging)

(just fixed a 500 error that was caused by the fact that I've been
playing with the db a bit and a required helper table was missing)

>>> (BTW, for something to actually be used In Production (TM), we want
>>> something that uses one of our existing frameworks. So don't go
>>> overboard in code-wise implementations on something else - proof of
>>> concept on something else is always ok, of course)
>>
>> OK, that's something I didn't know, even though I expected some kind of
>> limitations. Could you please elaborate a bit more (i.e. where to find
>> info)?
>
> Well, the framework we're moving towards is built on top of django, so
> that would be a good first start.
>
> There is also whever the commitfest thing is built on, but I'm told
> that's basically no framework.

I'm afraid that's outside on my expertise. But I can get as far as
having a proof of concept and the required queries / php code.

>> Having played with it, here's my feedback about AOX:
>>
>> pros:
>> - seemed to be working reliably;
>> - does most of the dirty job of parsing emails, splitting parts, etc
>> - highly normalized schema
>> - thread support (partial?)
>
> A killer will be if that thread support is enough. If we have to build
> that completely ourselves, it'll take a lot more work.

Looks like we need to populate a helper table with hierarchy
information, unless Ahijit has a better idea and knows how to get it
from the aox main schema.

>> cons:
>> - directly publishing the live email feed might not be desirable
>
> Why not?

The scenario I was thinking at was the creation of a static snapshot and
potential inconsistencies that might occur if the threads get updated
during that time.

>> - queries might end up being a bit complicate for simple tasks
>
> As long as we don't have to hit them too often, which is solve:able
> with caching. And we do have a pretty good RDBMS to run the queries on
> :)

True :)

>>> I don't think you can trust the NNTP gateway now or in the past,
>>> messages are sometimes lost there. The mbox files are as complete as
>>> anything we'll ever get.
>>
>> Importing the whole pgsql-www archive with a perl script that bounces
>> messages via SMTP took about 30m. Maybe there's even a way to skip SMTP, I
>> haven't looked into it that much.
>
> Um, yes. There is an MBOX import tool.

Cool.

>> With all that said, I can't promise anything as it all depends on how much
>> spare time I have, but I can proceed with the evaluation if you think it's
>> useful. I have a feeling that AOX is not truly the right tool for the job,
>> but we might be able to customise it to suit our needs. Are there any other
>> requirements that weren't specified?
>
> Well, I think we want to avoid customizing it. Using a custom
> frontend, sure. But we don't want to end up customizing the
> parser/backend. That's the road to unmaintainability.

Sure. I guess my wording wasn't right... I was more thinking about
adding new tables, materialized views or whatever else might be missing
to make it fit out purpose.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 21:28:03
Message-ID: m28wc3ovx8.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Aidan Van Dyk <aidan(at)highrise(dot)ca> writes:

> I'll note that the whole idea of a "email archive" interface might be a
> very good "advocacy" project as well. AOX might not be a perfect fit,
> but it could be a good learning experience... Really, all the PG mail
> archives need is:
>
> 1) A nice normalized DB schema representing mail messages and their
> relations to other message and "recipients" (or "folders")

We're now hoping that this one will fit:

http://www.archiveopteryx.org/schema

> 2) A "injector" that can parse an email message, and de-compose it into
> the various parts/tables of the DB schema, and insert it

aox has that either as a bulk importer or as a MDA.

> 3) A nice set of SQL queries to return message, parts, threads,
> folders based on $criteria (search, id, folder, etc)

I guess Matteo's working on that…

> 4) A web interface to view the messages/thread/parts #3 returns

And that too.

> The largest part of this is #1, but a good schema would be a very good
> candidate to show of some of PG's more powerful features in a way that
> "others" could see (like the movie store sample somewhere) , such as:
> 1) full text search
> 2) text vs bytea handling (thinking of all the mime parts, and encoding,
> etc)
> 3) CTEs, ltree, recursion, etc, for threading/searching
> 4) Triggers for "materialized views" (for quick threading/folder queries)
> 5) expression indexes

And Tsearch, too, maybe. Oh and pg_trgm might be quite good at providing
suggestion as you type or "Did you mean?" stuff.

Regards,
--
dim


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 21:33:38
Message-ID: 20100112213338.GE18076@oak.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

* Dimitri Fontaine <dfontaine(at)hi-media(dot)com> [100112 16:28]:

> > 1) A nice normalized DB schema representing mail messages and their
> > relations to other message and "recipients" (or "folders")
>
> We're now hoping that this one will fit:
>
> http://www.archiveopteryx.org/schema

Yup, and it provides a lot more too, which could probably be safely
ignored.

> > 2) A "injector" that can parse an email message, and de-compose it into
> > the various parts/tables of the DB schema, and insert it
>
> aox has that either as a bulk importer or as a MDA.

Yup, LMTP is ideally suited for that too.

> > 3) A nice set of SQL queries to return message, parts, threads,
> > folders based on $criteria (search, id, folder, etc)
>
> I guess Matteo's working on that…

Right, but this is where I want to see the AOX schema "imporove"... In
ways like adding persistant tables for threading, which are updated by
triggers as new messages are delivered, etc. Documented queries that
show how to use CTEs, ltree, etc to get threaded views, good FTS support
(with indexes and triggers managing them), etc.

a.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 22:10:57
Message-ID: m2y6k3nfda.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Aidan Van Dyk <aidan(at)highrise(dot)ca> writes:
>> aox has that either as a bulk importer or as a MDA.
>
> Yup, LMTP is ideally suited for that too.

Yes.

>> > 3) A nice set of SQL queries to return message, parts, threads,
>> > folders based on $criteria (search, id, folder, etc)
>>
>> I guess Matteo's working on that…
>
> Right, but this is where I want to see the AOX schema "imporove"... In
> ways like adding persistant tables for threading, which are updated by
> triggers as new messages are delivered, etc. Documented queries that
> show how to use CTEs, ltree, etc to get threaded views, good FTS support
> (with indexes and triggers managing them), etc.

+1.

I just didn't understand how much your proposal fit into current work :)
--
dim


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 07:22:29
Message-ID: 4B4EC635.9010205@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Hi,

>>>> 3) A nice set of SQL queries to return message, parts, threads,
>>>> folders based on $criteria (search, id, folder, etc)
>>>
>>> I guess Matteo's working on that…
>>
>> Right, but this is where I want to see the AOX schema "imporove"... In
>> ways like adding persistant tables for threading, which are updated by
>> triggers as new messages are delivered, etc. Documented queries that
>> show how to use CTEs, ltree, etc to get threaded views, good FTS support
>> (with indexes and triggers managing them), etc.
>
> +1.
>
> I just didn't understand how much your proposal fit into current work :)

I'm looking into it. The link I've previously sent will most likely
return a 500 error for the time being.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 12:23:31
Message-ID: 4B4F0CC3.6030501@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 14/01/2010 08:22, Matteo Beccati ha scritto:
> Hi,
>
>>>>> 3) A nice set of SQL queries to return message, parts, threads,
>>>>> folders based on $criteria (search, id, folder, etc)
>>>>
>>>> I guess Matteo's working on that…
>>>
>>> Right, but this is where I want to see the AOX schema "imporove"... In
>>> ways like adding persistant tables for threading, which are updated by
>>> triggers as new messages are delivered, etc. Documented queries that
>>> show how to use CTEs, ltree, etc to get threaded views, good FTS support
>>> (with indexes and triggers managing them), etc.
>>
>> +1.
>>
>> I just didn't understand how much your proposal fit into current work :)
>
> I'm looking into it. The link I've previously sent will most likely
> return a 500 error for the time being.

A quick update:

I've extended AOX with a trigger that takes care of filling a separate
table that's used to display the index pages. The new table also stores
threading information (standard headers + Exchange headers support) and
whether or not the email has attachments.

Please check the updated PoC: http://archives.beccati.org/

pgsql-hackers and -general are currently subscribed, while -www only has
2003 history imported via aoximport (very fast!).

BTW, I've just noticed a bug in the attachment detection giving false
positives, but have no time to check now.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 13:39:44
Message-ID: 87ska8955r.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Matteo Beccati <php(at)beccati(dot)com> writes:
> I've extended AOX with a trigger that takes care of filling a separate table
> that's used to display the index pages. The new table also stores threading
> information (standard headers + Exchange headers support) and whether or not
> the email has attachments.
>
> Please check the updated PoC: http://archives.beccati.org/

Looks pretty good, even if some thread are still separated (this one for
example), and the ordering looks strange.

Seems to be right on tracks, that said :)

Thanks for your work,
--
dim


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 13:46:09
Message-ID: 937d27e11001140546h7101d3c6tc68a1ded52c6b9c0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Thu, Jan 14, 2010 at 7:09 PM, Dimitri Fontaine
<dfontaine(at)hi-media(dot)com> wrote:
> Matteo Beccati <php(at)beccati(dot)com> writes:
>> I've extended AOX with a trigger that takes care of filling a separate table
>> that's used to display the index pages. The new table also stores threading
>> information (standard headers + Exchange headers support) and whether or not
>> the email has attachments.
>>
>> Please check the updated PoC: http://archives.beccati.org/
>
> Looks pretty good, even if some thread are still separated (this one for
> example), and the ordering looks strange.
>
> Seems to be right on tracks, that said :)

Yup.

Matteo - Can you try loading up a lot more of the old mbox files,
particularly the very early ones from -hackers? It would be good to
see how it copes under load with a few hundred thousand messages in
the database.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 14:08:22
Message-ID: 4B4F2556.80907@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 14/01/2010 14:39, Dimitri Fontaine ha scritto:
> Matteo Beccati<php(at)beccati(dot)com> writes:
>> I've extended AOX with a trigger that takes care of filling a separate table
>> that's used to display the index pages. The new table also stores threading
>> information (standard headers + Exchange headers support) and whether or not
>> the email has attachments.
>>
>> Please check the updated PoC: http://archives.beccati.org/
>
> Looks pretty good, even if some thread are still separated (this one for
> example), and the ordering looks strange.

This one is separated as the first one is not in the archive yet, thus
to the system there are multiple parent messages. It shouldn't happen
with full archives. About sorting, here's the query I've used (my first
try with CTEs incidentally):

WITH RECURSIVE t (mailbox, uid, date, subject, sender, has_attachments,
parent_uid, idx, depth) AS (
SELECT mailbox, uid, date, subject, sender, has_attachments,
parent_uid, uid::text, 1
FROM arc_messages
WHERE parent_uid IS NULL AND mailbox = 15
UNION ALL
SELECT a.mailbox, a.uid, a.date, a.subject, a.sender,
a.has_attachments, a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
FROM t JOIN arc_messages a USING (mailbox)
WHERE t.uid = a.parent_uid
) SELECT * FROM t ORDER BY idx

Any improvements to sorting are welcome :)

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 14:32:13
Message-ID: 4B4F2AED.1020101@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 14/01/2010 14:46, Dave Page ha scritto:
> On Thu, Jan 14, 2010 at 7:09 PM, Dimitri Fontaine
> <dfontaine(at)hi-media(dot)com> wrote:
>> Matteo Beccati<php(at)beccati(dot)com> writes:
>>> I've extended AOX with a trigger that takes care of filling a separate table
>>> that's used to display the index pages. The new table also stores threading
>>> information (standard headers + Exchange headers support) and whether or not
>>> the email has attachments.
>>>
>>> Please check the updated PoC: http://archives.beccati.org/
>>
>> Looks pretty good, even if some thread are still separated (this one for
>> example), and the ordering looks strange.
>>
>> Seems to be right on tracks, that said :)
>
> Yup.
>
> Matteo - Can you try loading up a lot more of the old mbox files,
> particularly the very early ones from -hackers? It would be good to
> see how it copes under load with a few hundred thousand messages in
> the database.

Sure, I will give it a try in the evening or tomorrow.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 14:36:35
Message-ID: 937d27e11001140636i2d767ba1mc32e5eeb8130ce38@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Thu, Jan 14, 2010 at 8:02 PM, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Il 14/01/2010 14:46, Dave Page ha scritto:
>>
>> On Thu, Jan 14, 2010 at 7:09 PM, Dimitri Fontaine
>> <dfontaine(at)hi-media(dot)com>  wrote:
>>>
>>> Matteo Beccati<php(at)beccati(dot)com>  writes:
>>>>
>>>> I've extended AOX with a trigger that takes care of filling a separate
>>>> table
>>>> that's used to display the index pages. The new table also stores
>>>> threading
>>>> information (standard headers + Exchange headers support) and whether or
>>>> not
>>>> the email has attachments.
>>>>
>>>> Please check the updated PoC: http://archives.beccati.org/
>>>
>>> Looks pretty good, even if some thread are still separated (this one for
>>> example), and the ordering looks strange.
>>>
>>> Seems to be right on tracks, that said :)
>>
>> Yup.
>>
>> Matteo - Can you try loading up a lot more of the old mbox files,
>> particularly the very early ones from -hackers? It would be good to
>> see how it copes under load with a few hundred thousand messages in
>> the database.
>
> Sure, I will give it a try in the evening or tomorrow.

Thanks :-)

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 14:47:02
Message-ID: 87r5ps7nh5.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Matteo Beccati <php(at)beccati(dot)com> writes:
> WITH RECURSIVE t (mailbox, uid, date, subject, sender, has_attachments,
> parent_uid, idx, depth) AS (
> SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid,
> uid::text, 1
> FROM arc_messages
> WHERE parent_uid IS NULL AND mailbox = 15
> UNION ALL
> SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments,
> a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
> FROM t JOIN arc_messages a USING (mailbox)
> WHERE t.uid = a.parent_uid
> ) SELECT * FROM t ORDER BY idx
>
> Any improvements to sorting are welcome :)

What I'd like would be to have it sorted by activity, showing first the
thread which received the later messages. I'm yet to play with CTE and
window function myself so without a database example to play with I
won't come up with a nice query, but I guess a more educated reader will
solve this without a sweat, as it looks easier than sudoku-solving,
which has been done already :)

Regards,
--
dim


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 15:06:33
Message-ID: 4B4F32F9.1040309@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 14/01/2010 15:47, Dimitri Fontaine ha scritto:
> Matteo Beccati<php(at)beccati(dot)com> writes:
>> WITH RECURSIVE t (mailbox, uid, date, subject, sender, has_attachments,
>> parent_uid, idx, depth) AS (
>> SELECT mailbox, uid, date, subject, sender, has_attachments, parent_uid,
>> uid::text, 1
>> FROM arc_messages
>> WHERE parent_uid IS NULL AND mailbox = 15
>> UNION ALL
>> SELECT a.mailbox, a.uid, a.date, a.subject, a.sender, a.has_attachments,
>> a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
>> FROM t JOIN arc_messages a USING (mailbox)
>> WHERE t.uid = a.parent_uid
>> ) SELECT * FROM t ORDER BY idx
>>
>> Any improvements to sorting are welcome :)
>
> What I'd like would be to have it sorted by activity, showing first the
> thread which received the later messages. I'm yet to play with CTE and
> window function myself so without a database example to play with I
> won't come up with a nice query, but I guess a more educated reader will
> solve this without a sweat, as it looks easier than sudoku-solving,
> which has been done already :)

Eheh, that was my first try as well. CTEs look very nice even though I'm
not yet very comfortable with the syntax. Anyway both for date and
thread indexes sort is the other way around, with newer posts/threads at
the bottom. Again I'll give it a try as soon as I find time to work
again on it.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 15:09:43
Message-ID: 9837222c1001140709n1b4cf564s2c90d7f266d56e33@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Thu, Jan 14, 2010 at 16:06, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Il 14/01/2010 15:47, Dimitri Fontaine ha scritto:
>>
>> Matteo Beccati<php(at)beccati(dot)com>  writes:
>>>
>>> WITH RECURSIVE t (mailbox, uid, date, subject, sender, has_attachments,
>>> parent_uid, idx, depth) AS (
>>>   SELECT mailbox, uid, date, subject, sender, has_attachments,
>>> parent_uid,
>>> uid::text, 1
>>>   FROM arc_messages
>>>   WHERE parent_uid IS NULL AND mailbox = 15
>>>   UNION ALL
>>>   SELECT a.mailbox, a.uid, a.date, a.subject, a.sender,
>>> a.has_attachments,
>>> a.parent_uid, t.idx || '.' || a.uid::text, t.depth + 1
>>>   FROM t JOIN arc_messages a USING (mailbox)
>>>   WHERE t.uid = a.parent_uid
>>> ) SELECT * FROM t ORDER BY idx
>>>
>>> Any improvements to sorting are welcome :)
>>
>> What I'd like would be to have it sorted by activity, showing first the
>> thread which received the later messages. I'm yet to play with CTE and
>> window function myself so without a database example to play with I
>> won't come up with a nice query, but I guess a more educated reader will
>> solve this without a sweat, as it looks easier than sudoku-solving,
>> which has been done already :)
>
> Eheh, that was my first try as well. CTEs look very nice even though I'm not
> yet very comfortable with the syntax. Anyway both for date and thread
> indexes sort is the other way around, with newer posts/threads at the
> bottom. Again I'll give it a try as soon as I find time to work again on it.

Three tips around this,

1) don't be constrained by how things look now. Make something that's useful.

2) don't be constrained by the fact that we have two ways to view it
now (thread + date). we can easily do three, if different people like
different ways. As long as it's not so much it becomes a maintenance
burden

3) Remember to run your tests with lots of emails, some designs just
tend to fall apart over that (say a thread with 200+ emails in it)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: David Fetter <david(at)fetter(dot)org>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-14 18:36:09
Message-ID: 20100114183609.GB6859@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Thu, Jan 14, 2010 at 03:08:22PM +0100, Matteo Beccati wrote:
> Il 14/01/2010 14:39, Dimitri Fontaine ha scritto:
> >Matteo Beccati<php(at)beccati(dot)com> writes:
> >>I've extended AOX with a trigger that takes care of filling a separate table
> >>that's used to display the index pages. The new table also stores threading
> >>information (standard headers + Exchange headers support) and whether or not
> >>the email has attachments.
> >>
> >>Please check the updated PoC: http://archives.beccati.org/
> >
> >Looks pretty good, even if some thread are still separated (this one for
> >example), and the ordering looks strange.
>
> This one is separated as the first one is not in the archive yet,
> thus to the system there are multiple parent messages. It shouldn't
> happen with full archives. About sorting, here's the query I've used
> (my first try with CTEs incidentally):
>
> WITH RECURSIVE t (mailbox, uid, date, subject, sender,
> has_attachments, parent_uid, idx, depth) AS (
> SELECT mailbox, uid, date, subject, sender, has_attachments,
> parent_uid, uid::text, 1
> FROM arc_messages
> WHERE parent_uid IS NULL AND mailbox = 15
> UNION ALL
> SELECT a.mailbox, a.uid, a.date, a.subject, a.sender,
> a.has_attachments, a.parent_uid, t.idx || '.' || a.uid::text,
> t.depth + 1
> FROM t JOIN arc_messages a USING (mailbox)
> WHERE t.uid = a.parent_uid
> ) SELECT * FROM t ORDER BY idx

> Any improvements to sorting are welcome :)

This is probably better written as:

WITH RECURSIVE t (
mailbox,
uid,
date,
subject,
sender,
has_attachments,
"path"
)
AS (
SELECT
mailbox,
uid,
date,
subject,
sender,
has_attachments,
ARRAY[uid]
FROM
arc_messages
WHERE
parent_uid IS NULL AND
mailbox = 15
UNION ALL
SELECT
a.mailbox,
a.uid,
a.date,
a.subject,
a.sender,
a.has_attachments,
t."path" || a.uid,
FROM
t JOIN arc_messages a
ON (
a.mailbox = t.mailbox AND
t.uid = a.parent_uid
)
)
SELECT *
FROM t
ORDER BY "path";

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Matteo Beccati <php(at)beccati(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-15 19:38:50
Message-ID: 4B50C44A.101@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Hi everyone,

Il 14/01/2010 19:36, David Fetter ha scritto:
> On Thu, Jan 14, 2010 at 03:08:22PM +0100, Matteo Beccati wrote:
>> Il 14/01/2010 14:39, Dimitri Fontaine ha scritto:
>>> Matteo Beccati<php(at)beccati(dot)com> writes:>
>> Any improvements to sorting are welcome :)
>
> ...
> ARRAY[uid]
> ...

Thanks David, using an array rather than text concatenation is slightly
slower and uses a bit more memory, but you've been able to convince me
that it's The Right Way(TM) ;)

Anyway, I've made further changes and I would say that at this point the
PoC is feature complete. There surely are still some rough edges and a
few things to clean up, but I'd like to get your feedback once again:

http://archives.beccati.org

You will find that pgsql-general and -hackers are subscribed and getting
messages live, wihle -hackers-history and -www have been imported from
the archives (about 200k and 1.5k messages respectively at 50 messages/s).

Also, I'd need some help with the CTE query that was picking a wrong
plan and led me to forcibly disable merge joins inside the application
when executing it. Plans are attached.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/

Attachment Content-Type Size
cte-mergejoin.txt text/plain 2.4 KB
cte-index.txt text/plain 2.2 KB

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-16 10:48:50
Message-ID: m23a26qq99.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Matteo Beccati <php(at)beccati(dot)com> writes:
> Anyway, I've made further changes and I would say that at this point the PoC
> is feature complete. There surely are still some rough edges and a few
> things to clean up, but I'd like to get your feedback once again:
>
> http://archives.beccati.org

I've been clicking around and like the speedy feeling and the Thread
index appearing under any mail. Also getting the attachments seems to be
just working™. I've also checked than this "local thread" works on month
boundaries, so that you're POC is in a way already better than the
current archives solution.

Only missing is the search, but we have tsearch and pg_trgm masters not
far away…

> You will find that pgsql-general and -hackers are subscribed and getting
> messages live, wihle -hackers-history and -www have been imported from the
> archives (about 200k and 1.5k messages respectively at 50 messages/s).

Tried clicking over there and very far in the past indexes show no
messages. Here's an example:

http://archives.beccati.org/pgsql-hackers-history/1996-09/by/thread

> Also, I'd need some help with the CTE query that was picking a wrong plan
> and led me to forcibly disable merge joins inside the application when
> executing it. Plans are attached.

Sorry, not from me, still a CTE noob.
--
dim


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-16 13:21:58
Message-ID: 4B51BD76.9060500@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 16/01/2010 11:48, Dimitri Fontaine ha scritto:
> Matteo Beccati<php(at)beccati(dot)com> writes:
>> Anyway, I've made further changes and I would say that at this point the PoC
>> is feature complete. There surely are still some rough edges and a few
>> things to clean up, but I'd like to get your feedback once again:
>>
>> http://archives.beccati.org
>
> I've been clicking around and like the speedy feeling and the Thread
> index appearing under any mail. Also getting the attachments seems to be
> just working™. I've also checked than this "local thread" works on month
> boundaries, so that you're POC is in a way already better than the
> current archives solution.

Thanks for the feedback.

> Only missing is the search, but we have tsearch and pg_trgm masters not
> far away…

I haven't even looked at it as I was under the impression that the old
engine could still be used. If not, adding search support should be
fairly easy.

>> You will find that pgsql-general and -hackers are subscribed and getting
>> messages live, wihle -hackers-history and -www have been imported from the
>> archives (about 200k and 1.5k messages respectively at 50 messages/s).
>
> Tried clicking over there and very far in the past indexes show no
> messages. Here's an example:
>
> http://archives.beccati.org/pgsql-hackers-history/1996-09/by/thread

Yeah, there are a few messages in the archives with a wrong date header.
The list is generated using from min(date) to now(), so there are holes.
At some point I'll run a few queries to fix that.

>> Also, I'd need some help with the CTE query that was picking a wrong plan
>> and led me to forcibly disable merge joins inside the application when
>> executing it. Plans are attached.
>
> Sorry, not from me, still a CTE noob.

Actually the live db doesn't suffer from that problem anymore, but I've
able to reproduce the issue with a few days old backup running on a test
8.5alpha3 instance that still has a stock postgresql.conf.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 14:48:12
Message-ID: 4B5474AC.4090800@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 16/01/2010 14:21, Matteo Beccati ha scritto:
> Il 16/01/2010 11:48, Dimitri Fontaine ha scritto:
>> Matteo Beccati<php(at)beccati(dot)com> writes:
>>> Anyway, I've made further changes and I would say that at this point
>>> the PoC
>>> is feature complete. There surely are still some rough edges and a few
>>> things to clean up, but I'd like to get your feedback once again:
>
> [...]
>
>>> Also, I'd need some help with the CTE query that was picking a wrong
>>> plan
>>> and led me to forcibly disable merge joins inside the application when
>>> executing it. Plans are attached.
>>
>> Sorry, not from me, still a CTE noob.
>
> Actually the live db doesn't suffer from that problem anymore, but I've
> able to reproduce the issue with a few days old backup running on a test
> 8.5alpha3 instance that still has a stock postgresql.conf.

Following advice from Andrew "RodiumToad" Gierth, I raised cpu costs
back to the defaults (I did lower them following some tuning guide) and
that seems to have fixed the problem.

My question now is... what next? :)

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 14:55:00
Message-ID: 9837222c1001180655n7a7b90b5kc5cedf0157419b7b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/1/18 Matteo Beccati <php(at)beccati(dot)com>:
> Il 16/01/2010 14:21, Matteo Beccati ha scritto:
>>
>> Il 16/01/2010 11:48, Dimitri Fontaine ha scritto:
>>>
>>> Matteo Beccati<php(at)beccati(dot)com> writes:
>>>>
>>>> Anyway, I've made further changes and I would say that at this point
>>>> the PoC
>>>> is feature complete. There surely are still some rough edges and a few
>>>> things to clean up, but I'd like to get your feedback once again:
>>
>> [...]
>>
>>>> Also, I'd need some help with the CTE query that was picking a wrong
>>>> plan
>>>> and led me to forcibly disable merge joins inside the application when
>>>> executing it. Plans are attached.
>>>
>>> Sorry, not from me, still a CTE noob.
>>
>> Actually the live db doesn't suffer from that problem anymore, but I've
>> able to reproduce the issue with a few days old backup running on a test
>> 8.5alpha3 instance that still has a stock postgresql.conf.
>
> Following advice from Andrew "RodiumToad" Gierth, I raised cpu costs back to the defaults (I did lower them following some tuning guide) and that seems to have fixed the problem.
>
> My question now is... what next? :)

If it wasn't for the fact that we're knee deep in two other major
projects for the infrastructure team right now, I'd be all over this
:-) But we really need to complete that before we put anything new in
production here.

What I'd like to see is one that integrates with our general layouts.

Also, I tink one of the main issues with the archives today that
people bring up is the inability to have threads cross months. I think
that should be fixed. Basically, get rid of the grouping by month for
a more dynamic way to browse.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Matteo Beccati <php(at)beccati(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 15:19:36
Message-ID: 874omje8zb.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Also, I tink one of the main issues with the archives today that
> people bring up is the inability to have threads cross months. I think
> that should be fixed. Basically, get rid of the grouping by month for
> a more dynamic way to browse.

Clic a mail in a thread within more than one given month. See the Thread
index for this email. It's complete, for both the month. Example here:

http://archives.beccati.org/pgsql-hackers-history/message/191438.html
http://archives.beccati.org/pgsql-hackers-history/message/191334.html

That said, the month boundary is artificial, so maybe having a X
messages per page instead would be better?

Regards,
--
dim


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 15:52:49
Message-ID: 20100118155249.GB3617@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Magnus Hagander wrote:
> 2010/1/18 Matteo Beccati <php(at)beccati(dot)com>:

> > My question now is... what next? :)

Gee, I disappear for a week and look what happens -- we get streaming
replication, a revamped archives site, and maybe something else that I
haven't seen yet. I love it :-)

> If it wasn't for the fact that we're knee deep in two other major
> projects for the infrastructure team right now, I'd be all over this
> :-) But we really need to complete that before we put anything new in
> production here.
>
> What I'd like to see is one that integrates with our general layouts.

Yeah. I think that should be relatively simple to add. There are some
other things that need a bit of rejiggering too, like the thread index
at the bottom getting too wide with large threads, for example here:
http://archives.beccati.org/pgsql-hackers-history/message/90000.html

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 17:31:08
Message-ID: 4B549ADC.3070501@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 18/01/2010 15:55, Magnus Hagander ha scritto:
> If it wasn't for the fact that we're knee deep in two other major
> projects for the infrastructure team right now, I'd be all over this
> :-) But we really need to complete that before we put anything new in
> production here.

Sure, that's completely understandable.

> What I'd like to see is one that integrates with our general layouts.

Shoudln't bee too hard, but I wouldn't be very keen on spending time on
layout related things that are going to be thrown away to due the
framework and language being different from what is going to be used on
production (symfony/php vs django/python).

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 17:35:07
Message-ID: 4B549BCB.1000103@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 18/01/2010 16:19, Dimitri Fontaine ha scritto:
> Magnus Hagander<magnus(at)hagander(dot)net> writes:
>> Also, I tink one of the main issues with the archives today that
>> people bring up is the inability to have threads cross months. I think
>> that should be fixed. Basically, get rid of the grouping by month for
>> a more dynamic way to browse.
>
> Clic a mail in a thread within more than one given month. See the Thread
> index for this email. It's complete, for both the month. Example here:
>
> http://archives.beccati.org/pgsql-hackers-history/message/191438.html
> http://archives.beccati.org/pgsql-hackers-history/message/191334.html

Thanks Dimitri, you beat me to it ;)

> That said, the month boundary is artificial, so maybe having a X
> messages per page instead would be better?

Not sure. Having date based pages helps out reducing the set of messages
that need to be scanned and sorted, increasing the likeliness of an
index scan. But I'm happy to examine other alternatives too.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 17:42:48
Message-ID: 9837222c1001180942s253eb081j8a16df6f8759828b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, Jan 18, 2010 at 18:31, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Il 18/01/2010 15:55, Magnus Hagander ha scritto:
>>
>> If it wasn't for the fact that we're knee deep in two other major
>> projects for the infrastructure team right now, I'd be all over this
>> :-) But we really need to complete that before we put anything new in
>> production here.
>
> Sure, that's completely understandable.
>
>> What I'd like to see is one that integrates with our general layouts.
>
> Shoudln't bee too hard, but I wouldn't be very keen on spending time on
> layout related things that are going to be thrown away to due the framework
> and language being different from what is going to be used on production
> (symfony/php vs django/python).

I don't know symfony, but as long as it's done in a template it is
probably pretty easy to move between different frameworks for the
layout part.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-18 17:44:28
Message-ID: 9837222c1001180944r40c9d1cfsd7592e5fe28c6a82@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, Jan 18, 2010 at 18:35, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Il 18/01/2010 16:19, Dimitri Fontaine ha scritto:
>>
>> Magnus Hagander<magnus(at)hagander(dot)net>  writes:
>>>
>>> Also, I tink one of the main issues with the archives today that
>>> people bring up is the inability to have threads cross months. I think
>>> that should be fixed. Basically, get rid of the grouping by month for
>>> a more dynamic way to browse.
>>
>> Clic a mail in a thread within more than one given month. See the Thread
>> index for this email. It's complete, for both the month. Example here:
>>
>>   http://archives.beccati.org/pgsql-hackers-history/message/191438.html
>>   http://archives.beccati.org/pgsql-hackers-history/message/191334.html
>
> Thanks Dimitri, you beat me to it ;)
>
>
>> That said, the month boundary is artificial, so maybe having a X
>> messages per page instead would be better?
>
> Not sure. Having date based pages helps out reducing the set of messages
> that need to be scanned and sorted, increasing the likeliness of an index
> scan. But I'm happy to examine other alternatives too.

I think we need to get rid of the months based pages. We can keep them
as an option, but they're not a good root thing. I'd rather have
something where you start at a certain point and see <n> before and
<n> after, so we keep the page to a reasonably short time. Keeping "30
days" there somewhere may make sense, but arbitrarily splitting at the
1st of each month doesn't follow the flow of discussions very well.

I think the first step has to be to figure out how we'd like it
presented. Only after that should we consider how to implement it to
get fast scans in the database...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-19 08:11:32
Message-ID: 4B556934.6000507@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 18/01/2010 18:42, Magnus Hagander ha scritto:
> On Mon, Jan 18, 2010 at 18:31, Matteo Beccati<php(at)beccati(dot)com> wrote:
>> Il 18/01/2010 15:55, Magnus Hagander ha scritto:
>>>
>>> If it wasn't for the fact that we're knee deep in two other major
>>> projects for the infrastructure team right now, I'd be all over this
>>> :-) But we really need to complete that before we put anything new in
>>> production here.
>>
>> Sure, that's completely understandable.
>>
>>> What I'd like to see is one that integrates with our general layouts.
>>
>> Shoudln't bee too hard, but I wouldn't be very keen on spending time on
>> layout related things that are going to be thrown away to due the framework
>> and language being different from what is going to be used on production
>> (symfony/php vs django/python).
>
> I don't know symfony, but as long as it's done in a template it is
> probably pretty easy to move between different frameworks for the
> layout part.

By default symfony uses plain PHP files as templates, but some plugins
allow using a templating engine instead. I guess I can give them a try.

--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-19 08:44:01
Message-ID: 9837222c1001190044s3024e5d6nf51db0d7c72b9436@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Tue, Jan 19, 2010 at 09:11, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Il 18/01/2010 18:42, Magnus Hagander ha scritto:
>>
>> On Mon, Jan 18, 2010 at 18:31, Matteo Beccati<php(at)beccati(dot)com>  wrote:
>>>
>>> Il 18/01/2010 15:55, Magnus Hagander ha scritto:
>>>>
>>>> If it wasn't for the fact that we're knee deep in two other major
>>>> projects for the infrastructure team right now, I'd be all over this
>>>> :-) But we really need to complete that before we put anything new in
>>>> production here.
>>>
>>> Sure, that's completely understandable.
>>>
>>>> What I'd like to see is one that integrates with our general layouts.
>>>
>>> Shoudln't bee too hard, but I wouldn't be very keen on spending time on
>>> layout related things that are going to be thrown away to due the
>>> framework
>>> and language being different from what is going to be used on production
>>> (symfony/php vs django/python).
>>
>> I don't know symfony, but as long as it's done in a template it is
>> probably pretty easy to move between different frameworks for the
>> layout part.
>
> By default symfony uses plain PHP files as templates, but some plugins allow
> using a templating engine instead. I guess I can give them a try.

As long as the templating is separated from the code, it doesn't
matter if it's a dedicated templating engine or PHP. The point being,
focus on the contents and interface, porting the actual
HTML-generation is likely to be easy compared to that.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-30 16:08:08
Message-ID: 4B645968.6050700@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Il 19/01/2010 09:44, Magnus Hagander ha scritto:
> As long as the templating is separated from the code, it doesn't
> matter if it's a dedicated templating engine or PHP. The point being,
> focus on the contents and interface, porting the actual
> HTML-generation is likely to be easy compared to that.

I've been following the various suggestions. Please take a look at the
updated archives proof of concept:

http://archives.beccati.org/

The PoC is now integrated with the website layout and has a working
"Mailing lists" menu to navigate the available lists. The artificial
monthly breakdown has been removed and both thread and date sorting use
pagination instead.

The fancy tables are using the Ext JS framework as it was the only free
one I could find that features column layout for trees. I'm not
extremely happy about it, but it just works. Threads are loaded
asynchronously (AJAX), while date sorting uses regular HTML tables with
a bit of JS to get the fancy layout. This means that search engines
still have a way to properly index all the messages.

Last but not least, it's backwards compatibile with the /message-id/*
URI. The other one (/list/yyyy-mm/msg*.php) is implemented, but I just
realized that it has problems dealing with the old archive weirdness
(2009-12 shows also some messages dated aug 2009 nov 2009 or jan 2010
for -hackers).

That said, there are still a few visual improvements to be done, but
overall I'm pretty much satisfied.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-30 16:54:37
Message-ID: 20100130165437.GA2007@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Matteo Beccati wrote:
> Il 19/01/2010 09:44, Magnus Hagander ha scritto:
> >As long as the templating is separated from the code, it doesn't
> >matter if it's a dedicated templating engine or PHP. The point being,
> >focus on the contents and interface, porting the actual
> >HTML-generation is likely to be easy compared to that.
>
> I've been following the various suggestions. Please take a look at
> the updated archives proof of concept:
>
> http://archives.beccati.org/

I like this.

Sorry for being unable to get in touch with you on IM. It's been a
hectic time here with only very few pauses.

Some things:

* the list of lists and groups of lists are stored in two JSON files.
Should I send you a copy of them so that you can tweak your code to use
them? They are generated automatically from the wwwmaster database.

* We have a bunch of templates that you could perhaps have used, if you
hadn't already written all of it ... :-(

* While I don't personally care, some are going to insist that the site
works with Javascript disabled. I didn't try but from your description
it doesn't seem like it would. Is this easily fixable?

* The old monthly interface /list/yyyy-mm/msg*php is not really
necessary to keep, *except* that we need the existing URLs to redirect
to the corresponding new message page. I think we should be able to
create a database of URL redirects from the old site, using the
Message-Id URL style. So each message accessed using the old URL style
would require two redirects, but I don't think this is a problem. Do
you agree?

* We're using Subversion to keep the current code. Is your code
version-controlled? We'd need to import your code there, I'm afraid.

> Last but not least, it's backwards compatibile with the
> /message-id/* URI. The other one (/list/yyyy-mm/msg*.php) is
> implemented, but I just realized that it has problems dealing with
> the old archive weirdness (2009-12 shows also some messages dated
> aug 2009 nov 2009 or jan 2010 for -hackers).

I'm surprised about the Aug 2009 ones, but the others are explained
because the site divides the mboxes using one timezone and the time
displayed is a different timezone. We don't really control the first
one so there's nothing to do about it; but anyway it's not really
important.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-30 21:14:10
Message-ID: m2d40r5m8t.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Matteo Beccati <php(at)beccati(dot)com> writes:
> I've been following the various suggestions. Please take a look at the
> updated archives proof of concept:
>
> http://archives.beccati.org/

I like the features a lot, and the only remarks I can think about are
bikeschedding, so I'll let it to the web team when they integrate it. It
sure looks like a when rather than an if as far as I'm concerned.

In short, +1! And thanks a lot!
--
dim


From: Joe Conway <mail(at)joeconway(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-30 21:18:48
Message-ID: 4B64A238.1050307@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 01/30/2010 01:14 PM, Dimitri Fontaine wrote:
> Matteo Beccati <php(at)beccati(dot)com> writes:
>> I've been following the various suggestions. Please take a look at the
>> updated archives proof of concept:
>>
>> http://archives.beccati.org/
>
> I like the features a lot, and the only remarks I can think about are
> bikeschedding, so I'll let it to the web team when they integrate it. It
> sure looks like a when rather than an if as far as I'm concerned.
>
> In short, +1! And thanks a lot!

+1 here too. That looks wonderful!

Joe


From: Matteo Beccati <php(at)beccati(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-30 21:43:50
Message-ID: 4B64A816.1070008@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 30/01/2010 17:54, Alvaro Herrera wrote:
> Matteo Beccati wrote:
>> Il 19/01/2010 09:44, Magnus Hagander ha scritto:
>>> As long as the templating is separated from the code, it doesn't
>>> matter if it's a dedicated templating engine or PHP. The point being,
>>> focus on the contents and interface, porting the actual
>>> HTML-generation is likely to be easy compared to that.
>>
>> I've been following the various suggestions. Please take a look at
>> the updated archives proof of concept:
>>
>> http://archives.beccati.org/
>
> I like this.
>
> Sorry for being unable to get in touch with you on IM. It's been a
> hectic time here with only very few pauses.

Thanks :)

And no worries, I'm pretty sure you must be quite busy lately!

> Some things:
>
> * the list of lists and groups of lists are stored in two JSON files.
> Should I send you a copy of them so that you can tweak your code to use
> them? They are generated automatically from the wwwmaster database.
>
> * We have a bunch of templates that you could perhaps have used, if you
> hadn't already written all of it ... :-(

The templates and especially the integration with the current layout
still need to be rewritten when porting the code to python/Django, so I
I'm not sure if it's wise to spend more time on it at this stage.

Not sure about the JSON approach either. Maybe it's something that needs
to be further discussed when/if planning the migration of the archives
to Archiveopteryx.

> * While I don't personally care, some are going to insist that the site
> works with Javascript disabled. I didn't try but from your description
> it doesn't seem like it would. Is this easily fixable?

Date sorting works nicely even without JS, while thread sorting doesn't
at all. I've just updated the PoC so that thread sorting is not
available when JS is not available, while it still is the default
otherwise. Hopefully that's enough to keep JS haters happy.

> * The old monthly interface /list/yyyy-mm/msg*php is not really
> necessary to keep, *except* that we need the existing URLs to redirect
> to the corresponding new message page. I think we should be able to
> create a database of URL redirects from the old site, using the
> Message-Id URL style. So each message accessed using the old URL style
> would require two redirects, but I don't think this is a problem. Do
> you agree?

Sure. I was just hoping there was an even easier way (rescritct to
month, order by uid limit 1 offset X). I guess it wouldn't be hard to
write a script that populates a backward compatibility table. No need
for double redirects, it'd be just a matter of adding a JOIN or two to
the query.

> * We're using Subversion to keep the current code. Is your code
> version-controlled? We'd need to import your code there, I'm afraid.

I do have a local svn repository. Given it's just a PoC that is going to
be rewritten I don't think it should live in the official repo, but if
you think id does, I'll be glad to switch.

>> Last but not least, it's backwards compatibile with the
>> /message-id/* URI. The other one (/list/yyyy-mm/msg*.php) is
>> implemented, but I just realized that it has problems dealing with
>> the old archive weirdness (2009-12 shows also some messages dated
>> aug 2009 nov 2009 or jan 2010 for -hackers).
>
> I'm surprised about the Aug 2009 ones, but the others are explained
> because the site divides the mboxes using one timezone and the time
> displayed is a different timezone. We don't really control the first
> one so there's nothing to do about it; but anyway it's not really
> important.

It's not a big deal, the BC-table approach will take care of those
out-of-range messages. However there are a few messages in the hackers
archive (and most likely others) that have wrong date headers (e.g.
1980, 2036): we need to think what to do with them.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-31 12:38:11
Message-ID: 4B6579B3.6040209@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 30/01/2010 22:18, Joe Conway wrote:
> On 01/30/2010 01:14 PM, Dimitri Fontaine wrote:
>> Matteo Beccati<php(at)beccati(dot)com> writes:
>>> I've been following the various suggestions. Please take a look at the
>>> updated archives proof of concept:
>>>
>>> http://archives.beccati.org/
>>
>> I like the features a lot, and the only remarks I can think about are
>> bikeschedding, so I'll let it to the web team when they integrate it. It
>> sure looks like a when rather than an if as far as I'm concerned.
>>
>> In short, +1! And thanks a lot!
>
> +1 here too. That looks wonderful!

Thanks guys. Hopefully in the next few days I'll be able to catch up
with Alvaro to see how we can proceed on this.

Incidentally, I've just found out that the mailing lists are dropping
some messages. According to my qmail logs the AOX account never received
Joe's message yesterday, nor quite a few others:

M156252, M156259, M156262, M156273, M156275

and I've verified that it also has happened before. I don't know why,
but I'm pretty sure that my MTA was contacted only once for those
messages, while normally I get two connections (my own address + aox
address).

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-31 12:45:09
Message-ID: 9837222c1001310445r780f85ads20d0c42a8745d986@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sat, Jan 30, 2010 at 22:43, Matteo Beccati <php(at)beccati(dot)com> wrote:
> On 30/01/2010 17:54, Alvaro Herrera wrote:
>> * While I don't personally care, some are going to insist that the site
>> works with Javascript disabled.  I didn't try but from your description
>> it doesn't seem like it would.  Is this easily fixable?
>
> Date sorting works nicely even without JS, while thread sorting doesn't at
> all. I've just updated the PoC so that thread sorting is not available when
> JS is not available, while it still is the default otherwise. Hopefully
> that's enough to keep JS haters happy.

I haven't looked at how it actually works, but the general requirement
is that it has to *work* without JS. It doesn't have to work *as
well*. That means serving up a page with zero contents, or a page that
you can't navigate, is not acceptable. Requiring more clicks to get
around the navigation and things like that, are ok.

>> * The old monthly interface /list/yyyy-mm/msg*php is not really
>> necessary to keep, *except* that we need the existing URLs to redirect
>> to the corresponding new message page.  I think we should be able to
>> create a database of URL redirects from the old site, using the
>> Message-Id URL style.  So each message accessed using the old URL style
>> would require two redirects, but I don't think this is a problem.  Do
>> you agree?
>
> Sure. I was just hoping there was an even easier way (rescritct to month,
> order by uid limit 1 offset X). I guess it wouldn't be hard to write a
> script that populates a backward compatibility table. No need for double
> redirects, it'd be just a matter of adding a JOIN or two to the query.

Once we go into production on this, we'll need to do some serious
thinking about the caching issues. And in any such scenario we should
very much avoid serving up the same content under different URLs,
since it'll blow away cache space for no reason - it's much better to
throw a redirct.

>> * We're using Subversion to keep the current code.  Is your code
>> version-controlled?  We'd need to import your code there, I'm afraid.
>
> I do have a local svn repository. Given it's just a PoC that is going to be
> rewritten I don't think it should live in the official repo, but if you
> think id does, I'll be glad to switch.

Note that the plan is to switch pgweb to git as well. So if you just
want to push the stuff up during development so people can look at it,
register for a repository at git.postgresql.org - or just set one up
at github which is even easier.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-31 14:09:17
Message-ID: 4B658F0D.7090406@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 31/01/2010 13:45, Magnus Hagander wrote:
> On Sat, Jan 30, 2010 at 22:43, Matteo Beccati<php(at)beccati(dot)com> wrote:
>> On 30/01/2010 17:54, Alvaro Herrera wrote:
>>> * While I don't personally care, some are going to insist that the site
>>> works with Javascript disabled. I didn't try but from your description
>>> it doesn't seem like it would. Is this easily fixable?
>>
>> Date sorting works nicely even without JS, while thread sorting doesn't at
>> all. I've just updated the PoC so that thread sorting is not available when
>> JS is not available, while it still is the default otherwise. Hopefully
>> that's enough to keep JS haters happy.
>
> I haven't looked at how it actually works, but the general requirement
> is that it has to *work* without JS. It doesn't have to work *as
> well*. That means serving up a page with zero contents, or a page that
> you can't navigate, is not acceptable. Requiring more clicks to get
> around the navigation and things like that, are ok.

As it currently stands, date sorting is the default and there are no
links to the thread view, which would otherwise look empty. We can
surely build a non-JS thread view as well, I'm just not sure if it's
worth the effort.

>>> * The old monthly interface /list/yyyy-mm/msg*php is not really
>>> necessary to keep, *except* that we need the existing URLs to redirect
>>> to the corresponding new message page. I think we should be able to
>>> create a database of URL redirects from the old site, using the
>>> Message-Id URL style. So each message accessed using the old URL style
>>> would require two redirects, but I don't think this is a problem. Do
>>> you agree?
>>
>> Sure. I was just hoping there was an even easier way (rescritct to month,
>> order by uid limit 1 offset X). I guess it wouldn't be hard to write a
>> script that populates a backward compatibility table. No need for double
>> redirects, it'd be just a matter of adding a JOIN or two to the query.
>
> Once we go into production on this, we'll need to do some serious
> thinking about the caching issues. And in any such scenario we should
> very much avoid serving up the same content under different URLs,
> since it'll blow away cache space for no reason - it's much better to
> throw a redirct.

Yes, that was my point. A single redirect to the only URL for the message.

>>> * We're using Subversion to keep the current code. Is your code
>>> version-controlled? We'd need to import your code there, I'm afraid.
>>
>> I do have a local svn repository. Given it's just a PoC that is going to be
>> rewritten I don't think it should live in the official repo, but if you
>> think id does, I'll be glad to switch.
>
> Note that the plan is to switch pgweb to git as well. So if you just
> want to push the stuff up during development so people can look at it,
> register for a repository at git.postgresql.org - or just set one up
> at github which is even easier.

The only reason why I used svn is that git support in netbeans is rather
poor, or at least that was my impression. I think it won't be a problem
to move to git, I probably just need some directions ;)

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-31 17:51:10
Message-ID: 9837222c1001310951r1a83db01i1db35cc8e2cfc13f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sun, Jan 31, 2010 at 15:09, Matteo Beccati <php(at)beccati(dot)com> wrote:
> On 31/01/2010 13:45, Magnus Hagander wrote:
>>
>> On Sat, Jan 30, 2010 at 22:43, Matteo Beccati<php(at)beccati(dot)com>  wrote:
>>>
>>> On 30/01/2010 17:54, Alvaro Herrera wrote:
>>>>
>>>> * While I don't personally care, some are going to insist that the site
>>>> works with Javascript disabled.  I didn't try but from your description
>>>> it doesn't seem like it would.  Is this easily fixable?
>>>
>>> Date sorting works nicely even without JS, while thread sorting doesn't
>>> at
>>> all. I've just updated the PoC so that thread sorting is not available
>>> when
>>> JS is not available, while it still is the default otherwise. Hopefully
>>> that's enough to keep JS haters happy.
>>
>> I haven't looked at how it actually works, but the general requirement
>> is that it has to *work* without JS. It doesn't have to work *as
>> well*. That means serving up a page with zero contents, or a page that
>> you can't navigate, is not acceptable. Requiring more clicks to get
>> around the navigation and things like that, are ok.
>
> As it currently stands, date sorting is the default and there are no links
> to the thread view, which would otherwise look empty. We can surely build a
> non-JS thread view as well, I'm just not sure if it's worth the effort.

Hmm. I personally think we need some level of thread support for
non-JS as well, that's at least not *too* much of a step backwards
from what we have now. But others may have other thoughts about that?

>>>> * We're using Subversion to keep the current code.  Is your code
>>>> version-controlled?  We'd need to import your code there, I'm afraid.
>>>
>>> I do have a local svn repository. Given it's just a PoC that is going to
>>> be
>>> rewritten I don't think it should live in the official repo, but if you
>>> think id does, I'll be glad to switch.
>>
>> Note that the plan is to switch pgweb to git as well. So if you just
>> want to push the stuff up during development so people can look at it,
>> register for a repository at git.postgresql.org - or just set one up
>> at github which is even easier.
>
> The only reason why I used svn is that git support in netbeans is rather
> poor, or at least that was my impression. I think it won't be a problem to
> move to git, I probably just need some directions ;)

:-)

Well, it doesn't matter what type of repo it's in at this point, only
once it goes into production. The reason I suggested git at this point
is that we (the postgresql project) do provide git hosting at
git.postgresql.org, but we don't provide subversion anywhere. And I'm
certainly not going to suggest you use pgfoundry and cvs....

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 02:27:24
Message-ID: 20100201022724.GA2093@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Matteo Beccati wrote:

> Incidentally, I've just found out that the mailing lists are
> dropping some messages. According to my qmail logs the AOX account
> never received Joe's message yesterday, nor quite a few others:
>
> M156252, M156259, M156262, M156273, M156275
>
> and I've verified that it also has happened before. I don't know
> why, but I'm pretty sure that my MTA was contacted only once for
> those messages, while normally I get two connections (my own address
> + aox address).

Hmm, I see it here:
http://archives.postgresql.org/message-id/4B64A238.1050307%40joeconway.com
Maybe it was just delayed?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Matteo Beccati <php(at)beccati(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 08:06:14
Message-ID: 4B668B76.4050201@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 01/02/2010 03:27, Alvaro Herrera wrote:
> Matteo Beccati wrote:
>
>> Incidentally, I've just found out that the mailing lists are
>> dropping some messages. According to my qmail logs the AOX account
>> never received Joe's message yesterday, nor quite a few others:
>>
>> M156252, M156259, M156262, M156273, M156275
>>
>> and I've verified that it also has happened before. I don't know
>> why, but I'm pretty sure that my MTA was contacted only once for
>> those messages, while normally I get two connections (my own address
>> + aox address).
>
> Hmm, I see it here:
> http://archives.postgresql.org/message-id/4B64A238.1050307%40joeconway.com
> Maybe it was just delayed?

But not here:

http://archives.beccati.org/message-id/4B64A238.1050307%40joeconway.com

Anyway, I guess that on production we'll have a better way to inject
emails into Archiveopteryx rather than relying on a email subscription,
which seems a bit fragile. It's been ages since I last set up majordomo,
but I guess there should be a way to also pipe outgoing messages through
a script that performs the delivery to AOX.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 09:26:04
Message-ID: 9837222c1002010126l283a7f3ane276273acfa06cd4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/2/1 Matteo Beccati <php(at)beccati(dot)com>:
> On 01/02/2010 03:27, Alvaro Herrera wrote:
>>
>> Matteo Beccati wrote:
>>
>>> Incidentally, I've just found out that the mailing lists are
>>> dropping some messages. According to my qmail logs the AOX account
>>> never received Joe's message yesterday, nor quite a few others:
>>>
>>> M156252, M156259, M156262, M156273, M156275
>>>
>>> and I've verified that it also has happened before. I don't know
>>> why, but I'm pretty sure that my MTA was contacted only once for
>>> those messages, while normally I get two connections (my own address
>>> + aox address).
>>
>> Hmm, I see it here:
>> http://archives.postgresql.org/message-id/4B64A238.1050307%40joeconway.com
>> Maybe it was just delayed?
>
> But not here:
>
> http://archives.beccati.org/message-id/4B64A238.1050307%40joeconway.com
>
> Anyway, I guess that on production we'll have a better way to inject emails into Archiveopteryx rather than relying on a email subscription, which seems a bit fragile. It's been ages since I last set up majordomo, but I guess there should be a way to also pipe outgoing messages through a script that performs the delivery to AOX.

Does the MBOX importer support incremental loading? Because majordomo
spits out MBOX files for us already.

One option could be to use SMTP with a subscription as the primary way
(and we could set up a dedicated relaying from the mailserver for this
of course, so it's not subject to graylisting or anything like that),
and then daily or so load the MBOX files to cover anything that was
lost?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 13:36:21
Message-ID: 4B66D8D5.6070309@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 01/02/2010 10:26, Magnus Hagander wrote:
> Does the MBOX importer support incremental loading? Because majordomo
> spits out MBOX files for us already.

Unfortunately the aoximport shell command doesn't support incremental
loading.

> One option could be to use SMTP with a subscription as the primary way
> (and we could set up a dedicated relaying from the mailserver for this
> of course, so it's not subject to graylisting or anything like that),
> and then daily or so load the MBOX files to cover anything that was
> lost?

I guess we could write a script that parses the mbox and adds whatever
is missing, as long as we keep it as a last resort if we can't make the
primary delivery a fail proof.

My main concern is that we'd need to overcomplicate the thread detection
algorithm so that it better deals with delayed messages: as it currently
works, the replies to a missing message get linked to the
"grand-parent". Injecting the missing message afterwards will put it at
the same level as its replies. If it happens only once in a while I
guess we can live with it, but definitely not if it happens tens of
times a day.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 14:03:52
Message-ID: 9837222c1002010603u5b51a22dqfb67799dc64241e4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/2/1 Matteo Beccati <php(at)beccati(dot)com>:
> On 01/02/2010 10:26, Magnus Hagander wrote:
>>
>> Does the MBOX importer support incremental loading? Because majordomo
>> spits out MBOX files for us already.
>
> Unfortunately the aoximport shell command doesn't support incremental loading.
>
>> One option could be to use SMTP with a subscription as the primary way
>> (and we could set up a dedicated relaying from the mailserver for this
>> of course, so it's not subject to graylisting or anything like that),
>> and then daily or so load the MBOX files to cover anything that was
>> lost?
>
> I guess we could write a script that parses the mbox and adds whatever is missing, as long as we keep it as a last resort if we can't make the primary delivery a fail proof.
>
> My main concern is that we'd need to overcomplicate the thread detection algorithm so that it better deals with delayed messages: as it currently works, the replies to a missing message get linked to the "grand-parent". Injecting the missing message afterwards will put it at the same level as its replies. If it happens only once in a while I guess we can live with it, but definitely not if it happens tens of times a day.

That can potentially be a problem.

Consider the case where message A it sent. Mesasge B is a response to
A, and message C is a response to B. Now assume B is held for
moderation (because the poser is not on the list, or because it trips
some other thing), then message C will definitely arrive before
message B. Is that going to cause problems with this method?

Another case where the same thing will happen is if message delivery
of B gets for example graylisted, or is just slow from sender B, but
gets quickly delivered to the author of message A (because of a direct
CC). In this case, the author of message A may respond to it (making
message D),and this will again arrive before message B because author
A is not graylisted.

So the system definitely needs to deal with out-of-order delivery.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 14:10:14
Message-ID: 4B66E0C6.5050109@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 01/02/2010 15:03, Magnus Hagander wrote:
> 2010/2/1 Matteo Beccati<php(at)beccati(dot)com>:
>> My main concern is that we'd need to overcomplicate the thread detection algorithm so that it better deals with delayed messages: as it currently works, the replies to a missing message get linked to the "grand-parent". Injecting the missing message afterwards will put it at the same level as its replies. If it happens only once in a while I guess we can live with it, but definitely not if it happens tens of times a day.
>
> That can potentially be a problem.
>
> Consider the case where message A it sent. Mesasge B is a response to
> A, and message C is a response to B. Now assume B is held for
> moderation (because the poser is not on the list, or because it trips
> some other thing), then message C will definitely arrive before
> message B. Is that going to cause problems with this method?
>
> Another case where the same thing will happen is if message delivery
> of B gets for example graylisted, or is just slow from sender B, but
> gets quickly delivered to the author of message A (because of a direct
> CC). In this case, the author of message A may respond to it (making
> message D),and this will again arrive before message B because author
> A is not graylisted.
>
> So the system definitely needs to deal with out-of-order delivery.

Hmm, it looks like I didn't factor in direct CCs when thinking about
potential problems with the simplified algorithm. Thanks for raising that.

I'll be out of town for a few days, but I will see what I can do when I
get back.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 14:16:44
Message-ID: 9837222c1002010616t15ce8eds39a1b1851c3a4eee@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/2/1 Matteo Beccati <php(at)beccati(dot)com>:
> On 01/02/2010 15:03, Magnus Hagander wrote:
>>
>> 2010/2/1 Matteo Beccati<php(at)beccati(dot)com>:
>>>
>>> My main concern is that we'd need to overcomplicate the thread detection algorithm so that it better deals with delayed messages: as it currently works, the replies to a missing message get linked to the "grand-parent". Injecting the missing message afterwards will put it at the same level as its replies. If it happens only once in a while I guess we can live with it, but definitely not if it happens tens of times a day.
>>
>> That can potentially be a problem.
>>
>> Consider the case where message A it sent. Mesasge B is a response to
>> A, and message C is a response to B. Now assume B is held for
>> moderation (because the poser is not on the list, or because it trips
>> some other thing), then message C will definitely arrive before
>> message B. Is that going to cause problems with this method?
>>
>> Another case where the same thing will happen is if message delivery
>> of B gets for example graylisted, or is just slow from sender B, but
>> gets quickly delivered to the author of message A (because of a direct
>> CC). In this case, the author of message A may respond to it (making
>> message D),and this will again arrive before message B because author
>> A is not graylisted.
>>
>> So the system definitely needs to deal with out-of-order delivery.
>
> Hmm, it looks like I didn't factor in direct CCs when thinking about potential problems with the simplified algorithm. Thanks for raising that.

That is a very common scenario. And even without that, email taking
different time to get delivered to majordomo is not at all uncomoon.

> I'll be out of town for a few days, but I will see what I can do when I get back.

No rush.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 16:28:15
Message-ID: 21249.1265041695@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Matteo Beccati <php(at)beccati(dot)com> writes:
> My main concern is that we'd need to overcomplicate the thread detection
> algorithm so that it better deals with delayed messages: as it currently
> works, the replies to a missing message get linked to the
> "grand-parent". Injecting the missing message afterwards will put it at
> the same level as its replies. If it happens only once in a while I
> guess we can live with it, but definitely not if it happens tens of
> times a day.

That's quite common unfortunately --- I think you're going to need to
deal with the case. Even getting a direct feed from the mail relays
wouldn't avoid it completely: consider cases like

* A sends a message
* B replies, cc'ing A and the list
* B's reply to list is delayed by greylisting
* A replies to B's reply (cc'ing list)
* A's reply goes through immediately
* B's reply shows up a bit later

That happens pretty frequently IME.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 16:31:21
Message-ID: 603c8f071002010831s17d2e25dwf81809942d098f5e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, Feb 1, 2010 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Matteo Beccati <php(at)beccati(dot)com> writes:
>> My main concern is that we'd need to overcomplicate the thread detection
>> algorithm so that it better deals with delayed messages: as it currently
>> works, the replies to a missing message get linked to the
>> "grand-parent". Injecting the missing message afterwards will put it at
>> the same level as its replies. If it happens only once in a while I
>> guess we can live with it, but definitely not if it happens tens of
>> times a day.
>
> That's quite common unfortunately --- I think you're going to need to
> deal with the case.  Even getting a direct feed from the mail relays
> wouldn't avoid it completely: consider cases like
>
>        * A sends a message
>        * B replies, cc'ing A and the list
>        * B's reply to list is delayed by greylisting
>        * A replies to B's reply (cc'ing list)
>        * A's reply goes through immediately
>        * B's reply shows up a bit later
>
> That happens pretty frequently IME.

Yeah - and sometimes the delay can be DAYS.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 16:41:57
Message-ID: 21551.1265042517@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Feb 1, 2010 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> * A sends a message
>> * B replies, cc'ing A and the list
>> * B's reply to list is delayed by greylisting
>> * A replies to B's reply (cc'ing list)
>> * A's reply goes through immediately
>> * B's reply shows up a bit later
>>
>> That happens pretty frequently IME.

> Yeah - and sometimes the delay can be DAYS.

Greylisting wouldn't explain a delay of more than an hour or so.
OTOH, if B's reply got held for moderation for some reason, then
yeah it could be days :-(. But in that case the rest of the list
didn't see it in real-time either, so having it show up out of
"logical" sequence in the archive doesn't seem like a terrible
reflection of reality. I'm just concerned about the threading not
being sensitive to skews on the order of a few minutes --- those
are extremely common.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 16:43:31
Message-ID: 603c8f071002010843q74e1e201l3049be5dead7f531@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, Feb 1, 2010 at 11:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Feb 1, 2010 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>        * A sends a message
>>>        * B replies, cc'ing A and the list
>>>        * B's reply to list is delayed by greylisting
>>>        * A replies to B's reply (cc'ing list)
>>>        * A's reply goes through immediately
>>>        * B's reply shows up a bit later
>>>
>>> That happens pretty frequently IME.
>
>> Yeah - and sometimes the delay can be DAYS.
>
> Greylisting wouldn't explain a delay of more than an hour or so.
> OTOH, if B's reply got held for moderation for some reason, then
> yeah it could be days :-(.  But in that case the rest of the list
> didn't see it in real-time either, so having it show up out of
> "logical" sequence in the archive doesn't seem like a terrible
> reflection of reality.  I'm just concerned about the threading not
> being sensitive to skews on the order of a few minutes --- those
> are extremely common.

I not infrequently receive messages out of sequence by time periods
well in excess of a few minutes.

Don't know why, but I do.

...Robert


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matteo Beccati <php(at)beccati(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-01 18:50:22
Message-ID: 9837222c1002011050p30c58275m285927b1988da25e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

2010/2/1 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Feb 1, 2010 at 11:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> On Mon, Feb 1, 2010 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>        * A sends a message
>>>>        * B replies, cc'ing A and the list
>>>>        * B's reply to list is delayed by greylisting
>>>>        * A replies to B's reply (cc'ing list)
>>>>        * A's reply goes through immediately
>>>>        * B's reply shows up a bit later
>>>>
>>>> That happens pretty frequently IME.
>>
>>> Yeah - and sometimes the delay can be DAYS.
>>
>> Greylisting wouldn't explain a delay of more than an hour or so.
>> OTOH, if B's reply got held for moderation for some reason, then
>> yeah it could be days :-(.  But in that case the rest of the list
>> didn't see it in real-time either, so having it show up out of
>> "logical" sequence in the archive doesn't seem like a terrible
>> reflection of reality.  I'm just concerned about the threading not
>> being sensitive to skews on the order of a few minutes --- those
>> are extremely common.
>
> I not infrequently receive messages out of sequence by time periods
> well in excess of a few minutes.
>
> Don't know why, but I do.

Quite often, it's stuck in the moderation queue.

Not quite as often, but still fairly frequently, it's stuck somewhere
in the hub.org relaying/antispam blackbox.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-02-13 12:34:56
Message-ID: 4B769C70.8060106@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 01/02/2010 17:28, Tom Lane wrote:
> Matteo Beccati<php(at)beccati(dot)com> writes:
>> My main concern is that we'd need to overcomplicate the thread detection
>> algorithm so that it better deals with delayed messages: as it currently
>> works, the replies to a missing message get linked to the
>> "grand-parent". Injecting the missing message afterwards will put it at
>> the same level as its replies. If it happens only once in a while I
>> guess we can live with it, but definitely not if it happens tens of
>> times a day.
>
> That's quite common unfortunately --- I think you're going to need to
> deal with the case. Even getting a direct feed from the mail relays
> wouldn't avoid it completely: consider cases like
>
> * A sends a message
> * B replies, cc'ing A and the list
> * B's reply to list is delayed by greylisting
> * A replies to B's reply (cc'ing list)
> * A's reply goes through immediately
> * B's reply shows up a bit later
>
> That happens pretty frequently IME.

I've improved the threading algorithm by keeping an ordered backlog of
unresolved references, i.e. when a message arrives:

1. Search for a parent message using:

1a. In-Reply-To header. If referenced message is not found insert its
Message-Id to the backlog table with position 0

1b. References header. For each missing referenced message insert its
Message-Id to the backlog table with position N

1c. MS Exchange Thread-Index and Thread-Topic headers

2. Message is stored along with its parent ID, if any.

3. Compare the Message-Id header with the backlog table. Update the
parent field of any referencing message and clean up positions >= n in
the references table.

Now I just need some time to do a final clean up and I'd be ready to
publish the code, which hopefully will be clearer than my words ;)

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: pgsql-www <pgsql-www(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Mailing list archives (was Re: [HACKERS] mailing list archiver chewing patches)
Date: 2010-05-01 18:16:13
Message-ID: 4BDC6FED.7010100@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Hi everyone,

(moving the old -hackers thread to pgsql-www as it's more relevant)

I'm sorry but since I last posted an update on -hackers, I haven't had
time to keep working on the proof of concept of the new mailing list
archive.

For those not involved in the previous thread it is a small app built on
top of archiveopterix (AOX), that has full thread support and can
display the archives by using some fancy JS but also being search engine
and non-JS friendly.

The logic is entirely on the database (will apart AOX itself) and the UI
was built using PHP and symfony, thus it will need to be ported to
python/django at some point.

I've been approached by Alvaro and we were finally able to discuss about
the project. I think we've been able to make some progress understating
how we can integrate the new archive with majordomo. He was kind enough
to volunteer for reviewing the current code, and I will shortly provide
all the information required for him to begin.

Now, my biggest concern is... will someone step in for porting the app
to the new website framework?

I just subscribed to -www, so I don't know what's the status of the new
website and I don't know how urgent the overhaul of the archives is, but
I'd really love to have some feedback on this before Alvaro and I put
some more effort in it.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Re: Mailing list archives (was Re: [HACKERS] mailing list archiver chewing patches)
Date: 2010-05-01 18:19:22
Message-ID: r2p9837222c1005011119z79935d35h4ea3f53074d74f7d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sat, May 1, 2010 at 20:16, Matteo Beccati <php(at)beccati(dot)com> wrote:
> Hi everyone,
>
> (moving the old -hackers thread to pgsql-www as it's more relevant)
>
> I'm sorry but since I last posted an update on -hackers, I haven't had time
> to keep working on the proof of concept of the new mailing list archive.
>
> For those not involved in the previous thread it is a small app built on top
> of archiveopterix (AOX), that has full thread support and can display the
> archives by using some fancy JS but also being search engine and non-JS
> friendly.
>
> The logic is entirely on the database (will apart AOX itself) and the UI was
> built using PHP and symfony, thus it will need to be ported to python/django
> at some point.
>
> I've been approached by Alvaro and we were finally able to discuss about the
> project. I think we've been able to make some progress understating how we
> can integrate the new archive with majordomo. He was kind enough to
> volunteer for reviewing the current code, and I will shortly provide all the
> information required for him to begin.
>
> Now, my biggest concern is... will someone step in for porting the app to
> the new website framework?

Yes. I can definitely do this, if nobody else does first.

Unfortunately it's blocked behind some other work (both website and
infrastructure - we need something to put it on..), but I think once
the rest of the framework is done, and as you say the logic is in AOX,
the porting part is not likely to take much time.

It would be good if it's already in the web *layout* framework before
that happens though, because that's the part that would take a really
long time for me. Meaning using the same HTML and CSS framework. And
that will be 100% the same regardless of which execution framework is
used.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Selena Deckelmann <selenamarie(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Matteo Beccati <php(at)beccati(dot)com>
Cc: pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives (was Re: [HACKERS] mailing list archiver chewing patches)
Date: 2010-05-03 16:07:25
Message-ID: m2k2b5e566d1005030907od7d87341g3e4b7ded5f88bc62@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sat, May 1, 2010 at 11:19 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> It would be good if it's already in the web *layout* framework before
> that happens though, because that's the part that would take a really
> long time for me. Meaning using the same HTML and CSS framework. And
> that will be 100% the same regardless of which execution framework is
> used.

I can assist with this task.

Matteo - can you point me to relevant threads or code to download to
generate the output? Or just some example output?

-selena

--
http://chesnok.com/daily - me
http://endpoint.com - work


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Selena Deckelmann <selenamarie(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Matteo Beccati <php(at)beccati(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-05-03 16:32:11
Message-ID: 87d3xd6j5g.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Selena Deckelmann <selenamarie(at)gmail(dot)com> writes:
> Matteo - can you point me to relevant threads or code to download to
> generate the output? Or just some example output?

http://archives.beccati.org/
http://archives.beccati.org/pgsql-hackers/message/266795

For a random specific mail example.

Regards,
--
dim

Not yet subscribed to the www mailing list.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Selena Deckelmann <selenamarie(at)gmail(dot)com>, Matteo Beccati <php(at)beccati(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-05-03 16:35:32
Message-ID: l2t9837222c1005030935tbc4de475id0595e80e8bd1b73@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, May 3, 2010 at 18:32, Dimitri Fontaine <dfontaine(at)hi-media(dot)com> wrote:
> Selena Deckelmann <selenamarie(at)gmail(dot)com> writes:
>> Matteo - can you point me to relevant threads or code to download to
>> generate the output? Or just some example output?
>
>  http://archives.beccati.org/
>  http://archives.beccati.org/pgsql-hackers/message/266795
>
> For a random specific mail example.

Whoa. Clearly I have to withdraw my previous comment. I missed the
version that *was* integrated with the website looks. There are still
some things to add though (for example, no link back to the thread
index from that details page), but it's further along than I thought.
My apologies.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, David Fetter <david(at)fetter(dot)org>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-05-03 16:55:39
Message-ID: 4BDF000B.4070805@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 03/05/2010 18:35, Magnus Hagander wrote:
> On Mon, May 3, 2010 at 18:32, Dimitri Fontaine<dfontaine(at)hi-media(dot)com> wrote:
>> Selena Deckelmann<selenamarie(at)gmail(dot)com> writes:
>>> Matteo - can you point me to relevant threads or code to download to
>>> generate the output? Or just some example output?
>>
>> http://archives.beccati.org/
>> http://archives.beccati.org/pgsql-hackers/message/266795
>>
>> For a random specific mail example.
>
> Whoa. Clearly I have to withdraw my previous comment. I missed the
> version that *was* integrated with the website looks. There are still
> some things to add though (for example, no link back to the thread
> index from that details page), but it's further along than I thought.
> My apologies.

No worries!

The link is there (at the bottom), but it currently points to the non-JS
threaded view. The whole HTML/JS still needs some love of course.

And BTW, retrieval by message ID works too:

http://archives.beccati.org/message-id/87aasljpog.fsf@hi-media-techno.com

to get a recent random message from Dimitri to -hackers.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: pgsql-www(at)postgresql(dot)org
Subject: Re: Re: Mailing list archives (was Re: [HACKERS] mailing list archiver chewing patches)
Date: 2010-05-23 16:33:34
Message-ID: 4BF958DE.8020808@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 03/05/2010 18:07, Selena Deckelmann wrote:
> On Sat, May 1, 2010 at 11:19 AM, Magnus Hagander<magnus(at)hagander(dot)net> wrote:
>
>> It would be good if it's already in the web *layout* framework before
>> that happens though, because that's the part that would take a really
>> long time for me. Meaning using the same HTML and CSS framework. And
>> that will be 100% the same regardless of which execution framework is
>> used.
>
> I can assist with this task.
>
> Matteo - can you point me to relevant threads or code to download to
> generate the output? Or just some example output?

I think I did reply with some info some time ago. Any news on this?

I don't mean to pressure or something, just trying to understand if I
can help in any way :)

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Matteo Beccati <php(at)beccati(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>
Cc: pgsql-www <pgsql-www(at)postgresql(dot)org>
Subject: Re: Re: Mailing list archives (was Re: [HACKERS] mailing list archiver chewing patches)
Date: 2010-06-07 14:35:48
Message-ID: AANLkTimsAB_anmgKN_LEwMiKhJOQSAP251X5yp1Vvr7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sun, May 23, 2010 at 18:33, Matteo Beccati <php(at)beccati(dot)com> wrote:
> On 03/05/2010 18:07, Selena Deckelmann wrote:
>>
>> On Sat, May 1, 2010 at 11:19 AM, Magnus Hagander<magnus(at)hagander(dot)net>
>>  wrote:
>>
>>> It would be good if it's already in the web *layout* framework before
>>> that happens though, because that's the part that would take a really
>>> long time for me. Meaning using the same HTML and CSS framework. And
>>> that will be 100% the same regardless of which execution framework is
>>> used.
>>
>> I can assist with this task.
>>
>> Matteo - can you point me to relevant threads or code to download to
>> generate the output? Or just some example output?
>
> I think I did reply with some info some time ago. Any news on this?

I *think* this one was directed at Selena and not me, correct? Selena - ping?

If it was at me, apologies for not getting back sooner, and can you
please outline exactly what you mean? ;) In "my book" it's still
backlogged behind the new website infrastructure stuff, which has been
delayed far beyond the point of embarassment, but certainly not
abandoned...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>
Subject: Re: Re: Mailing list archives (was Re: [HACKERS] mailing list archiver chewing patches)
Date: 2010-06-07 16:21:37
Message-ID: 4C0D1C91.9010501@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Hi Magnus,

>>> Matteo - can you point me to relevant threads or code to download to
>>> generate the output? Or just some example output?
>>
>> I think I did reply with some info some time ago. Any news on this?
>
> I *think* this one was directed at Selena and not me, correct? Selena - ping?
>
> If it was at me, apologies for not getting back sooner, and can you
> please outline exactly what you mean? ;) In "my book" it's still
> backlogged behind the new website infrastructure stuff, which has been
> delayed far beyond the point of embarassment, but certainly not
> abandoned...

Yes, this was mainly for Selena, just to touch base and see if there has
been any progress or if she needed further help from me. Sorry about the
misunderstanding.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: David Fetter <david(at)fetter(dot)org>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-11-21 20:06:23
Message-ID: 20101121200623.GG7424@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Mon, May 03, 2010 at 06:55:39PM +0200, Matteo Beccati wrote:
> On 03/05/2010 18:35, Magnus Hagander wrote:
> >On Mon, May 3, 2010 at 18:32, Dimitri Fontaine<dfontaine(at)hi-media(dot)com> wrote:
> >>Selena Deckelmann<selenamarie(at)gmail(dot)com> writes:
> >>>Matteo - can you point me to relevant threads or code to download to
> >>>generate the output? Or just some example output?
> >>
> >> http://archives.beccati.org/

This looks super great!

What do we need in order to get this up and running on the official
site?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <tim(dot)bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-11-22 00:39:38
Message-ID: 1290386280-sup-6091@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Excerpts from David Fetter's message of dom nov 21 17:06:23 -0300 2010:
> On Mon, May 03, 2010 at 06:55:39PM +0200, Matteo Beccati wrote:

> > >> http://archives.beccati.org/
>
> This looks super great!

Yeah

> What do we need in order to get this up and running on the official
> site?

At this point I think it should be reworked for the new site
infrastructure, i.e. ported to Django. We could air it soon after the
new pgweb is up and running.

Also there was a technical problem with an email that was working
correctly (no text displayed). Matteo was going to look into that
AFAIK.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Matteo Beccati <php(at)beccati(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <tim(dot)bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-11-22 04:48:14
Message-ID: 4CE9F60E.1080404@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

Hi,

On 22/11/2010 01:39, Alvaro Herrera wrote:
> Excerpts from David Fetter's message of dom nov 21 17:06:23 -0300 2010:
>> On Mon, May 03, 2010 at 06:55:39PM +0200, Matteo Beccati wrote:
>
>>>>> http://archives.beccati.org/
>>
>> This looks super great!
>
> Yeah

Thanks guys. I appreciate your support!

>> What do we need in order to get this up and running on the official
>> site?
>
> At this point I think it should be reworked for the new site
> infrastructure, i.e. ported to Django. We could air it soon after the
> new pgweb is up and running.
>
> Also there was a technical problem with an email that was working
> correctly (no text displayed). Matteo was going to look into that
> AFAIK.

I've looked into it and the issue is that I'm displaying only part "1",
whereas that email had a multipart/alternative format with parts "1.1"
and "1.2". When we get closer to getting the code to production, we can
think about fixing it. For now I've saved a reference to the issue in my
own bug tracker -- I will email credentials separately.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


From: Rob Wultsch <wultsch(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-11-22 05:08:59
Message-ID: AANLkTima4c56M9JOrje8cU-troS4tbfcncsp9b6N-xh4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sun, Nov 21, 2010 at 1:06 PM, David Fetter <david(at)fetter(dot)org> wrote:
> On Mon, May 03, 2010 at 06:55:39PM +0200, Matteo Beccati wrote:
>> On 03/05/2010 18:35, Magnus Hagander wrote:
>> >On Mon, May 3, 2010 at 18:32, Dimitri Fontaine<dfontaine(at)hi-media(dot)com>  wrote:
>> >>Selena Deckelmann<selenamarie(at)gmail(dot)com>  writes:
>> >>>Matteo - can you point me to relevant threads or code to download to
>> >>>generate the output? Or just some example output?
>> >>
>> >>  http://archives.beccati.org/
>
> This looks super great!
>

And unlike the current version, is not broken for IE 8.
--
Rob Wultsch
wultsch(at)gmail(dot)com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, David Fetter <david(at)fetter(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <tim(dot)bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-11-24 13:33:20
Message-ID: AANLkTikc-2hrzxP7yXKj4B2fcOkUF_N8gXSyi8Wx=FRt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On Sun, Nov 21, 2010 at 11:48 PM, Matteo Beccati <php(at)beccati(dot)com> wrote:
> I've looked into it and the issue is that I'm displaying only part "1",
> whereas that email had a multipart/alternative format with parts "1.1"
> and "1.2". When we get closer to getting the code to production, we can
> think about fixing it.

When will we get closer to getting the code to production?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Matteo Beccati <php(at)beccati(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, David Fetter <david(at)fetter(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, pgsql-www <pgsql-www(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <tim(dot)bunce(at)pobox(dot)com>
Subject: Re: Re: Mailing list archives
Date: 2010-11-24 14:51:55
Message-ID: 4CED268B.3080109@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-www

On 24/11/2010 14:33, Robert Haas wrote:
> On Sun, Nov 21, 2010 at 11:48 PM, Matteo Beccati <php(at)beccati(dot)com> wrote:
>> I've looked into it and the issue is that I'm displaying only part "1",
>> whereas that email had a multipart/alternative format with parts "1.1"
>> and "1.2". When we get closer to getting the code to production, we can
>> think about fixing it.
>
> When will we get closer to getting the code to production?

What you see is a proof of concept, based on archiveopterix plus a bunch
of custom tables and triggers with a php/symfony frontend on top.

The web frontend will need to be ported to django. That of course won't
happen before the new django based website is ready. No idea about when
that will happen though. I'll be happy to dedicate some time for bug
fixing and improvements once we get closer to it.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/