[Systers-dev] GSoC 2012 project

priya iyer wordsofagirl at gmail.com
Thu Apr 5 08:46:46 PDT 2012


Um.. Danci, please indent and space your mails properly before sending? It
appears as if you have drafted the mail in your editor and then did
copy-paste.

It will provide better readability :)

- Priya

PS: I have sent you my GSoC proposal.

On Thu, Apr 5, 2012 at 9:08 PM, Danci Emanuel <danci_emanuel at yahoo.com>wrote:

>
>
>  Hello Robin!
>
> Firstly, I`m sorry for the confusing message, but I saw that the added
> links did not work properly only after posting the message. Here is the
> version with the working links:
> After doing some further reading I got some conclusions and proposals that
> I will post here. It`s possible that Priya might have stumbled upon these
> solutions, and if this is so, please let me know. For improving the search
> engine I found there possible solutions:
> >1. Integrating the htdig search engine with Mailman -> Link -
> http://www.openinfo.co.uk/mm/patches/444884/ - (This is only a possible
> solution, because I do not know exactly what searching capabilities this
> patch could provide).
> >2. Replacing pipermail with MHonArc - http://www.mhonarc.org/ - and
> building a custom search engine on top of it. As it says here -
> http://www.mhonarc.org/MHonArc/doc/faq/usage.html#searching - other
> open-source search engines have been previously used with MHonArc. One of
> the compatible search engine is Lucene -  http://lucene.apache.org/ -  and
> the advantage for using it is that last year I did some research related to
> open-source search engines because I needed to create a search engine for a
> library application that stores information (using MySQL databases) for
> over 25.000 books (I know it`s not a big number, but without it was working
> pretty slow because I made the application to work similarly to Google
> Instant). Moreover, the open-source search engines like Lucene, Solr or
> Sphinx  provide highly configurable options (here -
> http://stackoverflow.com/questions/1284083/choosing-a-stand-alone-full-text-search-server-sphinx-or-solr
> ,
>  here -
> http://stackoverflow.com/questions/737275/comparison-of-full-text-search-engine-lucene-sphinx-postgresql-mysql
> ,
> >> and here -
> http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/ - are
> some good comparisons between some of the most used open-source search
> engines).
> >3. Why not replace the pipermail archiver with a relational database
> (in this - http://marc.info/?q=about - article, at the paragraph 'About
> its present form' a similar idea is presented)? We could create a solid
> database design and thus we could sort the data by subject, by date or by
> any other field. Furthermore, we can use one of the open-source search
> engines and this way we could basically search for anything in the
> database. Currently I`m thinking about an idea to store an id number for
> every message that belong to a certain thread and when searching for a
> keyword in the database this way would be very easy to return the whole
> conversation (just search for all the messages that have the same id as the
> message in which the word was found).
> >For the dynamic lists project I would like to know if the current system
> used by Sisters is stable and if my only task would be to fully integrate
> it with Mailman 3.0 or if my task will be to come up with a new proposal
> and implementation of the system. If my task consists of only integrating
> the existing system with Mailman 3.0 all what I will have to do will be to
> understand well how the code works, the custom changes made by you and
> finding the way to successfully integrate it with Mailman 3.0?
> >For now I would like to know your opinion regarding the presented ideas,
> if they are good enough to be used and if I should start writing my
> application and focus only on extending the search capabilities and
> implementing the dynamic list with Mailman 3.0, leaving the UI part
> aside. I also thought about the possibilities of creating a nice UI for the
> Mailman using Django, but for now, as I said I would like to know if the
> presented ideas are good enough to be accepted and if I should focus only
> on them. Your advice it`s very valuable because you can evaluate the work
> volume much better than me and I do not want to end up in the situation to
> over-promise and  under-deliver.
> >In regard to the timeline, I could definitely do some work between the
> bonding period in order to make up for the final exams period so that I
> would be fine at the mid point. Furthermore, I do not have any problems
> with re-planing if that will be necessary.
> >I am looking forward to hearing your advice!
> >>
> >>
> >>
> >>Secondly, thanks for the clarifications you have provided. I was asking
> all these questions because I have started the application process pretty
> late this year and I wanted to make sure I understand the system and
> the requirements as clearly as possible. In regard to the possibility of
> searching for information in a large database, in a comment from one of the
> posts from this link -
> http://stackoverflow.com/questions/737275/comparison-of-full-text-search-engine-lucene-sphinx-postgresql-mysql -
> one of the users talks about performance saying that if the index is
> correctly build, an advanced query over millions of record can be done in
> just a couple of milliseconds (one of the key factors being the data
> structures that they use - ternary trees, binary trees, tries etc).
> Although I think it would have been a lot of fun and interesting to work
> at the search project, given the fact that there is very little time left
> for writing the application, I will only apply for the project
>  'Integrating Dynamic Sub-Lists with Mailman 3.0'. The question that I have
> in regard to this project is:
> 1. When integrating the dlists with Mailman 3.0 the postgreSQL database
> will have the same structure or does we have to make any changes to it? I
> am asking this question in order to know if I have to add a separate week
> in the timeline for working at the database.
> Here is a sketch for the timeline:
> Week 1, 2, 3, 4  [May 21 - June 17] - Read the documentation and the code
> in order to get to know how Mailman works at a deeper level.
> Week 5, 6 [June 18 - July 1] - Read the documentation and the code for the
> changes that Systers have made for the dynamic sub-lists.
> Week 7 [July 2 - July 8] - Create the design for the changes that have to
> be implemented in the Mailman`s structure in order for the dynamic lists to
> properly work with Mailman 3.0
> Week 8, 9, 10, 11 [July 9 - August 5] - Implement the changes + submit the
> project for the mid-term evaluation.
> Week 12 [August 6 - August 12] - Additional testing and solving
> unexpected/related things that could come up.
> Week 13 [August 13 - August 20] - Final project documentation and wrapping
> the code for the final evaluation.
>
> Does it look fine to you? Should I change/add/remove something?
>
>
> Thank you,
> Emanuel Danci
>
> ________________________________
>  From: Robin Jeffries <robin at jeffries.org>
> To: Danci Emanuel <danci_emanuel at yahoo.com>
> Cc: "systers-dev+eligibility at systers.org" <
> systers-dev+eligibility at systers.org>
> Sent: Thursday, April 5, 2012 1:11 AM
> Subject: Re: [Systers-dev] GSoC 2012 project
>
> On Wed, Apr 4, 2012 at 2:19 PM, Danci Emanuel <danci_emanuel at yahoo.com
> >wrote:
>
> > Hello!
> >
> > After doing some further reading I got some conclusions and proposals
> that
> > I will post here. It`s possible that Priya might have stumbled upon these
> > solutions, and if this is so, please let me know. For improving the
> search
> > engine I found there possible solutions:
> > 1. Integrating the htdig search engine with Mailman -> Link (This is only
> > a possible solution, because I do not know exactly what searching
> > capabilities this patch could provide).
> >
>
> I didn't understand this
>
>
> > 2. Replacing pipermail with MHonArc and building a custom search engine
> on
> > top of it. As it says here other open-source search engines have been
> > previously used with MHonArc. One of the
>  compatible search
> > engines is Lucene and the advantage for using it is that last year I did
> > some research related to open-source search engines because I needed to
> > create a search engine for a library application that stores information
> > (using MySQL databases) for over 25.000 books (I know it`s not a big
> > number, but without it was working pretty slow because I made the
> > application to work similarly to Google Instant). Moreover, the
> open-source
> > search engines like Lucene, Solr or Sphinx  provide highly configurable
> > options (here, here and here are some good comparisons between some of
> the
> > most used open-source search engines).
> >
>
> You need to look at hyperkitty https://github.com/syst3mw0rm/HyperKitty,
> which is the core of the archive mailman plans to use .  I believe that one
> of the GSOC
>  students in 2010 (I don't think it was Priya, but that was a
> long time ago....) looked at Lucerne. Look at the student projects for
> 2010.  You might be able to add one of these to hyperkitty -- you would not
> need to commit to a specific one in your proposal, but talk about what you
> need to investigate to decide on one.
>
> 3. Why not replace the pipermail archiver with a relational database (in
> > this article, at the paragraph 'About its present form' a similar idea is
> > presented)? We could create a solid database design and thus we could
> sort
> > the data by subject, by date or by any other field. Furthermore, we can
> use
> > one of the open-source search engines and this way we could basically
> > search for anything in the database. Currently I`m thinking about an idea
> > to store an id number for every message that belong to a certain thread
> and
> > when searching for a keyword in the database
>  this way would be very easy to
> > return the whole conversation (just search for all the messages that have
> > the same id as the message in which the word was found).
> >
>
> Again, look at hyperkitty.  I believe that's what it does for the messages.
>
> I think that searching in a relational database, searching through full
> message text for individual words is going to be very slow.  You probably
> need an auxiliary data structure to help you out.
>
>
> > For the dynamic lists project I would like to know if the current system
> > used by Sisters is stable and if my only task would be to fully integrate
> > it with Mailman 3.0 or if my task will be to come up with a new proposal
> > and implementation of the system. If my task consists of only integrating
> > the existing system with Mailman 3.0 all what I will have to do will be
> to
> > understand well how the code works, the custom changes
>  made by you and
> > finding the way to successfully integrate it with Mailman 3.0?
> >
>
> I'm now confused -- you are talking about a completely different project,
> right?   Yes, the integration with mailman 3.0 is to take the existing
> system and make it work with mailman 3.0.  It is a smaller project
> (suitable for a junior student), but there is still a summer's worth of
> work here.
>
>
>
> > For now I would like to know your opinion regarding the presented ideas,
> > if they are good enough to be used and if I should start writing my
> > application and focus only on extending the search capabilities and
> > implementing the dynamic list with Mailman 3.0, leaving the UI part
> > aside. I also thought about the possibilities of creating a nice UI for
> the
> > Mailman using Django, but for now, as I said I would like to know if the
> > presented ideas are good enough to be accepted and if
>  I should focus only
> > on them. Your advice it`s very valuable because you can evaluate the work
> > volume much better than me and I do not want to end up in the situation
> to
> > over-promise and  under-deliver.
> > In regard to the timeline, I could definitely do some work between the
> > bonding period in order to make up for the final exams period so that I
> > would be fine at the mid point. Furthermore, I do not have any problems
> > with re-planing if that will be necessary.
> >
>
> If you are going to focus on the archiver, especially search, you will need
> to present some sort of UI, but I think your goal should be to have a very
> solid backend, and as much of a ui prototype as you can get done in the
> summer.   The hyperkitty project may have a student working for mailman
> working on some UI ideas, so you may be able to fit your UI into those
> ideas, or you might work on just the part of
>  the UI that relates to search
> (there is a lot more to an archiver UI than search -- check out some of the
> mock screenshots in hyperkitty to see where else this is going.)
>
> The archiver project is a big project for a first year undergraduate.  You
> seem to be willing to tackle an ambitious project, so this really depends
> on how likely you are to get "stuck".  If you can make regular progress
> with turnaround like I am giving you, perhaps slower, even if that progress
> is just to expose (and solve) one unexpected problem after another, we can
> make this into a successful project.  If that doesn't sound like fun, you
> may want to scale back to something more like the patches project or the
> porting to mailman3.0 project.
>
>
> I am looking forward to hearing your advice!
> >
> > Thank you,
> > Emanuel DANCI
> >
> >
> > ________________________________
> >  From: Robin Jeffries
>  <robin at jeffries.org>
> > To: Danci Emanuel <danci_emanuel at yahoo.com>
> > Cc: "systers-dev+eligibility at systers.org" <
> > systers-dev+eligibility at systers.org>
> > Sent: Wednesday, April 4, 2012 7:11 AM
> > Subject: Re: [Systers-dev] GSoC 2012 project
> >
> >
> > Some answers in line.  I think you are about to find your fit.
> >
> >
> >
> > On Tue, Apr 3, 2012 at 2:17 PM, Danci Emanuel <danci_emanuel at yahoo.com>
> > wrote:
> >
> > Hello Robin!
> >
>  >
> > >Thanks for the rapid response and I apologize for not responding right
> > away but I had the mid-term exams and I was a little busy with them.
> Also,
> > thanks for the provided guidelines in regard to which project to choose.
> > >Indeed, algorithms are a very interesting part of the computing world
> for
> > me, but I am also interested in learning other so-valuable practices
> > involved in the software development process.
> > > I  took a look at the Archive access project and I also read the
> details
> > about what the other students have done in order to improve Mailman
> archive
> > Access/Searching. Furthermore, I took a look at the ideas that Mairin
> Duffy
> > has for creating a richer web interface for the mailing lists. So far I
> > think that the ideas that she proposed with the mock-ups presented on her
> > website would be one of the best ways of tackling the problem,
>  because it
> > could solve three problems from just one shot:
> > >
> >
> > Good, we and Mailman are both interested in some of these ideas.
> >
> > First of all, by creating a web interface similar to the one that she
> > posted on her web site we could enhance the access experience and we also
> > could expose to the users numerous options and statistics that currently
> > are not available.
> > >Second of all, we could implement the dynamic sub-lists very easily this
> > way.
> > >
> >
> > I'm not sure what this sentence means.  You probably need to explain it
> > more
> >
> > Third of all, I think we could use the code the Priya Kuber has already
> > written and we could add some features to it in order to extend it`s
> > usability (e.g: as far I have seen from the rapid look that I took on her
> > code and on the description, it does not have the
>  capability of searching
> > for keywords contained in the body of the messages, and I think this
> would
> > be a nice feature to have).
> > >
> >
> > Yes, that's a good place to start.  Priya is still on this list (hi,
> > Priya) and I think we can arm twist her into giving you -- or any other
> > student chosen for this project -- some help getting started.
> >
> >
> > Do you think that this is do-able in 12 weeks? The only time-related
> > problem that could appear is the fact that during the first 3 weeks of
> June
> > I have to take the final exams and during that period I will have to work
> > at half-capacity, but I am sure that I can catch up along the way by
> > working extra-hours or during the weekends.
> > >
> >
> > If you have time to work during the bonding period (from the time you
> > are notified till  mid may), that could make up for it. And if you
>  really
> > can put in 3 weeks of somewhat high quality work (meaning that you
> > understand the work to be done and are ready to start coding on at least
> > part of it, so that you can be productive) at half time during your
> exams,
> > we could make this work.   We are willing to be flexible with students
> who
> > have some conflicts, but we have to notify Google about the work you have
> > done at the mid point and at the end, and you will have to think about
> > whether we will be able to honestly say you have done 6 weeks of
> > full-time-equivalent good work by July 9.
> >
> > I know that several students want to work on the archive project, so you
> > will need to be flexible (in case we have the resources to take more
> than 1
> > of you, you will have to replan), but you should write an application
> > assuming you are the only one working on archives.  I would pick a small
> >
>  number of Mairin's ideas as the core functionality you want to provide,
> > propose a time line for that, and include the work it will take to hook
> > those ideas up to the necessary backend (for which you should be looking
> at
> > hyperkitty -- mentioned in another thread -- and at Priya's work.
> >
> > Remember, if systers is going to support you in the archive project, we
> > want your proposal to include how you will support dynamic sublists, and
> > also that our main concern is the search aspect -- how do you find
> > something that was posted 2 years ago? How do you find the entire
> > conversation that it was posted in?  I strongly suggest you find a large
> > mailman list (one of the mailman developers lists would be ideal) and try
> > to find something you think would have been discussed there using the
> > current archives.  It will help you feel the pain of current
>  users.
> >
> >
> > I know that there is little time left, but I would like to get some
> > feedback and some guidance, in order to clarify the direction in which to
> > go and to be able to make a clear timeline for the application.
> > >Thank you very much!
> > >
> >
> > Good luck,
> >
> >
> > >Emanuel DANCI
> > >
> > >
> > >________________________________
> > >
> > >
> > >Well, even with this info, its hard to tell.  Integrating dynamic
> sublists
> > >into Mailman 3.0 is critical to systers, while Other mailman extensions
> is
> > >closer to mailman and you would probably end up with at least 1 mentor
> > from
> > >the mailman project.
> > >
> > >The dlists project will require you to understand mailman (and our
> > changes)
> > >at a relatively deep level -- it's a large code base, and learning
>  to
> > >understand such a system would be good for you.  It may also introduce
> you
> > >to python packages that you are not already familiar with.  I think it
> > will
> > >be straightforward and easy to make progress.  For someone with your
> > >experience, it might be useful to take on an additional project at the
> end
> > >of the summer, as, if you know python well, this might only take you a
> > >month or less.  You will learn some new algorithms, but probably won't
> > >create any
> > >
> > >The other mailman extensions will require that you decide what
> extensions
> > >are valuable, how people will want to use them (so you'll get introduced
> > to
> > >use cases, if you aren't already familiar with them).  There will be
> lots
> > >of opportunities to work out new algorithms, but that's only a small
> part
> > >of the
>  project.
> > >
> > >You might also look into the Archive access project -- it's about making
> > >the archives usefully searchable.  There is an active mailman project on
> > >this, which should get you started, and might make this accessible to an
> > >undergraduate. There is a use case component to this too, but there
> should
> > >be a lot of algorithm work too, if that is what you like. Look into it
> and
> > >see if it appeals to you.
> > >
> > >I've given some advice about the Other Mailman extensions project
> already.
> > >You should search the archives for that.  For the others, do a little
> > >research and ask some questions.  That will enable us to give you more
> > >concrete advice.
> > >
> > >Robin
> > >
> > >Robin
> > >
> > >On Mon, Mar 26, 2012 at 3:29 PM, Danci Emanuel <danci_emanuel at yahoo.com
> > >wrote:
> > >
> > >>
> > >>
> > >> Thank you for the prompt reply! This is great news! First of all, let
> me
> > >> introduce myself:
> > >> I am a 1st year undergraduate student pursuing a Computer Science
> degree
> > >> at the "Politehnica" University of Timisoara, Romania. I have a keen
> > >> interest in software development and in solving algorithms and
> > mathematics
> > >> problems, and up to this moment I have gained programming experience
> by
> > >> participating and winning multiple regional and national contests in
> > >> algorithms and project-based software development competitions and by
> > >> creating several pet projects. The programming languages that I have
> > used
> > >> so far and the
>  associated level of experience for each of them are: C,
> > C#,
> > >> Python, MySQL (intermediate), C++ (beginner/intermediate), nsis
> > scripting
> > >> language (beginner).Currently I am building a simulation tool for
> solar
> > >> panels, for one of the professors from our university and I am also
> > >> participating at the Competitive Programming Seminar from our college
> > where
> > >> we train for programming competitions like the ACM or Challenge24 by
> > >> solving problems from previous
> > >>  years and also by studying new different algorithms (audio and image
> > >> processing algorithms, linear programming algorithms etc).
> > >> This is who I am, in a few words, and from what I have already read on
> > the
> > >> mailing list, I consider this as being a great opportunity to
> > contribute to
> > >> a
>  great project by creating new features that will have a positive
> > impact
> > >> on a large number of users. I have read the proposed project ideas,
> and
> > >> although I found very interesting these two projects: "Other mailman
> > >> extensions" and "Integrating Dynamic Sub-lists with Mailman 3.0", I
> > would
> > >> like to get some advice in regard to which project to choose depending
> > on
> > >> which one would be more useful for the Systers' community and also
> which
> > >> one would be more suitable for me. Furthermore, it would be nice to
> get
> > >> some guidance in order to make a good application.
> > >>
> > >> Thank you,
> > >>
> > >> Emanuel DANCI
> > >>
> > >>
> > >> To unsubscribe from this conversation, send email to <
> > >> systers-dev+eligibility+unsubscribe at systers.org> or visit <
> > >>
> > http://systers.org/mailman/options/systers-dev?override=180&preference=0
> >
> > >> To contribute to this conversation, use your mailer's reply-all or
> > >> reply-group command or send your message to
> > >> systers-dev+eligibility at systers.org
> > >> To start a new conversation, send email to <
> systers-dev+new at systers.org
> > >
> > >> To unsubscribe entirely from systers-dev, send email to <
> > >> systers-dev-request at systers.org> with subject unsubscribe.
> > >>
> > >
> > >
> > >To unsubscribe from this conversation, send email to <
> > systers-dev+eligibility+unsubscribe at systers.org> or visit <
> > http://systers.org/mailman/options/systers-dev?override=180&preference=0
> >
> > >To contribute to this conversation, use your mailer's reply-all or
> > reply-group command or send your message to
> > systers-dev+eligibility at systers.org
> > >To start a new conversation, send email to <systers-dev+new at systers.org
> >
> > >To unsubscribe entirely from systers-dev, send email to <
> > systers-dev-request at systers.org> with subject unsubscribe.
> > >
> > >
> > >To unsubscribe from this conversation, send email to <
> > systers-dev+eligibility+unsubscribe at systers.org> or visit <
> > http://systers.org/mailman/options/systers-dev?override=180&preference=0
> >
> > >To contribute to this conversation, use your mailer's reply-all or
> > reply-group command or send your message to
> > systers-dev+eligibility at systers.org
> > >To start a new conversation, send email to <systers-dev+new at systers.org
> >
> > >To unsubscribe entirely from systers-dev, send email to <
> > systers-dev-request at systers.org> with subject unsubscribe.
> > >
> >
> >
> > To unsubscribe from this conversation, send email to <
> > systers-dev+eligibility+unsubscribe at systers.org> or visit <
> > http://systers.org/mailman/options/systers-dev?override=180&preference=0
> >
> > To contribute to this conversation,
>  use your mailer's reply-all or
> > reply-group command or send your message to
> > systers-dev+eligibility at systers.org
> > To start a new conversation, send email to <systers-dev+new at systers.org>
> > To unsubscribe entirely from systers-dev, send email to <
> > systers-dev-request at systers.org> with subject unsubscribe.
> >
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+eligibility+unsubscribe at systers.org> or visit <
> http://systers.org/mailman/options/systers-dev?override=180&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+eligibility at systers.org
> To start a new conversation, send email to <systers-dev+new at systers.org>
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+eligibility+unsubscribe at systers.org> or visit <
> http://systers.org/mailman/options/systers-dev?override=180&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+eligibility at systers.org
> To start a new conversation, send email to <systers-dev+new at systers.org>
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>

To contribute to this conversation, send mail to <Danci Emanuel >


More information about the Systers-dev mailing list