IQSS logo

IRC log for #dvn, 2013-02-26

We've moved! Please join #dataverse instead. The new logs are at http://irclog.iq.harvard.edu/dataverse/today

| Channels | #dvn index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
07:52 sbmarks joined #dvn
07:52 ruebot joined #dvn
07:52 pdurbin joined #dvn
18:38 pdurbin got a nice answer at Re: Compatibility between Lucene Query and Solr SolrQuery in classes SolrJ - http://article.gmane.org/gmane.comp.jakarta.lucene.solr.user/76951
19:57 pdurbin huh, it looks like you can do faceting in Lucene: [#LUCENE-3079] Faceting module - ASF JIRA - https://issues.apache.org/jira/browse/LUCENE-3079
19:58 pdurbin via http://stackoverflow.com/questions/8550818/whats-the-difference-between-grouping-and-facet-in-lucene-3-5
20:11 pdurbin man I'm thinking hard about DVN search
20:12 pdurbin here's my ticket about it: https://redmine.hmdc.harvard.edu/issues/2656
20:12 pdurbin basically, I've been investigating Solr as a possible replacement for Lucene
20:13 pdurbin Solr is being considered for two reasons
20:14 pdurbin 1. Solr runs as a standalone service and can be run on a separate server from the glassfish server(s)
20:14 pdurbin (or servers, I guess, if you were to do fancy clustering with Solr)
20:15 pdurbin 2. Solr gives you faceting out of the box
20:15 pdurbin So I've been playing around with Solr, per updates to that ticket.
20:16 pdurbin And it's nice, it does what it claims to do.
20:16 pdurbin But I think we were hoping it would be something of a drop in replacement for Lucene it our code.
20:17 pdurbin in our code
20:17 pdurbin But it's not.
20:19 pdurbin You can't just create SolrQuery objects ( http://lucene.apache.org/solr/4_1_0/solr-solrj/org/apache/solr/client/solrj/SolrQuery.html ) and pass them to methods that expect Lucene Query objects ( http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/api/core/org/apache/lucene/search/Query.html )
20:20 pdurbin that's what I expected and it was confirmed for me today in the post I mentioned earlier: Re: Compatibility between Lucene Query and Solr SolrQuery in classes SolrJ - http://article.gmane.org/gmane.comp.jakarta.lucene.solr.user/76951
20:21 pdurbin So I had to change what parameter the method expects. getHitIds() specifically: https://github.com/IQSS/dvn/commit/688bbe6 ... and in doing so I've completely changed how search works
20:22 pdurbin And personally, I'm still just getting up to speed with how search works in Lucene and DVN anyway.
20:23 pdurbin So before I go much futher, I need to have some sort of test environment in place to make sure search isn't completely changing.
20:34 pdurbin Within the DVN, search comes into play in four places (as far as I understand):
20:35 pdurbin 1. Users looking for studies via basic vs. advanced search at the dataverse level and network level: http://guides.thedata.org/book/search
20:35 pdurbin 2. Dataverse owners creating dynamic collections: http://guides.thedata.org/book/manage-collections
20:35 pdurbin 3. Dataverse admins setting up OAI harvesting set (which are dynamic): http://guides.thedata.org/book/manage-oai-harvesting-sets
20:35 pdurbin 4. Users looking for studies via the Data Sharing API: http://guides.thedata.org/book/data-sharing-api (put last because it's new and not widely used)
20:36 pdurbin So, the test environment needs to be ready to test all of these.
20:37 pdurbin That is to say, sufficient data in the database for testing and the queries saved for 2 and 3.
20:39 pdurbin We could change how the items above work, but we'd need to change our docs and explain things to DVN users and admins.
20:41 pdurbin DVN admins, especially, would want to know if the rules for specifying OAI harvesting sets were to change as it's how DVN admins limit what can be harvested.
20:59 pdurbin So, I think... 2 steps from here, and the second step is a decision point (2a vs. 2b)
21:00 pdurbin 1. set up a test environment
21:00 pdurbin 2a. Keep working on Solr, perhaps changing how search works (!)
21:01 pdurbin 2b. Try upgrading Lucene to a version that supports faceting: https://issues.apache.org/jira/browse/LUCENE-3079
22:12 pdurbin Oh, I meant to mention that this was helpful in thinking about Lucene vs. Solr: http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage/2288211#2288211

| Channels | #dvn index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

We've moved! Please join #dataverse instead. The new logs are at http://irclog.iq.harvard.edu/dataverse/today