Time |
S |
Nick |
Message |
07:52 |
|
|
sbmarks joined #dvn |
07:52 |
|
|
ruebot joined #dvn |
07:52 |
|
|
pdurbin joined #dvn |
18:38 |
|
pdurbin |
got a nice answer at Re: Compatibility between Lucene Query and Solr SolrQuery in classes SolrJ - http://article.gmane.org/gmane.comp.jakarta.lucene.solr.user/76951 |
19:57 |
|
pdurbin |
huh, it looks like you can do faceting in Lucene: [#LUCENE-3079] Faceting module - ASF JIRA - https://issues.apache.org/jira/browse/LUCENE-3079 |
19:58 |
|
pdurbin |
via http://stackoverflow.com/questions/8550818/whats-the-difference-between-grouping-and-facet-in-lucene-3-5 |
20:11 |
|
pdurbin |
man I'm thinking hard about DVN search |
20:12 |
|
pdurbin |
here's my ticket about it: https://redmine.hmdc.harvard.edu/issues/2656 |
20:12 |
|
pdurbin |
basically, I've been investigating Solr as a possible replacement for Lucene |
20:13 |
|
pdurbin |
Solr is being considered for two reasons |
20:14 |
|
pdurbin |
1. Solr runs as a standalone service and can be run on a separate server from the glassfish server(s) |
20:14 |
|
pdurbin |
(or servers, I guess, if you were to do fancy clustering with Solr) |
20:15 |
|
pdurbin |
2. Solr gives you faceting out of the box |
20:15 |
|
pdurbin |
So I've been playing around with Solr, per updates to that ticket. |
20:16 |
|
pdurbin |
And it's nice, it does what it claims to do. |
20:16 |
|
pdurbin |
But I think we were hoping it would be something of a drop in replacement for Lucene it our code. |
20:17 |
|
pdurbin |
in our code |
20:17 |
|
pdurbin |
But it's not. |
20:19 |
|
pdurbin |
You can't just create SolrQuery objects ( http://lucene.apache.org/solr/4_1_0/solr-solrj/org/apache/solr/client/solrj/SolrQuery.html ) and pass them to methods that expect Lucene Query objects ( http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/api/core/org/apache/lucene/search/Query.html ) |
20:20 |
|
pdurbin |
that's what I expected and it was confirmed for me today in the post I mentioned earlier: Re: Compatibility between Lucene Query and Solr SolrQuery in classes SolrJ - http://article.gmane.org/gmane.comp.jakarta.lucene.solr.user/76951 |
20:21 |
|
pdurbin |
So I had to change what parameter the method expects. getHitIds() specifically: https://github.com/IQSS/dvn/commit/688bbe6 ... and in doing so I've completely changed how search works |
20:22 |
|
pdurbin |
And personally, I'm still just getting up to speed with how search works in Lucene and DVN anyway. |
20:23 |
|
pdurbin |
So before I go much futher, I need to have some sort of test environment in place to make sure search isn't completely changing. |
20:34 |
|
pdurbin |
Within the DVN, search comes into play in four places (as far as I understand): |
20:35 |
|
pdurbin |
1. Users looking for studies via basic vs. advanced search at the dataverse level and network level: http://guides.thedata.org/book/search |
20:35 |
|
pdurbin |
2. Dataverse owners creating dynamic collections: http://guides.thedata.org/book/manage-collections |
20:35 |
|
pdurbin |
3. Dataverse admins setting up OAI harvesting set (which are dynamic): http://guides.thedata.org/book/manage-oai-harvesting-sets |
20:35 |
|
pdurbin |
4. Users looking for studies via the Data Sharing API: http://guides.thedata.org/book/data-sharing-api (put last because it's new and not widely used) |
20:36 |
|
pdurbin |
So, the test environment needs to be ready to test all of these. |
20:37 |
|
pdurbin |
That is to say, sufficient data in the database for testing and the queries saved for 2 and 3. |
20:39 |
|
pdurbin |
We could change how the items above work, but we'd need to change our docs and explain things to DVN users and admins. |
20:41 |
|
pdurbin |
DVN admins, especially, would want to know if the rules for specifying OAI harvesting sets were to change as it's how DVN admins limit what can be harvested. |
20:59 |
|
pdurbin |
So, I think... 2 steps from here, and the second step is a decision point (2a vs. 2b) |
21:00 |
|
pdurbin |
1. set up a test environment |
21:00 |
|
pdurbin |
2a. Keep working on Solr, perhaps changing how search works (!) |
21:01 |
|
pdurbin |
2b. Try upgrading Lucene to a version that supports faceting: https://issues.apache.org/jira/browse/LUCENE-3079 |
22:12 |
|
pdurbin |
Oh, I meant to mention that this was helpful in thinking about Lucene vs. Solr: http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage/2288211#2288211 |