IQSS logo

IRC log for #dataverse, 2015-06-26

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
02:11 garnett joined #dataverse
03:39 garnett joined #dataverse
06:46 metamattj joined #dataverse
12:35 pdurbin very interesting table mapping roles to dvobjects: https://github.com/IQSS/dataverse/commit/22c6afaaec3377e78f027cbc33ea24ddd5f9fbc4#commitcomment-11880334
14:01 Fritz joined #dataverse
14:02 Guest63608 Hi Philip Durbin, are you around? This is Fritz again, the guy working on Solr document access control.
14:05 Guest63608 I played a bit around and did some performance tests and noticed that joining doesn't scale very well. I added 10 million dummy docs and notice that the initial query goes up to 15sec in some cases and it seems its directly related to the number of documents a certain user has access to.
14:06 pdurbin wow, 10 million
14:06 Guest63608 Storing permission within the document itself keeps the query time always at 250msec
14:06 pdurbin 15 seconds is crazy slow
14:07 Guest63608 I mean it depends on the use case and whether reindexing the original document takes long time. I was curious if you found a way around the increasing query time for joins.
14:07 pdurbin so far performance hasn't been an issue. you have 10 million "primary" documents? we have maybe half a million
14:08 Guest63608 There is this nice blog post about joins  https://lucidworks.com/blog/solr-and-joins/
14:09 pdurbin "Well, the take-away is that you really, really should experiment with the join performance in your situation before deciding on it as a solution for all your problems."
14:09 pdurbin interesting
14:14 pdurbin Guest63608: how much granularity do you need in your permissions? is a global public vs. private concept enough? or do you need to only allow certain people or groups to be able to discover documents via search? or (even more complicated) do you need to differentiate why certain people can find docs based on multiple roles they may have?
14:15 pdurbin I simply can't figure out how to do that last one with Solr, as I commented here: https://docs.google.com/document/d/1xv_IKS1hWYUzX3GZOjHqGQ3BUBI7QH_2FekP90cexNA/edit?usp=sharing
14:15 pdurbin I do the middle one. :)
14:18 Guest63608 Will have a look in a bit, a meeting just started.
14:23 pdurbin I'm here till lunchtime. :)
14:23 pdurbin taking the kids camping
14:25 Guest63608 Great! The weather is suppose to get nice again starting this afternoon :)
14:26 pdurbin :)
15:29 axfelix joined #dataverse
15:30 axfelix joined #dataverse
16:41 Guest63608 pdurbin: Our set-up is fairly simple: you either have rights to find a certain data set or not. There is really no fine granularity. Since joining unique values seems to not scale well we will probably go for the most easy approach and store access permission as a MultiValue field in Solr. Updating permissions might take a bit longer but since we search much more often than we change permissions this trade-off is fine for us. All other permission
16:42 Guest63608 Thanks for your help again and have a nice weekend camping ;)
16:42 Guest63608 left #dataverse
16:58 metamattj joined #dataverse
20:36 axfelix joined #dataverse
22:04 garnett joined #dataverse
23:50 metamattj joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.