IQSS logo

IRC log for #dataverse, 2019-10-15

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
07:02 jri joined #dataverse
07:55 stefankasberger joined #dataverse
09:01 Slava1 joined #dataverse
09:03 Slava2 joined #dataverse
09:54 Slava1 joined #dataverse
09:59 poikilotherm joined #dataverse
10:59 Slava1 joined #dataverse
11:01 jri joined #dataverse
12:55 donsizemore joined #dataverse
13:08 kamil90 joined #dataverse
13:08 kamil90 hello
13:09 poikilotherm Hello kamil90
13:10 kamil90 We plan to implement an application on top of the Dataverse
13:10 poikilotherm That sounds nice :-)
13:10 kamil90 that will work as a datagrid view, gallery etc of specimen collection records
13:11 kamil90 and i would ask
13:12 kamil90 what do you think about performance of quering dataverse api in real time vs caching results via oai-pmh in nosql or even some relational db?
13:13 poikilotherm Disclaimer: I am no core dev member, just a community guy
13:14 poikilotherm That said: when I played with OAI-PMH for those harvesting timers, I found it to be pretty slow
13:14 poikilotherm OAI-PMH itself is not a very fast protocol, as known from other sources here (FZJ, Germany, using Invenio for text publications)
13:14 poikilotherm So most likely, using the API will be faster
13:15 kamil90 Yes, but I consider 2 approaches
13:15 poikilotherm There are already plenty of implementations for python, javascript etc etc that make it easier to use the API in your project
13:15 kamil90 1. Caching results from time to time via oai-pmh so the app will use the my own source of data
13:16 kamil90 2. use Dataverse API in every query
13:16 poikilotherm Right.
13:16 kamil90 Do you think the API will handle it?
13:17 kamil90 Maybe I will show you which type of app we want to develop on top of the DV
13:17 poikilotherm If I were to create a app on top of Dataverse, I'd like to avoid caching if not absolutely necessary
13:17 kamil90 https://data.nhm.ac.uk/dataset/collection-specimens/resource/05ff2255-c38a-40c9-b657-4ccb55ab2feb?view_id=203a0ae5-6a14-480a-a407-27eeb9373858
13:17 poikilotherm At least not implement it myself in my application, but rely on Dataverse handling the load
13:18 kamil90 This one is build in top of ckan, low level integrated into, but the source of data is nosql
13:18 poikilotherm If channeling throught the API is not fast enough, one could still think about using the Solr index directly or think again about caching
13:18 * poikilotherm goes looking
13:19 poikilotherm WOW, that are really HUGE datasets
13:19 kamil90 and the data is synchronising into nosql which from this NHM app queries for data
13:20 kamil90 Yes it is
13:21 kamil90 but we aren't as big as NHM :)
13:21 kamil90 but also not so small in contrast
13:22 kamil90 so that's why I'm asking about performance if we would store hundreds of thousands of specimen collection records
13:22 poikilotherm Sounds like a hard guess to estimate up front
13:23 kamil90 is querying api directly will be good enough or we should think about synchronizing content into relational DB or even nosql solution?
13:23 poikilotherm Maybe pdurbin knows about other instances as big as these
13:24 poikilotherm Unfortunately IQSS or at least Phil seems to be on holidays
13:24 poikilotherm Not much happening on Github
13:24 poikilotherm There are some recent checks about performance
13:24 poikilotherm Let me try to dig those Github issues out
13:27 poikilotherm Ok there are https://github.com/IQSS/dataverse/issues/6035, https://github.com/IQSS/dataverse/issues/5977 and https://github.com/IQSS/dataverse/issues/5824
13:27 poikilotherm They thought about using JMeter to measure
13:28 poikilotherm Maybe you can try to estimate if it can handle the load by using tools like JMeter, Gatling, etc?
13:28 kamil90 I think pdurbin should know much more about that and I hope he will give some advice on possible architecture of new app
13:28 kamil90 but thanks for your support I will try that in the meantime
13:28 poikilotherm I'm pretty sure he would be very happy about some load tests via API
13:29 poikilotherm Sure
13:29 kamil90 Unfortunately I don't know yet what will the load be and how much queries to the api I will need to perform to list a 50 records in datagridview or in gallery
13:29 poikilotherm Phil and everyone at IQSS is always fond of hearing user stories
13:30 poikilotherm Maybe you can reach out via the google group and/or open an issue?
13:30 poikilotherm So this can be tracked
13:30 poikilotherm You might be interested in contacting stefankasberger
13:30 poikilotherm He created the python client
13:30 kamil90 I've recently opened a few new tickets in github, I don't want to bother them once again :)
13:31 poikilotherm Heh. I think that'll be fine. They pretty much rely on feedback in every channel
13:31 kamil90 Maybe stefankasberger will be aware of performance
13:32 poikilotherm When you are lucky he sees our mentions here :-D
13:32 kamil90 maybe some popup appear on his desktop :]
13:33 poikilotherm You could also try reach out to him via Twitter or in pyDataverse Github project
13:34 kamil90 I found https://github.com/AUSSDA/pyDataverse
13:34 poikilotherm Aye, that's his project
13:35 kamil90 Thank you for your help poikilotherm
13:35 poikilotherm Meh. Didn't help much I fear. I do hope pdurbin is around next week ;-)
14:13 donsizemore joined #dataverse
16:55 stefankasberger sorry @kamil90. Was working hole day on a security policy training (passwords and stuff like this), so was not reading it before. have to go home now, but contact me whenever you want. will be back tomorrow. cheerz
16:57 stefankasberger *whole
17:51 Sherry joined #dataverse
17:53 Sherry demo dataverse is "up", but I get Privacy warnings on Chrome: "This server could not prove that it is demo.dataverse.org; its security certificate is from dev2.dataverse.org. This may be caused by a misconfiguration or an attacker intercepting your connection."
17:53 Sherry When I proceed "unsafely", and then log in (after creating a new account), I can't add any datasets or dataverses.
18:00 donsizemore joined #dataverse
18:05 donsizemore @Sherry Phil rebuilt it on short notice (and I believe/hope he's off due to his new house!) but he'll be back tomorrow
18:23 Sherry OK, thanks.
20:42 pdurbin joined #dataverse
20:43 pdurbin kamil90: your application sounds awesome!

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.