Time
S
Nick
Message
07:02
jri joined #dataverse
07:55
stefankasberger joined #dataverse
09:01
Slava1 joined #dataverse
09:03
Slava2 joined #dataverse
09:54
Slava1 joined #dataverse
09:59
poikilotherm joined #dataverse
10:59
Slava1 joined #dataverse
11:01
jri joined #dataverse
12:55
donsizemore joined #dataverse
13:08
kamil90 joined #dataverse
13:08
kamil90
hello
13:09
poikilotherm
Hello kamil90
13:10
kamil90
We plan to implement an application on top of the Dataverse
13:10
poikilotherm
That sounds nice :-)
13:10
kamil90
that will work as a datagrid view, gallery etc of specimen collection records
13:11
kamil90
and i would ask
13:12
kamil90
what do you think about performance of quering dataverse api in real time vs caching results via oai-pmh in nosql or even some relational db?
13:13
poikilotherm
Disclaimer: I am no core dev member, just a community guy
13:14
poikilotherm
That said: when I played with OAI-PMH for those harvesting timers, I found it to be pretty slow
13:14
poikilotherm
OAI-PMH itself is not a very fast protocol, as known from other sources here (FZJ, Germany, using Invenio for text publications)
13:14
poikilotherm
So most likely, using the API will be faster
13:15
kamil90
Yes, but I consider 2 approaches
13:15
poikilotherm
There are already plenty of implementations for python, javascript etc etc that make it easier to use the API in your project
13:15
kamil90
1. Caching results from time to time via oai-pmh so the app will use the my own source of data
13:16
kamil90
2. use Dataverse API in every query
13:16
poikilotherm
Right.
13:16
kamil90
Do you think the API will handle it?
13:17
kamil90
Maybe I will show you which type of app we want to develop on top of the DV
13:17
poikilotherm
If I were to create a app on top of Dataverse, I'd like to avoid caching if not absolutely necessary
13:17
kamil90
https://data.nhm.ac.uk/dataset/collection-specimens/resource/05ff2255-c38a-40c9-b657-4ccb55ab2feb?view_id=203a0ae5-6a14-480a-a407-27eeb9373858
13:17
poikilotherm
At least not implement it myself in my application, but rely on Dataverse handling the load
13:18
kamil90
This one is build in top of ckan, low level integrated into, but the source of data is nosql
13:18
poikilotherm
If channeling throught the API is not fast enough, one could still think about using the Solr index directly or think again about caching
13:18
* poikilotherm
goes looking
13:19
poikilotherm
WOW, that are really HUGE datasets
13:19
kamil90
and the data is synchronising into nosql which from this NHM app queries for data
13:20
kamil90
Yes it is
13:21
kamil90
but we aren't as big as NHM :)
13:21
kamil90
but also not so small in contrast
13:22
kamil90
so that's why I'm asking about performance if we would store hundreds of thousands of specimen collection records
13:22
poikilotherm
Sounds like a hard guess to estimate up front
13:23
kamil90
is querying api directly will be good enough or we should think about synchronizing content into relational DB or even nosql solution?
13:23
poikilotherm
Maybe pdurbin knows about other instances as big as these
13:24
poikilotherm
Unfortunately IQSS or at least Phil seems to be on holidays
13:24
poikilotherm
Not much happening on Github
13:24
poikilotherm
There are some recent checks about performance
13:24
poikilotherm
Let me try to dig those Github issues out
13:27
poikilotherm
Ok there are https://github.com/IQSS/dataverse/issues/6035 , https://github.com/IQSS/dataverse/issues/5977 and https://github.com/IQSS/dataverse/issues/5824
13:27
poikilotherm
They thought about using JMeter to measure
13:28
poikilotherm
Maybe you can try to estimate if it can handle the load by using tools like JMeter, Gatling, etc?
13:28
kamil90
I think pdurbin should know much more about that and I hope he will give some advice on possible architecture of new app
13:28
kamil90
but thanks for your support I will try that in the meantime
13:28
poikilotherm
I'm pretty sure he would be very happy about some load tests via API
13:29
poikilotherm
Sure
13:29
kamil90
Unfortunately I don't know yet what will the load be and how much queries to the api I will need to perform to list a 50 records in datagridview or in gallery
13:29
poikilotherm
Phil and everyone at IQSS is always fond of hearing user stories
13:30
poikilotherm
Maybe you can reach out via the google group and/or open an issue?
13:30
poikilotherm
So this can be tracked
13:30
poikilotherm
You might be interested in contacting stefankasberger
13:30
poikilotherm
He created the python client
13:30
kamil90
I've recently opened a few new tickets in github, I don't want to bother them once again :)
13:31
poikilotherm
Heh. I think that'll be fine. They pretty much rely on feedback in every channel
13:31
kamil90
Maybe stefankasberger will be aware of performance
13:32
poikilotherm
When you are lucky he sees our mentions here :-D
13:32
kamil90
maybe some popup appear on his desktop :]
13:33
poikilotherm
You could also try reach out to him via Twitter or in pyDataverse Github project
13:34
kamil90
I found https://github.com/AUSSDA/pyDataverse
13:34
poikilotherm
Aye, that's his project
13:35
kamil90
Thank you for your help poikilotherm
13:35
poikilotherm
Meh. Didn't help much I fear. I do hope pdurbin is around next week ;-)
14:13
donsizemore joined #dataverse
16:55
stefankasberger
sorry @kamil90. Was working hole day on a security policy training (passwords and stuff like this), so was not reading it before. have to go home now, but contact me whenever you want. will be back tomorrow. cheerz
16:57
stefankasberger
*whole
17:51
Sherry joined #dataverse
17:53
Sherry
demo dataverse is "up", but I get Privacy warnings on Chrome: "This server could not prove that it is demo.dataverse.org; its security certificate is from dev2.dataverse.org. This may be caused by a misconfiguration or an attacker intercepting your connection."
17:53
Sherry
When I proceed "unsafely", and then log in (after creating a new account), I can't add any datasets or dataverses.
18:00
donsizemore joined #dataverse
18:05
donsizemore
@Sherry Phil rebuilt it on short notice (and I believe/hope he's off due to his new house!) but he'll be back tomorrow
18:23
Sherry
OK, thanks.
20:42
pdurbin joined #dataverse
20:43
pdurbin
kamil90: your application sounds awesome!