IQSS logo

IRC log for #dataverse, 2019-06-14

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
03:37 icarito[m] joined #dataverse
09:59 poikilotherm joined #dataverse
10:23 pdurbin I was wrong about a number of things yesterday.
10:26 poikilotherm Good morning Phil :-)
10:32 poikilotherm You sound disappointed
10:33 pdurbin I'm actually happy about my bike lock. It's not broken. My nine year old figured out what happened. She's really good at puzzles. I let her have an extra cookie.
10:34 poikilotherm *thumbs up*
10:37 pdurbin Also, I was so convinced I had found that smoking gun yesterday about the "affiliation" input field that I had poor sekmiller try it for me before I even got out of my rain gear. But my theory was wrong. So now I'm even more confused.
10:41 poikilotherm It is not totally clear if GF 4.1 contains a patched JAR or not.
10:42 poikilotherm Believing in the MANIFEST, it should be affected
10:42 poikilotherm But GF 4.1 was released in August 2015
10:42 poikilotherm And fixed libraries were available as of June/July
10:42 poikilotherm So there might have been a backport
10:43 poikilotherm I am also confused about that DataversePage.init() I had been looking at yesterday.
10:44 poikilotherm Maybe my theory was just wrong, but something must be in there containing a bug, otherwise it should not break.
10:47 pdurbin I guess I was thinking that your theory could still be right but my way of trying to test is wasn't a good test. But maybe your theory is wrong. I have no idea. :)
10:50 pdurbin I do know that when I set INTERPRET_EMPTY_STRING_SUBMITTED_VALUES_AS_NULL to false like I did in https://github.com/IQSS/dataverse/pull/5908 that a couple issues were immediately resolved on Payara 5. But I didn't test much and you didn't seem interested in testing that pull request at all. Do you still feel the same way?
10:51 poikilotherm Do you know/like Dr House? Maybe we should try differential diagnosis here.
10:51 pdurbin By a couple issues I mean 1. no error on the home page and 2. dataverseAdmin user able to log in. But that's where I stopped testing.
10:52 pdurbin I've never heard of Dr House.
10:52 poikilotherm https://en.wikipedia.org/wiki/House_(TV_series)
10:53 pdurbin Oh, I've heard of House but I've never watched it.
10:57 poikilotherm Let me try getting things testable first here, ok?
10:58 poikilotherm I need to create a setup useable for development, involving running database etc offloaded
10:58 poikilotherm My laptop is aching from the load otherwise and killing things
11:10 pdurbin Sounds good. I just left a couple comments on related issues to bring them up to date.
11:23 poikilotherm Oh BTW pdurbin. I chatted with a colleague today about an interesting topic
11:23 poikilotherm (Interesting for Dataverse)
11:23 poikilotherm They are creating https://open-access-monitor.de/#/
11:24 poikilotherm Currently, they are using a PostgreSQL database backend for all those publications, etc.
11:24 poikilotherm Loads of data, ~50 GB
11:25 poikilotherm They are experiencing deep troubles with performance now, as the data is splitted across 7 or 8 tables, with a large amount of joins to generate reports
11:25 poikilotherm And they got a huge machine ;-)
11:25 pdurbin Lots of joins can be a killer.
11:26 poikilotherm However, they are experimenting with MongoDB now, as every publication is just a document then, with attached metadata
11:26 poikilotherm That reminded my of a few discussion on the usage of Solr
11:26 poikilotherm And how inflexible it is for custom metadata
11:27 pdurbin Well, one can run Solr in a schemaless mode. Perhaps we should look into this.
11:27 poikilotherm Yeah, but it really suffers from performance issues when I understood Peter Kiraly correct
11:27 poikilotherm He is eager to come up with sth. about dynamic schemas
11:28 pdurbin And even if you don't switch modes, if you end fields with _ss or _i or other suffixes the fields will be created dynamically. We do this already for some fields if you look in schema.xml.
11:29 poikilotherm Ok, good to know
11:29 pdurbin There are incredible performance issues when indexing dvobjects and especially their permissions but I believe that Solr is entirely innocent. All the bottlenecks are in the Dataverse code.
11:31 pdurbin At first I thought you were going to say you want Dataverse to switch from postgres to mongo. :)
11:31 poikilotherm Nope, I was just wondering about Metadata, Indexing and searches
11:41 pdurbin There's an API endpoint intended to help authors of custom metadata blocks adjust their Solr schema. I wrote about it at http://guides.dataverse.org/en/4.14/admin/metadatacustomization.html#updating-the-solr-schema
11:44 pdurbin Should custom metadata blocks be based on a standard? If you know of one, please reply to https://twitter.com/philipdurbin/status/1138796295756406784
11:46 donsizemore joined #dataverse
11:48 donsizemore @pdurbin morning. on solr and performance: Odum's Dataverse runs in a self-contained VM (for VMware HA). the only time i've seen the system load spike was while sophia was teaching a webinar and all the folks playing along at home simultaneously published. the CPU spikes are in solr, at least for us
12:19 pdurbin donsizemore: interesting. That reminds me. I should show everyone some thoughts on monitoring.
12:19 pdurbin Thoughts on monitoring: https://github.com/IQSS/dataverse.harvard.edu/issues/18
12:19 pdurbin What do you all think?
12:21 dzho chaos monkey when
12:21 dzho ;-)
12:24 pdurbin Heh. It would be nice to try some chaos monkey style testing. The other day I listened to a podcast about choas engineering at https://thenewstack.io/the-new-stack-context-monitorama-2019/
12:25 dzho this is all strictly do-as-I-say, tongue-in-cheek for me, tbh though, as I don't have continuous automated monitoring of my personal infrastructure even.
12:26 dzho which makes the reference to munin in the above link a helpful stir to the to-do pile.
12:28 pdurbin dzho: I've used Munin on Ubuntu and CentOS. It's pretty easy to install. This is what I wrote about it: http://guides.dataverse.org/en/4.14/admin/monitoring.html#munin
12:28 * dzho nods
12:29 pdurbin I don't bother monitoring my home server because I notice soon enough if I can't connect to my IRC client. :)
12:32 dzho lol, yeah. IRC is my de facto personal infrastructure monitor
12:34 dzho to the extent that, with this house move, towards which (I hope!) we are finally approaching an end, I've maintaned an ISP connection in each place and at least a small ARM SBC (Raspberry Pi-class machines, in other words) running as a crude monitor.
12:34 dzho So, at first, when renovations were ongoing in the new house before we moved in, then later as we asymptotically approached moving our stuff out of the old place.
12:51 pdurbin :)
13:34 pdurbin_m joined #dataverse
13:35 pdurbin_m dzho: have you seen https://www3.nd.edu/~pbui/teaching/cse.40842.sp19/ ?
14:25 pdurbin 'CSE 40842 is a Computer Science and Engineering elective course at the University of Notre Dame that explores the idea of a "hacker" and the practice of participating in the open source "bazaar".'
15:26 pdurbin donsizemore: do you think I should try to summon pameyer to help us troubleshoot docker-aio?
17:11 donsizemore i'm all ears on those tests failing inconsistently
17:12 donsizemore10 joined #dataverse
17:13 pdurbin Actually, can I show you something else quick?
17:17 bjonnh pdurbin: for testing docker containers I always do that in VMs
17:17 bjonnh that I refresh regularly
17:18 bjonnh just to make sure there are not any left over, that docker is the last release etc
17:21 pdurbin bjonnh: what kind of VMs? KVM, VMWare, something else?
17:21 andrewSC joined #dataverse
17:26 andrewSC joined #dataverse
17:29 bjonnh qemu
17:29 bjonnh qemu/kvm
17:29 bjonnh that's the easiest solution I found
17:29 bjonnh using libvirt
17:29 bjonnh so I can manage my hosts from the same machine, it is light
17:34 donsizemore10 @pdurbin craig is here; we're comparing co-ray-ray and whole tale =)
17:36 jri joined #dataverse
17:38 bjonnh pdurbin: but anything should work really… Just make snapshots of the VM, do a test, restore VM. I've always had surprises with leftover things
17:53 pdurbin bjonnh: I did a lot of qemu/kvm at my last job
17:53 pdurbin donsizemore10: hi Craig!
18:00 pdurbin bjonnh: I'm not sure I want to introduce another layer. I think CentOS and Docker should be enough without KVM in between. I was actually thinking that maybe we should just restart Docker after every run or something. Or restart Jenkins. Or restart the Jenkins server. :)
18:02 donsizemore10 i can make jenkins blow away everything docker-related
18:04 pdurbin might be worth a shot
18:04 Paul_Dante joined #dataverse
18:04 pdurbin but does that mean I should get out of /tmp on your server?
18:05 pdurbin oh hey Paul_Dante
18:05 Paul_Dante Hi Phil
18:05 pdurbin Paul_Dante: ready to talk about https://github.com/IQSS/dataverse/issues/5730 ? :)
18:05 donsizemore10 @pdurbin the webhook is listener is disabled so you can go to town
18:05 Paul_Dante yup
18:05 pdurbin donsizemore10: one moment please
18:06 bjonnh pdurbin: docker-compose allows you to keep things clean as well
18:06 bjonnh pdurbin: docker-compose down and everything goes away
18:07 bjonnh it creates networks, volumes, containers as needed, allows updates and when you put down, it kills everything
18:07 bjonnh (and it is integrated with docker)
18:08 pdurbin bjonnh: awesome but lemme chat with Paul_Dante about metadata exports quick
18:09 pdurbin Paul_Dante: so we're both running curl http://localhost:8080/api/admin/metadata/reExportAll
18:09 pdurbin And it works like I expect.
18:09 Paul_Dante I was under the impression that localhost:8080/api/admin/metadata/exportAll would export all of the metadata records from the DV; am I misunderstanding what that endpoint does?
18:10 pdurbin That sounds like a nice feature but no, that's not what it does. :)
18:10 Paul_Dante That would explain why it isn't doing what I expected :)
18:11 pdurbin If I go into /usr/local/glassfish4/glass​fish/domains/domain1/files and delete all the cached metadata export files like export_ddi.cached and friends and then run that reExportAll the export files will be regenerated.
18:11 pdurbin Paul_Dante: but what you want sounds like a nice feature. You already wrote some code for this?
18:13 Paul_Dante Sort of. I wrote a multi-step process to do that. 1) Search for all records 2) Grab the DOIs out of those search results 3) Iterate through all those DOIs to use for a download of each specific record.
18:13 pdurbin Yeah, that sounds like what I'd do. :)
18:14 pdurbin What language are you using?
18:14 Paul_Dante Java
18:14 pdurbin Interesting. Are you using https://github.com/IQSS/dataverse-client-java ?
18:16 Paul_Dante No, I've just been running Java-generated commandline calls.
18:16 pdurbin gotcha
18:17 pdurbin So do you want to create a new issue for the feature you want? Or are you good with your solution?
18:18 Paul_Dante That client would allow me to more cleanly make calls to DV within my java program?
18:18 Paul_Dante I'm content with my current solution.
18:21 pdurbin Paul_Dante: probably more cleanly, yes, but I've never used it. It used in production by RSpace, who wrote it when they integrated with Dataverse. Should we link to your current solution from http://guides.dataverse.org/en/4.14/api/apps.html#java ? If it's open source, that is.
18:25 Paul_Dante My code is all open source, so won't be a problem on that front, but my current solution is a bit distributed through my code at the moment, so wouldn't currently be very helpful to other people. If I find some free time I'll make a more succinct solution that would actually be readable.
18:26 pdurbin Paul_Dante: cool, if you ever want to edit that page, here's how it looks as of 4.14: https://github.com/IQSS/dataverse/blob/v4.14/doc/sphinx-guides/source/api/apps.rst
18:28 Paul_Dante :thumbs up:
18:28 Paul_Dante Thanks for the chat.
18:29 pdurbin Sure! Thanks for stopping by! Do you want to close your issue? Or help us with the docs? :)
18:36 pdurbin donsizemore10 bjonnh I'm talking to Pete a bit on Slack about my docker-aio on CentOS woes. I wonder if I have the same problem on Mac. I guess I should confirm this.
20:23 pdurbin donsizemore10: so, I'm being careful to do a docker rm before every run and I've had three successes in a row.
20:46 pdurbin ok, I'm out of here
20:46 pdurbin Have a good weekend, everyone!
20:47 pdurbin left #dataverse
22:04 pdurbin joined #dataverse
22:05 pdurbin p.s. Dataverse 4.15 is out: https://github.com/IQSS/dataverse/releases/tag/v4.15 :)
22:05 pdurbin left #dataverse
22:42 jri joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.