Time
S
Nick
Message
03:37
icarito[m] joined #dataverse
09:59
poikilotherm joined #dataverse
10:23
pdurbin
I was wrong about a number of things yesterday.
10:26
poikilotherm
Good morning Phil :-)
10:32
poikilotherm
You sound disappointed
10:33
pdurbin
I'm actually happy about my bike lock. It's not broken. My nine year old figured out what happened. She's really good at puzzles. I let her have an extra cookie.
10:34
poikilotherm
*thumbs up*
10:37
pdurbin
Also, I was so convinced I had found that smoking gun yesterday about the "affiliation" input field that I had poor sekmiller try it for me before I even got out of my rain gear. But my theory was wrong. So now I'm even more confused.
10:41
poikilotherm
It is not totally clear if GF 4.1 contains a patched JAR or not.
10:42
poikilotherm
Believing in the MANIFEST, it should be affected
10:42
poikilotherm
But GF 4.1 was released in August 2015
10:42
poikilotherm
And fixed libraries were available as of June/July
10:42
poikilotherm
So there might have been a backport
10:43
poikilotherm
I am also confused about that DataversePage.init() I had been looking at yesterday.
10:44
poikilotherm
Maybe my theory was just wrong, but something must be in there containing a bug, otherwise it should not break.
10:47
pdurbin
I guess I was thinking that your theory could still be right but my way of trying to test is wasn't a good test. But maybe your theory is wrong. I have no idea. :)
10:50
pdurbin
I do know that when I set INTERPRET_EMPTY_STRING_SUBMITTED_VALUES_AS_NULL to false like I did in https://github.com/IQSS/dataverse/pull/5908 that a couple issues were immediately resolved on Payara 5. But I didn't test much and you didn't seem interested in testing that pull request at all. Do you still feel the same way?
10:51
poikilotherm
Do you know/like Dr House? Maybe we should try differential diagnosis here.
10:51
pdurbin
By a couple issues I mean 1. no error on the home page and 2. dataverseAdmin user able to log in. But that's where I stopped testing.
10:52
pdurbin
I've never heard of Dr House.
10:52
poikilotherm
https://en.wikipedia.org/wiki/House_(TV_series)
10:53
pdurbin
Oh, I've heard of House but I've never watched it.
10:57
poikilotherm
Let me try getting things testable first here, ok?
10:58
poikilotherm
I need to create a setup useable for development, involving running database etc offloaded
10:58
poikilotherm
My laptop is aching from the load otherwise and killing things
11:10
pdurbin
Sounds good. I just left a couple comments on related issues to bring them up to date.
11:23
poikilotherm
Oh BTW pdurbin. I chatted with a colleague today about an interesting topic
11:23
poikilotherm
(Interesting for Dataverse)
11:23
poikilotherm
They are creating https://open-access-monitor.de/#/
11:24
poikilotherm
Currently, they are using a PostgreSQL database backend for all those publications, etc.
11:24
poikilotherm
Loads of data, ~50 GB
11:25
poikilotherm
They are experiencing deep troubles with performance now, as the data is splitted across 7 or 8 tables, with a large amount of joins to generate reports
11:25
poikilotherm
And they got a huge machine ;-)
11:25
pdurbin
Lots of joins can be a killer.
11:26
poikilotherm
However, they are experimenting with MongoDB now, as every publication is just a document then, with attached metadata
11:26
poikilotherm
That reminded my of a few discussion on the usage of Solr
11:26
poikilotherm
And how inflexible it is for custom metadata
11:27
pdurbin
Well, one can run Solr in a schemaless mode. Perhaps we should look into this.
11:27
poikilotherm
Yeah, but it really suffers from performance issues when I understood Peter Kiraly correct
11:27
poikilotherm
He is eager to come up with sth. about dynamic schemas
11:28
pdurbin
And even if you don't switch modes, if you end fields with _ss or _i or other suffixes the fields will be created dynamically. We do this already for some fields if you look in schema.xml.
11:29
poikilotherm
Ok, good to know
11:29
pdurbin
There are incredible performance issues when indexing dvobjects and especially their permissions but I believe that Solr is entirely innocent. All the bottlenecks are in the Dataverse code.
11:31
pdurbin
At first I thought you were going to say you want Dataverse to switch from postgres to mongo. :)
11:31
poikilotherm
Nope, I was just wondering about Metadata, Indexing and searches
11:41
pdurbin
There's an API endpoint intended to help authors of custom metadata blocks adjust their Solr schema. I wrote about it at http://guides.dataverse.org/en/4.14/admin/metadatacustomization.html#updating-the-solr-schema
11:44
pdurbin
Should custom metadata blocks be based on a standard? If you know of one, please reply to https://twitter.com/philipdurbin/status/1138796295756406784
11:46
donsizemore joined #dataverse
11:48
donsizemore
@pdurbin morning. on solr and performance: Odum's Dataverse runs in a self-contained VM (for VMware HA). the only time i've seen the system load spike was while sophia was teaching a webinar and all the folks playing along at home simultaneously published. the CPU spikes are in solr, at least for us
12:19
pdurbin
donsizemore: interesting. That reminds me. I should show everyone some thoughts on monitoring.
12:19
pdurbin
Thoughts on monitoring: https://github.com/IQSS/dataverse.harvard.edu/issues/18
12:19
pdurbin
What do you all think?
12:21
dzho
chaos monkey when
12:21
dzho
;-)
12:24
pdurbin
Heh. It would be nice to try some chaos monkey style testing. The other day I listened to a podcast about choas engineering at https://thenewstack.io/the-new-stack-context-monitorama-2019/
12:25
dzho
this is all strictly do-as-I-say, tongue-in-cheek for me, tbh though, as I don't have continuous automated monitoring of my personal infrastructure even.
12:26
dzho
which makes the reference to munin in the above link a helpful stir to the to-do pile.
12:28
pdurbin
dzho: I've used Munin on Ubuntu and CentOS. It's pretty easy to install. This is what I wrote about it: http://guides.dataverse.org/en/4.14/admin/monitoring.html#munin
12:28
* dzho
nods
12:29
pdurbin
I don't bother monitoring my home server because I notice soon enough if I can't connect to my IRC client. :)
12:32
dzho
lol, yeah. IRC is my de facto personal infrastructure monitor
12:34
dzho
to the extent that, with this house move, towards which (I hope!) we are finally approaching an end, I've maintaned an ISP connection in each place and at least a small ARM SBC (Raspberry Pi-class machines, in other words) running as a crude monitor.
12:34
dzho
So, at first, when renovations were ongoing in the new house before we moved in, then later as we asymptotically approached moving our stuff out of the old place.
12:51
pdurbin
:)
13:34
pdurbin_m joined #dataverse
13:35
pdurbin_m
dzho: have you seen https://www3.nd.edu/~pbui/teaching/cse.40842.sp19/ ?
14:25
pdurbin
'CSE 40842 is a Computer Science and Engineering elective course at the University of Notre Dame that explores the idea of a "hacker" and the practice of participating in the open source "bazaar".'
15:26
pdurbin
donsizemore: do you think I should try to summon pameyer to help us troubleshoot docker-aio?
17:11
donsizemore
i'm all ears on those tests failing inconsistently
17:12
donsizemore10 joined #dataverse
17:13
pdurbin
Actually, can I show you something else quick?
17:17
bjonnh
pdurbin: for testing docker containers I always do that in VMs
17:17
bjonnh
that I refresh regularly
17:18
bjonnh
just to make sure there are not any left over, that docker is the last release etc
17:21
pdurbin
bjonnh: what kind of VMs? KVM, VMWare, something else?
17:21
andrewSC joined #dataverse
17:26
andrewSC joined #dataverse
17:29
bjonnh
qemu
17:29
bjonnh
qemu/kvm
17:29
bjonnh
that's the easiest solution I found
17:29
bjonnh
using libvirt
17:29
bjonnh
so I can manage my hosts from the same machine, it is light
17:34
donsizemore10
@pdurbin craig is here; we're comparing co-ray-ray and whole tale =)
17:36
jri joined #dataverse
17:38
bjonnh
pdurbin: but anything should work really… Just make snapshots of the VM , do a test, restore VM. I've always had surprises with leftover things
17:53
pdurbin
bjonnh: I did a lot of qemu/kvm at my last job
17:53
pdurbin
donsizemore10: hi Craig!
18:00
pdurbin
bjonnh: I'm not sure I want to introduce another layer. I think CentOS and Docker should be enough without KVM in between. I was actually thinking that maybe we should just restart Docker after every run or something. Or restart Jenkins. Or restart the Jenkins server. :)
18:02
donsizemore10
i can make jenkins blow away everything docker-related
18:04
pdurbin
might be worth a shot
18:04
Paul_Dante joined #dataverse
18:04
pdurbin
but does that mean I should get out of /tmp on your server?
18:05
pdurbin
oh hey Paul_Dante
18:05
Paul_Dante
Hi Phil
18:05
pdurbin
Paul_Dante: ready to talk about https://github.com/IQSS/dataverse/issues/5730 ? :)
18:05
donsizemore10
@pdurbin the webhook is listener is disabled so you can go to town
18:05
Paul_Dante
yup
18:05
pdurbin
donsizemore10: one moment please
18:06
bjonnh
pdurbin: docker-compose allows you to keep things clean as well
18:06
bjonnh
pdurbin: docker-compose down and everything goes away
18:07
bjonnh
it creates networks, volumes, containers as needed, allows updates and when you put down, it kills everything
18:07
bjonnh
(and it is integrated with docker)
18:08
pdurbin
bjonnh: awesome but lemme chat with Paul_Dante about metadata exports quick
18:09
pdurbin
Paul_Dante: so we're both running curl http://localhost:8080/api/admin/metadata/reExportAll
18:09
pdurbin
And it works like I expect.
18:09
Paul_Dante
I was under the impression that localhost:8080/api/admin/metadata/exportAll would export all of the metadata records from the DV; am I misunderstanding what that endpoint does?
18:10
pdurbin
That sounds like a nice feature but no, that's not what it does. :)
18:10
Paul_Dante
That would explain why it isn't doing what I expected :)
18:11
pdurbin
If I go into /usr/local/glassfish4/glassfish/domains/domain1/files and delete all the cached metadata export files like export_ddi.cached and friends and then run that reExportAll the export files will be regenerated.
18:11
pdurbin
Paul_Dante: but what you want sounds like a nice feature. You already wrote some code for this?
18:13
Paul_Dante
Sort of. I wrote a multi-step process to do that. 1) Search for all records 2) Grab the DOIs out of those search results 3) Iterate through all those DOIs to use for a download of each specific record.
18:13
pdurbin
Yeah, that sounds like what I'd do. :)
18:14
pdurbin
What language are you using?
18:14
Paul_Dante
Java
18:14
pdurbin
Interesting. Are you using https://github.com/IQSS/dataverse-client-java ?
18:16
Paul_Dante
No, I've just been running Java-generated commandline calls.
18:16
pdurbin
gotcha
18:17
pdurbin
So do you want to create a new issue for the feature you want? Or are you good with your solution?
18:18
Paul_Dante
That client would allow me to more cleanly make calls to DV within my java program?
18:18
Paul_Dante
I'm content with my current solution.
18:21
pdurbin
Paul_Dante: probably more cleanly, yes, but I've never used it. It used in production by RSpace, who wrote it when they integrated with Dataverse. Should we link to your current solution from http://guides.dataverse.org/en/4.14/api/apps.html#java ? If it's open source, that is.
18:25
Paul_Dante
My code is all open source, so won't be a problem on that front, but my current solution is a bit distributed through my code at the moment, so wouldn't currently be very helpful to other people. If I find some free time I'll make a more succinct solution that would actually be readable.
18:26
pdurbin
Paul_Dante: cool, if you ever want to edit that page, here's how it looks as of 4.14: https://github.com/IQSS/dataverse/blob/v4.14/doc/sphinx-guides/source/api/apps.rst
18:28
Paul_Dante
:thumbs up:
18:28
Paul_Dante
Thanks for the chat.
18:29
pdurbin
Sure! Thanks for stopping by! Do you want to close your issue? Or help us with the docs? :)
18:36
pdurbin
donsizemore10 bjonnh I'm talking to Pete a bit on Slack about my docker-aio on CentOS woes. I wonder if I have the same problem on Mac. I guess I should confirm this.
20:23
pdurbin
donsizemore10: so, I'm being careful to do a docker rm before every run and I've had three successes in a row.
20:46
pdurbin
ok, I'm out of here
20:46
pdurbin
Have a good weekend, everyone!
20:47
pdurbin left #dataverse
22:04
pdurbin joined #dataverse
22:05
pdurbin
p.s. Dataverse 4.15 is out: https://github.com/IQSS/dataverse/releases/tag/v4.15 :)
22:05
pdurbin left #dataverse
22:42
jri joined #dataverse