07:30 juancorr joined #dataverse
07:46 juancorr joined #dataverse
08:42 Benjamin_Peuch joined #dataverse
08:43 Benjamin_Peuch Hello everybody.
08:43 Benjamin_Peuch Does anyone here use Dataverse's Provenance feature with PROV-JSON files?
08:59 jri joined #dataverse
10:02 poikilotherm donsizemore: FZJ has
10:02 poikilotherm donsizemore: please remember to allow Github, too ;-)
10:03 poikilotherm Dunno if they have public known ranges
10:03 poikilotherm Wow they have an API for that!
10:30 stefankasberger joined #dataverse
10:32 stefankasberger Hey gyus. Am in the middle of the Dataverse upgrades. Have set up some nice pytest selenium tests for Dataverse. We also created some test data, for which i wrote some pyDataverse scripts to handle them. So I am pretty happy about what is going on. I guess, some of this could be interesting to some of you. Will try to open up as much as possible after the upgrade. Not that easy, because some internal stuff is inside (copyr
10:32 stefankasberger ight issues, security issues).
10:50 poikilotherm Hi Stefan :-)
10:51 poikilotherm Good to hear from you and that you're making progress.
10:51 poikilotherm Wish you all the best :-)
11:05 pkiraly joined #dataverse
12:48 GustavoMartins joined #dataverse
12:54 poikilotherm Hi GustavoMartins
13:09 GustavoMartins Hi poikilotherm
13:09 poikilotherm How may we help you?
13:10 poikilotherm Not many guys from IQSS around due to vacation...
13:20 GustavoMartins I'm new to the Dataverse world. Currently I'm studying the docs, forums, GitHub issues and IRC logs to learn and see if I can contribute in any way.
13:28 poikilotherm :-D You're most welcome
13:28 poikilotherm Usually a few people from around the world hang out here
13:28 poikilotherm At least a part of the community exchanges thought and ideas, helps others with their installation and problems etc
13:29 poikilotherm Also lots of dev talk
13:29 poikilotherm As our "Chief Community Officer" (I call him that) Philip Durbin is not here, it's a bit quiet these days
13:29 poikilotherm He'll be back next week
13:30 poikilotherm If you have any questions, just paste them here. We're all pretty responsive, but sometimes timezones are difficult.
13:32 poikilotherm Oh and there are a ton of options how to contribute.
13:35 pkiraly @GustavoMartins: in the issue queue you can find tickets labelled as "Help wanted: ..." In those tickets helps is extremely welcome. You can help in reproducing things, revising/modify documentation or with coding
13:38 pkiraly GustavoMartins: if you are familiar with Solr, you can take a look this one: It requires a change in Solr config and we are waiting for feedback on whether the suggested change works for you or not.
13:39 poikilotherm pkiraly: is this a working PoC?
13:41 pkiraly poikilotherm, yes
13:41 poikilotherm Nice!
13:42 poikilotherm I could create a container image flavor with it if you want
13:42 poikilotherm Oh wait, that needs a Dataverse image too
13:43 pkiraly poikilotherm: and it is a preliminary step to improve Dataverse usage of Solr in schema management, i.e. a smoother management of custom metadata blocks
13:43 poikilotherm Nice!
13:54 poikilotherm pkiraly I read the issue again - are you saying it's sufficient to switch to ManagedSchemaFactory?
13:55 poikilotherm No changes to Dataverse code necessary?
13:57 pkiraly No Dataverse changes needed
13:57 poikilotherm So what happens when the metadata schemas change?
13:57 poikilotherm How does Solr deal with changed fields, etc?
13:58 poikilotherm Can I transparently switch from the old way to the new?
13:58 poikilotherm (Without needing to reindex)
13:59 pkiraly In Solr you can change the schema in 3 ways (after this change): 1) the same way as before (modifying the managed-schema file - now the same file is called schema.xml) 2) via the API 3) via the Solr admin UI
14:00 poikilotherm Aha! So switching to the other factory is a necessity to change the Dataverse code to make use of the API
14:00 pkiraly Theoretically no need to reindex after this change, because we do not change the index, neither the schema.
14:01 pkiraly poikilotherm: yes. Right now the Solr's Schema API is turned off. When it will be on, we can modify Dataverse to use this API.
14:02 poikilotherm Does the change deal with the XML includes=
14:02 poikilotherm ?
14:04 pkiraly I don't know. I did not tried it
14:05 poikilotherm Do you feel like creating an issue for me at IQSS/dataverse-kubernetes?
14:06 poikilotherm I'd be willing to create a flavor for it. Will need some tweaks to some scripts for updating the new file, as long as the API is not used
14:06 pkiraly OK, I'll create it
14:06 poikilotherm Thx
14:09 Benjamin_Peuch Thanks for the info about Philip's whereabouts, poikilotherm. Calling him a CCO sounds just right to me. :)
14:09 Benjamin_Peuch (As long as it's not CC0. :p)
14:10 poikilotherm That#s why I sent him
14:18 Benjamin_Peuch Hahaha, that's one awesome gift.
14:18 Benjamin_Peuch You got the font of the Dataverse logo just right.
14:19 Benjamin_Peuch Do you remember which one it is?
14:22 poikilotherm Lemme go looking
14:28 poikilotherm Ok the Western like font type is "Clarendon Condensed"
14:28 poikilotherm And the other one is "Myriad Pro"
14:28 poikilotherm IIRC both by Adobe
14:28 poikilotherm Found free supplements
14:45 donsizemore joined #dataverse
14:47 donsizemore @poikilotherm just added you to our "jenkins" zone =) and yes allowed webhook ranges as specified at
14:47 poikilotherm Nice!
14:47 poikilotherm Thanks @donsizemore
14:48 poikilotherm Anything we could do about re-enabling access for everyone?
14:49 donsizemore the problem (at minimum) is that even with read-only access everything is served through the tomcat webapp, and bots managed to keep both CPUs pegged at 100% and exhaust its resources
14:50 donsizemore better to focus on publishing test results to github (which i'm doing in the background)
14:50 donsizemore or otherwise take a Bloxsom publishing model (scripts write flat HTML to serve publicly)
15:01 poikilotherm donsizemore: are you seeing requests from single bots or more like a DDoS pattern?
15:07 Benjamin_Peuch Thanks for the names of the fonts, poikilotherm.
15:08 poikilotherm Asking because of NGINX options limit_req and limit_req_zone helping with rate limiting without fiddling with firewall rules
15:08 poikilotherm Benjamin_Peuch: sure. No problem. Took me the help of to find best matches ;-)
15:12 Benjamin_Peuch Oh, I thought you knew because you had designed the badge?
15:13 Benjamin_Peuch Since Philip thanked you personally.
15:14 donsizemore @poikilotherm they also break page rendering
15:15 poikilotherm Benjamin_Peuch: yeah, I did, but I used the official Dataverse logo for that. And the SVG containing it only has pathes, no font information left. So I had to reverse-engineer ;-)
15:15 Benjamin_Peuch Clever. :o
15:16 poikilotherm Once I had the fonts, I could add the other text elements ;-)
15:16 stefankasberger @all: Short question regarding solr upgrading: We need to update Solr from 4.6.0 to 7.3.1. Would you recommend to upgrade, or to do a fresh new install? I have no experience with Solr so far, so I don't know in detail how it works inside, and together with Dataverse.
15:19 poikilotherm Stefan let me send you a few links regarding solr upgrades
15:20 poikilotherm
15:21 poikilotherm
15:21 poikilotherm Depending on the number of datasets, you might be better of doing a complete reindex with Solr 7
15:21 poikilotherm s/of/off/
15:22 poikilotherm It does take a while, but you might end up doing it anyway
15:23 donsizemore joined #dataverse
15:23 poikilotherm Kevin mentioned in that it takes ~18h to reindex Harvard, which is ~6k datasets IIRC
15:24 donsizemore @poikilotherm @stefankasberger doesn't have a huge number of datasets. the simplest thing is to do a clean install of 7.3.1 and reIndexAll
15:25 poikilotherm stefankasberger: what donsizemore says :-D
15:26 poikilotherm donsizemore: regarding breaking page rendering: wouldn't those bots be blocked by nginx rate limiting before they reach tomcat?
15:27 donsizemore @poikilotherm yes, but to impose any effective limit on the bots also prevents browsers from loading all the various icons on jenkins views
15:27 poikilotherm O.O
15:28 donsizemore (which is to say, yesterday I tried a limit of 3 requests/second, then removed the limit)
15:28 donsizemore and to make the limit effective against bots the limit would need to be much more stringent
15:28 poikilotherm What about adding a cache?
15:29 poikilotherm So the icons etc would be served from cache, not Tomcat
15:31 poikilotherm I heard lots of good things about Varnish
15:36 donsizemore nginx was blocking the icons (requests per second)
16:56 stefankasberger joined #dataverse
17:04 juliangautier joined #dataverse
17:06 juliangautier Hi everyone! I used vagrant up to get a copy of Dataverse on my laptop, but can't figure out what the default username and password is. Would anyone here know or have any guesses? I've tried admin, admin1 and dataverseAdmin in all sorts of combinations :)
17:10 stefankasberger Thanks @poikilotherm and @donsizemore. Will do the ReIndeAll with our 140 datasets. :)
17:11 poikilotherm juliangautier: it should be user "dataverseAdmin" and password "admin" or "admin1"
17:11 poikilotherm stefankasberger: great! :-)
17:14 juliangautier poikilotherm: Thanks! I'll try that out
17:35 stefankasberger joined #dataverse
18:46 donsizemore joined #dataverse
18:47 donsizemore @juliangautier if you used vagrant, look in tests/group_vars/vagrant.yml for dataverse.adminpass (probably "admin1" as @poikilotherm says)
19:01 juliangautier donsizemore: Thanks!
19:25 stefankasberger joined #dataverse

Connect via to discuss Dataverse (, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.