IRC log for #dataverse, 2019-12-19

Connect via to discuss Dataverse (, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

08:40 jri joined #dataverse
10:22 sivoais joined #dataverse
10:33 stefankasberger joined #dataverse
11:12 Youssef_Ouahalou joined #dataverse
11:15 Youssef_Ouahalou Hello everyone, when we add a new metadata, how do we make it appear in ddi output?
11:51 bro joined #dataverse
11:56 pdurbin Youssef_Ouahalou: well, it should appear in the native JSON output. Does it?
11:58 Youssef_Ouahalou Yes it appears in json
12:01 pdurbin Good. The native JSON output is flexible in that all the values are shown. All the other formats included DDI are hard coded. Does the field you're adding map to DDI cleanly? Is there a place for it in DDI?
12:04 Youssef_Ouahalou Can I add it manually?
12:07 pdurbin You would have to fork the code and edit some Java. Could this field be used by other installations of Dataverse? Or is it somehow specific to your institution? I'm wondering if making a pull request makes sense, to bring the feature to everyone.
12:08 MrK joined #dataverse
12:10 Youssef_Ouahalou I think it is specific to our institution, it would be interesting if it were more flexible
12:13 Guest33048 Hi everyone! I have a question about the controlled vocabularies and their translations. I've manage to translate some of my controlled vocabularies by doing as said in the doc ( but I couldn't manage to make it work for values that have numbers or special characters. So I wondered if I could use the identifier from the controlled vocabulary part
12:15 pdurbin Guest33048: that's... strange. Numbers and special characters should work, I think. Can you share your TSV file?
12:15 pdurbin Youssef_Ouahalou: are you saying it would be interesting if DDI is more flexible?
12:16 Youssef_Ouahalou To enter new ddi fields
12:20 Guest33048 pdurbin: oh, then I guess I did somethings wrong (I've made many tries for different problems so I think I must have made a mistake). I'll try again. The only things to convert are spaces and uppercase right? So, for example, "Word 1: word2" should become "word_1:_word2"?
12:20 pdurbin Right but DDI is a standard (unlike Dataverse's native JSON format). So you have to map your institution-specific field to the standard somehow, right? You can't just invent a DDI field, I think. That said, there was a recent thread on the mailing list about some upcoming flexibility in DDI. Let me go find it, Youssef_Ouahalou
12:22 pdurbin Guest33048: I'm sorry but without seeing your TSV I'm having a little trouble picturing in my head what it looks like. But it's also early and I just had my first sip of coffee. :)
12:25 pdurbin Guest33048: you'd be very welcome to create a GitHub issue and attach your TSV to it. You might need to add .txt to it in order to attach it:
12:26 Guest33048 pdurbin: I've put an extra underscore in my property file. Sorry for the inconvenience! However can you confirm that the identifier in the TSV cannot be used as a property name?
12:27 pdurbin Youssef_Ouahalou: this is the thread I'm thinking about but it has more to do with supporting more scientific disciplines than supporting fields that are specific to an institution: ("custom metadata blocks now easier to spin up and evaluate") ... Guest33048 you may be interested in that thread too.
12:29 pdurbin Guest33048: when I'm at my desk it would be easier to confirm. Can you please email so that I or someone else can get back to you about this?
12:31 Guest33048 pdurbin: I'll do that. Thank you for everything, and for the thread that I bookmarked :)
12:31 Youssef_Ouahalou Ok thank you very much i will read it,thank you for your answers
12:32 pdurbin Guest33048: thanks! If you have any questions about that thread, please let me know! There's a lot to unpack. :)
12:32 pdurbin Youssef_Ouahalou: perfect, thanks. Also, I think your boss emailed me. I'm reading it now. :)
12:34 Youssef_Ouahalou Hahhaha yes I think ☺
12:40 pdurbin Youssef_Ouahalou: I just found this:
12:44 Youssef_Ouahalou Is this about your presentation?
12:50 pdurbin Well, the lightning talks still haven't been announced. So I still don't know if my talk has been accepted. I'm just poking around the FOSDEM site trying to figure out where it might be good to spend some time. There's also a room for Java and a room for PostgreSQL.
12:52 Youssef_Ouahalou yes I also saw that, it could be interesting. Hoping that your speech will be accepted
12:52 poikilotherm joined #dataverse
12:55 poikilotherm Mornin' guys. What's up?
12:56 MrK Hi
12:56 Youssef_Ouahalou Morning nice and you
13:01 Benjamin_Peuch joined #dataverse
13:03 Benjamin_Peuch joined #dataverse
13:04 Benjamin_Peuch boop
13:04 Benjamin_Peuch Oh this works now. Hello everybody, this is Ben from the State Archives of Belgium
13:05 Benjamin_Peuch Hello Philip. I'm Youssef's coworker :)
13:06 poikilotherm Welcome Benjamin_Peuch
13:09 Benjamin_Peuch Thanks, poikilotherm. We are glad that we finally got down to setting up a Dataverse and seeing how far we can adapt it to our needs
13:10 Benjamin_Peuch We are planning to launch it in a couple of months. Youssef has worked hard to this end, and I'm told he got a lot of support from you all
13:20 pdurbin Benjamin_Peuch: hi! I got your email!
13:21 pdurbin Yes, lot of us in here are helping Youssef_Ouahalou and others get their installations of Dataverse launched. :)
13:22 pdurbin stefankasberger: hey, did you get my "mysterious message from Vienna" email? :)
13:23 Benjamin_Peuch We greatly appreciate the support, pdurbin. :)
13:24 Benjamin_Peuch Youssef told me about the exchange you just had about adding extra fields in Dataverse.
13:24 Benjamin_Peuch We do plan to do this properly, so the output in DDI would absolutely be compliant with DDI-Codebook 2.5.
13:25 Benjamin_Peuch I must say that some of those extra fields we want to add are meant to transfer to another XML language, Encoded Archival Description (EAD), for archival purposes.
13:26 Benjamin_Peuch Still, we have identified which DDI elements we can use as intermediary vessels to this end. They would simply be <notes> element in <stdyDscr>'s <citation>.
13:26 pdurbin Benjamin_Peuch: I'm flying back at 6am on that Monday. Friday *might* work. I'm arriving from Lisbon/PIDapalooza at 6:15am on that Friday. I wonder if it would make sense to host a "fringe" event about Dataverse at your institution: . I need to spend more time looking at my calendar though. :)
13:26 Benjamin_Peuch That would be wonderful! We would make sure to prepare a lot of coffee. :)
13:27 Benjamin_Peuch We know of Dataverse users in France, Austria (hello stefankasberger!) and the Netherlands. They might be interested, especially if they plan to attend FOSDEM'20?
13:27 pdurbin My hotel is only a 29 minute walk from your institution (though I'd probably take public transportation). :)
13:27 Benjamin_Peuch Handy!
13:28 pdurbin I've been waiting to start a thread on the dataverse-community list about FOSDEM until I know if my lightning talk has been accepted or not. A lot of the rest of the schedule is up already. Longer talks, dev rooms, etc.
13:28 Benjamin_Peuch I should also mention we took good note of the advice in DV's manual, and we do plan to publish our metadata input.
13:29 pdurbin MrK: so this is another potential meetup in Europe. Perhaps. If we can pull it together. :)
13:29 Benjamin_Peuch Yes... Fingers crossed. I really hope you got to make this presentation.
13:30 Benjamin_Peuch *get
13:30 pdurbin Benjamin_Peuch: the thing I'm trying to figure out about your custom metadata fields is this... Can they be used by other installations of Dataverse? If so, we should work toward a pull request so everyone can benefit.
13:30 Benjamin_Peuch That is also a thing indeed, pdurbin.
13:31 pdurbin Otherwise, I fear you'd be forced to run a fork.
13:31 Benjamin_Peuch We had quite a few modifications for Dataverse in the gears until we realized that it had not been developed so as to be very heavily customized, especially regarding the core metadata (Citation and Terms essentially, I would say).
13:32 Benjamin_Peuch I must admit we interpreted the notion of open source a bit too freely. Indeed that was also my conclusion after you voiced concerns about departing too much from the original software (and our head of IT developments was of the same mind): it would pretty much amount to forking.
13:32 pdurbin Ok. Lots of people have forked Dataverse. You aren't alone. But ideally, we would merge changes into upstream, if we can.
13:34 Benjamin_Peuch Yes, that was also our plan. We thought we would take another approach and publish information about our needs (as archivists, data specialists) and that of our users (we only have a few at the moment, but they're qualified social scientists) so that it's open for discussion, and we can see from here what can be integrated in DV in the short, medium or long term.
13:34 Benjamin_Peuch I know Youssef already opened a few issues on your GitHub and I don't want to spam you, so I'm going to look into it and synthesize our points.
13:35 stefankasberger @pdurbin: Yes. It was Lars Kaczmirek, CEO of AUSSDA (my boss).
13:35 pdurbin stefankasberger: oh! I never got a follow up email from him. Thanks!
13:36 pdurbin Benjamin_Peuch: honestly, often is easier to get quick answers here than wading through old GitHub issues. But you're welcome to go wading. :)
13:37 Benjamin_Peuch Oh okay. Does that include saying "Hey, we want this! Implement that!"? :p
13:38 pdurbin Benjamin_Peuch: something else I'd like to put on your and Youssef_Ouahalou's radar is that DANS (I think) is transforming Dataverse's native JSON to various XML using a crosswalk outside of Dataverse. It's on GitHub somewhere.
13:38 stefankasberger Yes, cause I will get in touch with you, once i have time. Would it be possible to have a short call tomorrow to talk about some things?
13:38 poikilotherm pdurbin will most likely say: please open an issue for that @Benjamin_Peuch
13:38 pdurbin stefankasberger: tomorrow is looking quite open for me. After that I'm on holiday until Jan 2.
13:39 Benjamin_Peuch Oh that sounds very interesting. Thanks pdurbin. I assume they might be doing this with the DataverseEU project.
13:39 pdurbin Oh, and we have Zoom at Harvard now, which is nice. I hosted my first Zoom with the new maintainer of dataverse-client-r on Tuesday.
13:39 stefankasberger Me the same. I will get in touch with you, okay?
13:41 pdurbin stefankasberger: sure, let's pick a time.
13:41 pdurbin Benjamin_Peuch: I'm having trouble finding the crosswalk repo :(
13:41 stefankasberger 11/12h CET?
13:42 donsizemore joined #dataverse
13:42 pdurbin If memory serves, Slava was transforming Dataverse's native JSON to an XML format used in their in house system called EASY.
13:42 Benjamin_Peuch pdurbin: That's okay. I'm in touch with Marion and Slava. I can ask them directly. Thanks. :)
13:42 pdurbin Benjamin_Peuch: great. If you find the repo, please open an issue so we can add it to the guides.
13:50 Benjamin_Peuch Will do.
13:58 Benjamin_Peuch left #dataverse
14:02 donsizemore @pdurbin may i pester you about the python3 installer when you have a minute?
14:10 poikilotherm donsizemore: could you please just use ansible for that and bundle it?
14:12 MrK pdurbin: what kind of meeting :P?
14:12 poikilotherm donsizemore and pdurbin: or maybe use sth like to create real packages, with ansible behind the scenes?
14:13 poikilotherm That's how gitlab does it: they provide packages with a bundled chef installer doing all the hard work for them
14:31 stefankasberger joined #dataverse
14:50 pdurbin Wow, -5F (-21C) with wind chill. Glad I wore snow pants to bike in. I hear in Australia they have record heat.
14:51 pdurbin donsizemore: please pester away
14:51 pdurbin MrK: nothing concrete yet. :)
14:59 donsizemore @pdurbin so, one of the long-running criticisms of the ansible role is that it's not idempotent, because it's a wrapper for the dataverse installer, which itself is not idempotent
15:00 donsizemore @pdurbin i can remove some portions of the ansible role to allow the installer to do it all (fine) but that's a step backwards from an idempotence perspective
15:01 pdurbin yikes
15:01 pdurbin So how do we move forward instead of backward? :)
15:02 donsizemore regarding the installer, one first step might be the addition of a --no-db flag (in addition to the --db-only) flag
15:03 donsizemore under perl (and now python) the installer insists on creating the database as an admin user and creating the tables up front
15:04 pdurbin Well, aren't the tables created by the deployment of the war file? I'm agreeing with you but trying to get more specific.
15:04 donsizemore this would begin to pave the way for normalizing the upgrade process
15:04 donsizemore the perl installer crabs "a database exists! i'll only install onto a squeaky-clean postgres" or something similar
15:05 donsizemore so a --no-db flag would relieve the first such blocker to repeated runs. (or i can just let ansible call the script directly, but thought i'd ask)
15:06 pdurbin donsizemore: should we get on a Zoom with Leonid?
15:07 donsizemore that'd be fine with me, just thinking about repeated installs, ("using Ansible to manage upgrades" as has been long requested), and direction/design
15:09 pdurbin Sure. And he might want to pick your brain about the rewrite. He started from your work. Also, I recently heard about a POP concept I'd like your take on. Let me go find it.
15:10 MrK pdurbin: Installer is now in python?
15:10 pdurbin MrK: in development. Do you prefer Perl?
15:10 MrK pdurbin: No I'm happy its in python :p
15:11 pdurbin donsizemore: here's where I heard of POP: Making Complex Software Fun And Flexible With Plugin Oriented Programming - Episode 240 -
15:12 pdurbin My question is if you (and others here) think it's overkill to consider a POP architecture for the installer.
15:20 donsizemore i mean, this is proper design =) but as you say, the installer is just an installer.
15:21 pdurbin yeah
15:21 pdurbin Let me walk down the hall and see if Leonid is around. And if anybody made coffee.
15:26 poikilotherm joined #dataverse
15:30 pdurbin donsizemore: good news. There's coffee and Leonid is happy to do a video call tomorrow any time after standup. Will you be around? Should we invite poikilotherm and MrK and anyone else who has strong opinions about the installer? :)
15:30 poikilotherm :-D
15:31 poikilotherm The strongest opinions are about those pesky resource creations inside glassfish
15:31 pdurbin MrK poikilotherm: Mozilla just announced they're moving from IRC to Matrix:
15:31 poikilotherm Everything is very likely not to tangle me for K8s ;-)
15:33 poikilotherm +else
15:34 donsizemore @pdurbin i have a lunch close to noon but otherwise i'll be here
15:35 pdurbin donsizemore: ok, should we squeeze it in before your lunch?
15:35 donsizemore @pdurbin fine by me. it's just a design question, really (and i like to pick a path before picking up the rake and lopping shears)
15:39 pdurbin donsizemore: cool. I just started a doc with talking points. Can you please add some bullets?
15:40 donsizemore @pdurbin that's my basic question =) and it's not really my question (I pretty much use Ansible as a one-shot installer) but a bunch of community members have asked for idempotence
15:42 pdurbin Ok. Well, I already put a bug in Leonid's ear about idempotence. And he mentioned there are some flags already that might help. If we think of other talking points, we can add them. If not, we can just use that doc for notes.
15:46 poikilotherm pdurbin: it was too tempting to add some points ;-)
15:48 poikilotherm pdurbin: I don't think I can make it - 1130 Boston = 1730 over here...
15:51 pdurbin poikilotherm: that's why I created the doc. Thanks! Can you put your name next to your questions?
15:51 poikilotherm Done
15:52 pdurbin thanks!
15:52 poikilotherm Shall I do a quick browsing through the installers to nail down some more questions?
15:53 pdurbin If you mean the new branch, sure, please take a look:
15:53 poikilotherm No I meant the old installer and its pain points
15:53 pdurbin You're looking for pain in that old Perl script?
15:53 poikilotherm Nope
15:54 poikilotherm Things like etc
15:54 poikilotherm Those things have to DIE, IMHO
15:54 Benjamin_Peuch joined #dataverse
15:56 pdurbin Well, at least it's possible to install Dataverse from the command line. Better than forcing people to use a GUI. :)
15:56 poikilotherm :-D
15:56 pdurbin But yeah, let's make the installation process simple and awesome.
15:57 Benjamin_Peuch :thumbup:
15:57 pdurbin Are we inspired by any software that's simple to install?
15:57 poikilotherm It would be awesome to have a cli tool called "dataverse-adm"
15:57 poikilotherm Which does bundle simple things for you
15:57 poikilotherm Like those curls to block api endpoint, set unblock key etc
15:59 poikilotherm Or activate FAKE provider
15:59 poikilotherm Hmm wondering if this might be a good fit into pyDataverse
16:01 pdurbin sort of like asadmin?
16:01 poikilotherm Yeah
16:01 pdurbin And it calls into Dataverse APIs?
16:01 poikilotherm Don't make people lookup curls, give them small helper tools
16:02 poikilotherm This is all heavily inspired by Gitlab...
16:02 pdurbin great idea
16:03 poikilotherm Lots of installer stuff could be done in such a tool
16:03 poikilotherm The new installer is another HUGE beast, which is not easy to maintain
16:03 pdurbin What does GitLab write their adm tool in?
16:05 pdurbin poikilotherm: did you see my post above about POP? Would that help make it less of a HUGE beast? Or should we use some other language besides Python? How can we make the new installer awesome?
16:06 poikilotherm Gitlab seem to have a wrapper around rake tasks, so this is Ruby
16:06 pdurbin bleh, ruby ;)
16:07 pdurbin So you have to gem install it?
16:07 poikilotherm I'm not saying this should be done in ruby
16:07 poikilotherm Nope, in most cases (if not using K8s or similar) you will use packages (rpm/deb) and this is a simple binary shipped in the packages
16:08 poikilotherm Its just a wrapper, you could use rake from a bundle install (as a gem)
16:10 donsizemore @pdurbin did you want? can we push it back until afternoon EST?
16:11 pdurbin donsizemore: sure, afternoon is fine. It's too late for poikilotherm anyway. What time works for you. Not too late on a Friday before a more-than-a-week holiday, please. :)
16:12 pdurbin poikilotherm: ok, just a yum install or an apt-get install away. Good.
16:14 poikilotherm pdurbin: I added a note about my idea to the doc
16:15 * pdurbin looks
16:17 donsizemore @pdurbin any time after say 1300 should be fine?
16:23 poikilotherm Read you guys tomorrow
16:43 pdurbin donsizemore: at standup just now Gustavo reminded us that you're on Slack. I just started a direct message thingy with you, me, Leonid, Kevin, and Gustavo.
19:17 pdurbin Looks like Leonid just left a comment here:
19:44 bricas joined #dataverse
19:56 dataverse-user joined #dataverse
19:56 dataverse-user hi for all
19:56 pdurbin how's it going, dataverse-user? :)
19:59 dataverse-user fine thanks,  have a question about harvard dataverse. If i want to storage the research data of my project in the harvard dataverse repository. how much its it? ,
20:00 pdurbin Harvard Dataverse offers free data hosting if the data is under a certain size.
20:03 pdurbin Harvard Dataverse is launching a new support/about/reference website soon that will answer questions like this. It's not quite ready but here's the issue where the work is being tracked:
20:04 pdurbin dataverse-user: does that help?
20:32 dataverse-user @pdurbin: what happend if the data its different size or new institution not in  consortium its required dataverse storage?
20:37 pdurbin I think the quota is 1 TB right now. Your institution does not need to be part of any consortium. Individuals can even upload data.
20:44 pdurbin donsizemore: this is new to me, might be of interest:
21:02 pdurbin dataverse-user: I assume you're talking about . Not sure.

Connect via to discuss Dataverse (, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.