IRC log for #dataverse, 2018-10-18

Connect via to discuss Dataverse, an open source web application for sharing, citing, analyzing, and preserving research data

07:21 jri joined #dataverse
07:31 jri joined #dataverse
09:43 tcoupin joined #dataverse
09:49 tcoupin Hi! I am trying to customize citation metadatablock, especialy by adding french term in a controlled vocabulary
09:49 tcoupin I have some issue with accents
09:50 tcoupin my tsv is utf-8 encoded, I use "Content-type: text/tab-separated-values;charset=utf-8" in my curl command
09:51 donsizemore joined #dataverse
09:51 tcoupin Does any french guys have same trouble?
09:55 tcoupin joined #dataverse
10:02 tcoupin update, I still have encoding with fresh installation
10:22 poikilotherm joined #dataverse
10:33 pdurbin tcoupin: does your update mean everything is working now? Or are you still having trouble?
10:33 tcoupin I have trouble with citation.tsv from release zip
10:34 pdurbin Trouble when you try to add a French word to a controlled vocabulary?
10:35 poikilotherm Morning guys :-)
10:35 pdurbin morning
10:40 poikilotherm I'm fiddling with the Payara 5 stuff...
10:41 poikilotherm Currently I'm evaluating a multistage build Dockerfile for this, based on the payara5 full image
10:41 poikilotherm And building within Docker
10:42 poikilotherm Do you guys think this could be a good way? After reading through some stuff about maven and docker, it could be a good idea to avoid docker plugins for maven because of some struggles in the past... Docker seemed to have a API breaking tendency and the plugins will break then, too
10:43 poikilotherm This is kind of similar to what the vagrant based stuff does right now
10:43 poikilotherm Maybe that way you might feel better ;-)
10:44 pdurbin I've never even heard of multistage Docker. One thought is that Payara 4 would be a smaller step.
10:44 poikilotherm Which container to use is just a matter of what image you choose... ;-)
10:51 poikilotherm And using Payara 4 first might a good idea anyway... Release Notes state "Currently, you cannot drop-in and use a Payara 4 domain.xml with Payara 5."
10:53 pdurbin Ah. Yeah, the smaller step to 4 is probably better, then.
10:54 poikilotherm ;-)
10:55 pdurbin This issue is about moving from Java EE 7 to Java EE 8, which can be thought of as moving from Payara/Glassfish 4 to 5:
10:58 poikilotherm Thx pdurbin. But as long as we only depend on Java EE 7, it shouldn't matter that much if we use Payara 4 or 5, right?
10:58 poikilotherm (At least in theory)
10:58 poikilotherm (And only from the Java EE side of the view...)
11:01 poikilotherm pdurbin is the stuff under conf/docker in use somewhere at Harvard? Just asking because if not, I would start things there, not in a separate directory
11:01 poikilotherm Clean up old stuff... ;-)
11:03 poikilotherm AFK for a couple of minutes... :-)
11:06 pdurbin poikilotherm: at Harvard we only use Docker for testing and only here and there, really. Slava is actively pushing images to Docker Hub:
11:07 pdurbin And yes, a Java EE 7 app should run fine on either Payara 4 or 5 but who knows about our app. :)
11:17 poikilotherm I saw the dataverse-docker repo, but I try to come up with a more minimal approach...
11:17 poikilotherm But I will definitly try to get inspired by the composer script... ;-)
11:20 pdurbin Yeah, Slava's stuff is oriented toward Kubernetes, which I wouldn't call minimal. It's great but I'm still banging rocks together.
11:21 pdurbin poikilotherm: if you want you could start by changing this line from glassfish 4 to payara 4:
11:21 pdurbin That "docker-aio" (all in one) is what we use for testing sometimes.
11:22 pdurbin See
11:23 Jim_ joined #dataverse
11:24 poikilotherm pdurbin pameyer told me on IRC some days ago that he thinks the docker-aio is pretty much messed up and quite hacky. I'll come up with a cleaned up approach, taking things from docker-aio, dataverse-docker and other ressources
11:24 poikilotherm I hope that will make a thin solution reusable for different purposes
11:25 pdurbin Meh. I don't think it's messed up for the purpose of testing if the API test suite passes with Dataverse deployed to Payara rather than Glassfish. It seems like a good way to test this. Or Vagrant.
11:26 pdurbin For Vagrant you would change Glassfish to Payara at
11:27 pdurbin With Payara I'm assuming we won't have to patch Weld and Grizzly anymore, which would be great.
11:29 pdurbin I used docker-aio the other day to test and it was very helpful to have.
11:31 poikilotherm pdurbin: sure :-)
11:32 poikilotherm Just trying to improve with my goal of running a production setup with docker/kubernetes and being able to test things easier than with an even more bloated docker-aio
11:32 poikilotherm and maybe introducing things like Arquillian one day :-D
11:32 poikilotherm But small steps it is, right?
11:37 pdurbin Why would docker-aio become more bloated? Are you thinking you'd want to try to keep Glassfish 4 in there? I would take it out and replace it with Payara 4. And take out the patching of Weld and Grizzly. It should get less bloated.
11:38 pdurbin And more secure.
11:38 poikilotherm It get's bloated by stuff needed for more integration tests like S3 etc
11:39 poikilotherm Or the PID provider things we have in mind
11:40 poikilotherm Maybe other stuff like the R integration could use some testing, too - dunno about that, didn't look into it, just guessing
11:41 pdurbin Oh. docker-aio doesn't currently test S3 or R. And only a single PID provider. I don't think we plan to add these there. I don't know.
11:43 poikilotherm As I wrote in the S3 related issues I tackled some days ago, I would really like to see this tested... We will rely on this in production as you guys do and doing testing for stuff you rely on is IMHO a  good idea... ;-)
11:44 pdurbin Yes, absolutely. S3 stuff should be tested in an automated way. No argument.
11:44 pdurbin I just think of docker-aio as testing the essentials.
11:44 poikilotherm Sure :-)
11:45 pdurbin pameyer is working on a Jenkins declarative pipeline at if that's of interest.
11:45 poikilotherm For now, I prefer not mess around with it and help my other colleague getting the production containers going... ;-)
11:46 poikilotherm Maybe? I remember some discussion about using Travis vs Jenkins vs Gitlab CI and IIRC a slight tendency toward out-of-the-box Travis. Is this still valid?
11:46 pdurbin Ah, so you want to run Dataverse in production using Docker?
11:47 poikilotherm Because I was planning to use Travis later on for the stuff I am doing right now... ;-)
11:47 poikilotherm Actually in Kubernetes, but that needs Docker containers first
11:47 poikilotherm I'm just helping out with the Docker stuff, my colleague is all into Kubernetes
11:47 pdurbin Cool. I assigned Slava to but should I assign you instead?
11:53 poikilotherm Uff... Dunno. Slava is pretty active in dataverse-docker, so maybe it's just fine to leave it as it is. Is it possible to have multiple assignees?
11:53 poikilotherm Or does this conflict with your worklfows?
11:54 pdurbin Multiple assignees is fine. Do you want on that bus? :)
12:05 poikilotherm Sure :-)
12:07 poikilotherm In different issues I saw that you guys sometimes struggle from libs not available to maven because a repo is offline. Ever thought about installing Nexus at Harvard for this?
12:07 poikilotherm I had that problem a few times this week and it is driving me insane
12:10 poikilotherm And of course it would be an option to get rid of local_lib... ;-)
12:36 tcoupin joined #dataverse
12:37 tcoupin pdurbin: Hi again
12:38 tcoupin I have issue with a fresh installation, install with dvinstall so with the classic citation.tsv
12:42 pdurbin_m joined #dataverse
12:42 pdurbin_m poikilotherm: no, we don't run nexus
12:43 pdurbin_m tcoupin: what's the error?
12:47 tcoupin no error in server.log, just encoding problems
12:48 tcoupin =>
13:05 donsizemore joined #dataverse
13:10 pdurbin_m tcoupin: can you please try it on ?
13:19 tcoupin no problem on demo :'(
13:30 pdurbin :'(
13:30 pdurbin tcoupin: how is your server different than demo?
13:31 tcoupin perhaps, I work in a docker container
13:31 tcoupin I will try to set locale before installing dataverse
13:33 pdurbin If you're using Docker you could try to reproduce the problem by following
13:48 cdsp-rmo pdurbin: I merged the latest develop to my dev env, but I have a little problem while I deploy to glassfish
13:48 cdsp-rmo Exception while loading the app : java.lang.IllegalStateException: ContainerBase.addChild: start: org.apache.catalina.LifecycleException: java.lang.RuntimeException: com.sun.faces.config.ConfigurationException: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.xerces.impl.xs.XMLSchema​Loader.loadGrammar([Lorg/apache/xer​ces/xni/parser/XMLInputSource;)V]]
13:48 cdsp-rmo any idea ? war compiling is ok, but it fail at deploy. It is the only useful thing I have in my logs
13:56 pdurbin cdsp-rmo: huh. I don't think I've seen that error but I usually undeploy the war file, stop glassfish, remove "generated", and try again
13:56 cdsp-rmo I may have an idea
13:56 cdsp-rmo I have 2 jar for xerces
13:56 cdsp-rmo 2.6.4 and 2.8
13:57 cdsp-rmo 2.8 appeared today in my target lib dir
13:58 cdsp-rmo okay
13:59 cdsp-rmo I removed the jar for 2.6.4
13:59 cdsp-rmo rebuild and deploy
13:59 cdsp-rmo now it works
13:59 cdsp-rmo strange
13:59 pdurbin phew
13:59 pdurbin yes, strange
14:14 cdsp-rmo ahah
14:14 cdsp-rmo just run into your "column "uri" does not exist" problem
14:14 cdsp-rmo thanks for opening that ticket :D
14:14 cdsp-rmo I was lost
14:17 pdurbin glad it was helpful
14:19 cdsp-rmo got a question about the tsv files for datasets metadata blocks
14:19 cdsp-rmo they are supposed to only be used for psql tables ?
14:20 cdsp-rmo if so, changing directly values in psql tables could work ?
14:25 pdurbin should work but I usually start with a change of the tsv file and a new installation
14:28 cdsp-rmo ok
14:29 cdsp-rmo gonna do a clean install of my db, I still can't add multiple datas to any of my fields :D
14:29 cdsp-rmo (I mean, the "+" doesn't work)
14:29 cdsp-rmo damn
14:33 pdurbin cdsp-rmo: yes, I just left a comment about this on the issue
14:33 pdurbin please see
14:33 pdurbin cdsp-rmo: you didn't touch "Kind of Data" but the plus doesn't work there either. I have no idea why.
14:35 cdsp-rmo ok
14:36 pdurbin but the plus on "Kind of Data" works fine on our demo server
14:37 pdurbin very strange
14:38 cdsp-rmo ok, now it's getting stranger
14:38 cdsp-rmo I came back on my nesstart branch (so no changes to the tsv files)
14:38 cdsp-rmo did a clean reinstall with new database
14:38 cdsp-rmo I can't add multiple data to dataset on gui
14:39 cdsp-rmo BUT
14:40 cdsp-rmo if I try to validate, it tells me some mandatory fields are missing, for example for authors
14:40 cdsp-rmo they appear when I try to validate with mandatory fields
14:42 pdurbin cdsp-rmo: do you know about ? That's where I deploy the "develop" branch quite often. Please go try the "plus" button there. It doesn't seem to work there either. :/ The password for dataverseAdmin is "admin1" on that server.
14:43 cdsp-rmo ok
14:43 cdsp-rmo doesn't work
14:44 cdsp-rmo and if I try to add an author, for example
14:44 cdsp-rmo and I validate
14:44 cdsp-rmo it does the same thing as my env
14:44 cdsp-rmo to note, I'm in french in your phoenix
14:44 cdsp-rmo (the validation error is french, but not the field names)
14:45 cdsp-rmo it's a "french hybrid" translation
14:45 cdsp-rmo :S
14:51 pdurbin cdsp-rmo: so it's a terrible bug in "develop". How do you feel about opening an issue for this?
14:55 pdurbin Jim_: have you noticed this bug yet?
14:56 cdsp-rmo pdurbin: scared, but someone has to do it !
14:56 cdsp-rmo writing it
14:56 pdurbin thank you!!
15:04 pameyer joined #dataverse
15:06 cdsp-rmo
15:06 pameyer @poikilotherm (for when you check the logs) my initial thought about multi-stage docker build was that there wasn't an advantage over using a separate build container (which I've done, and works fine for building war/
15:06 pameyer second thought was that a multi-stage docker build for the installer might be a good thing to try
15:06 pdurbin cdsp-rmo: thanks for creating that issue!
15:09 pameyer @cdsp-rmo do you know if that bug is UI only (or API too)?
15:10 cdsp-rmo I try an import and come back to you
15:12 cdsp-rmo gui
15:12 cdsp-rmo my ddi import works for a file that worked before
15:12 cdsp-rmo (got multiple keywords, for example)
15:12 cdsp-rmo if I try to edit the file to add more fields, doesn't work
15:12 cdsp-rmo (via the gui)
15:13 pameyer @cdsp-rmo thanks, that helps narrow down where the problem is
15:13 cdsp-rmo when I click on the "+" button, the response have an empty CDATA field
15:13 cdsp-rmo usually, there is html in it
15:14 cdsp-rmo (for the field I guess)
15:14 cdsp-rmo no error in logs
15:27 Jim_ pdurbin - I haven't run into the bug yet, but will try to take a look (not sure I understand the exact recipe yet).
15:27 Jim_ I wonder if some of the notes in might help with .tsvs.
15:29 Jim_ In particular, we've seen various editors mess with the character set, so it's important to get the citation.tsv byte-for-byte correct, and once you've had an error, you'll have both the good and corrupted vocab words in the database.
15:29 Jim_ Not sure if there's also something in that doc that's not in the new guide that might relate to the bug as well...
15:32 cdsp-rmo yup, my csv editor did strange things on save (especially to some strings, adding "" in html for example), so I used a simple text editor at the end
15:33 cdsp-rmo but datas seems to be imported in the database. If someone has a "clean" psql database, maybe we could compare the datas ? ("corrupted" ones and clean ones)
15:39 pameyer my rule of thumb for editing tsv files is to always start with a fresh database
15:42 pameyer @cdsp-rmo I've got a clean database if you've got queries you'd like me to check (no promises on my latency thought)
15:47 cdsp-rmo an extract of datasetfieldtype table, I would say
15:48 cdsp-rmo most of the datas are coming here, but I have no experience on that
15:49 cdsp-rmo (on the subject, the tsv files and where the datas are used)
15:50 cdsp-rmo I don't have any time left to work on that, and tomorrow will be a long day for me
15:50 cdsp-rmo I think I will be back on the subject monday only, unfortunatly
15:51 pdurbin Jim_: if you could let me know if you can reproduce it would be appreciated. (You too, pameyer.) In short, you can't add multiple authors to a dataset. Or multiple contributors, etc. The "plus" button just spins.
15:51 pdurbin Thank you again to cdsp-rmo for discovering that bug!
15:52 pdurbin If anyone has any clues on how it was introduced, please let us know.
15:55 cdsp-rmo Have to go. Good afternoon ;)
15:55 pdurbin bye!
16:27 pdurbin Thanks to pameyer and Jim for calling into the "big data" meeting:
16:28 pdurbin pameyer: nice presentation. Thanks!
16:38 pdurbin interesting to hear DVUploadClient compared to DCM
16:38 pdurbin to me they're pretty different
16:45 pdurbin pameyer: what's SP's plan for Globus. You just got some clarification?
17:18 pameyer well, they're trying to solve the same user problem
17:18 pameyer *them = DVUploadClient and DCM
17:19 pameyer I'm happy to see other folks working on the same problem
17:28 pameyer and pdurbin: yeah, SP's plan for globus makes sense to me now.  seems like it could plug into the multiple storage locations / multiple acess protcols framework
17:28 pameyer more straightforward than handling the impedence mismatch between dataverse storage identifiers and globus
17:47 pdurbin pameyer: I'll probably pick your brain about the globus stuff some day. Thanks.
18:48 donsizemore joined #dataverse
18:53 donsizemore @pdurbin thanks for bd78384436a14f647b6be411773ea0aee6110fc5 — my mom fell this morning so i hadn't been able to look at it yet
18:54 donsizemore @pdurbin weird that i didn't hit that in vagrant or with ec2-create
19:06 pdurbin donsizemore: no worries. It was a quick fix. An a chance to hack on some Perl. :)
19:06 pdurbin sivoais_: ^^
19:13 donsizemore @pdurbin i experienced one of those git bombs we discussed, so... i blame me. i promise it worked in vagrant and with ec2 before i submitted
19:14 pdurbin heh, no worries!
19:14 pdurbin I gotta keep my commit count up anyway. So many meetings lately.
19:15 pdurbin donsizemore: oh, I wanted to put this on your radar:
19:20 pdurbin Jim_: we have a lead on the "plus" bug so please nevermind about that
19:21 pdurbin cdsp-rmo: ^^
19:28 Jim_ OK - good. I was able to add fields, but I hadn't tried any citation.tsv updates
19:29 pdurbin something with primefaces
19:29 pdurbin deprecated methods
19:29 pdurbin we might have to switch back to a deprecated method
19:31 pdurbin donsizemore: heh, thanks for opening
20:36 pdurbin donsizemore: merged! thanks!
20:37 pdurbin Does that mean we should merge ?
20:43 donsizemore that will make dataverse-ansible work with develop, but not releases <= 4.9.4
20:43 donsizemore actually, it may just pass variables and fail to set them... not terrible. i say merge
21:22 pdurbin cool. please go ahead if you want
21:22 pdurbin and I'm sorry to hear about your mom :(

Connect via to discuss Dataverse, an open source web application for sharing, citing, analyzing, and preserving research data