IRC log for #dataverse, 2019-01-09

Connect via to discuss Dataverse (, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

All times shown according to UTC.

Time S Nick Message
00:26 jri joined #dataverse
01:40 isullivan joined #dataverse
02:42 jri joined #dataverse
05:26 jri joined #dataverse
07:56 jri joined #dataverse
12:57 donsizemore joined #dataverse
13:11 isullivan joined #dataverse
14:15 pdurbin donsizemore: morning. Counter Processor made their first release.
14:16 donsizemore woo woo! i'm just running hopefully my final test run before committing
14:17 pdurbin wow, you're fast
14:18 donsizemore it looks like the release is the commit you were already using
14:18 donsizemore i would've been faster except Meetings(tm)
14:22 pdurbin yeah, same commit but we should use the release
15:23 donsizemore @pdurbin
15:26 pmauduit oh there also is an ansible role \o/
15:28 pameyer joined #dataverse
15:30 donsizemore @pmauduit pull requests welcome =)
15:42 pmauduit i'll first try it :)
16:19 pmauduit donsizemore: btw, it seems working in vagrant at first try, the role does not converge (i.e. got error trying to relaunch a vagrant provision), but I've got a UI on 8080 of the vagrant machine
16:19 pmauduit I've got to figure out the default admin password now ;)
16:44 pameyer pmauduit: admin password for the application?
16:57 pdurbin pmauduit: does this help?
17:56 jri joined #dataverse
18:00 donsizemore joined #dataverse
18:01 donsizemore @pmauduit yeah, the dataverse installer itself isn't idempotent, so i've made no effort to make the ansible role idempotent
18:05 pdurbin donsizemore: Python is broken on my Mac but I'd love to try out your new counter processor ansible thing.
18:12 pdurbin pameyer: hmm, maybe I should ask you for Python help. :)
18:14 donsizemore @pdurbin have you tried installing a newer version through ports or brew?
18:15 drew-jhu joined #dataverse
18:24 donsizemore @pdurbin also i don't find a newer binary installer but you could try
18:25 pameyer lately I've been using homebrew - one python 2.7*, one python3*, and a mess of virtualenvs
18:26 pameyer I used to do source installs of different pythons, but that was before getting comfortable with virtualenvs
18:33 drew-jhu @pdurbin i've learned some things about the DataCite test environment that differ from EZID. I think it's worth documenting, possibly along with info about the FAKE DOI provider. Am wondering where a good place might be
18:36 pameyer @donsizemore just out of curiosity, are you using rsyslog with dataverse?
18:49 pdurbin drew-jhu: maybe somewhere under ?
18:50 pdurbin donsizemore: I consulted with pameyer and I think I'm going to try to bring at least on of my installations of Python that I got from homebrew back to life. Python is hard. :(
18:50 donsizemore @pameyer kind of... though all i see on the syslog server are shibboleth warnings
18:51 donsizemore @pdurbin python is brittle. PERL, PERL, PERL!
18:51 pdurbin donsizemore: oh, did you hear pameyer is into Shibboleth suddenly? ;)
18:52 donsizemore @pdurbin my consultation fee is a hot chocolate
18:52 pdurbin donsizemore: these days Go feels less brittle. I recently switched a static site from Jekyll to Hugo.
18:56 pameyer thanks @donsizemore .
18:57 pameyer and yeah, shib's been on my radar lately
19:01 pdurbin pameyer: I just hacked around on /usr/local/bin/ and now I'm back in business, Python wise. Python 2 anyway.
19:02 drew-jhu @pdurbin: that looks perfect. I'll work on a PR soon. spoiler: 10.5072 is no longer the universal testing prefix
20:04 donsizemore joined #dataverse
20:32 pdurbin wat. the plot thickens
20:33 pameyer @drew-jhu good way to build suspense ;) now I'm wondering if somebody's using 10.5072 for production, or if different test accounts have different test prefixes
20:37 drew-jhu PR in the works, but each DataCite test account will have its own prefix(s). I guess somebody will wind up with 10.5072 (just for testing, i hope), but we won't be sharing it anymore
20:38 drew-jhu I asked DataCite about it, and apparently some of their long-time users use the same prefixes for their test accounts that they do for prod (though that isn't required)
20:39 pameyer good to know.
20:39 pameyer I'd assumed that test accounts and test prefixes would be completely isolated from production ones; but it sounds like that's not the case
20:47 drew-jhu @pameyer they are isolated to the test infrastructure, but they are no longer guaranteed to have distinct DOIs, if that makes sense
20:47 drew-jhu hopefully this helps
20:50 pdurbin drew-jhu: it helps but can you please change the title and/or description of to assert that 10.5072 is not a test prefix? That seems to be what you're saying.
20:56 drew-jhu @pdurbin: better?
20:56 pdurbin yep, thanks
20:57 pdurbin three of our developers are going to in a couple weeks so maybe they can get some answers on this
20:57 pdurbin or further clarification
20:59 drew-jhu i think one's experience will depend somewhat on whether one deals with DataCite directly, or mediated through the GDCC
21:00 drew-jhu though i didn't elaborate on that point in the docs
21:05 pdurbin might be worth adding
21:05 pdurbin I don't know.
21:05 pdurbin I moved it into code review at least.
21:08 drew-jhu i considered it, but thought the value of that extra info was not worth the cost of the complexity. if *all* dataverse-using institutions were GDCC members, it might be different
21:10 drew-jhu can always iterate on it later, if it proves worthwhile
21:19 drew-jhu left #dataverse
21:21 pdurbin sure
21:58 pameyer pdurbin: looks like travis artifacts aren't available from PR builds
22:11 Mahsa joined #dataverse
22:12 pdurbin pameyer: right. You have to put them somewhere.
22:19 Mahsa Hi, We seem to have a problem when uploading files larger than 100 MB. The upload progress bar completes, but the files does not move from the top widnow to the bottom winodw (where the uploaded files usually appear).
22:20 pameyer Hi Mahsa - this reminds me of a recent github issue
22:20 Mahsa :MaxFileUploadSizeInBytes has been set to 2 GB.
22:20 pameyer does look like what you're seeing?
22:20 Mahsa And I have made sue the http apache settings dont time out.
22:22 Mahsa Thanks for the link, our files are much smaller than this through, >100MB <200MB .
22:22 Mahsa So is it yet known what is causing the issue?
22:23 pameyer it's not known to me
22:24 pameyer are these files that would go ingest?
22:24 Mahsa no
22:24 Mahsa They are PDF files
22:24 Mahsa around 150MB in size
22:25 Mahsa In our case, any large file bigger than 100MB does not get uploaded
22:25 pameyer just so I'm understanding correctly, do smaller files go through consistently?
22:25 pameyer and do large files fail consistently?
22:26 Mahsa yes, they do
22:26 Mahsa that is correct
22:26 pameyer it may be something you've checked already, but does the system have enough free space in the temporary directory?
22:26 Mahsa we're storing our files on AWS S3 in case it helps and running Dataverse 4.9.4.
22:27 pameyer it does help to know both of those
22:27 Mahsa One question so I understand a bit better
22:28 Mahsa The upload happens in the top window and the progress bar completes,
22:28 Mahsa could it be the flushing functions that do not work correctly ?
22:29 pameyer that may depend on which flushing you're referring to.
22:29 pameyer when the dataset page is reloaded, does it show the files?
22:29 jri joined #dataverse
22:30 Mahsa no, they are not uploaded at all. The upload progress bar completes, but the files are never moved to the bottom window.
22:33 Mahsa These are the params I double checked: :MaxFileUploadSizeInBytes, :TabularIngestSizeLimit. And I set the AJP proxy pass timeout to 600.
22:34 Mahsa Is there any other place you can think of so I can check to make sure there are no config issues
22:35 Mahsa and I do not see any errors in glassfish logs.
22:36 pameyer those are the configuration items I'd check
22:37 pameyer I'm not very farmiliar with using S3; but one additional place I would check would be the glassfish temporary directory
22:37 pameyer /usr/local/glassfish4/glassfi​sh/domains/domain1/files/temp , or the equivalent on your system
22:38 Mahsa Ok, I will just check it out.
22:39 Mahsa So is this the temp folder where files are uploaded to before moving to S3 you think ?
22:39 pameyer I'm curious if there's a possible problem writing files to that location; or moving files from there to S3
22:40 Mahsa That is a great point to start troubleshooting.
22:40 Mahsa I will start from here. Thanks a lot
22:40 pameyer You're welcome.
22:41 Mahsa one last Q. , how long does the files kept here? Once they are moved to S3, are they trashed?
22:41 pameyer I believe that once the files are moved to S3, they should be removed from that directory
22:42 Mahsa Ok, good, I can it check that. Thanks.
22:42 pdurbin Mahsa: have you tried uploading files via API? Or via the new DVUPloader?
22:42 pdurbin (which uses the API)
22:42 pameyer that may not occur until the dataset is saved however
22:42 Mahsa no, I have not. Actually, that is also sth I should test.
22:42 Mahsa I can test it on an existing dataset
22:42 pdurbin Mahsa: here's a handy link for you:
22:43 Mahsa This is very useful! Thanks a lot.
22:43 pdurbin sure, you also might want to email the google group and/or
22:44 pameyer for what it's worth, after running ITs in docker-aio there are some lingering files in the temp directory
22:44 pdurbin the google group is!forum/dataverse-community
22:44 pameyer that may or may not be related to this, however
22:44 pdurbin pameyer: sounds related to
22:46 pameyer pdurbin: might be related
22:46 Mahsa Thanks, this is also helpful.
22:46 Mahsa Thanks a lot folks. Now I have so much to test to get to the bottom of it. :-)
22:48 pdurbin Sure. Good luck.

