IQSS logo

IRC log for #dataverse, 2019-03-01

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
08:08 jri joined #dataverse
08:23 MrK joined #dataverse
10:47 mouse25 joined #dataverse
14:46 pdurbin MrK: I've been too busy to play with flyway. Sorry. Are you around next week?
14:54 donsizemore joined #dataverse
14:54 donsizemore @pdurbin knock knock
14:58 pdurbin who's there?
14:59 pdurbin My nine year old hasn't come up with any jokes lately but the other day she asked, "There's 'underwater' but do you call it 'under yogurt'?"
15:10 MrK pdurbin: Sure, yeah I will try to be in the chat while working.
15:12 pdurbin thanks!
15:18 donsizemore @pdurbin mandy wants me to help her submit some 2GB+ files in the near future to Harvard Dataverse via file upload API. just want to make sure that's Kosher.
15:21 pdurbin donsizemore: you already have a 3.1 GB file uploaded: https://dataverse.unc.edu/dataverse/unc?q=fileSizeInBytes%3A%5B3210987654+TO+*%5D . How did you get it in there? :) Upload via API should work. It's supported by both SWORD and Native. See also https://github.com/IQSS/dataverse/issues/4439
15:22 pameyer joined #dataverse
15:29 donsizemore @pdurbin we're uploading these to Harvard (AJPS). on Wednesday we tested a 4GB file to a test server here in 4.11 and it failed after bogging down Glassfish for ~7 minutes. just didn't want to take your head node out without a heads up / permission
15:30 pdurbin Oh! Duh. Harvard Dataverse, like you said. Sorry. I missed that.
15:31 pameyer api or web?
15:31 pdurbin donsizemore: you should probably email support@dataverse.harvard.edu to give a heads up, especially if you're worried about breaking things (or slowing things down)
15:32 pdurbin Usually what happens is that stuff is slow or broken and *then* we notice that someone has emailed in saying, "I'm having trouble uploading huge files or lots of files". :)
15:32 pdurbin So having a heads up first would be a refreshing change. :)
15:33 pdurbin pameyer: from above, it sounds like donsizemore's plan is API
15:33 pdurbin but my reading comprehension isn't so good this morning
15:33 * pdurbin looks for more coffee
15:33 pameyer so not just me ;)
15:37 pdurbin :)
15:38 pdurbin some fresh fruit out there too. so glad we switched away from the dangerous candy bowl
15:55 donsizemore @pdurbin @pameyer also I quite happily made my Irving House reservations for June =)
15:56 pdurbin super close by, great!
15:57 pdurbin Did you propose a talk?
16:00 donsizemore @pdurbin we want to do i think three (GDCC, TRSA, Andrey's job interview project which could be useful to many people) and probably CORE2 as well. jon's handling the proposals.
16:03 pdurbin cool
16:03 donsizemore @pdurbin p.s. i was wrong. two of those files are 10GB and 13GB each
16:04 pdurbin hmm, did you email support yet?
16:04 pdurbin please hammer don't hurt em
16:07 pameyer this might be speeding up the slow step; but I'm wondering if scp'ing the files to aws and doing a curl upload from there to dataverse would be any better than usual
16:14 pdurbin not a bad idea but that would be outside donsizemore's control
16:14 pdurbin unless you're thinking he could at least get the files in the right region
16:16 pameyer pretty much that - I'd thought @donsizemore had aws access on the same region
16:16 pameyer won't help glassfish though
16:16 donsizemore @pdurbin @pameyer i'm composing my support message now. just wanted to do this the nicest way possible, particularly if you all can let me slip the files straight into AWS and wedge in the metadata somehow
16:17 donsizemore @pameyer exactly. during my testing of .txt files containing zeroes (i thought the ingest mechanism didn't kick in for .txt extensions? akio is on vacation...) a 1GB file didn't upload very well via API, a 4GB file failed via web interface, and during upload Glassfish appeared to be the bottleneck (which took out the web interface)
16:17 pameyer I'd been thinking of one of the dataverse-ansible test vms and a tmux session - but will happily defer to the folks at support who know more than me
16:17 donsizemore i'm just glad these files are going into AJPS and not UNC =)
16:19 pameyer @donsizemore as far as I know, ingest shouldn't kick in for txt - but I don't know much about ingest
16:20 donsizemore something really, really bogged down glassfish, and stracing the process showed it reading/writing zeroes (surrogate copies)
16:21 pameyer could you tell if it was the write to $tmp or moving it from $tmp to the final location?
16:21 donsizemore @pameyer dunno but i can try again and look there this time
16:22 pameyer @donsizemore may not be worth it if the goal is to get those files in (vs fixing the open problem)
16:22 pameyer I haven't looked into the glassfish file handling in quiet a while
16:49 pdurbin pameyer: lucky
17:36 andrewSC joined #dataverse
18:17 donsizemore joined #dataverse
20:37 pdurbin I just created this issue: "Increasingly slow feedback loop for developers, increasing large WAR files" https://github.com/IQSS/dataverse/issues/5593
20:43 pameyer pdurbin: has anyone tried moving the deps out and seeing if it does anything?
20:44 pameyer that reminded me to take a quick look; for 4.10.1 it looks like 180M dependencies from a 183M war
21:28 pdurbin wow
21:29 pdurbin I hadn't thought of that. You're saying you can just unzip the war file and rm the deps, right? The jars that aren't our code?
21:32 pameyer yeah
21:32 pameyer unscientificly, that looks like ~50% speedup on deployment
21:34 pameyer at least for develop-e707a22cf in my hands
21:36 pdurbin it still deploys?!? with the deps gone? interesting
21:37 pameyer I dumped the deps into glassfish
21:37 pameyer same place we're putting the postgres jdbc jar
21:38 pameyer wouldn't be surprised if disabling the db deploy stuff could speed things up more; but didn't try it (or try to measure it)
21:43 pdurbin oh, so a "hollow WAR" experiment, I guess. interesting
21:49 pameyer without knowing what a "hollow WAR" actually is, sure
21:49 pameyer it did slim down to ~4M
22:14 pdurbin just something I hear of
22:21 pdurbin Uh oh, I might be using the term wrong. Anyway, thanks for investigating.
22:22 pameyer no problem - multitasking
22:35 pdurbin I just opened this issue: https://github.com/jamesfalkner/java-packaging-demo/issues/1
22:41 pameyer more likely it's a term I'm not farmiliar with
23:02 icarito[m] joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.