Time
S
Nick
Message
08:08
jri joined #dataverse
08:23
MrK joined #dataverse
10:47
mouse25 joined #dataverse
14:46
pdurbin
MrK: I've been too busy to play with flyway. Sorry. Are you around next week?
14:54
donsizemore joined #dataverse
14:54
donsizemore
@pdurbin knock knock
14:58
pdurbin
who's there?
14:59
pdurbin
My nine year old hasn't come up with any jokes lately but the other day she asked, "There's 'underwater' but do you call it 'under yogurt'?"
15:10
MrK
pdurbin: Sure, yeah I will try to be in the chat while working.
15:12
pdurbin
thanks!
15:18
donsizemore
@pdurbin mandy wants me to help her submit some 2GB+ files in the near future to Harvard Dataverse via file upload API . just want to make sure that's Kosher.
15:21
pdurbin
donsizemore: you already have a 3.1 GB file uploaded: https://dataverse.unc.edu/dataverse/unc?q=fileSizeInBytes%3A%5B3210987654+TO+*%5D . How did you get it in there? :) Upload via API should work. It's supported by both SWORD and Native. See also https://github.com/IQSS/dataverse/issues/4439
15:22
pameyer joined #dataverse
15:29
donsizemore
@pdurbin we're uploading these to Harvard (AJPS). on Wednesday we tested a 4GB file to a test server here in 4.11 and it failed after bogging down Glassfish for ~7 minutes. just didn't want to take your head node out without a heads up / permission
15:30
pdurbin
Oh! Duh. Harvard Dataverse, like you said. Sorry. I missed that.
15:31
pameyer
api or web?
15:31
pdurbin
donsizemore: you should probably email support dataverse.harvard.edu to give a heads up, especially if you're worried about breaking things (or slowing things down)
15:32
pdurbin
Usually what happens is that stuff is slow or broken and *then* we notice that someone has emailed in saying, "I'm having trouble uploading huge files or lots of files". :)
15:32
pdurbin
So having a heads up first would be a refreshing change. :)
15:33
pdurbin
pameyer: from above, it sounds like donsizemore's plan is API
15:33
pdurbin
but my reading comprehension isn't so good this morning
15:33
* pdurbin
looks for more coffee
15:33
pameyer
so not just me ;)
15:37
pdurbin
:)
15:38
pdurbin
some fresh fruit out there too. so glad we switched away from the dangerous candy bowl
15:55
donsizemore
@pdurbin @pameyer also I quite happily made my Irving House reservations for June =)
15:56
pdurbin
super close by, great!
15:57
pdurbin
Did you propose a talk?
16:00
donsizemore
@pdurbin we want to do i think three (GDCC, TRSA, Andrey's job interview project which could be useful to many people) and probably CORE2 as well. jon's handling the proposals.
16:03
pdurbin
cool
16:03
donsizemore
@pdurbin p.s. i was wrong. two of those files are 10GB and 13GB each
16:04
pdurbin
hmm, did you email support yet?
16:04
pdurbin
please hammer don't hurt em
16:07
pameyer
this might be speeding up the slow step; but I'm wondering if scp'ing the files to aws and doing a curl upload from there to dataverse would be any better than usual
16:14
pdurbin
not a bad idea but that would be outside donsizemore's control
16:14
pdurbin
unless you're thinking he could at least get the files in the right region
16:16
pameyer
pretty much that - I'd thought @donsizemore had aws access on the same region
16:16
pameyer
won't help glassfish though
16:16
donsizemore
@pdurbin @pameyer i'm composing my support message now. just wanted to do this the nicest way possible, particularly if you all can let me slip the files straight into AWS and wedge in the metadata somehow
16:17
donsizemore
@pameyer exactly. during my testing of .txt files containing zeroes (i thought the ingest mechanism didn't kick in for .txt extensions? akio is on vacation...) a 1GB file didn't upload very well via API , a 4GB file failed via web interface, and during upload Glassfish appeared to be the bottleneck (which took out the web interface)
16:17
pameyer
I'd been thinking of one of the dataverse-ansible test vms and a tmux session - but will happily defer to the folks at support who know more than me
16:17
donsizemore
i'm just glad these files are going into AJPS and not UNC =)
16:19
pameyer
@donsizemore as far as I know, ingest shouldn't kick in for txt - but I don't know much about ingest
16:20
donsizemore
something really, really bogged down glassfish, and stracing the process showed it reading/writing zeroes (surrogate copies)
16:21
pameyer
could you tell if it was the write to $tmp or moving it from $tmp to the final location?
16:21
donsizemore
@pameyer dunno but i can try again and look there this time
16:22
pameyer
@donsizemore may not be worth it if the goal is to get those files in (vs fixing the open problem)
16:22
pameyer
I haven't looked into the glassfish file handling in quiet a while
16:49
pdurbin
pameyer: lucky
17:36
andrewSC joined #dataverse
18:17
donsizemore joined #dataverse
20:37
pdurbin
I just created this issue: "Increasingly slow feedback loop for developers, increasing large WAR files" https://github.com/IQSS/dataverse/issues/5593
20:43
pameyer
pdurbin: has anyone tried moving the deps out and seeing if it does anything?
20:44
pameyer
that reminded me to take a quick look; for 4.10.1 it looks like 180M dependencies from a 183M war
21:28
pdurbin
wow
21:29
pdurbin
I hadn't thought of that. You're saying you can just unzip the war file and rm the deps, right? The jars that aren't our code?
21:32
pameyer
yeah
21:32
pameyer
unscientificly, that looks like ~50% speedup on deployment
21:34
pameyer
at least for develop-e707a22cf in my hands
21:36
pdurbin
it still deploys?!? with the deps gone? interesting
21:37
pameyer
I dumped the deps into glassfish
21:37
pameyer
same place we're putting the postgres jdbc jar
21:38
pameyer
wouldn't be surprised if disabling the db deploy stuff could speed things up more; but didn't try it (or try to measure it)
21:43
pdurbin
oh, so a "hollow WAR" experiment, I guess. interesting
21:49
pameyer
without knowing what a "hollow WAR" actually is, sure
21:49
pameyer
it did slim down to ~4M
22:14
pdurbin
just something I hear of
22:21
pdurbin
Uh oh, I might be using the term wrong. Anyway, thanks for investigating.
22:22
pameyer
no problem - multitasking
22:35
pdurbin
I just opened this issue: https://github.com/jamesfalkner/java-packaging-demo/issues/1
22:41
pameyer
more likely it's a term I'm not farmiliar with
23:02
icarito[m] joined #dataverse