IQSS logo

IRC log for #dataverse, 2018-04-30

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
01:38 jri joined #dataverse
04:38 jri joined #dataverse
10:51 pdurbin joined #dataverse
13:42 pameyer joined #dataverse
13:52 donsizemore joined #dataverse
13:59 donsizemore @pdurbin knock knock
13:59 pdurbin Who's there?
14:00 pdurbin icarito[m] knows I was telling jokes in #publiclab on oftc over the weekend :)
14:01 * donsizemore runs to go look up knock-knock jokes
14:03 donsizemore so, if i wanted to get hold of someone with access to your production web node filesystem, am i allowed to do that?
14:03 pameyer now I'm wondering about port-knocking for remote logins, and procedural generation of knock-knock jokes
14:04 donsizemore (is that Kevin now that you're on AWS?)
14:04 pdurbin donsizemore: this is a good question for support@dataverse.harvard.edu
14:04 pdurbin maybe pameyer will give you access to his server ;)
14:05 donsizemore oh, i have a ticket open
14:05 pdurbin oh! what's the number please?
14:05 pameyer I don't have anything running on AWS at the moment
14:06 donsizemore but we've opened, and are holding open, ~1600 export_* file descriptors over the weekend. i've got my eye on me not having to restart dataverse every so often because systemd doesn't want to seem to set our max ulimit above 16384
14:06 donsizemore @pdurbin #261403
14:07 donsizemore @pameyer how many export_ files do you see in /proc/<gfish pid>/fd ?
14:07 pdurbin donsizemore: thanks, this ticket came in over the weekend. I can mention it at standup in an hour.
14:10 pameyer donsizemore: I'm seeing ~600-700; but that may be traffic dependent
14:10 donsizemore @pdurbin the files appear to be well-formed, as best i can tell. akio is comparing changes to the harvesting and ingest code between 4.7.1 (april 17th) and 4.8.6 (april 27th)
14:10 donsizemore @pameyer i'm not above a cronjob to restart the service every so often, i ain't too proud to beg
14:11 donsizemore @pameyer 600-700 total? what version of dataverse are you running?
14:11 pameyer I'm suspicious about resource leaks - do these look like OAI exports
14:11 pdurbin A cron job to restart Glassfish? We have a script the bounces Glassfish as necessary.
14:12 pameyer donsizemore: our currently deployed fork is based off 4.8.1, but it's got a mix of 4.8.3 and fork stuff in it too
14:12 pameyer if I'm remembering right, your on 4.8.6 - so what I'm seeing might not be too informative
14:13 donsizemore they're export_oai_schema and export_oai_dc mostly. we've got some held open from Friday when I bounced Glassfish
14:14 donsizemore @pameyer we open around 600 FDs just launching Glassfish, so you sound normal
14:19 donsizemore @pdurbin note that our last harvesting client run (against Harvard) still says Last Run: Sun Apr 29 03:00:00 EDT 2018INPROGRESS so that may be related. but harvesting always got stuck before without us falling over
14:19 pameyer donsizemore: yeah, both of those sound like harvesting related
14:23 donsizemore @pameyer however, on our production-test server (same version, production DB dump) the harvesting settings are the same and there are no export file lying around in /proc
14:25 pameyer @donsizemore: clone of the production VM (OS version, packages, etc), right?
14:26 donsizemore @pameyer eh, parallel stand-up
14:27 pameyer np
14:27 donsizemore @pameyer but not a VM clone, which would be a great test
14:29 pameyer @donsizemore: I'm wondering about possible minor version differences - but that might be because I'm scratching my head, and don't have any great ideas
14:30 donsizemore @pameyer they're both RHEL7, PG93, they did recently pick up the 1.8.0_171 JDK tho
14:32 pameyer anything interesting in the export_* or oaiSetsUpdate_* logs from glassfish?
14:35 donsizemore i'm just correlating timestamps. a bunch of them popped up at 0942 today
14:39 donsizemore @pdurbin akio found a suspicious bit of code not limited to harvesting. any time a dataset is updated AND the storage is local rather than remote, there's an open without a close
14:43 pameyer @donsizemore I think there may be a bunch of thoese
14:43 pameyer :(
14:44 pameyer ExportService?
14:44 donsizemore https://github.com/IQSS/dataverse/blob/6dd1fcb5a5e3755a40eedd3d6755e9ce2917f57d/src/main/java/edu/harvard/iq/dataverse/export/ExportService.java
14:46 pdurbin donsizemore: so some sort of leak? leaking resources? file descriptors?
14:51 donsizemore @pdurbin if i followed him correctly, it tries to open one outputstream for swift, then if (!tempFileRequired) it opens a different outputstream and closes that one, but then doesn't hit the else { stanza to close the first one
14:51 donsizemore @pdurbin smarter e-mail from akio forthcoming
14:57 pdurbin donsizemore: why did he send it to my personal email address?
14:59 donsizemore jon asked him to include IQSS in the loop on his troubleshooting. i'll append it to my ticket?
15:00 pdurbin Yes, please put the information in the ticket. Thanks!
15:59 andrewSC joined #dataverse
16:01 andrewSC joined #dataverse
16:35 icarito[m] I completely missunderstood what dataverse was
16:35 icarito[m] I had my mind on datproject (the p2p protocol)
16:37 icarito[m] it's about archiving data sets
16:37 pameyer content-addressable storage, right?
16:39 pdurbin icarito[m]: yep, Dataverse is about archiving datasets. And making them citable (with a DOI), explorable, etc.
16:39 icarito[m] I guess it's about open data and long term archival and reference
16:39 pdurbin yep, but the data can be restricted if necessary. but yeah CC0 by default
16:40 icarito[m] love to find stuff like "Centro de Investigacion de la Salud Indigena Dataverse"
16:41 pdurbin ah, you must mean https://dataverse.harvard.edu/dataverse/CISI
16:42 icarito[m] yeah! love this they do surveys and workshops in native language
16:43 icarito[m] I was part of the team that helped put quecha, aymara and awajun into glibc
16:43 icarito[m] and localize Sugar
16:43 icarito[m] so much knowledge in those dying tongues
16:45 pdurbin nice
17:43 pdurbin pameyer: you're still interested in the JMeter stuff, right? I'm watching a video from the students that discusses it.
17:45 pameyer pdurbin: still interested, but not in a great spot for video at the moment
17:46 pdurbin I'm taking screenshots.
17:52 pdurbin pameyer: I added them to https://github.com/IQSS/dataverse/issues/4201
17:59 pameyer pdurbin: thanks, those look interesting
18:00 pameyer looks ballpark in the range of what I was seeing with ab (extrapolated to 2 servers)
18:00 pameyer ^ throughput, I mean
18:10 pdurbin Yeah? The only tests I've done were with Locust: http://guides.dataverse.org/en/4.8.6/developers/testing.html#locust . And have no idea what the throughput was.
18:14 pameyer I wasn't being too systematic about it :(
18:27 pdurbin pameyer: there's stuff about file descriptors in the init script at http://guides.dataverse.org/en/4.8.6/installation/prerequisites.html#launching-glassfish-on-system-boot by the way. I was just looking at the ticket Don mentioned earlier.
21:14 dataverse-user joined #dataverse
21:14 dataverse-user I have unauthorised charges to my bank account, how can we clear this up?
21:42 pameyer left #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.