Time
S
Nick
Message
07:06
juancorr
Thanks pdurbin. I think that GoogleMaps is nicer for presentations like the dataverse.org case. We use it in e-cienciaDatos. WorldMap is much better to reuse data in a dataset. I agree with @polikotherm that summaries will be a great improvement in both cases
07:14
jri joined #dataverse
07:16
jri_ joined #dataverse
08:34
poikilotherm joined #dataverse
08:41
poikilotherm
Thx pdurbin! I just release new images for K8s. https://twitter.com/poi_ki_lo_therm/status/1149234850144165889
08:41
poikilotherm
+d
09:17
jri joined #dataverse
09:28
dataverse-user joined #dataverse
09:28
dataverse-user left #dataverse
09:29
dataverse-user joined #dataverse
09:31
jri_ joined #dataverse
09:37
dataverse-user
Hi, sorry if this is not this is not the correct channel for asking this, but I have issued a pull request 15 days ago for some typos I found on the documentation for the Native API (using the 'Quick Fix' procedure in http://guides.dataverse.org/en/latest/developers/documentation.html) and it has not progressed
09:38
dataverse-user
Please note that I am not complaining. I just wanted to check if I have done anything wrong
09:40
dataverse-user
Could anyone please advise on how to proceed and what info should I supply?
09:59
poikilotherm
Hi @dataverse-user
10:00
poikilotherm
Could you copy paste the URL to the PR here?
10:02
dataverse-user
hi @poikilotherm
10:02
dataverse-user
thanks for the reply
10:02
dataverse-user
https://github.com/j-n-c/dataverse/pull/1
10:02
dataverse-user
is this what You were asking?
10:15
pdurbin_m joined #dataverse
10:16
pdurbin_m
dataverse-user: hi! Thanks for that pull request! We didn't see it because it doesn't appear under https://github.com/IQSS/dataverse/pulls
10:18
dataverse-user
ok. Should I have done so that it appeared in https://github.com/IQSS/dataverse/pulls?
10:19
pdurbin_m
Yes, please!
10:20
pdurbin_m
This happened to another contributor recently. Her first pull request was against her fork rather than the upstream repo.
10:22
pdurbin_m
So I wonder if we should improve our "quick fix" write up. We haven't touched it in years.
10:30
poikilotherm
Sry, I had a talk with my colleague. What pdurbin says :-d
10:30
poikilotherm
And yeah, pdurbin, it might be a good idea
10:31
poikilotherm
Maybe rethink about the first pr bots?
10:31
poikilotherm
Attach a message about how to create a PR to the issue=
10:31
poikilotherm
?
10:31
dataverse-user
If this happened to other contributors, perhaps it would make sense to revisit the procedure
10:32
dataverse-user
could You please check if my pull request is now visible?
10:34
dataverse-user
I believe it is, but if You could check it would be great
10:34
dataverse-user
I believe it is, but if You could check it would be great
10:34
pdurbin_m
Yes! I see it now! Thanks! Do you want to help us fix up the page that talks about the quick fix too? :)
10:34
dataverse-user
I believe it is, but if You could check it would be great
10:34
dataverse-user30 joined #dataverse
10:35
pdurbin_m
And sorry it's almost time for breakfast here so I should get off my phone. :)
10:35
pdurbin_m
poikilotherm: are you around to continue helping? :)
10:35
poikilotherm
Yeah
10:35
dataverse-user30
sorry, the chat seems to have crashed on my machine
10:35
poikilotherm
Back to keyboard now :-D
10:36
dataverse-user30
sorry, the chat seems to have crashed on my machine
10:36
pdurbin_m
poikilotherm: thanks!
10:36
poikilotherm
Ah, there is a bug in the chat web client
10:36
dataverse-user joined #dataverse
10:36
poikilotherm
Never mind that it crashed
10:37
pdurbin_m
dataverse-user30: a memory leak: https://github.com/IQSS/chat.dataverse.org/issues/3 :(
10:38
poikilotherm
I guess this is your PR then, right j-n-c? https://github.com/IQSS/dataverse/pull/6011
10:38
dataverse-user
ok. I have received an update on my pull request. Thanks for Your help
10:39
poikilotherm
Yeah, I see it here https://github.com/IQSS/dataverse/pull/6011
10:39
poikilotherm
Is there an issue related to this PR already?
10:39
poikilotherm
IQSS tries to follow "issue first - pr second" process
10:39
poikilotherm
They use it to ensure QA and other devs always know what's going on
10:39
dataverse-user
I have not created an issue
10:40
dataverse-user
Ok. seems reasonable
10:40
poikilotherm
Would you be so kind to create one and link it in the PR?
10:40
dataverse-user
Sure
10:40
poikilotherm
It would make their work a lot easier
10:40
poikilotherm
Thank you!
10:40
poikilotherm
:-)
10:41
poikilotherm
Feel free to copy-paste your PR description into the issue, as it sounds like a good description ;-)
10:42
poikilotherm
You can delete the PR description part of "New Contributors" till "Related issues" after reading ;-)
10:49
dataverse-user
Done. Thanks again for Your help! ;)
10:52
poikilotherm
No problem :-)
10:52
poikilotherm
Thank you for your contribution
11:02
pdurbin_m
dataverse-user: yes, thanks.
11:02
pdurbin_m
poikilotherm: are you able to add the pull request to the main board?
11:03
poikilotherm
Done
11:04
poikilotherm
Pulled it into "Community Dev
11:04
poikilotherm
Issue + PR
11:04
poikilotherm
Dude, that Column is huge
11:06
pdurbin_m
poikilotherm: heh. Actually, it should be in code review, right? The pull request I mean.
11:07
poikilotherm
Done as well
11:07
pdurbin_m
Automation should put the pull request into Code Review if you add the pull request to the right project.
11:07
pdurbin_m
Does that make sense?
11:07
poikilotherm
I placed it here https://github.com/orgs/IQSS/projects/2#column-5298410
11:08
pdurbin_m
Yes, perfect, but I'm saying there's a simpler way.
11:09
poikilotherm
I placed it into Community Dev first and it hadn't been moved
11:09
poikilotherm
You say it should have been moved?
11:09
pdurbin_m
No.
11:10
pdurbin_m
The act of adding a pull request to the main project from the pull request itself moves it to code review thanks to automation.
11:10
poikilotherm
I can't modify the PR...
11:11
poikilotherm
I need to go the other way round
11:11
pdurbin_m
interesting
11:11
poikilotherm
Yeah
11:11
poikilotherm
I even cannot change it if its my own
11:11
pdurbin_m
but the pull request is in code review now so that's perfect. thanks!
11:12
poikilotherm
You're welcome.
11:12
poikilotherm
Err... Shouldn't you have some coffee first at 7am? Or get breakfast ready?
11:13
poikilotherm
Instead you are speaking with strangers across the ocean :-D
11:13
pdurbin_m
dataverse-user: oh, our process is changing so for a small doc change like this you probably don't have to create an issue but it's always appreciated :)
11:13
poikilotherm
My wife would kill me ;-)
11:13
poikilotherm
pdurbin: that was my fault! I requested him to follow the rulez ;-)
11:13
pdurbin_m
poikilotherm: we've met. We aren't strangers. :)
11:18
pdurbin_m
The rulez are changing a bit.
11:20
* poikilotherm
*thumbs up*
12:15
pdurbin
dataverse-user: I don't know if you're across the ocean from my or not but I assume so based on how early it in the morning it is here. :)
12:16
pdurbin
juancorr: thanks for the feedback on my crazy map idea :)
12:16
pdurbin
poikilotherm: awesome that you've already released https://github.com/IQSS/dataverse-kubernetes/releases/tag/v4.15.1
12:17
poikilotherm
;-)
12:18
pdurbin
Lots of chatter to catch up on. I think I missed a few things. Or have comments to make at least. :)
12:22
donsizemore joined #dataverse
12:24
donsizemore
@pdurbin morning. may I ask one of my many "have-you-seen-this-before" questions?
12:37
pdurbin
donsizemore: I have plumbers here installing a new washer/dryer but please go ahead.
12:38
pdurbin
donsizemore: also, dataverse-user just made a pull request to add some more full curl examples.
12:39
poikilotherm
donsizemore: go ahead, I'm all in :-)
12:41
donsizemore
it's curl i be askin'! (pirate voice)
12:42
donsizemore
since the AWS nodes were misbehaving, i was trying to help cheryl download a dataset with 108 files. via curl, a number of them return a 403 forbidden
12:44
donsizemore
copy-paste the same URL into a browser... oh, wait. i'm passing the API token but it's getting chomped
12:45
donsizemore
but only for certain requests?!? ugh
12:47
pdurbin
What's a pirate's favorite programming language? R!!!!!!
13:00
donsizemore
yeah, so draft dataset, native API . i get dataset metadata via api/datasets/:persistentId/versions/' + args.version + '/files?persistentId=
13:01
donsizemore
then i start downloading files individually via api/access/datafile/' + str(fileid) + '?format=original'
13:02
donsizemore
for about half of the files, this seems fine. others return a 403 forbidden. i can't help but think i'm passing dataverse bad milk and cookies
13:02
donsizemore
in a draft dataset, all files should have an 'original' format, yes? the version is at the record level
13:03
donsizemore
if i follow the URL i'm passing curl in a browser, i get the file. same url construct, passing API token, same dataset, nothing published
13:04
donsizemore
i may bump dataverse-ansible to 4.15.1 which i need to do anyway and test this a little more locally. (we were trying to download large datasets from harvard but encountering problems)
13:12
pdurbin
donsizemore: all files *should* have an original format but I'm aware that a few of these files are missing from Harvard Dataverse. We adjusted the GUI to not offer "original" under the download button in cases like this.
13:12
pdurbin
So I guess the question is if you see "original" for that file in the GUI or not.
13:18
donsizemore
this is great information. i have cheryl's API token but when she gets back from fauxbucks i'll pester her about what's in the GUI
13:19
pdurbin
You can go to the file page based on the id.
13:19
donsizemore
i was wondering if these files got uploaded during periods of instability and may not be in a fully-archived state
13:19
donsizemore
i've got the DOI and her API token, i just didn't think i could call up the webpage without being logged in as her. i'll try. thanks for the info about the GUI adjustment
13:24
pdurbin
Oh, that's a good point. The data is probably unpublished or restricted so you can't even see a download button unless you log in as her.
13:33
donsizemore
just looked at the GUI with cheryl. for problem files, there's a download button rather than a drop-down. but she can download-all and there's an option for original format
13:34
donsizemore
which means dataverse is working around this (i'll go traipse through the code). i would set forth that files which lack surrogate copies are assumed to be in original format, so that 403 forbidden might better simply return the file
13:36
pdurbin
I can dig up the issue about the GUI change if you want.
13:36
pdurbin
Make "download as original" disappear from download options, when there is no saved original. #4796 https://github.com/IQSS/dataverse/issues/4796
13:37
donsizemore
i was just looking at https://github.com/IQSS/dataverse/issues/4373
13:37
donsizemore
the first problem file i hit, for instance, is a PDF
13:37
donsizemore
no ingest, no rename, no original format
13:38
pdurbin
huh
13:39
pdurbin
Maybe ?format=original is only for tabular files that were ingested? I don't know.
13:39
pdurbin
If the API guide is unclear we should fix it up.
13:41
donsizemore
the API guide is perfectly clear, but for a scripted download pass, some file formats have an "original" format, some don't. i'm suggesting that dateverse serve an original for each
13:43
donsizemore
from our perspective this is fairly related to #6006
13:47
pdurbin
You want to always be able to pass ?format=original and get the original file regarless of if the file was a PDF or a tabular file that was successfully ingested. Is that right?
13:54
donsizemore
i was going to poke through the dataverse code to see what download-all in original format was using to determine whether to pass ?format=original or equivalent
13:54
donsizemore
let me look at that first. but every file should have an original format...
13:58
jri joined #dataverse
14:00
pdurbin
Yeah, I agree. I think we should have a dedicated issue about this. Small chunks.
14:06
donsizemore
i was thinking along the same lines, but wanted to ask first. best defense is no offense and all that =)
14:07
pdurbin
New washer/dryer installed and it seems like it even works! \o/
14:45
pdurbin
jri: Can you please take a look at this thread? Examples of Web Analytics Code (Matomo, formerly “Piwik”) https://groups.google.com/d/msg/dataverse-community/lkBfsz11dX4/p1nT6OepCQAJ
14:55
pdurbin
dataverse-user: I just moved your pull request to QA. Thanks again.
15:06
dataverse-user
@pdurbin, just checked the notification. Thanks!
15:13
pdurbin
dataverse-user: sure. I can help you change your nick to j-n-c in here if you want.
15:14
pdurbin
xarthisius: yes, yes, YES! I just noticed https://github.com/jupyter/repo2docker/pull/739 !!! Go, go, GO!!
15:21
pdurbin
donsizemore: I'm tempted to add a Jenkins job for it.
15:24
donsizemore
sure thing - just open an issue?
15:30
pdurbin
donsizemore: sure, done: https://github.com/IQSS/dataverse-jenkins/issues/13
15:32
pdurbin
donsizemore: we can try it on a DOI from UNC Dataverse. :)
16:04
xarthisius
FTR, it doesn't have to be DOI, it will work with raw url pointing to dataverse resource too
16:11
pdurbin
xarthisius: I was wondering if Handles are also supported. Thanks!
16:11
pdurbin
I just left this comment on the "Binderverse" issue: https://github.com/IQSS/dataverse/issues/4714#issuecomment-510550000
16:12
pdurbin
The thing I'm wondering about right now is where the installation instructions should go for adding an external tool to Dataverse for Binder.
16:13
pdurbin
xarthisius: as you know, the Whole Tale button instructions are here, for example: https://wholetale.readthedocs.io/en/stable/users_guide/integration.html#dataverse-external-tools
16:13
pdurbin
But what's the right place to put similar instructions in the Binder world?
16:18
xarthisius
pdurbin: I don't think external tools fit binder model
16:18
xarthisius
you'll need to create a "resolution" service that would convert output from ext tools to a proper binder link
16:19
xarthisius
that's how we do it in WT
16:20
xarthisius
there's an endpoint that responds with a redirect when external tools hit it
16:20
xarthisius
unless you make external tools more flexible
16:21
dataverse-user left #dataverse
16:21
pdurbin
Hmm. You don't think so? Do you have a Zenodo DOI I can play around with?
16:23
pdurbin
It looks like they were testing with 10.5281/zenodo.3229823
16:24
pdurbin
Which becomes this URL , it appears: https://mybinder.org/v2/zenodo/10.5281/zenodo.3229823/
16:26
xarthisius
yup, so external tools would need to return that
16:26
xarthisius
https://mybinder.org/v2/dataverse/ <doi>
16:27
pdurbin
But there are 46 installations of Dataverse.
16:28
xarthisius
I'm not sure how's that relevant
16:29
pdurbin
So should it be something like https://mybinder.org/v2/dataverse/dataverse.scholarsportal.info/ <doi> for the installations of Dataverse at https://dataverse.scholarsportal.info ?
16:29
xarthisius
it's gonna work with all instances, no matter how many there are
16:29
xarthisius
doi will resolve to dataverse.scholarsportal.info, won't it?
16:29
pdurbin
yes
16:29
pdurbin
I guess you're right.
16:31
xarthisius
https://mybinder.org/v2/dataverse/https%3A%2F%2Fdataverse.harvard.edu%2Fdataset.xhtml%3FpersistentId%3Ddoi%3A10.7910%2FDVN%2FR3GZZW
16:31
xarthisius
would also work
16:31
xarthisius
that was my point about raw urls
16:31
xarthisius
it's just a matter of external tools being able to create those urls
16:32
pdurbin
Hmm.
16:33
xarthisius
or running a service that would do it
16:33
pdurbin
We could send dataset id.
16:35
pdurbin
as a query parameter
16:38
pdurbin
I just added a "MyBinder" button to https://dev2.dataverse.org/file.xhtml?fileId=30 (under Explore)
16:38
pdurbin
If you click it, it goes to https://gke.mybinder.org/v2/dataverse/?fileId=30&siteUrl=https://dev2.dataverse.org&datasetId=18
16:40
xarthisius
yeah, but that would require Binder to significantly change how they operate wouldn't it?
16:40
pdurbin
I don't know. It sounds like you're saying it would. :)
16:41
pdurbin
Binder doesn't like query parameters? They like paths instead? :)
16:41
xarthisius
that's my understanding
16:42
xarthisius
I might be wrong
16:42
pdurbin
Query parameters are nice for DOIs because DOIs can have an arbitrary number of slashes in them.
16:44
pdurbin
But you seem to be saying that raw URLs will work with the content provider you're adding. That means that we need to allow external tools to append to the "toolUrl" of an external tool.
16:46
xarthisius
The changes I've made are only to r2d, and yeah it will work with anything. How Binder team decides to utilize that in their UI is not really my choice to make
16:47
pdurbin
Sure, that makes total sense. Do your r2d changes require a Dataverse DOI or can it also work with the database id of a dataset?
16:49
xarthisius
see https://github.com/jupyter/repo2docker/blob/3b84c52429297b0e24bbb6bc3b014f83c3451bd3/repo2docker/contentproviders/dataverse.py#L26-L30
16:51
pdurbin
Ok so you handle...
16:51
pdurbin
- {siteURL}/dataset.xhtml?persistentId={persistentId}
16:51
pdurbin
- {siteURL}/dataset.xhtml?id={datasetId} is not handled.
16:52
xarthisius
I can add that
16:52
pdurbin
that would be great, I think
16:52
pdurbin
Here's a live example: http://phoenix.dataverse.org/dataset.xhtml?id=10
16:52
xarthisius
If there are additional schemes that should be handled let me know
16:52
pdurbin
can do!
16:54
pdurbin
I think I might add this example to your pull request to try to get some feedback from the Binder folks: https://gke.mybinder.org/v2/dataverse/?siteUrl=https://dev2.dataverse.org&datasetId=18&fileId=34
16:55
pdurbin
To see how much they object to the query parameters. We can craft that URL today without any modifications to our external tool code.
16:55
pdurbin
Or should that conversation happen in a different repo? Perhaps the BinderHub repo?
16:58
pdurbin
Maybe this repo that has the front end code in it: https://github.com/jupyterhub/binderhub/blob/908c4439f63b08744e2074e76dabb9883b1632e9/binderhub/templates/index.html#L46
17:18
xarthisius
yeah, I think binderhub is better place for that conversation
17:19
pdurbin
ok, thanks
17:31
xarthisius
BTW, until Dataverse and Binder come up with a robust solution to the problem, I'm happy to host external tools -> binder resolution since we already do that for WT
17:31
pdurbin
Oh! That's fantastic! Thank you!
17:31
xarthisius
it's just a matter of adding a simple switch on our end and you can have two separate json specifications
17:32
xarthisius
one that will point to WT and 2nd that would bounce to binder
17:32
pdurbin
Is that what's happening with the Whole Tale external tool right now?
17:32
xarthisius
yes
17:33
pdurbin
And that resolution service has a url something like https://data.stage.wholetale.org/api/v1/integration/dataverse ?
17:33
xarthisius
yup
17:33
pdurbin
perfect
17:34
xarthisius
and accepts query params which is all this is about ;)
17:34
pdurbin
right, right
17:34
pdurbin
When you have an update to the resolution service that I can test, please let me know!
17:35
xarthisius
I can do it right away but there's no instance of binder that I can point it to
17:36
pdurbin
Sure, but once they merge your pull request and deploy to mybinder there would be.
17:36
xarthisius
heh, yes, once the above happens we can deploy the resolver instantly ;)
17:37
pdurbin
Nice! And I assume they have some sort of staging environment. Maybe they can let us test a bit.
17:37
xarthisius
can external tools add arbitrary query parameters, or is it a white list?
17:37
pdurbin
It's a white list. And some are required.
17:37
pdurbin
I can't leave out fileId, for example.
17:38
xarthisius
I was more interested in adding binder=True, but I'll work it around
17:38
pdurbin
You *can* customize the key of the query parameter, if that makes sense.
17:39
xarthisius
I'm not sure I understand, can you give me an example?
17:39
pdurbin
So you could have datasetIdWT=10 and datasetIdBinder=10
17:40
xarthisius
how do I do that?
17:41
pdurbin
like this: { "displayName": "Custom Keys", "description": "custom keys", "type": "explore", "toolUrl": "https://example.com/v2/dataverse/ ", "contentType": "application/x-ipynb+json", "toolParameters": { "queryParameters": [ { "foo": "{siteUrl}" }, { "bar": "{datasetId}" }, { "baz": "{fileId}" } ] } }
17:42
xarthisius
oh! the keys in queryParameters are arbitrary, now I get it
17:42
pdurbin
yeah, you'd get something like this as a URL : https://example.com/v2/dataverse/?foo=https://dev2.dataverse.org&bar=18&baz=30
17:42
pdurbin
I don't know if that helps you or not.
17:43
xarthisius
that's enough, I can have empty/non empty check on binder={whatever}
17:43
xarthisius
it'll work as good as boolean
17:43
pdurbin
cool
17:43
pdurbin
I love the hacks. Getting things done. :)
17:44
xarthisius
feature-driven development ;)
17:44
jri joined #dataverse
17:44
pdurbin
:)
18:01
pboon joined #dataverse
18:05
pboon
Help needed with dataverse.nl coming to a grinding halt with a CPU load of 400%
18:10
pdurbin
xarthisius: for now I left the comment at https://github.com/IQSS/dataverse/issues/4714#issuecomment-510594346
18:10
pdurbin
pboon: hi! Thanks for joining. Let's see who's around who runs Dataverse in production.
18:10
pdurbin
I see andrewSC bricas_ and donsizemore
18:11
pdurbin
pboon: you're running Dataverse 4.10.1, slightly forked, right?
18:12
pdurbin
I always wonder what changed recently. Is it simply that you're seeing more traffic right now? Everyone is trying to download data at once? Or did you upgrade something?
18:13
pboon
more details on https://groups.google.com/forum/#!topic/dataverse-community/DLy56gukZ3E
18:14
pdurbin
Ah, thanks!
18:14
pdurbin
I also asked for reinforcements in Slack just now.
18:14
pboon
Nothing was supposed to be changed, nothing happening, but will eyeball the logs again to be sure
18:16
pdurbin
pboon: ok. Nice write up. The image didn't come though. Maybe you can upload it to https://imgur.com or similar.
18:18
pdurbin
We should really start using the "Feature: Performance & Stability" GitHub Issue label more consistently because I'm having through finding any specific fixes we made after 4.10.1.
18:21
pdurbin
pboon: ah, you resent the graph of memory usage and it came through this time, thanks: https://groups.google.com/d/msg/dataverse-community/DLy56gukZ3E/p5QTMcS3CQAJ
18:22
pdurbin
This is interesting: "The weird thing is that it used to run without problems from the time we deployed it on production 2019-05-09 up to 2019-07-10, when we got the near 400% CPU load."
18:23
pboon
I waded through all milestones this afternoon
18:23
pdurbin
you poor thing
18:24
pboon
if it doesn't kill you it makes you stronger...
18:25
pdurbin
heh
18:26
pdurbin
Some of the performance related pull requests made by donsizemore you should already have since both https://github.com/IQSS/dataverse/pull/4654 and https://github.com/IQSS/dataverse/pull/4640 shipped with Dataverse 4.9.
18:27
pboon
Yes, this was probably retale
18:28
pdurbin
And I don't know if you're on S3 or not but you should already have https://github.com/IQSS/dataverse/pull/5305 from Jim Myers.
18:29
pboon
Yes, the ones about 'outputStream' are probably related to the open file descriptor problems, which I don't see now
18:29
pdurbin
And you should have https://github.com/IQSS/dataverse/pull/5019 from Pete Meyer.
18:30
djbrooke joined #dataverse
18:30
pboon
We are not running on S3, we where just testing the possibility
18:30
pdurbin
Ok. Do you have any sort of script in place to restart Glassfish if it stops responding? We do for Harvard Dataverse.
18:32
djbrooke
Hey pboon - can you check the logs to see if there are a lot of instances of GET /api/access/datafiles occurring as well?
18:32
pboon
As a way of making it usable for users we planned to upgrade memory (so it takes a bit longer to become problematic) and then our sysadmin will have a restart script in place
18:39
pboon
Me and my colleague already spent some time looking into the logs, but I will do it again after I downloaded it through my home WiFi
18:39
djbrooke
OK, sounds good
18:40
pdurbin
pboon: djbrooke is asking because that's the API endpoint that "download all files" or "download some files" buttons use to zip up files and this can be pretty inefficient. I crashed the demo server the other day while trying to download all files from a dataset. They were kind of big files, I guess.
18:40
djbrooke
We were seeing a lot of instances of GET /api/access/datafiles ... yeah what pdurbin said
18:41
pdurbin
We ended up limiting (for now anyway) "download files as zip" to 10 MB .
18:41
pdurbin
:ZipDownloadLimit http://guides.dataverse.org/en/4.10.1/installation/config.html#zipdownloadlimit
18:42
djbrooke
pboon also if you have some long running ingest jobs, that could be contributing, those persist even after a restart
18:42
pdurbin
The climb in memory usage from your graphs seems strangely regular, as if a script is hitting your site every few minutes or something. And strange patterns from certain IP addresses?
18:50
pboon joined #dataverse
18:51
pboon
My connection dropped, sorry
18:52
pboon
Unless there is some deep insight/breakthrough for my memory consumption problem with Dataverse I will get back tomorrow, Thanks!
18:52
pboon left #dataverse
18:52
djbrooke
pboon good to see you mentioned upping the memory above, donsizemore suggested that on the google group
18:53
donsizemore
back in the 90s i remember java's memory management algorithm described as: NOM NOM NOM
18:55
pdurbin
djbrooke: Paul's connection probably didn't drop. There's a memory leak in the version of Shout I stood up. We should think about upgrading to TheLounge: https://github.com/IQSS/chat.dataverse.org/issues/3
19:01
pdurbin
donsizemore: hopefully it's a little better these days. :)
19:03
pdurbin
donsizemore: also, DataverseNL does have a harvesting set advertised at https://docs.google.com/spreadsheets/d/12cxymvXCqP_kCsLKXQD32go79HBWZ1vU_tdG4kvP5S8/edit?usp=sharing
19:03
pdurbin
since you asked at https://groups.google.com/d/msg/dataverse-community/DLy56gukZ3E/v1QWrjvdBAAJ ... good points about memory
19:05
pdurbin
donsizemore: how long does harvesting take? Also, I think Mike just spotted a bug.
19:06
pdurbin
donsizemore: do you recognize these curl braces? https://github.com/IQSS/dataverse/blob/v4.15.1/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAISetServiceBean.java#L139 :)
19:06
pdurbin
curly*
19:07
donsizemore
i do
19:07
donsizemore
but i thought that had been moved into the solrservicebean
19:08
donsizemore
oh, wait. you're right =)
19:10
donsizemore
this may mean that Odum can turn harvesting back on
19:23
pdurbin
That would be nice. I just replied.
19:23
pdurbin
How long does harvesting take?
19:24
pdurbin
donsizemore: the version you're running has the same bug: https://github.com/IQSS/dataverse/blob/v4.9.4/src/main/java/edu/harvard/iq/dataverse/harvest/server/OAISetServiceBean.java#L139
19:27
donsizemore
correct
19:27
donsizemore
GESIS for instance seemed to harvest against us hourly
19:27
donsizemore
it runs for a while, then stops whether or not it completed. i don't remember what decides the cutoff
19:28
pdurbin
Well, it should stop when it's done fetching the latest changes, I guess.
19:28
donsizemore
it stops before then.
19:29
pdurbin
When your installation of Dataverse crashes? :)
19:29
donsizemore
we turned off harvesting
19:30
pdurbin
and you don't run a fork
19:30
pdurbin
which is probably good
19:30
pdurbin
but maybe you'll test a patch?
19:30
donsizemore
we're running on a patched 4.9.4 now; I suppose we could rebuild but we're planning to upgrade Real Soon Now to... 4.11?
19:31
pdurbin
oh ho, so you are running patches, probably for memory leaks
19:31
donsizemore
they really want the file hierarchy but we don't want to jump to 4.15 just yet
19:31
donsizemore
4.11 looks pretty safe
19:31
pdurbin
those other ones I mentioned earlier, and file descriptor leaks
19:31
donsizemore
yes our 4.9.4 warfile includes akio's two fixes we submitted as PRs into 4.10
19:31
pdurbin
see, that's what I like... give 'em an incentive to upgrade... new features! :)
19:33
pdurbin
Well, if you test a patch and it helps, you and Paul can have a race to the new pull request button.
19:33
donsizemore
you mentioned a PR for the harvesting curly. that's something i feel like i can handle :)
19:33
donsizemore
except Gustavo didn't want to go that route, he wanted to move the solr client into a service bean
19:43
pdurbin
What's more important to me is learning if fixing up that part of the code in either way help you with your problem of Dataverse crashing when people try to harvest from you.
19:43
pdurbin
donsizemore: so if moving the curly is easier for you, please go for it. :)
20:23
pdurbin
donsizemore: still there?
20:54
donsizemore
@pdurbin back from the gym
20:54
donsizemore
just saw your not about password aliases... and i've honestly had enough screwy problems for one day. i'll take a look tomorrow. have a great evening!
21:01
pdurbin
I pushed a commit, if it helps. :)