Time
S
Nick
Message
02:47
jri joined #dataverse
05:47
jri joined #dataverse
06:49
poikilotherm joined #dataverse
07:11
jri joined #dataverse
09:21
tcoupin joined #dataverse
10:36
pdurbin joined #dataverse
11:25
donsizemore joined #dataverse
12:10
poikilotherm
Morning guys... :-)
12:19
pdurbin
poikilotherm: mornin! I'm about to bike to work but talk to you all soon.
12:51
donsizemore joined #dataverse
13:40
pdurbin
andrewSC bjonnh bricas candy` cdsp-rmo donsizemore dzho jri poikilotherm: the community call will start in a couple hours and if you have any ideas for topics, please reply to https://groups.google.com/d/msg/dataverse-community/71kuJ6TdUIg/tEszGls2AgAJ . 2 hours and 20 minutes from now (noon Boston time).
13:50
poikilotherm
I'm sorry - can't make it today... Kids and stuff to do...
13:51
poikilotherm
pdurbin: any news about the merging of my S3 code? Seems to be stuck in QA?
13:52
pdurbin
Well, how many cards are in QA?
13:52
* pdurbin
looks
13:53
pdurbin
8 cards in QA. I'm looking at https://waffle.io/IQSS/dataverse . So I wouldn't say it's stuck. I'd say there are a lot of cars on the highway.
13:54
poikilotherm
You're right... Didn't think about the Waffle Board. Is kcondon a lone tiger in QA? All cards are assigned to him.
13:54
pdurbin
Yes. Lone tiger. Lone wolf. :)
13:55
poikilotherm
Oh dear... Poor guy.
13:55
poikilotherm
I would not change with him even if you offered me a ton of gold...
13:56
poikilotherm
Ok, I will just keep quiet and keep fingers crossed kcondon has the time to look into it soon...
13:56
pdurbin
I was going to say... honking your horn may not get you off the highway faster. :)
13:58
pdurbin
poikilotherm: oh! One thing you should do is merge the latest from develop into your branch. The pom still says 4.9.2 so it won't deploy to the test servers.
13:58
poikilotherm
<ironic>Oh, I thought that would be a good idea :-D</ironic>
13:58
poikilotherm
YEs sir!
13:58
poikilotherm
Will do so
13:58
pdurbin
Awesome. Thanks!
13:58
poikilotherm
Didn't touch the code for a week as it was in QA... ;-)
13:59
pdurbin
Sure. Makes sense.
13:59
pdurbin
It would be nice if the people who opened this issue would comment.
13:59
pdurbin
I assume the solution will work for them.
14:02
poikilotherm
Maybe they switched from dataverse to something else?
14:03
poikilotherm
While talking about the "responsiveness" of issues...
14:03
pdurbin
could be, maybe they were only evaluating Dataverse and other solutions
14:03
poikilotherm
I finally made contact with the DANS and GESIS people about the PID stuff
14:04
pdurbin
Oh! Great! Are they going to comment on the issue?
14:04
poikilotherm
Don't thinks so. Will ask them to do so, but I dunno if they will do so.
14:05
pdurbin
thanks
14:05
poikilotherm
Anyway, the approach by @fbgesis will be dropped in favor of a proxy approach.
14:05
pdurbin
proxy approach?
14:05
pdurbin
microservice? rest api?
14:06
poikilotherm
If I got them correctly, they want a lightweight service that talks "DataCite API " language, but will make requests to da|ra instead.
14:06
pdurbin
ok, sounds fine
14:06
poikilotherm
So they configure Dataverse to talk to DataCite but actually will talk via their proxy with da|ra.
14:07
pdurbin
interesting
14:07
poikilotherm
I'm currently thinking about using this approach in a similar wa
14:07
poikilotherm
+y
14:07
poikilotherm
Actually, their approach might be not sufficient for us, as we want more than one provider at the same time and at different points in time.
14:08
pdurbin
Would I be able to configure a PID provider that doesn't actually reach out to anything? Just for testing? That always returns success. That I could use when I'm off the network?
14:08
poikilotherm
But the idea is not too bad. I like the approach to let a lightweight proxy make the heavy lifting of provider integrations. This would take load from you guys not being responsible for the software sustainability of these provider integrations.
14:09
poikilotherm
Yeah, that is the way I am currently heading for. Exactly these scenarios came to my mind... ,-)
14:10
poikilotherm
With this Dataverse would need a provider class that is actually offloading the work to the "external" service.
14:10
poikilotherm
But this seems to be lightweight and can be kept stable with some well defined protocol / API , e.g. based on DataCite
14:10
pdurbin
poikilotherm: this is related. A proxy to DataCite used in Australia: https://github.com/IQSS/dataverse/pull/3843/files
14:12
poikilotherm
Thx!
14:13
poikilotherm
I'm not sure yet how to properly depict the aspect that we want PIDs early in the game (when a dataset is created) and not just when someone hits "publish now". The current providers don't seem ready for such a scenario.
14:14
poikilotherm
IMHO and IIRC and AFAIK
14:14
poikilotherm
:-D
14:17
pdurbin
It sounds like you want the concept of a "reserved" PID. EZID supports this. DataCite is working on support for this, from what I hear.
14:17
pdurbin
Here. "Reserved": https://github.com/IQSS/dataverse/issues/5093
14:20
poikilotherm
Nope, actually we want to use ePICs on creation and DOIs on publishing
14:20
poikilotherm
Basically because DataCite DOIs are quite expensive compared to ePIC
14:21
pdurbin
oh, interesting
14:22
pdurbin
Won't your researchers want to know the DOI of the dataset before the dataset is published so they can put the DOI for the dataset in their journal article?
14:22
poikilotherm
Most certainly yes :-D
14:23
poikilotherm
The thing is: we want to use Dataverse for "the long tail"...
14:23
poikilotherm
Automatic uploads right from the experiment etc
14:23
poikilotherm
And most of this data will mostly never be published
14:24
poikilotherm
It would be a huge waste of money to use Datacite DOIs on those.
14:24
poikilotherm
ePICs are 0,00129€/PID
14:24
poikilotherm
(For up to 1e6 PIDs per year)
14:24
pdurbin
Ok, makes sense.
14:25
pdurbin
I wonder if other institutions have a similar use case, a similar story.
14:27
poikilotherm
And for comparison: a Datacite DOI is about 0,1€/PID, not including membership fees etc
14:27
pameyer joined #dataverse
14:28
poikilotherm
I don't know if other actually think about using the repositories for other things than "just" the published data.
14:29
poikilotherm
From our point of view it would be a huge benefit if also other data is inserted into the repositories.
14:29
poikilotherm
Actually we went with Dataverse because it has the hierarchy of verses and sets together with the metadata schemas.
14:30
pdurbin
poikilotherm: well, andrewSC opened an issue about making PIDs optional: https://github.com/IQSS/dataverse/issues/3652
14:30
poikilotherm
Aye. But we need these PIDs... :-D
14:30
poikilotherm
We want people to use persistant links
14:30
poikilotherm
And we have some plans to build apps upon Dataverse
14:31
pameyer
it does seem like there's some interest in more flexibility w\ identifiers than the current "single public PID"
14:31
poikilotherm
And these also will need to have proper PIDs on every dataset
14:31
poikilotherm
Yes :-D
14:31
pdurbin
So you're not anti-PID. You just trying to control costs. Support for Handles rather than DOIs was contributed by the community because the cost of DOIs is high compared to Handles. That's my understanding anyway.
14:31
pdurbin
mornin pameyer
14:32
poikilotherm
Totally pro-PID here :-D
14:32
poikilotherm
As you said: it is a matter of cost control.
14:32
pdurbin
poikilotherm: globus/gridftp discussion is springing up in #dv-design on Slack
14:32
pameyer
pdurbin: morning
14:33
poikilotherm
Oh interesting
14:35
poikilotherm
I can't talk about our plans in public, but we have some stuff in the pipeline about distributed systems.
14:35
poikilotherm
GridFTP seems also interesting :-)
14:38
pdurbin
Ian Foster from Globus gave a keynote at the Dataverse Community Meeting back in June.
14:41
pameyer
pdurbin: it seems like the dv-design discussion is focusing on ux, so I won't do another re-hash of the technical issues that would need to get sorted
14:42
pdurbin
lots to get sorted :)
14:42
pdurbin
pameyer: oh there was a recent comment by Martin about datasets being available for download from various places.
14:43
pameyer
pdurbin: very interesting. any pointers to where?
14:43
pdurbin
"My goal is a contentURL referencing a single file, ideally a bagit archive that also includes metadata. I have run into a use case where I need to support multiple contentURLs - the same content in multiple cloud locations (AWS, Google Cloud), but that is an edge case."
14:43
pdurbin
https://github.com/datacite/freya/issues/2#issuecomment-427433681
14:44
pameyer
thanks
14:45
pameyer
... it sometimes seems like most of the things I think you need for good system design for a data repository are "edge cases"
14:45
pameyer
:(
14:46
pdurbin
well, LOCKSS is or was a thing
14:46
pdurbin
I'd say you're on the right track.
14:47
pameyer
and PDB's been doing distributed access for long enough that I'd have to check the literature to see when they started
14:47
pameyer
also not assuming that the world runs on http
14:49
poikilotherm
Ok, that's it for me for today... Maybe you can talk to Slava (@4thikonov) during the community meeting? He is the one actually behind the dara stuff
14:50
pdurbin
We'll try. Thanks!
15:11
pameyer
pdurbin: edff192275df861739bc56f002adc9bf8cd77c51 looks unhappy to me. would you mind pushing the jenkins button when you've got a chance?
15:12
pdurbin
Sure. Just did. Thanks for the heads up.
15:13
pameyer
no problem. I'm leaning towards a glitch on my end, from the commit messages between 0c89260a482428e07f0b206dde2bf73ea8ff5487 and edff192275df861739bc56f002adc9bf8cd77c51
15:18
donsizemore
@pameyer can i get a second pair of eyes, before i put my foot through my thunderbolt display?
15:29
pdurbin
pameyer: I'm seeing "Regression" "expected:<[Darwin's Finches - dva6e0453b]> but was:<[500 Internal Server Error]>" on DatasetsIT.testPrivateUrl https://build.hmdc.harvard.edu:8443/job/phoenix.dataverse.org-apitest-develop/edu.harvard.iq$dataverse/259/testReport/junit/edu.harvard.iq.dataverse.api/DatasetsIT/testPrivateUrl/
15:29
pdurbin
I just kicked off another build to see if I get the same result.
15:30
pdurbin
donsizemore: ansible stuff?
15:30
donsizemore
@pdurbin yis. and i'm too ashamed to post it publicly.
15:33
pdurbin
heh
15:33
* pdurbin
has no shame
15:33
pameyer
@donsizemore: where do you want a 2nd pair of eyes
15:34
pameyer
@pdurbin: yeah, that's what I'm seeing
15:34
pameyer
showed up 2x for me
15:34
pdurbin
pameyer: are you willing to create an issue?
15:34
donsizemore
actually, gimme a minute to try method 4 of coaxing ansible to edit one stupid file.
15:34
pameyer
ah - pdurbin, I was wrong. nothing on private url, but still search API
15:34
pameyer
sure - issue incoming
15:49
pameyer
pdurbin: also, it turned out that glassfish is more robust to abusing the file access API than I was expecting. lost quite a bit of responsiveness, but didn't crash
15:51
pdurbin
a pleasant surprise for both of us :)
15:55
pdurbin
pameyer: and in other pleasant surprise news... on the second run (job 260) all the integration tests passed: https://build.hmdc.harvard.edu:8443/job/phoenix.dataverse.org-apitest-develop/
15:59
pameyer
pdurbin: huh
15:59
pameyer
thanks for checking it; does suggest a problem on my end
16:05
poiki-at-home joined #dataverse
17:25
donsizemore joined #dataverse
17:31
Jim__ joined #dataverse
17:33
Jim__
Hey pdurbin - PR maintenance question - with your 5030 branch replacing my PR, what's the best way for me to maintain it? (There's a new tika version, I've found a dependency issue with commons-io , and its behind dev again...).
17:33
Jim__
Should I make a PR against that branch?
17:34
pdurbin
Jim__: yep, a PR against the new branch would be great. Thanks.
17:34
pdurbin
Also, can I ask you about highlighting?
17:38
Jim__
highlighting? OK...
17:39
pdurbin
That's what we call the snippets of text that match the search term. Not sure if you've seen this in Dataverse or not.
17:40
pdurbin
The matching text and a little context on either side. Before and after.
17:41
Jim__
Ah - ok. In the search results...
17:41
pdurbin
Yeah, like how "Wright" is in bold in the screenshot at https://github.com/IQSS/dataverse/issues/1589
17:41
pdurbin
The question is if you've thought about this for full text search. Of PDFs or whatever.
17:42
pdurbin
If you haven't, that's fine. I just thought I'd see if it's on your radar.
17:45
Jim__
I haven't. I guess there isn't a highlight from the full-text search now. I'll look when I'm going back through that code. So far I just add the full-text from tika to the index and have done nothing to the search itself or the results
17:46
pdurbin
That's totally fine. I'm super excited to have this feature in any form. I only mention it because when in hits QA highlighting might get asked about.
17:46
pdurbin
it*
18:19
pdurbin
Jim__: merged. Thanks.
18:29
donsizemore
@pameyer okay, i'm ready to cave
18:34
pameyer
@donsizemore - ansible sadness?
18:39
Jim__
Phil - thanks & sorry - one more small pr to update poi...
18:39
pdurbin
no problem. merged that one too
18:43
pameyer
that seems like a worthwhile update
18:46
pdurbin
pameyer: no "boolean" in this list: https://github.com/IQSS/dataverse/blob/v4.9.4/src/main/java/edu/harvard/iq/dataverse/DatasetFieldType.java#L35
18:46
pdurbin
nor http://guides.dataverse.org/en/4.9.3/admin/metadatacustomization.html#fieldtype-definitions
18:46
donsizemore
@pameyer ansible uses python regular expressions (fine) and it prints all output in JSON -encoded form (also fine) but i need to pass it a shell command containing a variable which contains quotes and a dash, and i'm toying with becoming angriful towards ansible
18:46
pameyer
pdurbin: thanks, that's what I'd *thought*
18:47
pameyer
donsizemore: the variable has quotes and a dash, or the command does?
18:48
donsizemore
@pameyer the command (asadmin, to add jvm-options). the XML module is under heavy development, and the lineinfile module reports that it does what i ask it do to (except it doesn't) so i punted and thought i'd pass shell commands. i'm setting them as facts because otherwise ansible will double-escape special characters
18:49
pameyer
asadmin / jvm-options commands :<
18:50
pameyer
@donsizemore do you have an opinion on using ansible to deploy and call utility scripts?
18:51
donsizemore
@pameyer yeah, i think that was coming next
18:51
pameyer
it's somewhat of a hack, but it may be the less frustration-inducing approach
18:51
donsizemore
@pameyer but i'd still face the same regexp funsies in asking it to write out the script
18:51
donsizemore
p.s. did my DMs make it through?
18:51
pdurbin
donsizemore: is this work that I tried to pawn off on you or some itch you're scratching? :)
18:52
pameyer
haven't seen any DMs - but that might be because my irc-fu is weak
18:52
pameyer
utility script would let you reduce it to a solved problem though
18:52
donsizemore
@pdurbin the former =) but it's turned into a mild obsession-of-irritation
18:52
pameyer
https://github.com/IQSS/dataverse/blob/develop/conf/docker-aio/configure_doi.bash
18:53
pdurbin
Yikes. Did I create an issue at least?
18:53
pameyer
or at least semi-solved; it appears that it's interacting with integration test setup in a way that it wasn't when that PR was merged
18:53
donsizemore
yeah, i can make a template of the script and try my luck there. i just need to set three jvm-options.
18:54
pameyer
if I hadn't known, I could make a pretty strong inference that glassfish predated most of the provisioning tools I know of
18:55
donsizemore
@pameyer also, our backups decided to exhibit unexpected behavior overnight, which isn't brightening my mood ring
18:56
pameyer
@donsizemore backups deciding to exhibit their creativity is not a great thing
18:57
donsizemore
@pameyer yeah, a 24-hour incremental consolidation job seems to be blocking further runs. but it's at 88% and i'm confident my going to the gym will fix things
18:58
pameyer
I can't rule out the possibility that I fixed some intermittant network issues by getting off a bus this morning ;)
19:03
pdurbin
Regarding PMs on IRC , freenode recently added +R (block unidentified) to all user accounts because of all the spam. "Ignores private messages from users who are not identified with services." https://freenode.net/kb/answer/usermodes
19:05
donsizemore
@pameyer also, your script uses some of the exact regexps i was trying earlier today!
19:07
pameyer
@donsizemore you could distinguish them from line noise? ;)
21:15
jri joined #dataverse
21:26
pameyer
@donsizemore - just occured to me that we might've been talking past each other on ansible utility scripts.
21:26
pameyer
I'd been thinking write a script that reads from environmental variables, use one task to copy it to the system and another task to execute it
21:42
donsizemore
@pdurbin i think i've got the datacite stuff in dataverse-ansible. i'll let you test it ;0
23:24
pdurbin
thanks!