IRC log for #dataverse, 2021-04-14

Connect via to discuss Dataverse (, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

10:42 lincoln i want to create a workflow by including an additional step after upload of file. I have looked at the following link and would like to know more about the use case:
10:43 lincoln
10:43 lincoln is there anything else that i can refer to
10:43 lincoln ?
15:29 poikilotherm Oh pdurbin :-D Late to the party today? ;-)
15:29 poikilotherm I just left for you folks to look at
15:30 pdurbin I was in a design meeting about embargo. Any interest in that feature?
15:31 pdurbin It looks like lincoln had a question.
15:31 pdurbin lincoln: I'm pretty sure workflows have to be tied to commands, and maybe only certain commands, such as publish.
15:32 pdurbin "Trigger types are PrePublishDataset, PostPublishDataset"
15:32 poikilotherm Yeah, that feature sounds great
15:33 pdurbin lincoln: if you want I can try to summon Jim who knows all about workflows.
15:33 poikilotherm I'm off to construction site now, but reading y'all on my mobile
15:33 pdurbin poikilotherm: wow, 94 files changes, but mostly just imports, it seems.
15:34 poikilotherm Yeah
15:34 lincoln pdurbin:i am more looking into how to implement the workflow (as i am new to this) and i am more focused on prepublishDatasets
15:34 lincoln
15:35 lincoln may be a usecase of how the customflow has been implemented would do the thing
15:35 poikilotherm pdurbin: there's a bug hidding in the code. Maybe we can offer it some good smelling things to attract it?
15:36 pdurbin Is the bug in your pull request or in "develop"?
15:41 Jim64 The basic idea of a workflow managed by an external app is that Dataverse will make an HTTP call to you either before or after publication and include info on the dataset and the api key of the user doing the publish. You're app can then do whatever series of API calls it wants before telling Dataverse it is done, at which point, for pre-publish, publish continues.
15:42 Jim64 There are new options in 5.4+ that allow you're workflow to send a success/failure message the user would see.
15:43 Jim64 The two guide entries you sight are the best docs.
15:44 Jim64 (It is also possible to have workflows that leverage internal classes (as is done for archiving), but that means adding code to Dataverse itself.)
15:47 pdurbin Jim64: any thoughts on "including an additional step after upload of file"? Use pre or post publish, I guess.
15:48 Jim64 We've talked about adding additional triggers, i.e. dataset creation, file upload, but that doesn't exist yet.
15:48 lincoln thank you Jim64. So can we say the workflow is activated upon the button Publish dataset?
15:48 Jim64 Geospatial indexing is a driver there
15:49 Jim64 Yes, if you register a workflow and configure it to trigger at pre or post-publish.
15:51 lincoln that was exactly what I was looking for.Thank you Jim64 :). But would be more interesting ( in my case) for dataset creation,file upload..
15:51 Jim64 just curious - what use case do you have?
15:51 * pdurbin is curious too :)
15:53 lincoln a usecase where he uploads only the file and then ,a script reads the necessary infos(standard) and fills up the field/info in the datafile
15:54 pdurbin That sounds like what we do for FITS files (from astronomy).
15:55 pdurbin
15:55 pdurbin For FITS files we store the extracted metadata at the dataset level, though.
15:55 Jim64 Cool - FWIW Dataverse has some internal 'ingest' mechanisms that extract metadata (as pdurbin said) where's there's potential interest in moving that to an external service - another case where the file upload trigger would help, and would open the door to 3rd party ingest tools.
15:55 pdurbin lincoln: you want to fill in the file description?
15:56 lincoln thank you and i am also looking if the same thing(process) could also be eytended to the dataset metadata
15:56 lincoln *extended
15:56 lincoln pdurbin:yes i want to fill in file description as well
15:57 Jim64 Also FWIW - if it would make sense to have a user trigger your process, you could potentially use the external tools/configure tools option - they function similar to workflows w.r.t. calling your http url and sending params. The main difference is there's no auto-trigger, a user has to push the button, and you there's no way to run a sequence of tools.
15:58 lincoln the other question..(although kind of basic).. Can we make a customized button and in which file
15:59 Jim64 external tools show up as a button
15:59 lincoln okay..
16:03 Jim64 and either workflows or external tools can do any api call that the user has permissions to do, so you can add dataset metadata, etc. (Currently we're discussing how to limit what tools can do to a subset of what the user can. Tools are currently 'trusted' since they are set up by the admin.)
16:05 lincoln Thank you Jim64 I think i am clear for datafile idea....also is it possible to create an upload button (somewhere at the top of the page of dataverse) where a user uploads his metadatafile (presses The button)---- datverse reads his file (using script)----and then fills up the necessary metadablock---and creates new dataset
16:08 pdurbin One thing to be aware of is that at the moment you can't have a "configure" button at the dataset level. The plan has been to add one as part of
16:08 Jim64 There isn't any mechanism I'm aware of to create a datafile from an external file via the UI (e.g. with a button), but Dataverse does have apis and a harvesting mechanism that create datasets from various formats (not something I've used much).
16:09 pdurbin Of course, come to think of it, it sounds like you want a button at the dataverse level. Almost an alternative "create dataset" button.
16:11 lincoln pdurbin: yeah something like that but for another problem
16:17 lincoln soemthing like configuring in the custom-html file would also do the trick ...yeah
16:17 lincoln thanks you for the idea
16:18 pdurbin Sure, maybe some custom HTML.
18:02 poikilotherm Can we start calling it webhooks instead of triggers? Chances are higher sysadmin people find it more easier that way in docs. And it makes it more aligned with the usual naming conventions...
18:03 pdurbin not a bad idea :)
18:03 poikilotherm And there is already a ton of UI and config in services to learn from how this could look like in Dataverse
18:04 poikilotherm And maybe we can setup a formal spec, how the webhooks payload looks like
18:04 poikilotherm Just like the other tools are doing it ..
18:05 pdurbin Sure. Sounds fine. We're about to start our sprint planning meeting.
18:05 poikilotherm Pdurbin regarding the bug: I think it's in the code of develop. Must been hiding for quite a while
18:05 poikilotherm Maybe it's just the test setup, dunno
18:57 nightowl313 my weekly question for this amazing group ... we are trying to a dataverse s3 store using dell isolon OneFS (native S3) ..
18:58 nightowl313 does anyone know if this has been attempted? and possible?
18:58 nightowl313 we successfully set up a store to wasabi
19:10 pameyer I _think_ that dataverse with non-amazon S3 storage has been done, but I'm not 100% sure
19:10 pameyer even if that's right, I'm not sure if it was isilon or something else
19:12 pameyer poikilotherm or pdurbin will likely have a better idea than I do
19:13 poikilotherm Not with that storage product, but as long as it's compatible with S3 API, should be fairly easy to setup
19:14 nightowl313 thanks, @pameyer
19:17 pameyer you're welcome @nightowl313 - and thanks to @poikilotherm for having better info
19:22 nightowl313 i have gotten it to work with wasabi, but wasabi used the aws credentials profile ... this just has an access id and secret key ... and everything else is a little different (ie: zone vs region)
19:29 pdurbin Yes, Dataverse has been shown to work with some S3 clones, from what I understand.
19:31 nightowl313 okay, i'll send a question out to the community and see if anyone has done it with this ... thanks!
