IQSS logo

IRC log for #dataverse, 2019-04-26

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
04:02 jri joined #dataverse
06:03 jri joined #dataverse
06:13 jri joined #dataverse
07:28 jri joined #dataverse
10:23 kamil10 joined #dataverse
10:24 kamil10 hello dataverse comunity? Is the dataverse is capable of storing darwin core (biodiversity) metadata?
10:44 pdurbin kamil10: hi! The answer is a little complicated, I'm afraid. Let me try to explain.
10:45 pdurbin Dataverse is flexbile enough that anyone can create a "custom metadata block" describing any metadata. For example, a user recently created one for structural biology. Here are the docs on how to create one: http://guides.dataverse.org/en/4.13/admin/metadatacustomization.html
10:45 pdurbin Does that help?
10:46 kamil10 https://dataverse.org/files/dataverseorg/files/iassistposter2016ecastro.pdf
10:46 kamil10 What is the chance that you'll implement such a functionality?
10:47 kamil10 On the slide there is an planned to support by default
10:53 kamil10 And the second question is it possible to make an self-hosted installation of harvard worldmap?
10:57 Stofpad joined #dataverse
11:10 jri joined #dataverse
11:18 pdurbin Sorry, which functionality? I'm in and out right now, trying to get the kids out the door for school. :)
11:21 pdurbin Some of the items under "Planned to support" have been implemented. DataCite 4.0 and Schema.org JSON-LD are done, for example.
11:22 pdurbin Oh, I see "Darwin Core (Biodiversity)" on that slide. Hmm. Eleni made that slide it looks like. She has been gone for a whlie so I assume it's years old. How did you find it? From a Google search?
11:25 pdurbin I just found this spreadsheet called "Comparative Zoology / Darwin Core Metadata", last updated June 2016: https://docs.google.com/spreadsheets/d/1P9xvaRLhCKsYmjz9eXXVl0T9d2U34UgynbvxDp-2Bjc/edit?usp=sharing
11:25 pdurbin kamil10: does that help?
11:25 pdurbin welcome, Stofpad
11:26 pdurbin kamil10: with regard to WorldMap, it was recently rewritten and my understanding is that they are ready for beta testers to try to self host it.
11:34 MrK joined #dataverse
11:47 pdurbin I need to run a sleeping bag and some other stuff over to a friend's house for my daughter's overnight trip but I'll be back in a bit. Please keep the questions coming!
11:55 jri_ joined #dataverse
12:05 kamil10 Yes, I found it by google search, but I can't find such a metadata in dataverse demo, is this metadata implemented as seen in the google sheets?
12:05 kamil10 Thank you very much for your collaboration and support!
12:06 kamil10 And take your time, I can wait
12:11 pdurbin Ok, back. But I should eat some breakfast and bike to work soon. :)
12:12 pdurbin kamil10: that spreadsheet seems to be our standard format for custom metadata blocks. Do you want to try it? Do you already have an installation of Dataverse?
12:17 kamil10 Ups sorry, you're probably in different timezone :)
12:17 kamil10 No, I used demo.dataverse.org
12:17 pdurbin I'm in Boston.
12:18 kamil10 I'm in Europe, Poland
12:19 pdurbin kamil10: ah, you should meet MrK then.
12:19 pdurbin see also the "who's who" link in the topic of this channel
12:19 kamil10 I can't find darwin core metadata in demo version of dataverse, how can I find it or any documentation related to this issue?
12:19 kamil10 Who is MrK?
12:20 MrK Hi ;)
12:20 pdurbin MrK is not Mr. T. He does not pity the fool.
12:20 kamil10 :]
12:21 MrK Probably because I'm also From Poland :P
12:21 kamil10 So we both can write in the same timezone :)
12:22 MrK So from which workplace are you :D?
12:22 donsizemore joined #dataverse
12:23 kamil10 Bialystok University of Technology
12:23 kamil10 collaborating with Bialowieza PAN
12:25 pdurbin kamil10: bad news. I downloaded that spreadsheet as tsv and ran `curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file /tmp/Comparative\ Zoology\ _\ Darwin\ Core\ Metadata\ -\ Sheet2.tsv` but I got `{"status":"ERROR","message":"For input string: \"\""}` :( This means that custom metadata block needs more work. But it's a good starting
12:25 pdurbin point, I hope.
12:26 pdurbin kamil10: would you like to create an issue at https://github.com/IQSS/dataverse/issues asking for Darwin Core support?
12:29 pdurbin Also, there's a typo in our guides. it says --upload-file twice. Does anyone want to make a pull request? Deleting stuff is usually easy. :)
12:30 kamil10 Yes, probably we could even collaborate and support this work, but we are still in the stage of evaluation which system will we use
12:30 kamil10 ckan or dataverse?
12:30 pdurbin kamil10: fantastic! You should definitely consider creating an issue then.
12:31 pdurbin kamil10: have you seen our comparative review? CKAN is on it.
12:31 pdurbin this: https://dataverse.org/blog/comparative-review-various-data-repositories
12:31 kamil10 Could you tell me based on your experience which system would be better to store collection of specimens?
12:32 kamil10 Yes I saw that before
12:33 kamil10 CKAN is much better in terms of modularity and extensions, but they stop to work in every release of new version of base ckan
12:33 pdurbin I don't know but if you post to https://groups.google.com/forum/#!forum/dataverse-community someone might have opinions about specimens.
12:33 pdurbin Actually, one sec.
12:34 pdurbin "Each of those datasets references a plant genetic resource described in our GnpIS database." https://dataverse.org/blog/data-inra
12:35 kamil10 That a lot for this stuff!
12:35 pdurbin For "Kind of Data" they use "Physical Object": https://data.inra.fr/dataverse/omics?q=&fq0=kindOfData_ss%3A%22Physical+Object%22&types=dataverses%3Adatasets&sort=dateSort&order=desc
12:36 pdurbin but they don't seem to have created a custom metadata block... not that I blame them, it's hard to start from scratch... at least there's already a stub for darwin core
12:37 pdurbin Yes, Dataverse has a lot of catching up to do in terms of modularity. At least our new "external tool" framework is helping. :)
12:37 kamil10 Yes and we are afraid of it :(
12:37 kamil10 https://dataverse.org/files/dataverseorg/files/openmonolith.pdf
12:38 kamil10 Where can I find this new external tool framework?
12:38 pdurbin Heh. What are you saying about the open monolith? You like monoliths? You don't like monoliths? :)
12:38 pdurbin http://guides.dataverse.org/en/4.13/installation/external-tools.html
12:38 kamil10 Don't like but life shows that ckan extensions is a really mess!
12:40 pdurbin Oh? A mess in what way?
12:41 pdurbin speaking of modularity, you are (all) welcome to leave a comment on this "Dataverse App Store" idea: https://github.com/IQSS/dataverse/issues/5688
12:43 kamil10 For example, in CKAN 2.8 support for celery has dissapeared, making i.e ckanext-archiver and whole system not working
12:43 kamil10 https://docs.ckan.org/en/latest/maintaining/background-tasks.html#background-jobs-migration
12:43 MrK When I'm thinking about modularity, Hexagonal architecture always comes to my min.
12:44 kamil10 and after half a yera https://github.com/ckan/ckanext-archiver/blob/66075b2aa97499535b6ecca97d6ba23174d7a3b4/ckanext/archiver/lib.py
12:44 pdurbin Hmm. At least they document how to switch to the new system.
12:44 kamil10 The extension was working again
12:45 pdurbin Oh. I see. You're saying it was broken for a while.
12:45 kamil10 yes but the extensions are maintainted by people all over the world, and you maintain this monolytic aproach yourseld and everything works
12:45 pdurbin speaking of archiving, this is new too: http://guides.dataverse.org/en/4.13/admin/integrations.html#research-data-preservation
12:46 MrK I wondering why would they even remove it, if you creating modular system with external sources you usually just use one generic interface and you operate with it in the internals of the system so you can operate on abstraction and not worry about any externall stuff.
12:46 pdurbin Well, not everything works. We have bugs. :)
12:48 pdurbin We try not to remove stuff. And integration broken when we released Dataverse 4.12. But we followed up with a fix in Dataverse 4.13 somewhat quickly.
12:48 pdurbin looks like 20 days between releases... could be worse, could be better :)
12:50 pdurbin Man, it's raining hard out there. We're supposed to get over an inch of rain today. I've got my rain pants on. Time to jump on my bike. Keep chatting. I'll catch up.
12:52 pdurbin MrK: please help us make Dataverse more modular.
12:57 MrK I wish I could but amt our versions are kinda different :P. But I can tell you how we will try to divide it.
13:04 kamil10 Thank you very much, I didn't imagine that dataverse community is so openess!
13:05 kamil10 Thank you, can I come back to you after our evaluation?
13:22 donsizemore hey @pdurbin see if your github keys let you on centos@ec2-54-161-93-228.compute-1.amazonaws.com ?
14:10 pdurbin kamil10: yes! Please come back and give us feedback either way!
14:10 pdurbin MrK: yes, please teach me your ways.
14:11 pdurbin donsizemore: I'm in. :)
14:11 pdurbin key-jenkins-fe83a4a4. nice
14:12 pdurbin kamil10: oh, speaking of openness, you can check out this article I wrote: https://groups.google.com/d/msg/dataverse-community/brxCn1E9tX0/VbsNz4u8BgAJ . I'd love feedback on it.
14:13 MrK pdurbin: I'm not yet good in architecture in any way since expirience is too low but we are going to divide it to pom modules api-view(jsf)-service-persistance and then inside the packages are going to be divided by function as we are already doing with new functions so we wanna add new licenses, everything about licenses is going to be in license package and so on. Ideally but here it's impossible because of JEE for example but I w
14:13 donsizemore coolies. ça marche in vagrant, gonna call that good.
14:14 MrK classes in the package, package private so no class outside would see it.
14:14 pdurbin MrK: sorry, you were a little cut off. "for example but I wo"
14:14 donsizemore @pdurbin and if you want i can go ahead and drop in the cert/proxy stuff, all of which works except for letsencrypt, which will just take more (free) time
14:14 MrK yeah character limit i finished in another sentenced :p
14:15 MrK for example but I would normally try to make the  classes in the package, package private so no class outside would see it.
14:23 pdurbin MrK: I don't know what pom modules are.
14:24 MrK Well atm you have 1 pom for 1 module which is 'dataverse' but you can have 4 modules with 4 child pom and 1 master pom.
14:24 pdurbin donsizemore: if you're happy with how the cert/proxy stuff is turning out, please don't let me stop you from merging. I haven't tested it.
14:24 pdurbin MrK: that sounds like what we did for DVN 3, the previous generation of Dataverse.
14:25 MrK Oh cool, for me it's not that revolutionary ofc, the most important thing is still modularity in code.
14:25 pdurbin <module>DVN-web</module> in https://github.com/IQSS/dvn/blob/3.6.2/DVN-root/pom.xml#L10
14:26 pdurbin Yes! More modularity please.
14:26 pdurbin What parts of the code do you want to make more modular?
14:27 jri joined #dataverse
14:33 MrK I mean probably there is room for it everywhere, the biggest monster is DatasetPage i think.
14:35 pdurbin Yes, it is a monster.
14:36 pdurbin But I thought you were talking about plugins, add-ons, extensions.
14:36 pdurbin Now it sounds like you're talking about refactoring.
14:36 pdurbin So I'm a little confused.
14:36 pdurbin I'm in favor of all of it but I'm not sure what we're talking about. :)
14:36 MrK Oh yeah I meant refactoring of existing things :P
14:38 pdurbin Some refactoring would be great. Let's make the monsters smaller. :)
14:40 MrK Have a good weekend I'm going home :D
14:41 donsizemore joined #dataverse
15:05 pdurbin bye!
15:06 pdurbin Branch "develop" from https://github.com/IQSS/dataverse.git has been deployed to http://ec2-54-161-93-228.compute-1.amazonaws.com:8080
15:06 pdurbin at https://jenkins.dataverse.org/job/IQSS-dataverse-develop/20/consoleFull
15:06 pdurbin donsizemore: this is so cool! Thank you!
15:07 donsizemore @pdurbin let me know if you want changes to our super-secret group_vars file. it places your keys and larsks', enables (basic) sample data and the pre-fab external tools
15:07 larsks donsizemore: ...probably doesn't need my keys anymore!
15:07 pdurbin Heh. You can safely remove larsks for now I'd say. :)
15:08 pdurbin jinx
15:08 donsizemore done
15:09 pdurbin how the terminate stuff going?
15:09 pdurbin how's*
15:10 donsizemore it's in the job output ;)
15:10 pdurbin I saw. :)
15:11 donsizemore i mean, it's building and deploying on merge to develop. do we leave it... 8 hours? run ec2-terminate-all every friday at 5pm?
15:12 donsizemore maybe we don't want ec2 spinning up on each merge.
15:12 pdurbin Maybe not.
15:12 pdurbin But how else would we trigger a deployment to ec2? Chatbots!!
15:13 donsizemore we could set up a VM and just let jenkins deploy the warfile there each time
15:13 donsizemore that's what we do with akio's trsa-api branch, and with trsa-web
15:13 pdurbin And that's what phoenix is. A VM on VMWare downstairs.
15:13 pdurbin standup in 2 minutes. brb
16:06 pdurbin Heh. "Potentially, there's a godzillion datasets in this Dataverse." https://github.com/IQSS/dataverse/pull/5799
16:06 pdurbin donsizemore: I saw your question about this pull request.
16:12 jri joined #dataverse
16:39 pdurbin looks like both Leonid and I answered :)
16:42 bjonnh pdurbin: we are working on a protocol for sharing our NMR data on dataverse
16:42 bjonnh pdurbin: do you have anybody that could give it a look?
16:42 bjonnh we are going to publish that for NIH grantees and others
16:43 pdurbin bjonnh: you should ask at https://groups.google.com/forum/#!forum/dataverse-community I think.
16:46 pdurbin searching my email, I just found "Upload of dozens of NMR spectra in folder" at https://help.hmdc.harvard.edu/Ticket/Display.html?id=268573
16:47 pdurbin bjonnh: and your post at https://github.com/IQSS/dataverse/issues/3439
16:47 pdurbin :)
16:48 pdurbin and your data at https://dataverse.harvard.edu/dataverse/cenaptnmr :)
16:48 bjonnh yep
16:48 bjonnh so we are trying to help people doing that
16:49 bjonnh s/doing/to do
16:49 pdurbin Is this NMR? https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SAWFQA
16:49 pdurbin and this? https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FRZEGB
16:50 pdurbin Again, I would just post to the Google group.
16:52 pdurbin Oh! You did already! https://groups.google.com/d/msg/dataverse-community/47ZcVebJnfc/rYBLdPExDAAJ
16:52 pdurbin bjonnh: can you post to https://ask.cyberinfrastructure.org too?
17:00 bjonnh yes they are nmr
17:00 bjonnh I wasn't aware of cyberinfrastructure
17:01 bjonnh not sure it is their domain though
17:01 bjonnh but it is mine ;)
17:02 pdurbin It's on topic for Ask.CI. It's new.
17:03 donsizemore joined #dataverse
17:06 pdurbin Probably more NMR people there than on the Dataverse list.
17:30 xarthisius kamil10: what's the primary goal of the project you're evaluating Dataverse/CKAN for? Are you planning on depositing data/extracting metadata/making it searchable etc, or is there a part that would require interactive data access with custom frontends and/or long batch jobs that'd utilize the deposited data as an input?
17:47 donsizemore joined #dataverse
18:23 donsizemore @pdurbin i'm building 5753-new-validation-api now =)
18:44 pdurbin donsizemore: from jenkins?!?
18:45 pdurbin xarthisius: kamil10 should install whole tale too :)
19:26 pdurbin donsizemore: what I mean is that I can't wait until we start spinning up (and running the API test suite on) arbitrary branches from the new jenkins. Also, can you please spin down any instances you're not using over the weekend?
19:26 pdurbin I'm heading out a bit early to check out a research computing event on campus. Have a good weekend, all!
19:27 pdurbin left #dataverse
19:32 donsizemore @pdurbin jenkins owned 3 =) i've terminated them... you own the one left so not killing it
19:33 donsizemore @pdurbin p.s. the validate api thinks we have 22 invalid datasets. hope you have a good weekend!
21:19 dataverse-user joined #dataverse
21:24 donsizemore joined #dataverse
21:34 kamil10 @xarthisius We are planning on depositing data/extracting metadata/making it searchable (unfortunately you haven't yet support dcat/rdf) and making the data previewable
21:36 kamil10 and this would be done with external tools in opposite to ckan extension integration in the same page
21:38 kamil10 Did you notice that all of the portal based are on ckan are now forks and are using ver 2.2 or 2.3 maintaing the code by themselves (data.dov.uk, data.gov, dane.gov.pl not even mention about data.nhm.co.uk)

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.