Time
S
Nick
Message
04:02
jri joined #dataverse
06:03
jri joined #dataverse
06:13
jri joined #dataverse
07:28
jri joined #dataverse
10:23
kamil10 joined #dataverse
10:24
kamil10
hello dataverse comunity? Is the dataverse is capable of storing darwin core (biodiversity) metadata?
10:44
pdurbin
kamil10: hi! The answer is a little complicated, I'm afraid. Let me try to explain.
10:45
pdurbin
Dataverse is flexbile enough that anyone can create a "custom metadata block" describing any metadata. For example, a user recently created one for structural biology. Here are the docs on how to create one: http://guides.dataverse.org/en/4.13/admin/metadatacustomization.html
10:45
pdurbin
Does that help?
10:46
kamil10
https://dataverse.org/files/dataverseorg/files/iassistposter2016ecastro.pdf
10:46
kamil10
What is the chance that you'll implement such a functionality?
10:47
kamil10
On the slide there is an planned to support by default
10:53
kamil10
And the second question is it possible to make an self-hosted installation of harvard worldmap?
10:57
Stofpad joined #dataverse
11:10
jri joined #dataverse
11:18
pdurbin
Sorry, which functionality? I'm in and out right now, trying to get the kids out the door for school. :)
11:21
pdurbin
Some of the items under "Planned to support" have been implemented. DataCite 4.0 and Schema.org JSON -LD are done, for example.
11:22
pdurbin
Oh, I see "Darwin Core (Biodiversity)" on that slide. Hmm. Eleni made that slide it looks like. She has been gone for a whlie so I assume it's years old. How did you find it? From a Google search?
11:25
pdurbin
I just found this spreadsheet called "Comparative Zoology / Darwin Core Metadata", last updated June 2016: https://docs.google.com/spreadsheets/d/1P9xvaRLhCKsYmjz9eXXVl0T9d2U34UgynbvxDp-2Bjc/edit?usp=sharing
11:25
pdurbin
kamil10: does that help?
11:25
pdurbin
welcome, Stofpad
11:26
pdurbin
kamil10: with regard to WorldMap, it was recently rewritten and my understanding is that they are ready for beta testers to try to self host it.
11:34
MrK joined #dataverse
11:47
pdurbin
I need to run a sleeping bag and some other stuff over to a friend's house for my daughter's overnight trip but I'll be back in a bit. Please keep the questions coming!
11:55
jri_ joined #dataverse
12:05
kamil10
Yes, I found it by google search, but I can't find such a metadata in dataverse demo, is this metadata implemented as seen in the google sheets?
12:05
kamil10
Thank you very much for your collaboration and support!
12:06
kamil10
And take your time, I can wait
12:11
pdurbin
Ok, back. But I should eat some breakfast and bike to work soon. :)
12:12
pdurbin
kamil10: that spreadsheet seems to be our standard format for custom metadata blocks. Do you want to try it? Do you already have an installation of Dataverse?
12:17
kamil10
Ups sorry, you're probably in different timezone :)
12:17
kamil10
No, I used demo.dataverse.org
12:17
pdurbin
I'm in Boston.
12:18
kamil10
I'm in Europe, Poland
12:19
pdurbin
kamil10: ah, you should meet MrK then.
12:19
pdurbin
see also the "who's who" link in the topic of this channel
12:19
kamil10
I can't find darwin core metadata in demo version of dataverse, how can I find it or any documentation related to this issue?
12:19
kamil10
Who is MrK?
12:20
MrK
Hi ;)
12:20
pdurbin
MrK is not Mr. T. He does not pity the fool.
12:20
kamil10
:]
12:21
MrK
Probably because I'm also From Poland :P
12:21
kamil10
So we both can write in the same timezone :)
12:22
MrK
So from which workplace are you :D?
12:22
donsizemore joined #dataverse
12:23
kamil10
Bialystok University of Technology
12:23
kamil10
collaborating with Bialowieza PAN
12:25
pdurbin
kamil10: bad news. I downloaded that spreadsheet as tsv and ran `curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file /tmp/Comparative\ Zoology\ _\ Darwin\ Core\ Metadata\ -\ Sheet2.tsv` but I got `{"status":"ERROR","message":"For input string: \"\""}` :( This means that custom metadata block needs more work. But it's a good starting
12:25
pdurbin
point, I hope.
12:26
pdurbin
kamil10: would you like to create an issue at https://github.com/IQSS/dataverse/issues asking for Darwin Core support?
12:29
pdurbin
Also, there's a typo in our guides. it says --upload-file twice. Does anyone want to make a pull request? Deleting stuff is usually easy. :)
12:30
kamil10
Yes, probably we could even collaborate and support this work, but we are still in the stage of evaluation which system will we use
12:30
kamil10
ckan or dataverse?
12:30
pdurbin
kamil10: fantastic! You should definitely consider creating an issue then.
12:31
pdurbin
kamil10: have you seen our comparative review? CKAN is on it.
12:31
pdurbin
this: https://dataverse.org/blog/comparative-review-various-data-repositories
12:31
kamil10
Could you tell me based on your experience which system would be better to store collection of specimens?
12:32
kamil10
Yes I saw that before
12:33
kamil10
CKAN is much better in terms of modularity and extensions, but they stop to work in every release of new version of base ckan
12:33
pdurbin
I don't know but if you post to https://groups.google.com/forum/#!forum/dataverse-community someone might have opinions about specimens.
12:33
pdurbin
Actually, one sec.
12:34
pdurbin
"Each of those datasets references a plant genetic resource described in our GnpIS database." https://dataverse.org/blog/data-inra
12:35
kamil10
That a lot for this stuff!
12:35
pdurbin
For "Kind of Data" they use "Physical Object": https://data.inra.fr/dataverse/omics?q=&fq0=kindOfData_ss%3A%22Physical+Object%22&types=dataverses%3Adatasets&sort=dateSort&order=desc
12:36
pdurbin
but they don't seem to have created a custom metadata block... not that I blame them, it's hard to start from scratch... at least there's already a stub for darwin core
12:37
pdurbin
Yes, Dataverse has a lot of catching up to do in terms of modularity. At least our new "external tool" framework is helping. :)
12:37
kamil10
Yes and we are afraid of it :(
12:37
kamil10
https://dataverse.org/files/dataverseorg/files/openmonolith.pdf
12:38
kamil10
Where can I find this new external tool framework?
12:38
pdurbin
Heh. What are you saying about the open monolith? You like monoliths? You don't like monoliths? :)
12:38
pdurbin
http://guides.dataverse.org/en/4.13/installation/external-tools.html
12:38
kamil10
Don't like but life shows that ckan extensions is a really mess!
12:40
pdurbin
Oh? A mess in what way?
12:41
pdurbin
speaking of modularity, you are (all) welcome to leave a comment on this "Dataverse App Store" idea: https://github.com/IQSS/dataverse/issues/5688
12:43
kamil10
For example, in CKAN 2.8 support for celery has dissapeared, making i.e ckanext-archiver and whole system not working
12:43
kamil10
https://docs.ckan.org/en/latest/maintaining/background-tasks.html#background-jobs-migration
12:43
MrK
When I'm thinking about modularity, Hexagonal architecture always comes to my min.
12:44
kamil10
and after half a yera https://github.com/ckan/ckanext-archiver/blob/66075b2aa97499535b6ecca97d6ba23174d7a3b4/ckanext/archiver/lib.py
12:44
pdurbin
Hmm. At least they document how to switch to the new system.
12:44
kamil10
The extension was working again
12:45
pdurbin
Oh. I see. You're saying it was broken for a while.
12:45
kamil10
yes but the extensions are maintainted by people all over the world, and you maintain this monolytic aproach yourseld and everything works
12:45
pdurbin
speaking of archiving, this is new too: http://guides.dataverse.org/en/4.13/admin/integrations.html#research-data-preservation
12:46
MrK
I wondering why would they even remove it, if you creating modular system with external sources you usually just use one generic interface and you operate with it in the internals of the system so you can operate on abstraction and not worry about any externall stuff.
12:46
pdurbin
Well, not everything works. We have bugs. :)
12:48
pdurbin
We try not to remove stuff. And integration broken when we released Dataverse 4.12. But we followed up with a fix in Dataverse 4.13 somewhat quickly.
12:48
pdurbin
looks like 20 days between releases... could be worse, could be better :)
12:50
pdurbin
Man, it's raining hard out there. We're supposed to get over an inch of rain today. I've got my rain pants on. Time to jump on my bike. Keep chatting. I'll catch up.
12:52
pdurbin
MrK: please help us make Dataverse more modular.
12:57
MrK
I wish I could but amt our versions are kinda different :P. But I can tell you how we will try to divide it.
13:04
kamil10
Thank you very much, I didn't imagine that dataverse community is so openess!
13:05
kamil10
Thank you, can I come back to you after our evaluation?
13:22
donsizemore
hey @pdurbin see if your github keys let you on centos ec2-54-161-93-228.compute-1.amazonaws.com ?
14:10
pdurbin
kamil10: yes! Please come back and give us feedback either way!
14:10
pdurbin
MrK: yes, please teach me your ways.
14:11
pdurbin
donsizemore: I'm in. :)
14:11
pdurbin
key-jenkins-fe83a4a4. nice
14:12
pdurbin
kamil10: oh, speaking of openness, you can check out this article I wrote: https://groups.google.com/d/msg/dataverse-community/brxCn1E9tX0/VbsNz4u8BgAJ . I'd love feedback on it.
14:13
MrK
pdurbin: I'm not yet good in architecture in any way since expirience is too low but we are going to divide it to pom modules api-view(jsf)-service-persistance and then inside the packages are going to be divided by function as we are already doing with new functions so we wanna add new licenses, everything about licenses is going to be in license package and so on. Ideally but here it's impossible because of JEE for example but I w
14:13
donsizemore
coolies. ça marche in vagrant, gonna call that good.
14:14
MrK
classes in the package, package private so no class outside would see it.
14:14
pdurbin
MrK: sorry, you were a little cut off. "for example but I wo"
14:14
donsizemore
@pdurbin and if you want i can go ahead and drop in the cert/proxy stuff, all of which works except for letsencrypt, which will just take more (free) time
14:14
MrK
yeah character limit i finished in another sentenced :p
14:15
MrK
for example but I would normally try to make the classes in the package, package private so no class outside would see it.
14:23
pdurbin
MrK: I don't know what pom modules are.
14:24
MrK
Well atm you have 1 pom for 1 module which is 'dataverse' but you can have 4 modules with 4 child pom and 1 master pom.
14:24
pdurbin
donsizemore: if you're happy with how the cert/proxy stuff is turning out, please don't let me stop you from merging. I haven't tested it.
14:24
pdurbin
MrK: that sounds like what we did for DVN 3, the previous generation of Dataverse.
14:25
MrK
Oh cool, for me it's not that revolutionary ofc, the most important thing is still modularity in code.
14:25
pdurbin
<module>DVN-web</module> in https://github.com/IQSS/dvn/blob/3.6.2/DVN-root/pom.xml#L10
14:26
pdurbin
Yes! More modularity please.
14:26
pdurbin
What parts of the code do you want to make more modular?
14:27
jri joined #dataverse
14:33
MrK
I mean probably there is room for it everywhere, the biggest monster is DatasetPage i think.
14:35
pdurbin
Yes, it is a monster.
14:36
pdurbin
But I thought you were talking about plugins, add-ons, extensions.
14:36
pdurbin
Now it sounds like you're talking about refactoring.
14:36
pdurbin
So I'm a little confused.
14:36
pdurbin
I'm in favor of all of it but I'm not sure what we're talking about. :)
14:36
MrK
Oh yeah I meant refactoring of existing things :P
14:38
pdurbin
Some refactoring would be great. Let's make the monsters smaller. :)
14:40
MrK
Have a good weekend I'm going home :D
14:41
donsizemore joined #dataverse
15:05
pdurbin
bye!
15:06
pdurbin
Branch "develop" from https://github.com/IQSS/dataverse.git has been deployed to http://ec2-54-161-93-228.compute-1.amazonaws.com:8080
15:06
pdurbin
at https://jenkins.dataverse.org/job/IQSS-dataverse-develop/20/consoleFull
15:06
pdurbin
donsizemore: this is so cool! Thank you!
15:07
donsizemore
@pdurbin let me know if you want changes to our super-secret group_vars file. it places your keys and larsks', enables (basic) sample data and the pre-fab external tools
15:07
larsks
donsizemore: ...probably doesn't need my keys anymore!
15:07
pdurbin
Heh. You can safely remove larsks for now I'd say. :)
15:08
pdurbin
jinx
15:08
donsizemore
done
15:09
pdurbin
how the terminate stuff going?
15:09
pdurbin
how's*
15:10
donsizemore
it's in the job output ;)
15:10
pdurbin
I saw. :)
15:11
donsizemore
i mean, it's building and deploying on merge to develop. do we leave it... 8 hours? run ec2-terminate-all every friday at 5pm?
15:12
donsizemore
maybe we don't want ec2 spinning up on each merge.
15:12
pdurbin
Maybe not.
15:12
pdurbin
But how else would we trigger a deployment to ec2? Chatbots!!
15:13
donsizemore
we could set up a VM and just let jenkins deploy the warfile there each time
15:13
donsizemore
that's what we do with akio's trsa-api branch, and with trsa-web
15:13
pdurbin
And that's what phoenix is. A VM on VMWare downstairs.
15:13
pdurbin
standup in 2 minutes. brb
16:06
pdurbin
Heh. "Potentially, there's a godzillion datasets in this Dataverse." https://github.com/IQSS/dataverse/pull/5799
16:06
pdurbin
donsizemore: I saw your question about this pull request.
16:12
jri joined #dataverse
16:39
pdurbin
looks like both Leonid and I answered :)
16:42
bjonnh
pdurbin: we are working on a protocol for sharing our NMR data on dataverse
16:42
bjonnh
pdurbin: do you have anybody that could give it a look?
16:42
bjonnh
we are going to publish that for NIH grantees and others
16:43
pdurbin
bjonnh: you should ask at https://groups.google.com/forum/#!forum/dataverse-community I think.
16:46
pdurbin
searching my email, I just found "Upload of dozens of NMR spectra in folder" at https://help.hmdc.harvard.edu/Ticket/Display.html?id=268573
16:47
pdurbin
bjonnh: and your post at https://github.com/IQSS/dataverse/issues/3439
16:47
pdurbin
:)
16:48
pdurbin
and your data at https://dataverse.harvard.edu/dataverse/cenaptnmr :)
16:48
bjonnh
yep
16:48
bjonnh
so we are trying to help people doing that
16:49
bjonnh
s/doing/to do
16:49
pdurbin
Is this NMR? https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SAWFQA
16:49
pdurbin
and this? https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FRZEGB
16:50
pdurbin
Again, I would just post to the Google group.
16:52
pdurbin
Oh! You did already! https://groups.google.com/d/msg/dataverse-community/47ZcVebJnfc/rYBLdPExDAAJ
16:52
pdurbin
bjonnh: can you post to https://ask.cyberinfrastructure.org too?
17:00
bjonnh
yes they are nmr
17:00
bjonnh
I wasn't aware of cyberinfrastructure
17:01
bjonnh
not sure it is their domain though
17:01
bjonnh
but it is mine ;)
17:02
pdurbin
It's on topic for Ask.CI. It's new.
17:03
donsizemore joined #dataverse
17:06
pdurbin
Probably more NMR people there than on the Dataverse list.
17:30
xarthisius
kamil10: what's the primary goal of the project you're evaluating Dataverse/CKAN for? Are you planning on depositing data/extracting metadata/making it searchable etc, or is there a part that would require interactive data access with custom frontends and/or long batch jobs that'd utilize the deposited data as an input?
17:47
donsizemore joined #dataverse
18:23
donsizemore
@pdurbin i'm building 5753-new-validation-api now =)
18:44
pdurbin
donsizemore: from jenkins?!?
18:45
pdurbin
xarthisius: kamil10 should install whole tale too :)
19:26
pdurbin
donsizemore: what I mean is that I can't wait until we start spinning up (and running the API test suite on) arbitrary branches from the new jenkins. Also, can you please spin down any instances you're not using over the weekend?
19:26
pdurbin
I'm heading out a bit early to check out a research computing event on campus. Have a good weekend, all!
19:27
pdurbin left #dataverse
19:32
donsizemore
@pdurbin jenkins owned 3 =) i've terminated them... you own the one left so not killing it
19:33
donsizemore
@pdurbin p.s. the validate api thinks we have 22 invalid datasets. hope you have a good weekend!
21:19
dataverse-user joined #dataverse
21:24
donsizemore joined #dataverse
21:34
kamil10
@xarthisius We are planning on depositing data/extracting metadata/making it searchable (unfortunately you haven't yet support dcat/rdf) and making the data previewable
21:36
kamil10
and this would be done with external tools in opposite to ckan extension integration in the same page
21:38
kamil10
Did you notice that all of the portal based are on ckan are now forks and are using ver 2.2 or 2.3 maintaing the code by themselves (data.dov.uk, data.gov, dane.gov.pl not even mention about data.nhm.co.uk)