Time
S
Nick
Message
00:23
xvx joined #dataverse
01:36
axfelix joined #dataverse
04:58
axfelix joined #dataverse
06:42
jri joined #dataverse
07:32
bencomp joined #dataverse
13:48
pameyer joined #dataverse
14:47
axfelix joined #dataverse
15:39
axfelix joined #dataverse
17:18
cnk joined #dataverse
20:06
iamtimmo
quick DV 4 comment and question: I note that after setting :AllowSignUp to false via the API , the signup link disappears from the header. But if I already know the create user url, I can still get the create user screen when not logged in. Is there a way to stop that happening?
20:07
iamtimmo
Couldn’t find this anywhere in docs, but feel free to scold me if I missed this somehow.
20:07
pameyer
iamtimmo: do you have apache (or another webserver) in front of glassfish?
20:07
iamtimmo
pameyer: yep. currently apache.
20:08
pameyer
then a quick and dirty approach would be to rewrite or redirect the create user url to the 404 page
20:08
pameyer
or somewhere else
20:08
pdurbin
iamtimmo: there's a reason I put a warning at http://guides.dataverse.org/en/4.3/installation/config.html#allowsignup :)
20:11
pdurbin
iamtimmo: I just added a comment at https://github.com/IQSS/dataverse/issues/2838#issuecomment-204551248
20:11
pdurbin
pameyer: yes, that would be a good way to mitigate the risk in 4.3 and earlier.
20:13
pameyer
pdurbin: fixed is better than worked-around :)
20:13
pdurbin
I think so. :)
20:13
pdurbin
Now that the issue has gone through QA I guess I could remove the warning. In that pull request.
20:15
yoh
pdurbin: I see that you have met Joey! ;)
20:15
iamtimmo
pameyer / pdurbin: Thanks for the pointers.
20:16
pdurbin
yoh: Joey is awesome! https://github.com/IQSS/dataverse/issues/2863#issuecomment-199322114
20:16
pdurbin
iamtimmo: sure. I'm here for 15 more minutes if you have any questions.
20:17
iamtimmo
pdurbin: Nothing more for today on this one, I don’t think.
20:17
pdurbin
ok
20:18
pdurbin
pameyer: remember our priority labels? Looks like they are changing a bit: Prioritizing Dataverse Github Issues - Google Groups - https://groups.google.com/forum/#!topic/dataverse-community/eq5aWkLbZ24
20:20
yoh
pdurbin: I know ;) have you talked about 'annex'ification of dataverse one way or another?
20:22
pameyer
pdurbin: looks like descending order got the hat tip for those
20:22
pdurbin
yoh: well, I got to hear about http://datalad.org a bit. I mentioned it to pameyer at http://irclog.iq.harvard.edu/dataverse/2016-03-21#i_33014 but then I forgot to tell him any more in person. My main takeaway was that pameyer and I are on the right track with hoping to add support for rsync.
20:23
pdurbin
pameyer: my take is that if you take all the highest values (P4, S5, i5, E5) it translates to "hair on fire" :)
20:24
pameyer
those will hopefully be rare
20:25
pdurbin
hopefully, I can't spare the hair these days
20:26
yoh
pdurbin: I was wondering more of "expose datasets as ready to be consumed git/annex repositories which would access data from the dataverse servers" ;)
20:28
pameyer
yoh: are you thinking on the same (or an acessible) filesystem?
20:30
yoh
rright... so in the end I see something like datalad install --full //dataverse/harvard/datasetX which would pull that annex repository to the local drive with content being fetched from the dataverse
20:30
yoh
example: https://github.com/datalad/openfmri--sha256-ds000202 which points to content within tarball archives available on S3 bucket (versioned) as pointed to by openfmri.org/dataset/ds000202 ;)
20:31
pdurbin
yoh: ah, so the data would be fetched via S3. not git
20:31
yoh
data will be fetch via whatever annex supports ;)
20:31
yoh
and actually whatever datalad provide support for
20:32
yoh
e.g. content from archives which are in turn also available within annex, and in turn available from somewhere online (s3/http/ftp/rsync/....)
20:33
axfelix joined #dataverse
20:34
pdurbin
yoh: duh. You're the top contribuotor on https://github.com/datalad/datalad ... I should have mentioned you to Joey :)
20:34
yoh
I wonder -- if meta information about datasets available through some simple API ? so I don't need to scrape anything from the pages? ;)
20:34
yoh
he knows me ;)
20:35
* yoh
proudly states that DataLad project supports Joey's git-annex development already for more than a year ;-)
20:35
pdurbin
yoh: metadata for published datasets is easy to grab. such as https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI as I explained at https://github.com/IQSS/dataverse/issues/1837#issuecomment-197468164
20:36
yoh
https://git-annex.branchable.com/thanks/
20:37
pdurbin
yoh: I use and love Joey's ikiwiki code: http://wiki.greptilian.com/ikiwiki
20:38
pdurbin
anyway, must run. have a good weekend, all!
20:38
yoh
pdurbin: you too! cheers
20:38
iamtimmo
happy weekend, pdurbin
20:41
yoh
pdurbin: whenever you get online, if you could point me also how to get a list of all persistenIds, would be nice ;)
20:44
pdurbin
yoh: your best best is probably iterating with the Search API : http://guides.dataverse.org/en/4.3/api/search.html ... you'll need an API token
20:45
pdurbin
really going now :)
20:45
pameyer
enjoy the good weather while it lasts
20:45
yoh
it isn't raining down there? damn
20:45
yoh
not fair
20:46
pameyer
today is nice, tomorrow is supposed to be nice - Sun is supposed to snow here
20:49
yoh
d'oh -- snow again? you lucky .... (although at this time not that lucky) we got really not enough snow this year in NH :-/
20:51
pameyer
MA was the same this year - not a lot of snow
20:51
yoh
I thought that at least some snow storms touched MA while fully avoiding NH
20:52
pameyer
I'm still using Wisconsin and upstate NY as my benchmarks
20:53
yoh
indeed - sunday/monday snow showers and subfreezing temps... "very nice"
20:53
pameyer
yoh: I was taking a look at datalad, and got the sense you might be able to answer some questions about it
20:55
yoh
pameyer: I might
20:58
pameyer
are tarballs things get passed to typical neuro software, or do they have to be unpacked first?
20:59
pameyer
I'm wondering because this was something that we considered - tgz is easy to upload, but hard to compute directly on
21:07
yoh
unpacked
21:08
yoh
that is one of the points of datalad -- we want to take that burden and ambiguity away ;)
21:09
yoh
and the 2nd major point is versioning -- so you would deal with data as with code pretty much
21:09
yoh
and the 3rd is distribution -- so you deal with data packages as with any software packages (on debian/conda/...) -- easy and convenient ;)
21:11
yoh
see e.g. http://pastebin.com/w5sfp6Ca
21:12
yoh
that file is listed as available from multiple tarballs and even from multiple files within those tarballs (i.e. it was identical across subjects)
21:16
pameyer
makes sense
21:16
pameyer
I've been thinking "immutable data" lately; but that's probably because I'd been thinking primary data
21:17
pameyer
aka - this came off the detector; if it changes something's gone wrong somewhere
21:17
pameyer
but versioning for processed data and models could be very helpful :)
21:18
pameyer
does the metadata live on the filesystem in this model, or somewhere else?
21:20
pameyer
or still TBD ?
21:20
yoh
TBD
21:20
yoh
it will be within git of each dataset
21:21
yoh
"flowing" up in the hierarchy cached for searching
21:21
yoh
in neuroimaging raw data is at the scanner in proprietary or DICOM format
21:22
yoh
noone works on that in research -- we need to convert data etc. and that is not ... guaranteed since you do use some software forp that and deal with proprietary fields in dicom etc
21:22
yoh
see e.g. https://openfmri.org/dataset-orientation-issues/
21:22
garnett joined #dataverse
21:22
yoh
so -- you might need to version data really close to its origin, even before really processing it etc ;)
21:23
yoh
other gotchas could be -- incomplete transfers etc, which could happen to anyone at any stage ;)
21:23
pameyer
makes sense
21:23
pameyer
checksums are nice for that
21:23
yoh
indeed
21:24
pameyer
any thoughts about pushing metadata to the DOI system for a "published" git version of a dataset?
21:25
yoh
"eventually" "may be" ;)
21:25
pameyer
gotcha :)
21:26
yoh
so if there is an interest, could work on https://github.com/datalad/datalad/issues/393 ;-)
21:30
pameyer
there's always more interesting things to be done that there is time to do them in ….
21:31
yoh
indeed ;-)
21:31
pameyer
I've been working on (and brainstorming about) ways to move "normal" sized datasets around through non-http
21:31
pameyer
but hadn't looked at git seriously
21:32
pameyer
but my "normal" is structural biologist normal size primary data
21:33
yoh
well -- could go to normal git I guess ;-)
21:33
yoh
ok -- now I need to run. Have a good weekend, cheers
21:33
pameyer
have a good weekend
21:35
garnett joined #dataverse
22:00
pdurbin
yoh: I just emailed you back with a link to http://guides.dataverse.org/en/4.3/api/dataaccess.html
22:00
pdurbin
to download a file: https://dataverse.harvard.edu/api/access/datafile/2692294
22:01
pdurbin
and thanks for opening that issue. we love integrations
22:02
pdurbin
pameyer: "git" version? the metadata would be in git?
22:06
pameyer
pdurbin: that's how it sounded to me - but also sounded like that was a few steps ahead of where things are at
22:09
pdurbin
pameyer: two years ago I posted this: A Thought Experiment: Datasets As Git Repos - https://docs.google.com/document/d/18WDIS8hrFJvMJBcnRuQ8NfD-VxGq32vJ9WwlEgyyWZs/edit?usp=sharing
22:14
pameyer
there seems to be a general theme emerging that data files (or data streams) live one place, and metadata lives another place
22:14
pameyer
I didn't think wiki's were a git feature - github maybe, but not strictly git
22:15
pameyer
could do some kind of serialization of metadata to a file; and a doi at a commit (tagged or not)
22:17
pdurbin
Yeah, in DVN 3.x we serialized metadata to disk (as XML ). We plan to do the same in Dataverse 4 but I'm not sure if there's a specific issue tracking this.
22:17
pdurbin
This would be for preservation purposes.
22:22
pameyer
I'll leave preservation for people that know more about it than me. but for either data publication, or distribution of working data, there's an argument to be made for having the metadata living the same place as the data, or having it *not* living the same place as the data
22:24
pdurbin
I'm not sure which side you're arguing for. :)
22:24
pameyer
me either :-S
22:24
pdurbin
Give me a one-armed economist.
22:24
pameyer
:)
22:46
axfelix joined #dataverse
22:56
axfelix joined #dataverse