IRC log for #dataverse, 2016-04-01

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

All times shown according to UTC.

Time	Nick	Message
00:23		xvx joined #dataverse
01:36		axfelix joined #dataverse
04:58		axfelix joined #dataverse
06:42		jri joined #dataverse
07:32		bencomp joined #dataverse
13:48		pameyer joined #dataverse
14:47		axfelix joined #dataverse
15:39		axfelix joined #dataverse
17:18		cnk joined #dataverse
20:06	iamtimmo	quick DV 4 comment and question: I note that after setting :AllowSignUp to false via the API, the signup link disappears from the header. But if I already know the create user url, I can still get the create user screen when not logged in. Is there a way to stop that happening?
20:07	iamtimmo	Couldn’t find this anywhere in docs, but feel free to scold me if I missed this somehow.
20:07	pameyer	iamtimmo: do you have apache (or another webserver) in front of glassfish?
20:07	iamtimmo	pameyer: yep. currently apache.
20:08	pameyer	then a quick and dirty approach would be to rewrite or redirect the create user url to the 404 page
20:08	pameyer	or somewhere else
20:08	pdurbin	iamtimmo: there's a reason I put a warning at http://guides.dataverse.org/en/4.3/installation/config.html#allowsignup :)
20:11	pdurbin	iamtimmo: I just added a comment at https://github.com/IQSS/dataverse/issues/2838#issuecomment-204551248
20:11	pdurbin	pameyer: yes, that would be a good way to mitigate the risk in 4.3 and earlier.
20:13	pameyer	pdurbin: fixed is better than worked-around :)
20:13	pdurbin	I think so. :)
20:13	pdurbin	Now that the issue has gone through QA I guess I could remove the warning. In that pull request.
20:15	yoh	pdurbin: I see that you have met Joey! ;)
20:15	iamtimmo	pameyer / pdurbin: Thanks for the pointers.
20:16	pdurbin	yoh: Joey is awesome! https://github.com/IQSS/dataverse/issues/2863#issuecomment-199322114
20:16	pdurbin	iamtimmo: sure. I'm here for 15 more minutes if you have any questions.
20:17	iamtimmo	pdurbin: Nothing more for today on this one, I don’t think.
20:17	pdurbin	ok
20:18	pdurbin	pameyer: remember our priority labels? Looks like they are changing a bit: Prioritizing Dataverse Github Issues - Google Groups - https://groups.google.com/forum/#!topic/dataverse-community/eq5aWkLbZ24
20:20	yoh	pdurbin: I know ;) have you talked about 'annex'ification of dataverse one way or another?
20:22	pameyer	pdurbin: looks like descending order got the hat tip for those
20:22	pdurbin	yoh: well, I got to hear about http://datalad.org a bit. I mentioned it to pameyer at http://irclog.iq.harvard.edu/dataverse/2016-03-21#i_33014 but then I forgot to tell him any more in person. My main takeaway was that pameyer and I are on the right track with hoping to add support for rsync.
20:23	pdurbin	pameyer: my take is that if you take all the highest values (P4, S5, i5, E5) it translates to "hair on fire" :)
20:24	pameyer	those will hopefully be rare
20:25	pdurbin	hopefully, I can't spare the hair these days
20:26	yoh	pdurbin: I was wondering more of "expose datasets as ready to be consumed git/annex repositories which would access data from the dataverse servers" ;)
20:28	pameyer	yoh: are you thinking on the same (or an acessible) filesystem?
20:30	yoh	rright... so in the end I see something like datalad install --full //dataverse/harvard/datasetX which would pull that annex repository to the local drive with content being fetched from the dataverse
20:30	yoh	example: https://github.com/datalad/openfmri--sha256-ds000202 which points to content within tarball archives available on S3 bucket (versioned) as pointed to by openfmri.org/dataset/ds000202 ;)
20:31	pdurbin	yoh: ah, so the data would be fetched via S3. not git
20:31	yoh	data will be fetch via whatever annex supports ;)
20:31	yoh	and actually whatever datalad provide support for
20:32	yoh	e.g. content from archives which are in turn also available within annex, and in turn available from somewhere online (s3/http/ftp/rsync/....)
20:33		axfelix joined #dataverse
20:34	pdurbin	yoh: duh. You're the top contribuotor on https://github.com/datalad/datalad ... I should have mentioned you to Joey :)
20:34	yoh	I wonder -- if meta information about datasets available through some simple API? so I don't need to scrape anything from the pages? ;)
20:34	yoh	he knows me ;)
20:35	* yoh	proudly states that DataLad project supports Joey's git-annex development already for more than a year ;-)
20:35	pdurbin	yoh: metadata for published datasets is easy to grab. such as https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI as I explained at https://github.com/IQSS/dataverse/issues/1837#issuecomment-197468164
20:36	yoh	https://git-annex.branchable.com/thanks/
20:37	pdurbin	yoh: I use and love Joey's ikiwiki code: http://wiki.greptilian.com/ikiwiki
20:38	pdurbin	anyway, must run. have a good weekend, all!
20:38	yoh	pdurbin: you too! cheers
20:38	iamtimmo	happy weekend, pdurbin
20:41	yoh	pdurbin: whenever you get online, if you could point me also how to get a list of all persistenIds, would be nice ;)
20:44	pdurbin	yoh: your best best is probably iterating with the Search API: http://guides.dataverse.org/en/4.3/api/search.html ... you'll need an API token
20:45	pdurbin	really going now :)
20:45	pameyer	enjoy the good weather while it lasts
20:45	yoh	it isn't raining down there? damn
20:45	yoh	not fair
20:46	pameyer	today is nice, tomorrow is supposed to be nice - Sun is supposed to snow here
20:49	yoh	d'oh -- snow again? you lucky .... (although at this time not that lucky) we got really not enough snow this year in NH :-/
20:51	pameyer	MA was the same this year - not a lot of snow
20:51	yoh	I thought that at least some snow storms touched MA while fully avoiding NH
20:52	pameyer	I'm still using Wisconsin and upstate NY as my benchmarks
20:53	yoh	indeed - sunday/monday snow showers and subfreezing temps... "very nice"
20:53	pameyer	yoh: I was taking a look at datalad, and got the sense you might be able to answer some questions about it
20:55	yoh	pameyer: I might
20:58	pameyer	are tarballs things get passed to typical neuro software, or do they have to be unpacked first?
20:59	pameyer	I'm wondering because this was something that we considered - tgz is easy to upload, but hard to compute directly on
21:07	yoh	unpacked
21:08	yoh	that is one of the points of datalad -- we want to take that burden and ambiguity away ;)
21:09	yoh	and the 2nd major point is versioning -- so you would deal with data as with code pretty much
21:09	yoh	and the 3rd is distribution -- so you deal with data packages as with any software packages (on debian/conda/...) -- easy and convenient ;)
21:11	yoh	see e.g. http://pastebin.com/w5sfp6Ca
21:12	yoh	that file is listed as available from multiple tarballs and even from multiple files within those tarballs (i.e. it was identical across subjects)
21:16	pameyer	makes sense
21:16	pameyer	I've been thinking "immutable data" lately; but that's probably because I'd been thinking primary data
21:17	pameyer	aka - this came off the detector; if it changes something's gone wrong somewhere
21:17	pameyer	but versioning for processed data and models could be very helpful :)
21:18	pameyer	does the metadata live on the filesystem in this model, or somewhere else?
21:20	pameyer	or still TBD?
21:20	yoh	TBD
21:20	yoh	it will be within git of each dataset
21:21	yoh	"flowing" up in the hierarchy cached for searching
21:21	yoh	in neuroimaging raw data is at the scanner in proprietary or DICOM format
21:22	yoh	noone works on that in research -- we need to convert data etc. and that is not ... guaranteed since you do use some software forp that and deal with proprietary fields in dicom etc
21:22	yoh	see e.g. https://openfmri.org/dataset-orientation-issues/
21:22		garnett joined #dataverse
21:22	yoh	so -- you might need to version data really close to its origin, even before really processing it etc ;)
21:23	yoh	other gotchas could be -- incomplete transfers etc, which could happen to anyone at any stage ;)
21:23	pameyer	makes sense
21:23	pameyer	checksums are nice for that
21:23	yoh	indeed
21:24	pameyer	any thoughts about pushing metadata to the DOI system for a "published" git version of a dataset?
21:25	yoh	"eventually" "may be" ;)
21:25	pameyer	gotcha :)
21:26	yoh	so if there is an interest, could work on https://github.com/datalad/datalad/issues/393 ;-)
21:30	pameyer	there's always more interesting things to be done that there is time to do them in ….
21:31	yoh	indeed ;-)
21:31	pameyer	I've been working on (and brainstorming about) ways to move "normal" sized datasets around through non-http
21:31	pameyer	but hadn't looked at git seriously
21:32	pameyer	but my "normal" is structural biologist normal size primary data
21:33	yoh	well -- could go to normal git I guess ;-)
21:33	yoh	ok -- now I need to run. Have a good weekend, cheers
21:33	pameyer	have a good weekend
21:35		garnett joined #dataverse
22:00	pdurbin	yoh: I just emailed you back with a link to http://guides.dataverse.org/en/4.3/api/dataaccess.html
22:00	pdurbin	to download a file: https://dataverse.harvard.edu/api/access/datafile/2692294
22:01	pdurbin	and thanks for opening that issue. we love integrations
22:02	pdurbin	pameyer: "git" version? the metadata would be in git?
22:06	pameyer	pdurbin: that's how it sounded to me - but also sounded like that was a few steps ahead of where things are at
22:09	pdurbin	pameyer: two years ago I posted this: A Thought Experiment: Datasets As Git Repos - https://docs.google.com/document/d/18WDIS8hrFJvMJBcnRuQ8NfD-VxGq32vJ9WwlEgyyWZs/edit?usp=sharing
22:14	pameyer	there seems to be a general theme emerging that data files (or data streams) live one place, and metadata lives another place
22:14	pameyer	I didn't think wiki's were a git feature - github maybe, but not strictly git
22:15	pameyer	could do some kind of serialization of metadata to a file; and a doi at a commit (tagged or not)
22:17	pdurbin	Yeah, in DVN 3.x we serialized metadata to disk (as XML). We plan to do the same in Dataverse 4 but I'm not sure if there's a specific issue tracking this.
22:17	pdurbin	This would be for preservation purposes.
22:22	pameyer	I'll leave preservation for people that know more about it than me. but for either data publication, or distribution of working data, there's an argument to be made for having the metadata living the same place as the data, or having it not living the same place as the data
22:24	pdurbin	I'm not sure which side you're arguing for. :)
22:24	pameyer	me either :-S
22:24	pdurbin	Give me a one-armed economist.
22:24	pameyer	:)
22:46		axfelix joined #dataverse
22:56		axfelix joined #dataverse

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.