IQSS logo

IRC log for #dataverse, 2016-04-01

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
00:23 xvx joined #dataverse
01:36 axfelix joined #dataverse
04:58 axfelix joined #dataverse
06:42 jri joined #dataverse
07:32 bencomp joined #dataverse
13:48 pameyer joined #dataverse
14:47 axfelix joined #dataverse
15:39 axfelix joined #dataverse
17:18 cnk joined #dataverse
20:06 iamtimmo quick DV 4 comment and question: I note that after setting :AllowSignUp to false via the API, the signup link disappears from the header. But if I already know the create user url, I can still get the create user screen when not logged in. Is there a way to stop that happening?
20:07 iamtimmo Couldn’t find this anywhere in docs, but feel free to scold me if I missed this somehow.
20:07 pameyer iamtimmo: do you have apache (or another webserver) in front of glassfish?
20:07 iamtimmo pameyer: yep. currently apache.
20:08 pameyer then a quick and dirty approach would be to rewrite or redirect the create user url to the 404 page
20:08 pameyer or somewhere else
20:08 pdurbin iamtimmo: there's a reason I put a warning at http://guides.dataverse.org/en/4.3/installation/config.html#allowsignup :)
20:11 pdurbin iamtimmo: I just added a comment at https://github.com/IQSS/dataverse/issues/2838#issuecomment-204551248
20:11 pdurbin pameyer: yes, that would be a good way to mitigate the risk in 4.3 and earlier.
20:13 pameyer pdurbin: fixed is better than worked-around :)
20:13 pdurbin I think so. :)
20:13 pdurbin Now that the issue has gone through QA I guess I could remove the warning. In that pull request.
20:15 yoh pdurbin: I see that you have met Joey! ;)
20:15 iamtimmo pameyer / pdurbin: Thanks for the pointers.
20:16 pdurbin yoh: Joey is awesome! https://github.com/IQSS/dataverse/issues/2863#issuecomment-199322114
20:16 pdurbin iamtimmo: sure. I'm here for 15 more minutes if you have any questions.
20:17 iamtimmo pdurbin: Nothing more for today on this one, I don’t think.
20:17 pdurbin ok
20:18 pdurbin pameyer: remember our priority labels? Looks like they are changing a bit: Prioritizing Dataverse Github Issues - Google Groups - https://groups.google.com/forum/#!topic/dataverse-community/eq5aWkLbZ24
20:20 yoh pdurbin: I know ;)  have you talked about 'annex'ification of dataverse one way or another?
20:22 pameyer pdurbin: looks like descending order got the hat tip for those
20:22 pdurbin yoh: well, I got to hear about http://datalad.org a bit. I mentioned it to pameyer at http://irclog.iq.harvard.edu/dataverse/2016-03-21#i_33014 but then I forgot to tell him any more in person. My main takeaway was that pameyer and I are on the right track with hoping to add support for rsync.
20:23 pdurbin pameyer: my take is that if you take all the highest values (P4, S5, i5, E5) it translates to "hair on fire" :)
20:24 pameyer those will hopefully be rare
20:25 pdurbin hopefully, I can't spare the hair these days
20:26 yoh pdurbin: I was wondering more of "expose datasets as ready to be consumed git/annex repositories which would access data from the dataverse servers" ;)
20:28 pameyer yoh: are you thinking on the same (or an acessible) filesystem?
20:30 yoh rright... so in the end I see something like   datalad install --full //dataverse/harvard/datasetX   which would pull that annex repository to the local drive with content being fetched from the dataverse
20:30 yoh example:  https://github.com/datalad/openfmri--sha256-ds000202   which points to content within tarball archives available on S3 bucket (versioned) as pointed to by openfmri.org/dataset/ds000202 ;)
20:31 pdurbin yoh: ah, so the data would be fetched via S3. not git
20:31 yoh data will be fetch via whatever annex supports ;)
20:31 yoh and actually whatever datalad provide support  for
20:32 yoh e.g. content from archives which are in turn also available within annex, and in turn available from somewhere online (s3/http/ftp/rsync/....)
20:33 axfelix joined #dataverse
20:34 pdurbin yoh: duh. You're the top contribuotor on https://github.com/datalad/datalad ... I should have mentioned you to Joey :)
20:34 yoh I wonder -- if meta information about datasets available through some simple API? so I don't need to scrape anything from the pages? ;)
20:34 yoh he knows me ;)
20:35 * yoh proudly states that DataLad  project supports Joey's git-annex development already for more than a year ;-)
20:35 pdurbin yoh: metadata for published datasets is easy to grab. such as https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI as I explained at https://github.com/IQSS/dataverse/issues/1837#issuecomment-197468164
20:36 yoh https://git-annex.branchable.com/thanks/
20:37 pdurbin yoh: I use and love Joey's ikiwiki code: http://wiki.greptilian.com/ikiwiki
20:38 pdurbin anyway, must run. have a good weekend, all!
20:38 yoh pdurbin: you too! cheers
20:38 iamtimmo happy weekend, pdurbin
20:41 yoh pdurbin: whenever you get online, if you could point me also how to get a list of all persistenIds, would be nice ;)
20:44 pdurbin yoh: your best best is probably iterating with the Search API: http://guides.dataverse.org/en/4.3/api/search.html ... you'll need an API token
20:45 pdurbin really going now :)
20:45 pameyer enjoy the good weather while it lasts
20:45 yoh it isn't raining down there? damn
20:45 yoh not fair
20:46 pameyer today is nice, tomorrow is supposed to be nice - Sun is supposed to snow here
20:49 yoh d'oh -- snow again?  you lucky .... (although at this time not that lucky)   we got really not enough snow this year in NH :-/
20:51 pameyer MA was the same this year - not a lot of snow
20:51 yoh I thought that at least some snow storms touched MA while fully avoiding NH
20:52 pameyer I'm still using Wisconsin and upstate NY as my benchmarks
20:53 yoh indeed - sunday/monday snow showers and subfreezing temps... "very nice"
20:53 pameyer yoh: I was taking a look at datalad, and got the sense you might be able to answer some questions about it
20:55 yoh pameyer: I might
20:58 pameyer are tarballs things get passed to typical neuro software, or do they have to be unpacked first?
20:59 pameyer I'm wondering because this was something that we considered - tgz is easy to upload, but hard to compute directly on
21:07 yoh unpacked
21:08 yoh that is one of the points of datalad -- we want to take that burden and ambiguity away ;)
21:09 yoh and the 2nd major point is versioning -- so you would deal with data as with code pretty much
21:09 yoh and the 3rd is distribution -- so you deal with data packages as with any software packages (on debian/conda/...) -- easy and convenient ;)
21:11 yoh see e.g.   http://pastebin.com/w5sfp6Ca
21:12 yoh that file is listed as available from multiple tarballs and even from multiple files within those tarballs (i.e. it was identical across subjects)
21:16 pameyer makes sense
21:16 pameyer I've been thinking "immutable data" lately; but that's probably because I'd been thinking primary data
21:17 pameyer aka - this came off the detector; if it changes something's gone wrong somewhere
21:17 pameyer but versioning for processed data and models could be very helpful :)
21:18 pameyer does the metadata live on the filesystem in this model, or somewhere else?
21:20 pameyer or still TBD?
21:20 yoh TBD
21:20 yoh it will be within git of each dataset
21:21 yoh "flowing" up in the hierarchy cached for searching
21:21 yoh in neuroimaging raw data is at the scanner in proprietary or DICOM format
21:22 yoh noone works on that in research -- we need to convert data etc.  and that is not ... guaranteed since you do use some software forp that and deal with proprietary fields in dicom etc
21:22 yoh see e.g. https://openfmri.org/dataset-orientation-issues/
21:22 garnett joined #dataverse
21:22 yoh so -- you might need to version data really close to its origin, even before really processing it etc ;)
21:23 yoh other gotchas could be -- incomplete transfers etc, which could happen to anyone at any stage ;)
21:23 pameyer makes sense
21:23 pameyer checksums are nice for that
21:23 yoh indeed
21:24 pameyer any thoughts about pushing metadata to the DOI system for a "published" git version of a dataset?
21:25 yoh "eventually" "may be" ;)
21:25 pameyer gotcha :)
21:26 yoh so if there is an interest, could work on  https://github.com/datalad/datalad/issues/393  ;-)
21:30 pameyer there's always more interesting things to be done that there is time to do them in ….
21:31 yoh indeed ;-)
21:31 pameyer I've been working on (and brainstorming about) ways to move "normal" sized datasets around through non-http
21:31 pameyer but hadn't looked at git seriously
21:32 pameyer but my "normal" is structural biologist normal size primary data
21:33 yoh well -- could go to normal git I guess ;-)
21:33 yoh ok -- now I need to run.  Have a good weekend, cheers
21:33 pameyer have a good weekend
21:35 garnett joined #dataverse
22:00 pdurbin yoh: I just emailed you back with a link to http://guides.dataverse.org/en/4.3/api/dataaccess.html
22:00 pdurbin to download a file: https://dataverse.harvard.edu/api/access/datafile/2692294
22:01 pdurbin and thanks for opening that issue. we love integrations
22:02 pdurbin pameyer: "git" version? the metadata would be in git?
22:06 pameyer pdurbin: that's how it sounded to me - but also sounded like that was a few steps ahead of where things are at
22:09 pdurbin pameyer: two years ago I posted this: A Thought Experiment: Datasets As Git Repos - https://docs.google.com/document/d/18WDIS8hrFJvMJBcnRuQ8NfD-VxGq32vJ9WwlEgyyWZs/edit?usp=sharing
22:14 pameyer there seems to be a general theme emerging that data files (or data streams) live one place, and metadata lives another place
22:14 pameyer I didn't think wiki's were a git feature - github maybe, but not strictly git
22:15 pameyer could do some kind of serialization of metadata to a file; and a doi at a commit (tagged or not)
22:17 pdurbin Yeah, in DVN 3.x we serialized metadata to disk (as XML). We plan to do the same in Dataverse 4 but I'm not sure if there's a specific issue tracking this.
22:17 pdurbin This would be for preservation purposes.
22:22 pameyer I'll leave preservation for people that know more about it than me.  but for either data publication, or distribution of working data, there's an argument to be made for having the metadata living the same place as the data, or having it *not* living the same place as the data
22:24 pdurbin I'm not sure which side you're arguing for. :)
22:24 pameyer me either :-S
22:24 pdurbin Give me a one-armed economist.
22:24 pameyer :)
22:46 axfelix joined #dataverse
22:56 axfelix joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.