IQSS logo

IRC log for #dvn, 2014-03-05

We've moved! Please join #dataverse instead. The new logs are at http://irclog.iq.harvard.edu/dataverse/today

| Channels | #dvn index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
01:53 LyndsySimon joined #dvn
01:58 axfelix joined #dvn
02:15 LyndsySimon_ joined #dvn
03:13 axfelix joined #dvn
11:51 sivoais_ joined #dvn
13:11 Guest63910 joined #dvn
13:30 realzies joined #dvn
14:20 jwhitney joined #dvn
14:36 LyndsySimon joined #dvn
15:42 Guest91338 joined #dvn
16:00 pdurbin hmm
16:00 balo hm?
16:00 pdurbin none of you fine folks have dataverse in your bones
16:00 pdurbin no offense
16:00 pdurbin :)
16:00 pdurbin I have a thought experiment
16:00 balo you can say that
16:00 pdurbin for people who have dataverse in their bones
16:01 balo i feel some dataverse particles in the corner of my heart
16:06 pdurbin Releasing datasets is related to versioning (a release means incrementing a version number) and Philip Durbin has written a thought experiment for treating datasets as git repos at https://docs.google.com/document/d/18WDIS8hrFJvMJBcnRuQ8NfD-VxGq32vJ9WwlEgyyWZs/edit?usp=sharing
16:07 pdurbin balo: I just added that to https://redmine.hmdc.harvard.edu/issues/3628
16:07 pdurbin if anyone would like to comment on the Google Doc, please let me know your gmail address and I'll give you the proper permissions
16:10 balo i will read it if i get home
16:10 pdurbin balo: I hope you get home :)
16:10 balo seems interesting
16:11 balo i have to prepare tonight for my presentation. i'll talk about a new web stream api tomorrow :D
16:11 pdurbin balo: ah. break a leg!
16:11 balo still have nothing from it
16:11 skay pdurbin: hey, https://github.com/maxogden/dat
16:13 pdurbin "real-time replication and versioning for large tabular data sets. pre-alpha!"
16:13 pdurbin hmm
16:13 skay pdurbin: http://dataprotocols.org/
16:13 skay http://dataprotocols.org/revisioning-data/
16:13 skay http://git-annex.branchable.com/
16:14 pdurbin heh. yeah, I know about git-annex. scares me a bit
16:14 pdurbin skay: these links are great. thanks!
16:14 skay pdurbin: cool! I don't version data yet, but it is something I will need to support
16:14 pdurbin huh. "GeoGit is an open source tool that draws inspiration from Git, but adapts its core concepts to handle distributed versioning of geospatial data" -- http://geogit.org
16:14 skay so I am a squirrel for this right now
16:16 skay pdurbin: someone posted about bup in a thread on the osf list https://stackoverflow.com/questions/8001663/can-git-treat-zip-files-as-directories-and-files-inside-the-zip-as-blobs/20129617#20129617
16:19 pdurbin skay: ah. as an alternative to git-annex. interesting. and a whole show about it: http://episodes.gitminutes.com/2013/10/gitminutes-24-zoran-zaric-on-backups.html
16:20 pdurbin skay: I dunno though. I've heard very good things about https://github.com/eclipse/jgit
16:23 axfelix joined #dvn
16:24 skay you can add me the doc shekay for gmail
16:24 skay but I don't know if I'll have any chance to make comments and participate right now. it is crunch mode time
16:25 pdurbin skay: done
16:26 pdurbin LyndsySimon: I bet you think about using git as a back end a lot
16:26 LyndsySimon ?
16:27 pdurbin LyndsySimon: for storing data, files, etc.
16:27 LyndsySimon Ah. I'm catching up with the conversation now. Yeah, we use git on the backend for OSF's project-based file storage. It has advantages and disadvantages.
16:27 pdurbin LyndsySimon: tell us more
16:28 LyndsySimon It's great for things like CSV files - text files where each record is a line. For anything else (especially binary files), it's probably not the best tool for the job.
16:28 LyndsySimon git-annex basically takes the version storage part out of git and leaves only change tracking. It's useful, but we're not using it at all.
16:29 LyndsySimon I need to read about git descending into archives. That's an interesting concept for sure.
16:29 LyndsySimon As a rule though, we're not trying to be a fileserver. Other services - Github, Bitbucket, S3, Dataverse, Figshare, &c - are better suited to that, and we're happy to have them fill the role.
16:31 pdurbin LyndsySimon: well, dataverse is not a file server
16:31 LyndsySimon Also, there is a huge need to version tracking for binary files. It's a very difficult problem to solve, but it's not impossible. We can't be the only people out there that would like to be able to keep version of things like SAS's *.sas7bdat files. If you could do intra-line diffs and write extensions to display diffs in specific binary file formats, it would be an awesome and useful tool.
16:32 LyndsySimon pdurbin: I have to admit, I'm honestly not well versed in what DVN does. I've only briefly touched the add-on code on our side to help with specific issues. I've been mostly consumed with non-OSF projects since you guys were here.
16:33 LyndsySimon The past few weeks have been about core OSF features and teaching interns. Fun stuff, but I'll be glad when I'm able to sit down and do some serious architecture and refactoring work again :)
16:34 pdurbin LyndsySimon: sure sure, no worries
16:34 LyndsySimon pdurbin: That Google Doc of yours that's linked above - is that open to input?
16:34 LyndsySimon Specifically, you ask how metadata could be versioned. Git is already doing that on the back, when you think about it
16:35 LyndsySimon Each commit has a hash, for one thing. That's metadata about the commit, just as much as permissions are metadata about the whole repo. Why not just have a .permissions file in the .git directory that spells that out? If your'e not using git on the backend, I'm sure there's some sort of analog. Every VCS is going to have to store its data somewhere.
16:37 LyndsySimon Actually, now that I typed that out here - I don't feel the need to insert it into the document anymore, lol.
16:48 sivoais joined #dvn
17:23 LyndsySimon joined #dvn
17:42 axfelix so I like this git versioning design doc quite a bit...
17:43 axfelix I see that someone else has already pointed out the issue with binary files
17:43 axfelix but I'm 100% in favour of leaving the metadata elements out of the user-manipulable filesystem part
17:44 pdurbin LyndsySimon: if you tell me your gmail address I'll give you access to comment on the doc
17:44 pdurbin axfelix: makes sense. glad you like the doc
17:51 LyndsySimon pdurbin: simon.lyndsy@gmail.com - but like I said, now that I verbalized it here, I don't have a pressing desire :)
17:51 LyndsySimon Plus, I've been reading about Bup as I've been waiting on tests to run, and it's changing the way I'm thinking about some of it.
17:55 pdurbin LyndsySimon: I just made it so you can comment
17:56 pdurbin LyndsySimon: I also linked back to these IRC logs so please don't feel like you *have* to comment in the doc to be heard
17:56 LyndsySimon Fair enough :)
17:56 pdurbin LyndsySimon: and I added a link to what I mean by metadata
17:56 pdurbin LyndsySimon: this is what I mean by metadata. Metadata for datasets: https://groups.google.com/d/msg/dataverse-community/fBjW8VBHAPE/DPaCANOwS9YJ
17:57 pdurbin (we have a lot of metadata)
17:57 pdurbin :)
19:38 LyndsySimon joined #dvn
19:55 LyndsySimon joined #dvn
21:12 skay pdurbin: hey I hang out in a channel where the maintainer of git-annex is and I mentioned the data versioning thing and suggested that if it was an interest to stop in here and say hi
21:13 skay he replied and said that he knows some neuroscience people who are also interested in this use case and will introduce us all
21:13 skay pdurbin: may I share your email?
21:13 skay also, he said your nick sounds familiar and was wondering if you are in #debian
21:24 pdurbin skay: thanks! I gotta pick up the kids but I'd be happy to chat with him. Joey Hess, right? I use and love his wiki software
21:24 pdurbin javaeebot: lucky ikiwiki
21:24 javaeebot pdurbin: http://ikiwiki.info/
21:24 pdurbin skay: that one
21:25 pdurbin skay: do you mind dropping him a link to these IRC logs? http://irclog.iq.harvard.edu/dvn/2014-03-05
21:27 skay oh I didn't realize he did ikiwiki
21:27 skay IPOL uses ikiwiki I think
21:30 skay pdurbin: done! (and IPOL is http://www.ipol.im/ image processing on line, with demos of the code in the papers)
21:32 pdurbin "IPOL is a research journal of image processing and image analysis."
21:32 pdurbin neat
21:32 pdurbin gotta run. don't forget to comment on the google doc!
21:33 pdurbin ... everyone! :)
23:03 LyndsySimon joined #dvn
23:05 LyndsySimon joined #dvn
23:54 pdurbin skay: whoa, you're in #openhatch too? I just discovered that channel: http://irclogs.jackgrigg.com/irc.freenode.net/openhatch/2014-03-05#i_3289382
23:54 skay pdurbin: yeah!
23:54 skay I've helped a few times too with some openhatchy things. shauna and paulproteus came to Chicago to run an Open Source Comes to Campus event
23:54 skay and I volunteered
23:55 shauna it was great!
23:55 skay pdurbin: you should do an OSCtC event! you should!
23:55 shauna (fyi there will probably be another event this sprint in Chicago, at NEIU)
23:55 shauna *spring
23:55 skay spring, okay
23:55 skay do you have a time window?
23:55 skay or just spring?
23:56 shauna They suggested Apr 19th, but we're doing an event at George Mason University that day.
23:56 skay pdurbin: sometimes I am in #graphite #rackspace and #docker but I'm really busy this week and the channel colors distract me
23:56 shauna And I'm not sure we're "scaling" enough yet to be able to do two at once.
23:56 shauna But we might be?
23:56 shauna I don't know.  So maybe April 19th.
23:56 pdurbin javaeebot: lucky OSCtC event
23:56 javaeebot pdurbin: https://openhatch.org/wiki/OpenHatch_affiliated_projects
23:57 skay javaeebot: lucky open source comes to campus
23:57 javaeebot skay: http://campus.openhatch.org/
23:57 pdurbin ah. Open Source Comes to Campus - http://campus.openhatch.org
23:57 pdurbin interesting
23:57 * pdurbin works on a campus
23:58 shauna ooooh, tell me more pdurbin :)
23:58 * shauna runs OSCTC
23:58 * skay notes that dvn is open source
23:58 pdurbin oh, I saw Obama on the way home: https://plus.google.com/+PhilipDurbin/posts/iQbJZSJpL7q
23:59 shauna This http://campus.openhatch.org/projects.html is a better link than this https://openhatch.org/wiki/OpenHatch_affiliated_projects
23:59 pdurbin shauna: not quite in Harvard Yard, a bit north

| Channels | #dvn index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

We've moved! Please join #dataverse instead. The new logs are at http://irclog.iq.harvard.edu/dataverse/today