Time |
S |
Nick |
Message |
01:53 |
|
|
LyndsySimon joined #dvn |
01:58 |
|
|
axfelix joined #dvn |
02:15 |
|
|
LyndsySimon_ joined #dvn |
03:13 |
|
|
axfelix joined #dvn |
11:51 |
|
|
sivoais_ joined #dvn |
13:11 |
|
|
Guest63910 joined #dvn |
13:30 |
|
|
realzies joined #dvn |
14:20 |
|
|
jwhitney joined #dvn |
14:36 |
|
|
LyndsySimon joined #dvn |
15:42 |
|
|
Guest91338 joined #dvn |
16:00 |
|
pdurbin |
hmm |
16:00 |
|
balo |
hm? |
16:00 |
|
pdurbin |
none of you fine folks have dataverse in your bones |
16:00 |
|
pdurbin |
no offense |
16:00 |
|
pdurbin |
:) |
16:00 |
|
pdurbin |
I have a thought experiment |
16:00 |
|
balo |
you can say that |
16:00 |
|
pdurbin |
for people who have dataverse in their bones |
16:01 |
|
balo |
i feel some dataverse particles in the corner of my heart |
16:06 |
|
pdurbin |
Releasing datasets is related to versioning (a release means incrementing a version number) and Philip Durbin has written a thought experiment for treating datasets as git repos at https://docs.google.com/document/d/18WDIS8hrFJvMJBcnRuQ8NfD-VxGq32vJ9WwlEgyyWZs/edit?usp=sharing |
16:07 |
|
pdurbin |
balo: I just added that to https://redmine.hmdc.harvard.edu/issues/3628 |
16:07 |
|
pdurbin |
if anyone would like to comment on the Google Doc, please let me know your gmail address and I'll give you the proper permissions |
16:10 |
|
balo |
i will read it if i get home |
16:10 |
|
pdurbin |
balo: I hope you get home :) |
16:10 |
|
balo |
seems interesting |
16:11 |
|
balo |
i have to prepare tonight for my presentation. i'll talk about a new web stream api tomorrow :D |
16:11 |
|
pdurbin |
balo: ah. break a leg! |
16:11 |
|
balo |
still have nothing from it |
16:11 |
|
skay |
pdurbin: hey, https://github.com/maxogden/dat |
16:13 |
|
pdurbin |
"real-time replication and versioning for large tabular data sets. pre-alpha!" |
16:13 |
|
pdurbin |
hmm |
16:13 |
|
skay |
pdurbin: http://dataprotocols.org/ |
16:13 |
|
skay |
http://dataprotocols.org/revisioning-data/ |
16:13 |
|
skay |
http://git-annex.branchable.com/ |
16:14 |
|
pdurbin |
heh. yeah, I know about git-annex. scares me a bit |
16:14 |
|
pdurbin |
skay: these links are great. thanks! |
16:14 |
|
skay |
pdurbin: cool! I don't version data yet, but it is something I will need to support |
16:14 |
|
pdurbin |
huh. "GeoGit is an open source tool that draws inspiration from Git, but adapts its core concepts to handle distributed versioning of geospatial data" -- http://geogit.org |
16:14 |
|
skay |
so I am a squirrel for this right now |
16:16 |
|
skay |
pdurbin: someone posted about bup in a thread on the osf list https://stackoverflow.com/questions/8001663/can-git-treat-zip-files-as-directories-and-files-inside-the-zip-as-blobs/20129617#20129617 |
16:19 |
|
pdurbin |
skay: ah. as an alternative to git-annex. interesting. and a whole show about it: http://episodes.gitminutes.com/2013/10/gitminutes-24-zoran-zaric-on-backups.html |
16:20 |
|
pdurbin |
skay: I dunno though. I've heard very good things about https://github.com/eclipse/jgit |
16:23 |
|
|
axfelix joined #dvn |
16:24 |
|
skay |
you can add me the doc shekay for gmail |
16:24 |
|
skay |
but I don't know if I'll have any chance to make comments and participate right now. it is crunch mode time |
16:25 |
|
pdurbin |
skay: done |
16:26 |
|
pdurbin |
LyndsySimon: I bet you think about using git as a back end a lot |
16:26 |
|
LyndsySimon |
? |
16:27 |
|
pdurbin |
LyndsySimon: for storing data, files, etc. |
16:27 |
|
LyndsySimon |
Ah. I'm catching up with the conversation now. Yeah, we use git on the backend for OSF's project-based file storage. It has advantages and disadvantages. |
16:27 |
|
pdurbin |
LyndsySimon: tell us more |
16:28 |
|
LyndsySimon |
It's great for things like CSV files - text files where each record is a line. For anything else (especially binary files), it's probably not the best tool for the job. |
16:28 |
|
LyndsySimon |
git-annex basically takes the version storage part out of git and leaves only change tracking. It's useful, but we're not using it at all. |
16:29 |
|
LyndsySimon |
I need to read about git descending into archives. That's an interesting concept for sure. |
16:29 |
|
LyndsySimon |
As a rule though, we're not trying to be a fileserver. Other services - Github, Bitbucket, S3, Dataverse, Figshare, &c - are better suited to that, and we're happy to have them fill the role. |
16:31 |
|
pdurbin |
LyndsySimon: well, dataverse is not a file server |
16:31 |
|
LyndsySimon |
Also, there is a huge need to version tracking for binary files. It's a very difficult problem to solve, but it's not impossible. We can't be the only people out there that would like to be able to keep version of things like SAS's *.sas7bdat files. If you could do intra-line diffs and write extensions to display diffs in specific binary file formats, it would be an awesome and useful tool. |
16:32 |
|
LyndsySimon |
pdurbin: I have to admit, I'm honestly not well versed in what DVN does. I've only briefly touched the add-on code on our side to help with specific issues. I've been mostly consumed with non-OSF projects since you guys were here. |
16:33 |
|
LyndsySimon |
The past few weeks have been about core OSF features and teaching interns. Fun stuff, but I'll be glad when I'm able to sit down and do some serious architecture and refactoring work again :) |
16:34 |
|
pdurbin |
LyndsySimon: sure sure, no worries |
16:34 |
|
LyndsySimon |
pdurbin: That Google Doc of yours that's linked above - is that open to input? |
16:34 |
|
LyndsySimon |
Specifically, you ask how metadata could be versioned. Git is already doing that on the back, when you think about it |
16:35 |
|
LyndsySimon |
Each commit has a hash, for one thing. That's metadata about the commit, just as much as permissions are metadata about the whole repo. Why not just have a .permissions file in the .git directory that spells that out? If your'e not using git on the backend, I'm sure there's some sort of analog. Every VCS is going to have to store its data somewhere. |
16:37 |
|
LyndsySimon |
Actually, now that I typed that out here - I don't feel the need to insert it into the document anymore, lol. |
16:48 |
|
|
sivoais joined #dvn |
17:23 |
|
|
LyndsySimon joined #dvn |
17:42 |
|
axfelix |
so I like this git versioning design doc quite a bit... |
17:43 |
|
axfelix |
I see that someone else has already pointed out the issue with binary files |
17:43 |
|
axfelix |
but I'm 100% in favour of leaving the metadata elements out of the user-manipulable filesystem part |
17:44 |
|
pdurbin |
LyndsySimon: if you tell me your gmail address I'll give you access to comment on the doc |
17:44 |
|
pdurbin |
axfelix: makes sense. glad you like the doc |
17:51 |
|
LyndsySimon |
pdurbin: simon.lyndsygmail.com - but like I said, now that I verbalized it here, I don't have a pressing desire :) |
17:51 |
|
LyndsySimon |
Plus, I've been reading about Bup as I've been waiting on tests to run, and it's changing the way I'm thinking about some of it. |
17:55 |
|
pdurbin |
LyndsySimon: I just made it so you can comment |
17:56 |
|
pdurbin |
LyndsySimon: I also linked back to these IRC logs so please don't feel like you *have* to comment in the doc to be heard |
17:56 |
|
LyndsySimon |
Fair enough :) |
17:56 |
|
pdurbin |
LyndsySimon: and I added a link to what I mean by metadata |
17:56 |
|
pdurbin |
LyndsySimon: this is what I mean by metadata. Metadata for datasets: https://groups.google.com/d/msg/dataverse-community/fBjW8VBHAPE/DPaCANOwS9YJ |
17:57 |
|
pdurbin |
(we have a lot of metadata) |
17:57 |
|
pdurbin |
:) |
19:38 |
|
|
LyndsySimon joined #dvn |
19:55 |
|
|
LyndsySimon joined #dvn |
21:12 |
|
skay |
pdurbin: hey I hang out in a channel where the maintainer of git-annex is and I mentioned the data versioning thing and suggested that if it was an interest to stop in here and say hi |
21:13 |
|
skay |
he replied and said that he knows some neuroscience people who are also interested in this use case and will introduce us all |
21:13 |
|
skay |
pdurbin: may I share your email? |
21:13 |
|
skay |
also, he said your nick sounds familiar and was wondering if you are in #debian |
21:24 |
|
pdurbin |
skay: thanks! I gotta pick up the kids but I'd be happy to chat with him. Joey Hess, right? I use and love his wiki software |
21:24 |
|
pdurbin |
javaeebot: lucky ikiwiki |
21:24 |
|
javaeebot |
pdurbin: http://ikiwiki.info/ |
21:24 |
|
pdurbin |
skay: that one |
21:25 |
|
pdurbin |
skay: do you mind dropping him a link to these IRC logs? http://irclog.iq.harvard.edu/dvn/2014-03-05 |
21:27 |
|
skay |
oh I didn't realize he did ikiwiki |
21:27 |
|
skay |
IPOL uses ikiwiki I think |
21:30 |
|
skay |
pdurbin: done! (and IPOL is http://www.ipol.im/ image processing on line, with demos of the code in the papers) |
21:32 |
|
pdurbin |
"IPOL is a research journal of image processing and image analysis." |
21:32 |
|
pdurbin |
neat |
21:32 |
|
pdurbin |
gotta run. don't forget to comment on the google doc! |
21:33 |
|
pdurbin |
... everyone! :) |
23:03 |
|
|
LyndsySimon joined #dvn |
23:05 |
|
|
LyndsySimon joined #dvn |
23:54 |
|
pdurbin |
skay: whoa, you're in #openhatch too? I just discovered that channel: http://irclogs.jackgrigg.com/irc.freenode.net/openhatch/2014-03-05#i_3289382 |
23:54 |
|
skay |
pdurbin: yeah! |
23:54 |
|
skay |
I've helped a few times too with some openhatchy things. shauna and paulproteus came to Chicago to run an Open Source Comes to Campus event |
23:54 |
|
skay |
and I volunteered |
23:55 |
|
shauna |
it was great! |
23:55 |
|
skay |
pdurbin: you should do an OSCtC event! you should! |
23:55 |
|
shauna |
(fyi there will probably be another event this sprint in Chicago, at NEIU) |
23:55 |
|
shauna |
*spring |
23:55 |
|
skay |
spring, okay |
23:55 |
|
skay |
do you have a time window? |
23:55 |
|
skay |
or just spring? |
23:56 |
|
shauna |
They suggested Apr 19th, but we're doing an event at George Mason University that day. |
23:56 |
|
skay |
pdurbin: sometimes I am in #graphite #rackspace and #docker but I'm really busy this week and the channel colors distract me |
23:56 |
|
shauna |
And I'm not sure we're "scaling" enough yet to be able to do two at once. |
23:56 |
|
shauna |
But we might be? |
23:56 |
|
shauna |
I don't know. So maybe April 19th. |
23:56 |
|
pdurbin |
javaeebot: lucky OSCtC event |
23:56 |
|
javaeebot |
pdurbin: https://openhatch.org/wiki/OpenHatch_affiliated_projects |
23:57 |
|
skay |
javaeebot: lucky open source comes to campus |
23:57 |
|
javaeebot |
skay: http://campus.openhatch.org/ |
23:57 |
|
pdurbin |
ah. Open Source Comes to Campus - http://campus.openhatch.org |
23:57 |
|
pdurbin |
interesting |
23:57 |
|
* pdurbin |
works on a campus |
23:58 |
|
shauna |
ooooh, tell me more pdurbin :) |
23:58 |
|
* shauna |
runs OSCTC |
23:58 |
|
* skay |
notes that dvn is open source |
23:58 |
|
pdurbin |
oh, I saw Obama on the way home: https://plus.google.com/+PhilipDurbin/posts/iQbJZSJpL7q |
23:59 |
|
shauna |
This http://campus.openhatch.org/projects.html is a better link than this https://openhatch.org/wiki/OpenHatch_affiliated_projects |
23:59 |
|
pdurbin |
shauna: not quite in Harvard Yard, a bit north |