Time
S
Nick
Message
02:24
djbrooke joined #dataverse
05:08
axfelix joined #dataverse
08:15
jri joined #dataverse
10:11
petrichor joined #dataverse
12:11
petrichor joined #dataverse
12:42
petrichor joined #dataverse
12:47
petrichor joined #dataverse
12:51
petrichor joined #dataverse
13:44
petrichor joined #dataverse
13:49
jri_ joined #dataverse
13:50
donsizemore joined #dataverse
13:53
petrichor joined #dataverse
14:05
djbrooke joined #dataverse
14:47
jri joined #dataverse
14:54
pdurbin
donsizemore: congrats on "we got basic publishing working from Discovery Environment into Dataverse" http://irclog.iq.harvard.edu/dataverse/2016-11-18#i_45268
14:56
donsizemore
@pdurbin Discovery Environment/iRODS => Condor => Docker => Dataverse =)
14:58
donsizemore
@pdurbin it's at https://github.com/DFC-Incubator/dataverse-publisher/tree/demo-2016-11-21 - next they want to be able to say download all files in a given dataset directly into iRODS and a few other common tasks
14:58
pdurbin
"This is an Java application that uploads (publishes) a local file to a pre-specified dataverse destination."
14:59
pdurbin
donsizemore: interesting! Does it use https://github.com/IQSS/dataverse-client-java ?
15:00
pdurbin
donsizemore: I forget if I ever told you that we use Condor at IQSS: http://projects.iq.harvard.edu/rce/book/batch-basics
15:01
donsizemore
@pdurbin I know Mike and Akio have seen that project but they wrote the code; my contribution was cobbling it in via Condor/Docker and doing the test runs. (in essence, you want Akio or michael.c.conway gmail.com)
15:03
donsizemore
@pdurbin you didn't. is anyone related to Dataverse running that stack (and have they seen Discovery Environment?)?
15:06
pdurbin
donsizemore: Condor is run by "Technology Services" on this org chart: http://www.iq.harvard.edu/files/iqss-harvard/files/iqss_org_chart_final_2.1.16.pdf
15:13
pameyer joined #dataverse
15:30
pameyer
donsizemore: which condor?
15:32
pdurbin
pameyer: do you use Condor too?
15:32
pameyer
if you mean condor the job scheduler, sometimes
15:33
pdurbin
yeah, that one
15:33
pameyer
cool - I wasn't sure if there were multiple projects with the same name
15:38
pdurbin
I haven't hacked on the Condor stuff in years. That was a past life. :)
15:53
pameyer
hopefully I won't be giving you flashbacks - but I've been considering it as a possible "abstraction of computation" component
15:54
djbrooke joined #dataverse
15:56
pdurbin
condor certainly does computation
15:56
djbrooke joined #dataverse
15:56
djbrooke joined #dataverse
15:56
pdurbin
Historically, the data at IQSS has been so small that if you want to process it with Condor, you just download it in your R script or whatever.
15:57
pameyer
Right - but in terms of how to describe workflows between related datasets, condor DAGs seem like they might have some potential
16:00
pdurbin
Cool. I'm out of my depth, but cool.
16:01
donsizemore joined #dataverse
16:11
pdurbin
pameyer: speaking of computation, I wonder if you'd be interesting in swinging by for a talk by http://codeocean.com
16:12
pameyer
pretty picture - what is it?
16:13
pdurbin
it's hard to find any info... here's a bit: https://tech.cornell.edu/programs/startup-postdocs/runway-companies
16:25
pameyer
pdurbin: to your original question, maybe. when?
16:25
pdurbin
pameyer: I'll forward the email to you. I'm sure you'd be quite welcome.
16:47
djbrooke joined #dataverse
16:53
djbrooke joined #dataverse
17:11
pdurbin
community call has started: https://docs.google.com/document/d/16P0feohPzPflDOtv9GMA-bAe3i3QdHvQRrnc6uPjX3Y/edit?usp=sharing
18:03
djbrooke joined #dataverse
18:30
djbrooke joined #dataverse
18:40
djbrooke joined #dataverse
18:43
djbrooke joined #dataverse
19:00
donsizemore joined #dataverse
19:33
djbrooke joined #dataverse
20:31
djbrooke joined #dataverse
20:41
djbrooke
pameyer: just had someone visit from HBS who wants to use the DCM
20:42
pameyer
cool - in what context?
20:42
djbrooke
she has "millions" of small text files - SEC filings from 1950 to present-ish
20:43
pameyer
DCM by itself, or DCM+Dataverse?
20:43
djbrooke
DCM+Dataverse - she saw the roadmap item and it made her think of it
20:43
pameyer
means I need to bump the upper bound for my load testing on number of files then
20:44
pameyer
the user isn't on windows, are they?
20:44
djbrooke
Is the correct answer "no?"
20:44
pameyer
hopefully ….
20:45
djbrooke
I'll check... forgot about that restriction. But she's a librarian and can definitely get her hands on a Mac
20:45
pameyer
we've had some discussions about going from generated transfer scripts to more generic upload client; but if you can't run unix scripts then in its current form things aren't likely to work
21:01
pdurbin
millions of SEC filings makes me think we should run them through Consilience to cluster them somehow :)
21:02
pameyer
SEC filings have gotten a lot easier since they started requiring filings in machine-readable format
21:02
axfelix joined #dataverse
21:04
axfelix joined #dataverse
21:19
petrichor joined #dataverse
21:40
axfelix joined #dataverse
22:02
djbrooke joined #dataverse