Time
S
Nick
Message
07:15
Virgile joined #dataverse
10:46
dataverse-user joined #dataverse
12:41
donsizemore joined #dataverse
14:38
Virgile joined #dataverse
15:14
Virgile joined #dataverse
19:01
andrew-reece joined #dataverse
19:02
andrew66 joined #dataverse
19:09
andrew66
hi, i have a medium-large dataset (~2.5TB) that i've built while working for a private company. it is comprised of unstructured (audio/video), semi-structured (transcript), and structured (tabular) data files. we'd like to publish the dataset on dataverse, so other academic researchers can make use of it. but we'd also like to retain some control over who accesses the data - specifically, we'd like to ensure that requestors are verified acad
19:13
pdurbin
andrew66: hi! Sorry you were cut off at "verified acad". academics, I assume. :)
19:14
andrew66
...- specifically, we'd like to ensure that requestors are verified academic researchers/labs. is it possible to set up that kind of access control on dataverse?
19:14
andrew66
^pdurbin, that's the rest of it. and hi!
19:15
donsizemore
Danny the Dataverse PM asks if you'd contact support dataverse.harvard.edu (also hello to you as well =) )
19:16
pdurbin
Files in Dataverse can certainly be marked as "restricted". That way people who want to download have to go through an approval process. I'm a little concerned about the size (2.5 TB) but hopefully it could be managed.
19:17
pdurbin
Also, to donsizemore's point, are you asking about hosting with Harvard Dataverse or operating your own installation of Dataverse?
19:17
andrew66
Ok, I'll check in with support dataverse.harvard.edu
19:18
andrew66
>asking about hosting with Harvard Dataverse
19:18
andrew66
^this
19:18
andrew66
we'd like to publish findings related to this dataset in an academic journal, and we're looking for a place to host this data so it's easy for researchers to access, but not completely open-access
19:19
andrew66
dataverse is the best option i've found so far
19:20
andrew66
then again, i'm not totally clear on what operating my own installation of DV would entail, so i suppose i shouldn't rule that out until i understand it better.
19:22
andrew66
email sent! thanks for your quick replies @pdurbin and @donsizemore
19:24
pdurbin
andrew66: the ticket came through. Thanks.
19:25
andrew66
great, i'll look forward to continuing our discussion on that thread.
19:26
pdurbin
andrew66: heads up that "up to 1TB" is what's on https://support.dataverse.harvard.edu (for free).
19:30
andrew66
pdurbin: understood, thanks for the link. it's possible we could just trim down the dataset, but maybe worth talking about a bit more first? it's likely to be an influential dataset for both social science and ML/AI research communities alike - it'd be a great addition to the Dataverse trove.
19:30
pdurbin
mmmm, data
19:50
andrew66
i just talked to my lead data engineer, looks like we may have over-estimated by including lower-quality data we'd intended to drop. re-calculating total size now, but likely to be around 35-50% of the original estimate. that should get us close to, maybe even under 1TB. but i think safe to say we'll be under 1.5TB for the final dataset.
19:53
pdurbin
cool
19:54
pdurbin
Have you played with Dataverse yet? We have a demo site if you'd like to kick the tires: https://demo.dataverse.org
20:32
bjonnh joined #dataverse
21:39
pdurbin left #dataverse