IQSS logo

IRC log for #dataverse, 2021-02-01

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
07:15 Virgile joined #dataverse
10:46 dataverse-user joined #dataverse
12:41 donsizemore joined #dataverse
14:38 Virgile joined #dataverse
15:14 Virgile joined #dataverse
19:01 andrew-reece joined #dataverse
19:02 andrew66 joined #dataverse
19:09 andrew66 hi, i have a medium-large dataset (~2.5TB) that i've built while working for a private company.  it is comprised of unstructured (audio/video), semi-structured (transcript), and structured (tabular) data files. we'd like to publish the dataset on dataverse, so other academic researchers can make use of it. but we'd also like to retain some control over who accesses the data - specifically, we'd like to ensure that requestors are verified acad
19:13 pdurbin andrew66: hi! Sorry you were cut off at "verified acad". academics, I assume. :)
19:14 andrew66 ...- specifically, we'd like to ensure that requestors are verified academic researchers/labs. is it possible to set up that kind of access control on dataverse?
19:14 andrew66 ^pdurbin, that's the rest of it.  and hi!
19:15 donsizemore Danny the Dataverse PM asks if you'd contact support@dataverse.harvard.edu (also hello to you as well =) )
19:16 pdurbin Files in Dataverse can certainly be marked as "restricted". That way people who want to download have to go through an approval process. I'm a little concerned about the size (2.5 TB) but hopefully it could be managed.
19:17 pdurbin Also, to donsizemore's point, are you asking about hosting with Harvard Dataverse or operating your own installation of Dataverse?
19:17 andrew66 Ok, I'll check in with support@dataverse.harvard.edu
19:18 andrew66 >asking about hosting with Harvard Dataverse
19:18 andrew66 ^this
19:18 andrew66 we'd like to publish findings related to this dataset in an academic journal, and we're looking for a place to host this data so it's easy for researchers to access, but not completely open-access
19:19 andrew66 dataverse is the best option i've found so far
19:20 andrew66 then again, i'm not totally clear on what operating my own installation of DV would entail, so i suppose i shouldn't rule that out until i understand it better.
19:22 andrew66 email sent!  thanks for your quick replies @pdurbin and @donsizemore
19:24 pdurbin andrew66: the ticket came through. Thanks.
19:25 andrew66 great, i'll look forward to continuing our discussion on that thread.
19:26 pdurbin andrew66: heads up that "up to 1TB" is what's on https://support.dataverse.harvard.edu (for free).
19:30 andrew66 pdurbin: understood, thanks for the link.  it's possible we could just trim down the dataset, but maybe worth talking about a bit more first?  it's likely to be an influential dataset for both social science and ML/AI research communities alike - it'd be a great addition to the Dataverse trove.
19:30 pdurbin mmmm, data
19:50 andrew66 i just talked to my lead data engineer, looks like we may have over-estimated by including lower-quality data we'd intended to drop.  re-calculating total size now, but likely to be around 35-50% of the original estimate.  that should get us close to, maybe even under 1TB.  but i think safe to say we'll be under 1.5TB for the final dataset.
19:53 pdurbin cool
19:54 pdurbin Have you played with Dataverse yet? We have a demo site if you'd like to kick the tires: https://demo.dataverse.org
20:32 bjonnh joined #dataverse
21:39 pdurbin left #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.