Time
S
Nick
Message
00:25
* pdurbin
leaves a comment
01:02
bjonnh
pdurbin: if you have stuff to test, I'll probably start to use a dataverse locally in the lab
01:03
bjonnh
pdurbin: a feature I would enjoy would be the ability to transfer things from one dataverse to another
01:03
bjonnh
so I was thinking looking into that
01:36
pdurbin
bjonnh: if the things you want to transfer are files, some day you'll be able to use the Data Locality Module (DLM) that pameyer is leading the way on: https://github.com/IQSS/dataverse/issues/3403
01:37
pdurbin
if the things you want to transfer are sets of metadata about datasets, you can do that now using OAI-PMH: http://guides.dataverse.org/en/4.5.1/admin/index.html
01:45
bjonnh
I would like to transfer full sets
01:45
bjonnh
eventually full verses
02:26
pdurbin
ok, well, play around with it, I guess, and let us know how it goes
02:28
bjonnh
well I'm still new to dataverse
02:29
bjonnh
we are just using the harvard instance for now
02:48
pdurbin
bjonnh: oh! Anything published yet? I'm pretty sure we do help places migrate from Harvard's installation to their own.
02:49
bjonnh
not yet, it is in the pipeline
02:49
bjonnh
my university did think about having their own dataverse instance
02:49
pdurbin
jhand: a detailed email went out about your talk today. Much more info than I was able to find at http://www.iq.harvard.edu/event/tots-tip-4
02:49
bjonnh
but funding situation is ridiculous in Illinois
02:49
bjonnh
universities have not received money from the state for almost 2 years
02:50
pdurbin
woof
02:50
bjonnh
one university closed already
02:50
bjonnh
mine has a little exposure to state money but still
02:50
bjonnh
that has an impact
02:50
bjonnh
so what I was thinking is having a lab-internal dataverse
02:51
bjonnh
so we are sure we have everything in-house even not published/private/commercial stuff
02:51
bjonnh
and then migrate it to a public instance when it gets published/openned
02:51
bjonnh
that's my phase 1 plan
02:51
bjonnh
my phase 2 is to get the university to open an instance
02:51
bjonnh
:p
02:52
pdurbin
:)
02:52
pdurbin
makes sense. I think that's more or less the phase that telnoratti is in right now. researchers are kicking the tires behind a firewall. someday they hope to publish the data
02:53
pdurbin
it takes time to make sure Dataverse is the solution for you or not
02:53
bjonnh
well in our group the data sharing part is resolved
02:54
bjonnh
what do you mean "is the solution for you or not"
02:55
bjonnh
the group that did bring the discussion to the university first had a specific use case with several TB a month… in that case, I don't think dataverse is the best way
02:55
bjonnh
but if you have specific cases that are not adapted I would like to hear from it
03:08
pdurbin
well, we're trying to scale Dataverse up to handle larger datasets. that's what I was trying to say at https://groups.google.com/d/msg/dataverse-community/tDjFfGQ-f0Q/aQNX41sQAgAJ
03:11
bjonnh
in the case of our group, our datasets are rather small (hundreds MB worst scenario)
03:12
pdurbin
gotcha, should be fine
03:30
bjonnh
do we still have to double zip for multi-file/folder data?
12:11
pdurbin
bjonnh: let's back up and talk about reasons why one might double zip. I want to hear more about your multi-file/folder data
14:21
juliangautier joined #dataverse
14:39
bjonnh
pdurbin: we have NMR data
14:39
bjonnh
pdurbin: which is a "big" file with the data itself and a lot of little files with meta data/settings
14:40
bjonnh
pdurbin: everything has to be used together to even open the file in a software
14:40
bjonnh
pdurbin: there is no "standard" format for now that can cover everything in a single file
15:12
pdurbin
bjonnh: can you please link me to an example of what the data looks like?
15:15
bjonnh
yes
15:16
bjonnh
pdurbin: http://moldb.wishartlab.com/system/documents/files/000/028/480/original/L-Serine_NOESY1D.fid20121204-87231-1e8qmgb.zip?1354661546
15:16
bjonnh
that's one of the many formats
15:17
bjonnh
so we have a more open kind of format, JCAMPDX, but the problem is that sometimes the transformation is lossy
15:17
bjonnh
because of type conversion
15:17
bjonnh
and so on
15:17
bjonnh
also each vendor make the JCAMPDX files a bit differently
15:18
bjonnh
that's something we want to work on too, but not yet
15:51
pdurbin
bjonnh: how important is it to preserve the folder name ("D_L_SERINE_NOESY1D.FID"). What if all the files were just thrown into the "root" directory of the dataset and the folder name ("D_L_SERINE_NOESY1D.FID") wasn't recorded anywhere?
15:52
bjonnh
well problem is sometimes for a single paper we have 20 folders like that
15:52
bjonnh
pdurbin: also with this format they are all in the same folder
15:53
bjonnh
pdurbin: the main used format has a directory structure
15:54
pdurbin
bjonnh: are you saying that sometimes the zip file would contain more than one folder?
15:59
bjonnh
pdurbin: yes
15:59
bjonnh
there are many formats :(
16:00
bjonnh
let me try to find one
16:01
bjonnh
hmm
16:01
bjonnh
well any way
16:01
bjonnh
sometimes it is:
16:02
bjonnh
/fid (the main file)
16:02
bjonnh
/1/parameters
16:02
bjonnh
and there are some parameters files
16:02
bjonnh
around 5
16:03
pdurbin
bjonnh: ok. So you're right, the work around currently is to double zip the file before uploading to Dataverse. Does that make sense?
16:07
bjonnh
pdurbin: yep
16:07
bjonnh
that's what we figured out
16:07
bjonnh
maybe it should be said when uploading data
16:07
bjonnh
because the users that tried it in the lab spent some time trying many things
16:08
bjonnh
they tried to rename the zip file
16:08
bjonnh
etc
16:10
pdurbin
bjonnh: the users would have an easier time in the Dataverse web interface if they could upload their original zip and check a box saying "don't unzip this", right?
18:07
bjonnh
yes
18:07
bjonnh
totally
18:36
pdurbin
:)
18:36
pdurbin
bjonnh: in DVN 3.x there was a such a checkbox. I guess we should bring it back.
18:37
pdurbin
bjonnh: do you feel like creating an issue about this? https://github.com/IQSS/dataverse/issues
18:38
bjonnh
yep
18:38
bjonnh
writing it right now
18:41
bjonnh
pdurbin: https://github.com/IQSS/dataverse/issues/3439
18:41
bjonnh
maybe I could use this issue to start working on the DV code
18:48
pdurbin
bjonnh: thanks! Sure!