IQSS logo

IRC log for #dataverse, 2016-10-30

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
00:25 * pdurbin leaves a comment
01:02 bjonnh pdurbin: if you have stuff to test, I'll probably start to use a dataverse locally in the lab
01:03 bjonnh pdurbin: a feature I would enjoy would be the ability to transfer things from one dataverse to another
01:03 bjonnh so I was thinking looking into that
01:36 pdurbin bjonnh: if the things you want to transfer are files, some day you'll be able to use the Data Locality Module (DLM) that pameyer is leading the way on: https://github.com/IQSS/dataverse/issues/3403
01:37 pdurbin if the things you want to transfer are sets of metadata about datasets, you can do that now using OAI-PMH: http://guides.dataverse.org/en/4.5.1/admin/index.html
01:45 bjonnh I would like to transfer full sets
01:45 bjonnh eventually full verses
02:26 pdurbin ok, well, play around with it, I guess, and let us know how it goes
02:28 bjonnh well I'm still new to dataverse
02:29 bjonnh we are just using the harvard instance for now
02:48 pdurbin bjonnh: oh! Anything published yet? I'm pretty sure we do help places migrate from Harvard's installation to their own.
02:49 bjonnh not yet, it is in the pipeline
02:49 bjonnh my university did think about having their own dataverse instance
02:49 pdurbin jhand: a detailed email went out about your talk today. Much more info than I was able to find at http://www.iq.harvard.edu/event/tots-tip-4
02:49 bjonnh but funding situation is ridiculous in Illinois
02:49 bjonnh universities have not received money from the state for almost 2 years
02:50 pdurbin woof
02:50 bjonnh one university closed already
02:50 bjonnh mine has a little exposure to state money but still
02:50 bjonnh that has an impact
02:50 bjonnh so what I was thinking is having a lab-internal dataverse
02:51 bjonnh so we are sure we have everything in-house even not published/private/commercial stuff
02:51 bjonnh and then migrate it to a public instance when it gets published/openned
02:51 bjonnh that's my phase 1 plan
02:51 bjonnh my phase 2 is to get the university to open an instance
02:51 bjonnh :p
02:52 pdurbin :)
02:52 pdurbin makes sense. I think that's more or less the phase that telnoratti is in right now. researchers are kicking the tires behind a firewall. someday they hope to publish the data
02:53 pdurbin it takes time to make sure Dataverse is the solution for you or not
02:53 bjonnh well in our group the data sharing part is resolved
02:54 bjonnh what do you mean "is the solution for you or not"
02:55 bjonnh the group that did bring the discussion to the university first had a specific use case with several TB a month… in that case, I don't think dataverse is the best way
02:55 bjonnh but if you have specific cases that are not adapted I would like to hear from it
03:08 pdurbin well, we're trying to scale Dataverse up to handle larger datasets. that's what I was trying to say at https://groups.google.com/d/msg/dataverse-community/tDjFfGQ-f0Q/aQNX41sQAgAJ
03:11 bjonnh in the case of our group, our datasets are rather small (hundreds MB worst scenario)
03:12 pdurbin gotcha, should be fine
03:30 bjonnh do we still have to double zip for multi-file/folder data?
12:11 pdurbin bjonnh: let's back up and talk about reasons why one might double zip. I want to hear more about your multi-file/folder data
14:21 juliangautier joined #dataverse
14:39 bjonnh pdurbin: we have NMR data
14:39 bjonnh pdurbin: which is a "big" file with the data itself and a lot of little files with meta data/settings
14:40 bjonnh pdurbin: everything has to be used together to even open the file in a software
14:40 bjonnh pdurbin: there is no "standard" format for now that can cover everything in a single file
15:12 pdurbin bjonnh: can you please link me to an example of what the data looks like?
15:15 bjonnh yes
15:16 bjonnh pdurbin: http://moldb.wishartlab.com/system/documents/files/000/028/480/original/L-Serine_NOESY1D.fid20121204-87231-1e8qmgb.zip?1354661546
15:16 bjonnh that's one of the many formats
15:17 bjonnh so we have a more open kind of format, JCAMPDX, but the problem is that sometimes the transformation is lossy
15:17 bjonnh because of type conversion
15:17 bjonnh and so on
15:17 bjonnh also each vendor make the JCAMPDX files a bit differently
15:18 bjonnh that's something we want to work on too, but not yet
15:51 pdurbin bjonnh: how important is it to preserve the folder name ("D_L_SERINE_NOESY1D.FID"). What if all the files were just thrown into the "root" directory of the dataset and the folder name ("D_L_SERINE_NOESY1D.FID") wasn't recorded anywhere?
15:52 bjonnh well problem is sometimes for a single paper we have 20 folders like that
15:52 bjonnh pdurbin: also with this format they are all in the same folder
15:53 bjonnh pdurbin: the main used format has a directory structure
15:54 pdurbin bjonnh: are you saying that sometimes the zip file would contain more than one folder?
15:59 bjonnh pdurbin: yes
15:59 bjonnh there are many formats :(
16:00 bjonnh let me try to find one
16:01 bjonnh hmm
16:01 bjonnh well any way
16:01 bjonnh sometimes it is:
16:02 bjonnh /fid (the main file)
16:02 bjonnh /1/parameters
16:02 bjonnh and there are some parameters files
16:02 bjonnh around 5
16:03 pdurbin bjonnh: ok. So you're right, the work around currently is to double zip the file before uploading to Dataverse. Does that make sense?
16:07 bjonnh pdurbin: yep
16:07 bjonnh that's what we figured out
16:07 bjonnh maybe it should be said when uploading data
16:07 bjonnh because the users that tried it in the lab spent some time trying many things
16:08 bjonnh they tried to rename the zip file
16:08 bjonnh etc
16:10 pdurbin bjonnh: the users would have an easier time in the Dataverse web interface if they could upload their original zip and check a box saying "don't unzip this", right?
18:07 bjonnh yes
18:07 bjonnh totally
18:36 pdurbin :)
18:36 pdurbin bjonnh: in DVN 3.x there was a such a checkbox. I guess we should bring it back.
18:37 pdurbin bjonnh: do you feel like creating an issue about this? https://github.com/IQSS/dataverse/issues
18:38 bjonnh yep
18:38 bjonnh writing it right now
18:41 bjonnh pdurbin: https://github.com/IQSS/dataverse/issues/3439
18:41 bjonnh maybe I could use this issue to start working on the DV code
18:48 pdurbin bjonnh: thanks! Sure!

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.