IQSS logo

IRC log for #dvn, 2013-07-29

We've moved! Please join #dataverse instead. The new logs are at http://irclog.iq.harvard.edu/dataverse/today

| Channels | #dvn index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
14:44 iqlogbot joined #dvn
14:44 Topic for #dvn is now http://thedata.org - The Dataverse Network Project | logs at http://irclog.iq.harvard.edu/dvn/today
14:45 jwhitney pdurbin: yes, although we do want to allow multiple files: https://docs.google.com/file/d/0B8Zfl4GMgyejMlhFOUU5M0p4c3M/edit
14:51 jwhitney pdurbin: (these are just mockups: the file description form has some fields that should describe the study, instead)
14:51 jwhitney pdurbin: (they're a bit out of date)
14:58 pdurbin jwhitney: hmm, ok
14:58 pdurbin you're back!
14:59 pdurbin jwhitney: sorry, that was for iqlogbot :) ... logging is back http://irclog.iq.harvard.edu/dvn/2013-07-29
14:59 jwhitney pdurbin: :)
15:00 pdurbin jwhitney: I think it would be great if you played with the SWORD API as it stands right now. I can point you to the curl commands. It's still very rough but it'll give you an idea of its current state
15:01 pdurbin https://github.com/IQSS/dvn/tree/develop/tools/scripts/data-deposit-api contains all the scripts and I'm happy to walk you through them
15:03 jwhitney pdurbin: yep, ok.
15:03 pdurbin jwhitney: the biggest thing that's on my mind is... what will the binary file you send look like? You mentioned simple zip... After I receive the zip, I should unzip it and look for files inside and ingest them one by one? And also look for a metadata file in there?
15:04 jwhitney pdurbin: that's one approach: study metadata in the atom entry, then include file-level metadata in the zip
15:04 pdurbin right now my implementation takes whatever binary file is sent and attempts to ingest it. So if you send an Rdata file it will be ingested as Rdata. Same for a Stata file, I assume, but I haven't tried this yet.
15:05 pdurbin but it sounds like I should always expect a zip instead?
15:05 jwhitney pdurbin: I think so, yes: even if there is only one file, there may be associated metadata
15:06 jwhitney pdurbin: something like DSpace's simple archive format, maybe https://wiki.duraspace.org/display/DSDOC3x/Importing+and+Exporting+Items+via+Simple+Archive+Format#ImportingandExportingItemsviaSimpleArchiveFormat-ItemImporterandExporter
15:08 pdurbin jwhitney: ok. Alex also seemed interested in BagIt: http://en.wikipedia.org/wiki/BagIt
15:09 pdurbin (I've only barely heard of both of these formats.)
15:09 jwhitney pdurbin: bagit seems more straightforward
15:09 pdurbin straightforward is good :)
15:10 pdurbin jwhitney: do you think we should formally support BagIt? or just use it as a model for now?
15:10 jwhitney pdurbin: I've worked with the DSpace format, have only read about bagit.
15:13 pdurbin jwhitney: the way upload works in DVN now is that you can upload a single file and specify the format (Rdata vs. Stata vs. etc.)
15:13 pdurbin or you can upload a zip file that has a bunch of files in it
15:14 pdurbin which gets unzipped... and all the files get ingested
15:14 pdurbin so it's very simple
15:14 jwhitney pdurbin: Ok.
15:14 pdurbin that would probably be the easiest thing for me to support out of the gate
15:15 posixeleni joined #dvn
15:15 pdurbin which is called "simple zip" in the SWORD spec
15:15 pdurbin posixeleni: hi!
15:16 posixeleni hi folks! just wanted clarification on how OJS would handle the supplementary files that you send over to DVN
15:16 pdurbin jwhitney: are you saying you're familiar with METSDSpaceSIP? That's also in the SWORD spec as an example
15:16 posixeleni so if I understand it correctly: OJS will allow authors to deposit multiple files
15:17 posixeleni Then when it comes time to send it to DVN it is packaged into a simple zip file and sent via API?
15:17 pdurbin posixeleni: right on both accounts
15:17 posixeleni cool sorry to interrupt!
15:17 jwhitney posixeleni: not at all!
15:18 pdurbin posixeleni: sorry, iqlogbot was broken but jwhitney or I will paste the whole chat to a Google Doc when we're done
15:18 posixeleni joined #dvn
15:18 posixeleni thanks so much!
15:19 pdurbin I was saying that right now I'm just ingesting whatever binary file is sent ... but it sounds like I need to switch to expecting a zip file, which I will unzip ... and then ingest the files one by one
15:19 pdurbin jwhitney: right?
15:20 jwhitney pdurbin: yes, if OJS  needs to send along file-level metadata, which it seems like yes
15:20 jwhitney pdurbin: data type, at the very least.
15:21 pdurbin jwhitney: well, even if metadata it not necessary... right now you would have to send files one by one
15:21 pdurbin which we probably don't want
15:24 jwhitney what's typical? I think OJS has to allow the possibility of multiple files, but if most articles will only have a single file...
15:24 pdurbin posixeleni: it's quite common for studies to have multiple files, right?
15:25 posixeleni more common than not since they will have the dataset and then a different file for documentation explaining the dataset (readme)
15:25 jwhitney right, ok
15:26 pdurbin I feel like everything I've read and watched so far suggests that a zip gets sent across during a binary deposit in the SWORD protocol.
15:28 pdurbin It was easier for me to simply accept any file as-is (not zipped) and attach it to a study but again, I think I should change this... I should advertise via SWORD that I accept "simple zip" and then accept a zip file and unzip it
15:29 pdurbin jwhitney: ready for a quick walk through of curl commands?
15:29 jwhitney pdurbin: ok, sure
15:29 pdurbin great. the starting point is https://github.com/IQSS/dvn/tree/develop/tools/scripts/data-deposit-api
15:29 jwhitney pdurbin: yep, I have been walking through your scripts
15:30 pdurbin the create-study-deposit-data script is a wrapper around a bunch of shell scripts that call curl: https://github.com/IQSS/dvn/blob/develop/tools/scripts/data-deposit-api/create-study-deposit-data
15:31 pdurbin to explain each curl command, the wrapper script does the following:
15:31 pdurbin 1. retrieve service document using creditials for the journal dataverse in question
15:32 pdurbin 2. create a study based on an "atom entry" XML file
15:32 pdurbin 3. list studies (should be incremented by one each time)
15:32 pdurbin 4. add a file to the study that was just created
15:33 pdurbin 5. make sure error handling is working (sorry, I threw this in just for myself)
15:33 jwhitney pdurbin: ok, great.
15:33 pdurbin 6. retrieve the SWORD statement for the study
15:34 pdurbin I'm definitely not sure I'm implementing all this correctly, but it's a start :)
15:34 jwhitney :)
15:34 pdurbin you'll see "fakeIRI" and such in some places
15:34 pdurbin so it's a bit of a moving target
15:35 jwhitney pdurbin: just to make sure we're on the same page: you said, "... everything I've read and watched so far suggests that a zip gets sent across during a binary deposit in the SWORD protocol"
15:35 pdurbin and if you think I'd doing anything wrong spec-wise, please let me know! I want to make sure I'm implementing SWORD correctly
15:35 pdurbin I'm*
15:35 jwhitney pdurbin: do you feel that adding content in a zip to an existing resource is not quite in line w/ the spec?
15:35 jwhitney & same here!
15:36 jwhitney errr. want to make sure I'm sending content in a way that makes sense...
15:36 pdurbin jwhitney: it does feel strange... right adding the file to the study is a "replace" from a SWORD perspective
15:37 jwhitney pdurbin: that's true.
15:37 pdurbin because a PUT is a replace
15:38 pdurbin I tried to get clarification on this from the SWORD mailing list: [sword-app-tech] POST atom entry, then PUT media resource - http://www.mail-archive.com/sword-app-tech@lists.sourceforge.net/msg00331.html
15:38 pdurbin but so far it's just me writing to myself :(
15:38 pdurbin jwhitney: something for you to chew on is... how do we replace a file that has been added to a study?
15:39 pdurbin jwhitney: or... how do we replace 2 out of 5 files that have been added to a study?
15:44 pdurbin jwhitney: do you sent all 5 files over again via PUT and I do a "replace" on the DVN side?
15:44 pdurbin it gets interesting :)
15:44 jwhitney pdurbin: that's what I've been thinking through.. if I do that, can I provide information for you to know what's changed?
15:45 pdurbin jwhitney: you mean so you can send only the 2 changed files instead of all 5?
15:47 jwhitney pdurbin: no, I meant I'd send all content, but with enough metadata for you to know what's changed..
15:48 jwhitney pdurbin: but that's awkwad
15:48 jwhitney 'awkward'
15:48 pdurbin ah. sure. well... and md5sum for each of the 5 files would help in this case, right?
15:48 jwhitney yes
15:48 pdurbin jwhitney: in Boston we say "awkwad" ... and "hazadous" ;)
15:48 jwhitney jwhitney: literal lol.
15:49 jwhitney pdurbin: oops, self-referential comment.
15:50 pdurbin jwhitney: on thing I'm wondering is if you plan to persist into your database a unique identifier for each study and files the corresponds to the dataverse for a journal
15:51 jwhitney pdurbin: yes, we have to know if a study's been created for an article
15:52 pdurbin right. and studies have persistent, unique identifiers such as hdl:1902.1/12345
15:52 pdurbin jwhitney: but what about files? ... the best I could do is expose the database id of each file... and then we have to think about how studies can have multiple versions...
15:54 pdurbin jwhitney: anyway, for now we can focus on creating new stuff... but obviously I have a lot of questions about how stuff gets updated :)
15:55 posixeleni jwhitney: as do i!
15:56 pdurbin posixeleni: :) ... well I think this is on your agenda for the meeting in only a few hours :)
15:56 jwhitney sorry, minor interruption...
15:57 jwhitney & I'm concerned about storing metadata OJS-side that is updated DV-side.
15:58 jwhitney so after OJS has created studies, files in DV, I think it makes sense to store IDs and refresh as requested OJS-side.
15:59 pdurbin jwhitney: so... with the example of updating 2 of 5 files, what would happen?
16:01 pdurbin jwhitney: we can talk about it during the meeting if it's too much to type :)
16:03 jwhitney pdurbin: a fast think-through: author requests a view of files. OJS fetches study / file metadata from DV. Author edits study / file metadata. OJS PUTS new study metadata to EDIT-IRI. OJS PUTs new package of content to EM-IRI.
16:04 jwhitney pdurbin: glaring holes?
16:04 jwhitney pdurbin: assuming package includes enough information for the server to identify changed files.
16:08 pdurbin jwhitney: so would I give you a view of the files via a SWORD statement? ... I need to look at the spec some more
16:08 jwhitney pdurbin: like here? http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#protocoloperations_retrievingcontent_feed
16:09 pdurbin jwhitney: ah. so not a statement. yes, that looks right
16:12 pdurbin jwhitney: thanks :)
16:18 jwhitney pdurbin: you're welcome, although I meant the '?' -- a double-check on my understanding of the spec.
16:20 pdurbin jwhitney: sure. but I think you're right
17:39 pdurbin here's the entire conversation (including the earlier part not logged by iqlogbot) in a Google Doc: https://docs.google.com/document/d/1XbaVsDTML0RohCtY5WNJMpFcC6xYhCjyFKyKA-cQBVI/edit?usp=sharing
19:21 jwhitney joined #dvn
19:30 pdurbin jwhitney: good meeting. :) ... not sure if you saw but I did create that Google Doc with the rest of the chat from this morning
19:30 jwhitney pdurbin: no, but thanks

| Channels | #dvn index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

We've moved! Please join #dataverse instead. The new logs are at http://irclog.iq.harvard.edu/dataverse/today