IQSS logo

IRC log for #dataverse, 2017-11-14

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
00:05 djbrooke joined #dataverse
06:18 andrewSC joined #dataverse
07:57 jri joined #dataverse
08:00 jri joined #dataverse
12:53 rebecabarros joined #dataverse
12:57 rebecabarros good morning =). pameyer when I try upload through GUI or API, I get code 404 and none file is created in $UPLOAD/requests
13:35 pdurbin rebecabarros: good morning. Do you feel like opening an issue about SELinux at https://github.com/sbgrid/data-capture-module/issues ? The README should probably indicate if SELinux needs to be disabled.
13:41 rebecabarros It should. I will open a new issue over there describing that.
13:43 pdurbin Thanks!
13:56 pdurbin rebecabarros: I *think* we should focus on continuing to hit "ur.py" with curl to get it working.
13:57 rebecabarros pdurbin: what you mean?
13:58 pdurbin I mean I think you should continue trying to get this script to work (it calls ur.py with curl): https://github.com/sbgrid/data-capture-module/blob/master/ansible/roles/dcm/files/root/scripts/dcm-test01.sh
13:59 pdurbin Does that make sense?
14:09 rebecabarros Yes. Like, now I'm getting 'status:ok' from running dcm-test01, which judging from ur.py suppose to mean that everything worked, right? But the json file it is not created.
14:09 pdurbin "status ok" sounds like good news to me :)
14:11 pdurbin I'm looking at https://github.com/sbgrid/data-capture-module/blob/master/api/ur.py again
14:11 pdurbin "dump to unique file"
14:11 pdurbin rebecabarros: you're saying a file isn't being created?
14:12 rebecabarros pdurbin: yes.
14:12 pdurbin Is the file supposed to be created in /deposit/requests/ ?
14:13 rebecabarros I mentioned before, I've tried to curl http://$DCM_SERVER/up.py. Gives me back 'status:ok' and file is created in /requests.
14:13 pdurbin oh! so the file is being created!
14:14 rebecabarros But only if I do this directly. Running trough dcm-test, for instance, doesn't created the file.
14:14 pdurbin huh
14:18 pdurbin curl -H "Content-Type: application/json" -X POST -d "{\"datasetId\":\"42\", \"userId\":\"42\",\"datasetIdentifier\":\"42\"}" http://localhost/ur.py
14:18 pdurbin rebecabarros: what happens if you run that curl command above from your DCM server?
14:19 donsizemore joined #dataverse
14:21 rebecabarros pdurbin: 'status:ok' but no file created
14:21 pdurbin hmm
14:23 pdurbin but if you do `curl http://localhost/ur.py` a file is created?
14:25 rebecabarros pdurbin: that's correct
14:26 pdurbin What is the content of the file that's created?
14:27 rebecabarros It's a empty json
14:29 pdurbin rebecabarros: ok. Thanks. How are you feeling about all this? pameyer says he should have time to help later today. Over at http://irclog.iq.harvard.edu/dataverse/2017-11-09#i_59973 you and djbrooke talked about the roadmap for this rsync feature.
14:41 andrewSC joined #dataverse
14:45 pdurbin donsizemore: mornin. Lots on interest in your Ansible playbook!
14:46 donsizemore @pdurbin i see that. i wish i had lots of time to work on it! ;)
14:47 pdurbin seems like a higher priority that rewriting the installer :)
14:47 pdurbin Is there anything I can do to help? I don't really know Ansible.
14:49 rebecabarros the flow as far as could understand is: Using upload in DVN should make some call to ur.py that would be response to create some json file in /request directory. Than this json file will be used by sr.py to allow the upload itself. I will wait for pameyer so he could explain what the json file in /requests has to look like.
14:52 rebecabarros pdurbin: you mean, what I think about your perspectives for rsync feature? I'm excited that the plan is to allow both options to work side by side. That way Dataverse will be able to cover all possible scenarios with small and large files.
14:52 donsizemore @pdurbin i think the root of his problems are a) ansible assumes a clean install, as dataverse's installation isn't idempotent. i can stick some semaphores in there to make the playbook idempotent, but it will likely lead to screwy glassfish states
14:53 pdurbin rebecabarros: right. Except we don't call it "DVN" anymore. Now we call it "Dataverse". :) I mean, I think that's how it works. From the Dataverse perspective, Dataverse calls "ur.py" to make an "upload request" and then immediately calls "sr.py" for a "script request". sr.py returns a Bash script with rsync commands in it. Dataverse prsents this script to the user in the Dataverse GUI.
14:53 donsizemore and b) i never coded it for Ubuntu/Debian. the Readme.md says CentOS 7 and means it
14:53 pdurbin donsizemore: sorry, one sec
14:55 pdurbin rebecabarros: let me try to be a little more clear about the current state of the rsync feature. The reason why it's documented in the Developer Guide rather than the Installation Guide is that this feature is highly experimental: http://guides.dataverse.org/en/4.8.2/developers/big-data-support.html
14:56 pdurbin That is to say, I'm not surprised that the rsync feature doesn't "just work" for you because you are only the second person to try to get it working. The first to get it working is pameyer who is the author of the rsync (Data Capture Module) code.
14:58 pdurbin rebecabarros: I'm extremely impressed by your tenacity, by how hard you are working on trying to get the rsync feature to work. But I'm wondering if you should write up your notes so far into an issue at https://github.com/IQSS/dataverse/issues (main Dataverse repo) and ask for more documentation (Installation Guide rather than Developer Guide).
14:59 pdurbin This would (someday) mean that someone other than the author of the Data Capture Module would install it and independently verify that it's working as expected. It would go through QA, basically.
14:59 pdurbin As part of the process the documentation would be improved.
15:00 djbrooke joined #dataverse
15:00 pdurbin Making it easier for a customer like yourself to follow the documentation and have success setting up all the necessary components enable "big data support" (rsync).
15:01 pdurbin Does that make sense?
15:01 pdurbin I don't mean to discourage you from continuing to try if that's how you'd like to spend your time.
15:01 pdurbin I think you have a lot to contribute in terms of opening issues to explain the problems you've had.
15:01 pdurbin Once we know what the problems are, we can fix them or document workarounds.
15:02 pdurbin I hope this is making sense. I think I'm done. :)
15:02 pdurbin rebecabarros: what do you think?
15:19 pdurbin djbrooke: mornin. I'm sort of trying to talk rebecabarros out of trying to get a Data Capture Module working until we've put it through QA. We only tested the mock DCM.
15:20 djbrooke I'd defer that question to pameyer who said he would be on later today
15:21 djbrooke and mornin
15:21 donsizemore joined #dataverse
15:22 pdurbin That's fine. Without more documentation, the Data Capture Module is obviously very difficult to support.
15:23 rebecabarros pdurbin: Don't worry. I understand that is still a experimental feature and I really appreciate how you guys are accessible and helpful at any time. And I agree with you, I was already thinking about summarize in a doc how everything went so far and the problems that I've faced with the propose of help you to know how improve documentation and stuff.
15:23 rebecabarros The reason why I "insist" in try to get this done is because we really want to use Dataverse but we really going to need to support large files, it's our main scenario. Meanwhile I'm already thinking about options, so, for instance, I'm about to test how Dataverse will behave if I split a 100gb zip file and upload 10 small ones with 10gb. Although this would not be ideal.
15:24 rebecabarros But I do understand your concerns and I understand that this takes time and that you have a lot of other features to worry about right now.
15:26 pdurbin rebecabarros: you and pameyer have the same needs. His primary use case is large files, which is why he help us develop this new experimental feature. Someone like you coming along to try to get the feature working is exactly what I wanted. I'm just frustrated that I can't help more. I don't know enough about how the DCM code works.
15:35 pdurbin rebecabarros: I see "big data" at https://www.bahia.fiocruz.br/cidacs/ when I run that page through Google Translate. :)
16:04 pdurbin or even when I don't :)
16:04 pdurbin 'grandes bases de dados (“big data”)'
16:12 rebecabarros pdurbin: Again, don't worry. You've being really helpful for me since the beginning, answered me all sort of questions and was always patient :) haha. I appreciate that. I wish I have more programming skills to help you guys out on development side of things, but I do not, so...
16:12 rebecabarros pdurbin: yes, that's us!
16:13 pdurbin rebecabarros: you are helping a lot by testing things. It's extremely valuable.
16:13 pdurbin "We welcome contributions of ideas, bug reports, usability research/feedback, documentation, code, and more!" https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md
16:15 pdurbin rebecabarros: did you say you might summarize in a doc? What kind of doc? A Google doc? An attachment on a GitHub issue?
16:15 djbrooke joined #dataverse
16:15 pdurbin A Google doc might be nice if you enable comments.
16:21 rebecabarros What do you think that is the best way?
16:23 pdurbin If you don't mind creating a Google Doc, I think that would be best.
16:28 rebecabarros Ok then. I will do that and I send the link later.
17:04 djbrooke joined #dataverse
17:04 djbrooke joined #dataverse
17:11 Thalia_UM joined #dataverse
17:11 Thalia_UM Good morning! :)
17:13 pdurbin hi Thalia_UM. Good morning. :)
17:13 Thalia_UM A question philip
17:15 Thalia_UM I want to consult some open web services, as I can do it already installed dataverse, for example, modifying some XHTML file or something similar. They told me that they will investigate to implement JSON and AJAX to consult web services.
17:16 Thalia_UM I don't know how to implement it so that through the interface when I create a dataset, check out those web services.
17:17 pdurbin What do the web services do?
17:17 Thalia_UM any ideas
17:18 pdurbin Can these web services be used by any installation of Dataverse?
17:18 Thalia_UM it is only to consult names of people, institutions, data type (xml, pdf, docx, etc)
17:18 Thalia_UM http://catalogs.repositorionacionalcti.mx/webresources/idioma/0/2
17:19 Thalia_UM For example like that
17:19 pdurbin What are some example user stories?
17:19 Thalia_UM That is my question
17:20 Thalia_UM That link is about language
17:20 pdurbin "As a user, I want to create a dataset and pick from a list of authors." ... Something like that?
17:20 Thalia_UM Yes
17:20 Thalia_UM Like that
17:21 pdurbin Are there any other user stories?
17:24 pdurbin "As a user, I want to..."
17:28 Thalia_UM I don't understand what does mean user stories ?
17:29 Thalia_UM are five web services that we are going to consult but we want that be dynamic with dataverse
17:31 pdurbin A user story begins with "As a user, I want to..."
17:33 pdurbin Thalia_UM: can you please create an issue for the first user story we just talked about? At https://github.com/IQSS/dataverse/issues
17:35 Thalia_UM Oooh
17:35 Thalia_UM yes
17:35 Thalia_UM Sure
17:35 pdurbin Thanks!
17:44 Thalia_UM https://github.com/IQSS/dataverse/issues/4282
17:45 Thalia_UM Do you have any idea how I can do that?
17:49 pdurbin Thalia_UM: please see the comment I just left. Thanks for opening an issue!
17:50 pdurbin djbrooke: Thalia_UM could probably use some help breaking her ideas down into user stories
17:52 Thalia_UM djbrooke?
17:53 jri joined #dataverse
17:57 djbrooke Hey Thalia_UM - Mike Cohn has written a few books about user stories and is my go-to source. A short read is here: https://www.mountaingoatsoftware.com/agile/user-stories
17:58 jgautier joined #dataverse
17:58 djbrooke When we develop a feature or new capability, we want to recognize the user's goal in their words (in a consistent format)
17:59 djbrooke This helps us as we develop because we can always point back to user's desired outcome, and it gives some flexibility about how we implement a solution to that outcome
17:59 dataverse-user joined #dataverse
18:00 djbrooke So, for the example in 4282: As a user, I want to create a dataset and pick from a list of authors or language or type of publication, etc.
18:01 djbrooke It's good! The only thing missing is the end piece - the "why" - what value would this provide to you or your user community?
18:05 Thalia_UM We have to consult web services and then add to the "Add Dataset" form so that when consulting the web services "GET" the fields are filled with the content of the web services.
18:16 djbrooke joined #dataverse
18:17 djbrooke joined #dataverse
18:23 Thalia_UM Another one of my questions is if I can do this but without modifying the dataverse code, without having to uninstall it.
18:38 djbrooke joined #dataverse
19:00 djbrooke joined #dataverse
19:02 djbrooke joined #dataverse
19:33 djbrooke joined #dataverse
19:50 djbrooke joined #dataverse
20:02 djbrooke joined #dataverse
21:01 djbrooke joined #dataverse
21:07 djbrooke joined #dataverse
21:29 Thalia_UM left #dataverse
21:39 djbrooke joined #dataverse
21:52 djbrooke joined #dataverse
21:54 pameyer joined #dataverse
22:08 jri joined #dataverse
22:10 pameyer rebecaarros: the information flow should roughly be: request to ur.py (from curl, test script or Dataverse) -> JSON file in /deposit/requests -> rq worker reads JSON file, creates transfer account and script, moves JSON file to /deposit/processed (and renames JSON file from PID to dataset_id); request to sr.py returns script (or 404 if the script is not generated)
22:14 pameyer "status:ok" from ur.py should only be returned if the request has been processed by the request queue
22:15 pameyer ^ typo'd ; "status:ok" is upstream of the request queue
22:21 pameyer if there's an empty JSON file resulting from calls to ur.py, then this is probably because the parameters aren't being passed correctly
22:25 pameyer should be JSON encoded text in the POST body
22:25 djbrooke joined #dataverse
22:51 djbrooke joined #dataverse
22:58 pameyer joined #dataverse
23:01 djbrooke joined #dataverse
23:02 dataverse-user joined #dataverse
23:05 djbrooke joined #dataverse
23:11 djbrooke joined #dataverse
23:14 djbrooke joined #dataverse
23:15 djbrooke_ joined #dataverse
23:16 djbrooke joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.