IRC log for #dataverse, 2017-11-14

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

Time	Nick	Message
00:05		djbrooke joined #dataverse
06:18		andrewSC joined #dataverse
07:57		jri joined #dataverse
08:00		jri joined #dataverse
12:53		rebecabarros joined #dataverse
12:57	rebecabarros	good morning =). pameyer when I try upload through GUI or API, I get code 404 and none file is created in $UPLOAD/requests
13:35	pdurbin	rebecabarros: good morning. Do you feel like opening an issue about SELinux at https://github.com/sbgrid/data-capture-module/issues ? The README should probably indicate if SELinux needs to be disabled.
13:41	rebecabarros	It should. I will open a new issue over there describing that.
13:43	pdurbin	Thanks!
13:56	pdurbin	rebecabarros: I think we should focus on continuing to hit "ur.py" with curl to get it working.
13:57	rebecabarros	pdurbin: what you mean?
13:58	pdurbin	I mean I think you should continue trying to get this script to work (it calls ur.py with curl): https://github.com/sbgrid/data-capture-module/blob/master/ansible/roles/dcm/files/root/scripts/dcm-test01.sh
13:59	pdurbin	Does that make sense?
14:09	rebecabarros	Yes. Like, now I'm getting 'status:ok' from running dcm-test01, which judging from ur.py suppose to mean that everything worked, right? But the json file it is not created.
14:09	pdurbin	"status ok" sounds like good news to me :)
14:11	pdurbin	I'm looking at https://github.com/sbgrid/data-capture-module/blob/master/api/ur.py again
14:11	pdurbin	"dump to unique file"
14:11	pdurbin	rebecabarros: you're saying a file isn't being created?
14:12	rebecabarros	pdurbin: yes.
14:12	pdurbin	Is the file supposed to be created in /deposit/requests/ ?
14:13	rebecabarros	I mentioned before, I've tried to curl http://$DCM_SERVER/up.py. Gives me back 'status:ok' and file is created in /requests.
14:13	pdurbin	oh! so the file is being created!
14:14	rebecabarros	But only if I do this directly. Running trough dcm-test, for instance, doesn't created the file.
14:14	pdurbin	huh
14:18	pdurbin	curl -H "Content-Type: application/json" -X POST -d "{\"datasetId\":\"42\", \"userId\":\"42\",\"datasetIdentifier\":\"42\"}" http://localhost/ur.py
14:18	pdurbin	rebecabarros: what happens if you run that curl command above from your DCM server?
14:19		donsizemore joined #dataverse
14:21	rebecabarros	pdurbin: 'status:ok' but no file created
14:21	pdurbin	hmm
14:23	pdurbin	but if you do `curl http://localhost/ur.py` a file is created?
14:25	rebecabarros	pdurbin: that's correct
14:26	pdurbin	What is the content of the file that's created?
14:27	rebecabarros	It's a empty json
14:29	pdurbin	rebecabarros: ok. Thanks. How are you feeling about all this? pameyer says he should have time to help later today. Over at http://irclog.iq.harvard.edu/dataverse/2017-11-09#i_59973 you and djbrooke talked about the roadmap for this rsync feature.
14:41		andrewSC joined #dataverse
14:45	pdurbin	donsizemore: mornin. Lots on interest in your Ansible playbook!
14:46	donsizemore	@pdurbin i see that. i wish i had lots of time to work on it! ;)
14:47	pdurbin	seems like a higher priority that rewriting the installer :)
14:47	pdurbin	Is there anything I can do to help? I don't really know Ansible.
14:49	rebecabarros	the flow as far as could understand is: Using upload in DVN should make some call to ur.py that would be response to create some json file in /request directory. Than this json file will be used by sr.py to allow the upload itself. I will wait for pameyer so he could explain what the json file in /requests has to look like.
14:52	rebecabarros	pdurbin: you mean, what I think about your perspectives for rsync feature? I'm excited that the plan is to allow both options to work side by side. That way Dataverse will be able to cover all possible scenarios with small and large files.
14:52	donsizemore	@pdurbin i think the root of his problems are a) ansible assumes a clean install, as dataverse's installation isn't idempotent. i can stick some semaphores in there to make the playbook idempotent, but it will likely lead to screwy glassfish states
14:53	pdurbin	rebecabarros: right. Except we don't call it "DVN" anymore. Now we call it "Dataverse". :) I mean, I think that's how it works. From the Dataverse perspective, Dataverse calls "ur.py" to make an "upload request" and then immediately calls "sr.py" for a "script request". sr.py returns a Bash script with rsync commands in it. Dataverse prsents this script to the user in the Dataverse GUI.
14:53	donsizemore	and b) i never coded it for Ubuntu/Debian. the Readme.md says CentOS 7 and means it
14:53	pdurbin	donsizemore: sorry, one sec
14:55	pdurbin	rebecabarros: let me try to be a little more clear about the current state of the rsync feature. The reason why it's documented in the Developer Guide rather than the Installation Guide is that this feature is highly experimental: http://guides.dataverse.org/en/4.8.2/developers/big-data-support.html
14:56	pdurbin	That is to say, I'm not surprised that the rsync feature doesn't "just work" for you because you are only the second person to try to get it working. The first to get it working is pameyer who is the author of the rsync (Data Capture Module) code.
14:58	pdurbin	rebecabarros: I'm extremely impressed by your tenacity, by how hard you are working on trying to get the rsync feature to work. But I'm wondering if you should write up your notes so far into an issue at https://github.com/IQSS/dataverse/issues (main Dataverse repo) and ask for more documentation (Installation Guide rather than Developer Guide).
14:59	pdurbin	This would (someday) mean that someone other than the author of the Data Capture Module would install it and independently verify that it's working as expected. It would go through QA, basically.
14:59	pdurbin	As part of the process the documentation would be improved.
15:00		djbrooke joined #dataverse
15:00	pdurbin	Making it easier for a customer like yourself to follow the documentation and have success setting up all the necessary components enable "big data support" (rsync).
15:01	pdurbin	Does that make sense?
15:01	pdurbin	I don't mean to discourage you from continuing to try if that's how you'd like to spend your time.
15:01	pdurbin	I think you have a lot to contribute in terms of opening issues to explain the problems you've had.
15:01	pdurbin	Once we know what the problems are, we can fix them or document workarounds.
15:02	pdurbin	I hope this is making sense. I think I'm done. :)
15:02	pdurbin	rebecabarros: what do you think?
15:19	pdurbin	djbrooke: mornin. I'm sort of trying to talk rebecabarros out of trying to get a Data Capture Module working until we've put it through QA. We only tested the mock DCM.
15:20	djbrooke	I'd defer that question to pameyer who said he would be on later today
15:21	djbrooke	and mornin
15:21		donsizemore joined #dataverse
15:22	pdurbin	That's fine. Without more documentation, the Data Capture Module is obviously very difficult to support.
15:23	rebecabarros	pdurbin: Don't worry. I understand that is still a experimental feature and I really appreciate how you guys are accessible and helpful at any time. And I agree with you, I was already thinking about summarize in a doc how everything went so far and the problems that I've faced with the propose of help you to know how improve documentation and stuff.
15:23	rebecabarros	The reason why I "insist" in try to get this done is because we really want to use Dataverse but we really going to need to support large files, it's our main scenario. Meanwhile I'm already thinking about options, so, for instance, I'm about to test how Dataverse will behave if I split a 100gb zip file and upload 10 small ones with 10gb. Although this would not be ideal.
15:24	rebecabarros	But I do understand your concerns and I understand that this takes time and that you have a lot of other features to worry about right now.
15:26	pdurbin	rebecabarros: you and pameyer have the same needs. His primary use case is large files, which is why he help us develop this new experimental feature. Someone like you coming along to try to get the feature working is exactly what I wanted. I'm just frustrated that I can't help more. I don't know enough about how the DCM code works.
15:35	pdurbin	rebecabarros: I see "big data" at https://www.bahia.fiocruz.br/cidacs/ when I run that page through Google Translate. :)
16:04	pdurbin	or even when I don't :)
16:04	pdurbin	'grandes bases de dados (“big data”)'
16:12	rebecabarros	pdurbin: Again, don't worry. You've being really helpful for me since the beginning, answered me all sort of questions and was always patient :) haha. I appreciate that. I wish I have more programming skills to help you guys out on development side of things, but I do not, so...
16:12	rebecabarros	pdurbin: yes, that's us!
16:13	pdurbin	rebecabarros: you are helping a lot by testing things. It's extremely valuable.
16:13	pdurbin	"We welcome contributions of ideas, bug reports, usability research/feedback, documentation, code, and more!" https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md
16:15	pdurbin	rebecabarros: did you say you might summarize in a doc? What kind of doc? A Google doc? An attachment on a GitHub issue?
16:15		djbrooke joined #dataverse
16:15	pdurbin	A Google doc might be nice if you enable comments.
16:21	rebecabarros	What do you think that is the best way?
16:23	pdurbin	If you don't mind creating a Google Doc, I think that would be best.
16:28	rebecabarros	Ok then. I will do that and I send the link later.
17:04		djbrooke joined #dataverse
17:04		djbrooke joined #dataverse
17:11		Thalia_UM joined #dataverse
17:11	Thalia_UM	Good morning! :)
17:13	pdurbin	hi Thalia_UM. Good morning. :)
17:13	Thalia_UM	A question philip
17:15	Thalia_UM	I want to consult some open web services, as I can do it already installed dataverse, for example, modifying some XHTML file or something similar. They told me that they will investigate to implement JSON and AJAX to consult web services.
17:16	Thalia_UM	I don't know how to implement it so that through the interface when I create a dataset, check out those web services.
17:17	pdurbin	What do the web services do?
17:17	Thalia_UM	any ideas
17:18	pdurbin	Can these web services be used by any installation of Dataverse?
17:18	Thalia_UM	it is only to consult names of people, institutions, data type (xml, pdf, docx, etc)
17:18	Thalia_UM	http://catalogs.repositorionacionalcti.mx/webresources/idioma/0/2
17:19	Thalia_UM	For example like that
17:19	pdurbin	What are some example user stories?
17:19	Thalia_UM	That is my question
17:20	Thalia_UM	That link is about language
17:20	pdurbin	"As a user, I want to create a dataset and pick from a list of authors." ... Something like that?
17:20	Thalia_UM	Yes
17:20	Thalia_UM	Like that
17:21	pdurbin	Are there any other user stories?
17:24	pdurbin	"As a user, I want to..."
17:28	Thalia_UM	I don't understand what does mean user stories ?
17:29	Thalia_UM	are five web services that we are going to consult but we want that be dynamic with dataverse
17:31	pdurbin	A user story begins with "As a user, I want to..."
17:33	pdurbin	Thalia_UM: can you please create an issue for the first user story we just talked about? At https://github.com/IQSS/dataverse/issues
17:35	Thalia_UM	Oooh
17:35	Thalia_UM	yes
17:35	Thalia_UM	Sure
17:35	pdurbin	Thanks!
17:44	Thalia_UM	https://github.com/IQSS/dataverse/issues/4282
17:45	Thalia_UM	Do you have any idea how I can do that?
17:49	pdurbin	Thalia_UM: please see the comment I just left. Thanks for opening an issue!
17:50	pdurbin	djbrooke: Thalia_UM could probably use some help breaking her ideas down into user stories
17:52	Thalia_UM	djbrooke?
17:53		jri joined #dataverse
17:57	djbrooke	Hey Thalia_UM - Mike Cohn has written a few books about user stories and is my go-to source. A short read is here: https://www.mountaingoatsoftware.com/agile/user-stories
17:58		jgautier joined #dataverse
17:58	djbrooke	When we develop a feature or new capability, we want to recognize the user's goal in their words (in a consistent format)
17:59	djbrooke	This helps us as we develop because we can always point back to user's desired outcome, and it gives some flexibility about how we implement a solution to that outcome
17:59		dataverse-user joined #dataverse
18:00	djbrooke	So, for the example in 4282: As a user, I want to create a dataset and pick from a list of authors or language or type of publication, etc.
18:01	djbrooke	It's good! The only thing missing is the end piece - the "why" - what value would this provide to you or your user community?
18:05	Thalia_UM	We have to consult web services and then add to the "Add Dataset" form so that when consulting the web services "GET" the fields are filled with the content of the web services.
18:16		djbrooke joined #dataverse
18:17		djbrooke joined #dataverse
18:23	Thalia_UM	Another one of my questions is if I can do this but without modifying the dataverse code, without having to uninstall it.
18:38		djbrooke joined #dataverse
19:00		djbrooke joined #dataverse
19:02		djbrooke joined #dataverse
19:33		djbrooke joined #dataverse
19:50		djbrooke joined #dataverse
20:02		djbrooke joined #dataverse
21:01		djbrooke joined #dataverse
21:07		djbrooke joined #dataverse
21:29		Thalia_UM left #dataverse
21:39		djbrooke joined #dataverse
21:52		djbrooke joined #dataverse
21:54		pameyer joined #dataverse
22:08		jri joined #dataverse
22:10	pameyer	rebecaarros: the information flow should roughly be: request to ur.py (from curl, test script or Dataverse) -> JSON file in /deposit/requests -> rq worker reads JSON file, creates transfer account and script, moves JSON file to /deposit/processed (and renames JSON file from PID to dataset_id); request to sr.py returns script (or 404 if the script is not generated)
22:14	pameyer	"status:ok" from ur.py should only be returned if the request has been processed by the request queue
22:15	pameyer	^ typo'd ; "status:ok" is upstream of the request queue
22:21	pameyer	if there's an empty JSON file resulting from calls to ur.py, then this is probably because the parameters aren't being passed correctly
22:25	pameyer	should be JSON encoded text in the POST body
22:25		djbrooke joined #dataverse
22:51		djbrooke joined #dataverse
22:58		pameyer joined #dataverse
23:01		djbrooke joined #dataverse
23:02		dataverse-user joined #dataverse
23:05		djbrooke joined #dataverse
23:11		djbrooke joined #dataverse
23:14		djbrooke joined #dataverse
23:15		djbrooke_ joined #dataverse
23:16		djbrooke joined #dataverse

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.