Time
S
Nick
Message
02:53
andrewSC joined #dataverse
03:11
axfelix joined #dataverse
03:14
dzho joined #dataverse
04:22
sivoais
pdurbin: sure, I wouldn't mind being added to the spreadsheet. But I haven't done anything with Dataverse yet --- not even started on implementing the SWORD protocol for Perl like you mentioned once ;-)
06:00
jri joined #dataverse
06:16
iqlogbot joined #dataverse
06:16
Topic for #dataverse is now Dataverse is open source research data repository software: http://dataverse.org | IRC Logs: http://irclog.iq.harvard.edu/dataverse/today | Who's who: https://docs.google.com/spreadsheets/d/16h3jv24usMGq18495C-JA-yNcQCKiKDa65MTraNDd7k/edit?usp=sharing
08:13
jri joined #dataverse
08:24
jri joined #dataverse
09:45
bjonnh joined #dataverse
11:10
donsizemore joined #dataverse
12:45
rebecabarros joined #dataverse
14:01
rebecabarros
pdurbin1, pameyer: I'm running out of ideas, guys. I saw that '500 error status' could be related with folder permissions, so I set full access in my dcm folder. Again, nothing.
14:04
pdurbin1
rebecabarros: I think you should start hacking on ur.py itself
14:04
pdurbin
does that make sense?
14:05
pdurbin
or copy ur.py to hello.py and make it reply "hello world" when you hit it with curl
14:05
pdurbin
Maybe this is a terrible idea but it's what I'd do. :)
14:07
rebecabarros
It's an idea. Let me try that and see what happens. Thanks :)
14:09
donsizemore joined #dataverse
14:09
pdurbin
rebecabarros: sure. Part of why I'm suggesting this is that pameyer mentioned yesterday that perhaps CGI scripts can't be executed.
14:10
pdurbin
donsizemore: mornin'. You're a machine. Should we point you at a project other than slogging through a rewrite of the Perl installer into Python? Is there any other issue you're excited about?
14:11
donsizemore
@pdurbin i'm excited about the translation stuff but the internationalization part will be more of a journey (and possibly part of a grant here)
14:11
pdurbin
sivoais: I totally forgot about that. Added! What do you want your "org" to be?
14:11
pdurbin
mmm, I can smell the grant money already
14:12
pdurbin
the smell of sustainability
14:12
pdurbin
donsizemore: your skills are probably wasted writing docs, right?
14:13
donsizemore
@pdurbin i do like writing documentation. the syntax is much more forgiving than code
14:14
pdurbin
yeah
14:14
donsizemore
@pdurbin what in particular needs documenting?
14:14
pdurbin
I wonder if I should change "Help Wanted: Code" to "Help Wanted: Java" and "Help Wanted: Python" and Javascript and Perl.
14:15
pdurbin
Well, the API Guide is on my mind because of this thread that got kicked off yesterday: https://groups.google.com/d/msg/dataverse-community/4XsA0Px2H8Q/-iHLF-osDAAJ
14:15
pdurbin
I'm sure a lot of people try using the API and give up. Not that I have any data on this.
14:17
donsizemore
we've used the API for a few things but wound up doing most of the prep work in the GUI
14:17
pdurbin
prep work?
14:17
pdurbin
Here's a list of where documentation is lacking: https://github.com/IQSS/dataverse/labels/Help%20Wanted%3A%20Documentation
14:17
donsizemore
pre-existing database, API token, dataset creation
14:18
pdurbin
You take datasets from an old database and copy and paste the metadata into Dataverse using the GUI ?
14:22
pdurbin
If so, I don't blame you. Constucting an equivalent JSON document is a pain. I mention this is that thread on theh Google Group.
14:48
rebecabarros
pdurbin: you're right. it's something related with ur.py file itself. When I've changed to an 'Hello World' example, I've got no errors from dcm-test01.sh. Now I have to found out what
14:54
donsizemore
@pdurbin it was for the publishing app we wrote for DE, which expects an existing account, dataverse, and dataset
15:03
pdurbin
rebecabarros: sounds like progress. Do you need any more suggestions at this point?
15:04
pdurbin
donsizemore: ok. And it sounds like it was a partially manual effort, especially populating the metadata fields.
15:04
donsizemore
@pdurbin correct, but we're interested in Dataverse for a couple grants we're on, and would love to make more use of the API
15:05
pdurbin
sounds like a good opportunity to help with the API Guide, if you're interested :)
15:06
pdurbin
Maybe we should start with a user story.
15:35
Thalia_UM joined #dataverse
16:01
Thalia_UM
Good morning! :)
16:37
pdurbin
mornin
17:01
rebecabarros
pdurbin: I do. I'm still a little lost about the whole flow of DCM and how everything connects with everything. Do you have any more suggestion regarding ur.py? I've checked if the diretory pointed out in the file exists and if one has permission to write and yes. I've checked if Redis is installed and up and running, and yes it is.
17:08
sivoais
pdurbin: hmm, maybe put down Project Renard? That's the closest to what Dataverse is doing (nevermind the fact that I started it... :-P)
17:10
pdurbin
DCM uses Redis? Huh.
17:11
pdurbin
sivoais: fixed. Thanks.
17:36
jri joined #dataverse
18:00
Thalia_UM joined #dataverse
18:10
rebecabarros
pdurbin: do you know from where this dump is supposed to come? https://github.com/sbgrid/data-capture-module/blob/master/api/ur.py
18:12
pdurbin
It says "dump to unique file"
18:13
pdurbin
rebecabarros: I'm going to guess something like "/deposit/requests/1234.json" from a quick look at the code.
18:18
axfelix joined #dataverse
18:19
rebecabarros
As far I could understand this file is created by this script, right? Without any content? Or from where the content come from? from my rsync request? I'm trying to run the ur.py itself (which I think don't make much sense because something prior has to call for it). But when I debug the file itself, seems like the file can't be opened, much like because it does not exists.
18:21
rebecabarros
Sorry if this don't make much sense.
18:21
pdurbin
Huh, there is Redis in there. I forgot it was added to speed things up.
18:23
pdurbin
rebecabarros: I'm calling for reinforcements. :)
18:24
Thalia_UM joined #dataverse
18:25
rebecabarros
pdurbin: haha ok, thanks again!
18:26
pdurbin
rebecabarros: do you know if Venki is still trying to set up a Data Capture Module or not? Remember that thread on the Google Group?
18:30
pdurbin
Here's the thread: https://groups.google.com/d/msg/dataverse-community/mcji2ytn3QI/3qKoRkiYBAAJ
18:30
pdurbin
rebecabarros: Do you want to reply on the thread with your latest status? Maybe Venki can help. Or maybe Pete will reply when he has time.
18:31
donsizemore joined #dataverse
18:33
rebecabarros
pdurbin: I do remember. He sent me a private message asking if my 17gb successful uploaded file was a zip file. Seems like he was trying upload double zipped files. I said that mine was a csv file and I've suggested that he could try to upload using the API just to see what would happened. But I do not heard back from him.
18:34
pdurbin
rebecabarros: hmm, ok. Do you think he's actively trying to set up all this DCM and rsync stuff?
18:36
rebecabarros
pdurbin: I don't think so. But I could try to ask.
18:43
pdurbin
rebecabarros: ok, you and Pete might be the only people trying to use a Data Capture Module. Do you see "Large Data Support and HTTP Upload Support" at https://dataverse.org/goals-roadmap-and-releases as something we're thinking about? I can explain what that means.
18:54
rebecabarros
pdurbin: I would appreciate if you give me a general idea on how you are thinking to approach that. :)
19:01
pdurbin
rebecabarros: the key word is "and". Large Data Support AND HTTP (regular) upload at the same time. Right now you can only use one or the other. Does that make sense?
19:01
jri joined #dataverse
19:05
djbrooke joined #dataverse
19:06
djbrooke
Hey rebecabarros -- pdurbin mentioned you had some roadmap questions... let me catch up on the chat
19:10
djbrooke joined #dataverse
19:11
rebecabarros
pdurbin: that's a good news. Because I was worried about have to use only DCM even for small datasets.
19:13
djbrooke
so, rsync and http upload are currently either/or - you need to pick to have your installation transfer data via rsync or http
19:13
djbrooke
If you try to switch between them or enable both, I don't think it will work as expected
19:14
djbrooke
We chose the either/or path first because it makes it easier from a UI/UX standpoint and it meets the grant requirement of making this available in support of the large data sets of structural biologists
19:16
djbrooke
But, in 2018 we'll be planning to make an installation able to have both rsync (for big transfers) and http (for smaller transfers) in the same installation
19:16
djbrooke
The technical groundwork has been laid, now it's a matter of providing a good user experience for getting data in and out Dataverse when both of these options are enabled
19:19
rebecabarros
djbrooke: hi, thanks for the clarification. That's a good prospect. Over here we will have to deal mostly with large data sets but we also would like to be able to maintain http option for the smaller ones. I'm really excited for this.
19:21
pameyer joined #dataverse
19:22
pameyer
rebecabarros: just to recap, you're seeing 500 error both local and from DV server on ur.py calls; nothing in lighttpd error.log
19:22
pameyer
anything I'm missing from skimming the logs?
19:23
djbrooke
and we're excited to work on it! It's been a long time coming, so it's great to have the resources from this grant for something that will benefit the larger community
19:26
rebecabarros
pameyer: correct. Although I think that error.log doesn't show much info because is not setted to do so. And I did not find how to make it more verbose.
19:26
pdurbin
pameyer: I was encouraging rebecabarros to copy ur.py to hello.py and try to get a "hello world" output via curl just to make sure CGI is working.
19:27
pameyer
looking that up now
19:27
pameyer
but leaning towards guessing the web server doesn't have write permission to the request directory
19:29
pameyer
ah - got a better candidate
19:29
pameyer
could you let me know what the default python version on your dcm system is?
19:30
rebecabarros
pameyer: it's Python 2.7.5
19:33
pameyer
ok - could you add `server.breakagelog = "/var/log/lighttpd/breakage.log"` to your lighttpd.conf and restart
19:33
pameyer
python2.7.5 should be fine
19:36
rebecabarros
pameyer: let me try
19:37
pameyer
graphviz files in doc subdirectory were intended to be informative about information flow; but feedback has been that they don't do a great job conveying information
19:42
rebecabarros
pameyer: So, I restarted lighttpd and try tu run dcm-test01 again. Here is the breakage.log -> https://pastebin.com/JVaEhHDR
19:42
rebecabarros
I've change the DATADIR value in ur.py, should I move back to the value that is in your code?
19:44
pameyer
what do you have it set to?
19:44
donsizemore joined #dataverse
19:45
pameyer
log makes it look like a problem using a relative path; if you switch to a full path it should at least move the error
19:48
rebecabarros
full path lead me to a permission error. I don't understand cause I already did chmod -R 777 on the main directory of dcm
19:53
pameyer
you have DATADIR inside the same directory as the dcm code?
19:53
rebecabarros
pameyer: yes
19:55
pameyer
does `sudo -u lighttpd touch $DATADIR/testfile` give the same permission error
19:56
pdurbin
pameyer: oh! I had a thought. SELinux.
19:56
pameyer
aka - $DATADIR for the full path to your dcm/deposit/requests/14186.json directory
19:56
pameyer
pdurbin: good thought. `sestatus?`
19:56
pdurbin
getenforce
19:57
pameyer
rebecabarros: you have `/usr/local/dcmu` in some places, and `dcm/deposit/` in others. is it possible that there's a mix up in directory names
20:01
rebecabarros
pamayer: I put $DATADIR = $UPLOAD_DIRECTORY/requests (of main.yml) should that be okay? Meaning, DATADIR should be any directory or what?
20:04
pameyer
right: `$DATADIR = $UPLOAD_DIRECTORY/requests` is the way things are expecting to have it setup
20:07
rebecabarros
`sudo -u lighttpd touch $DATADIR/testfile` works without error and the file is created.
20:07
pameyer
ok - so it's not permissions
20:07
pameyer
could you try that with the full path for $DATADIR?
20:11
pdurbin
Can you both run `getenforce` and say what the output is?
20:12
rebecabarros
pameyer: I've tried and same thing.
20:12
pameyer
"same thing" == ("same failure as before" | file created)?
20:14
rebecabarros
same failure as before
20:14
rebecabarros
pdurbin: 'getenforce' gives me 'Enforcing'
20:16
pdurbin
pameyer: does getenforce return 'Enforcing' for you too?
20:16
pameyer
nope - permissive
20:16
pameyer
so might be selinux breaking stuff again
20:17
pdurbin
rebecabarros: you might want to try setting SELinux to permissive
20:18
rebecabarros
pdurbin: guess I saw something related to that in two ravens tutorial right? I will try that.
20:19
pdurbin
yep, `setenforce permissive` is in http://guides.dataverse.org/en/4.8.2/installation/r-rapache-tworavens.html
20:19
rebecabarros
Right now I will have to go. But first thing tomorrow morning is try this with SELinux, and I'll let you guys know. Thank you again for all the help.
20:19
rebecabarros left #dataverse
20:20
* pdurbin
crosses fingers
20:29
pdurbin
pameyer: I appreciate you jumping in. Any thoughts while this is all top of mind?
20:33
pameyer
"dcm" vs "dcmu"; selinux are top of the list
20:35
pdurbin
ok. we probably should have put non-mock through QA
20:36
pdurbin
given enough time, that is :)
20:42
pameyer
yup
21:23
pdurbin left #dataverse
22:26
jri joined #dataverse