IQSS logo

IRC log for #dataverse, 2016-09-02

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
00:23 djbrooke joined #dataverse
06:38 djbrooke joined #dataverse
07:38 jri joined #dataverse
11:09 kzisme joined #dataverse
12:37 djbrooke joined #dataverse
12:56 pameyer joined #dataverse
13:20 sekmiller joined #dataverse
13:31 pdurbin sekmiller: good morning. You're off the hook: https://github.com/IQSS/dataverse/issues/3326#issuecomment-244365474 :)
13:32 bsilverstein joined #dataverse
13:37 pdurbin bsilverstein: oh good, you're here. I having trouble replicating the math challenge bug.
13:38 pdurbin I'm*
13:38 bsilverstein pdurbin: is it appearing correctly for you?
13:38 pdurbin bsilverstein: can you please swing by?
13:39 bsilverstein I actually didn't even know about the math challenge until sitting in on QA and Kevin kind of arbitrarily looked to see if it was working and it happened to not be
13:39 bsilverstein did not repeatedly recreate that one
13:39 bsilverstein pdurbin: of course!
13:39 pdurbin it'll be easier to explain if you look over my shoulder
13:40 pdurbin pameyer: if bsilverstein and I can finish this up maybe I can swing by your office this afternoon.
13:50 pameyer pdurbin: bmckinney's off-site today, so you might get a better preview next week
13:57 djbrooke joined #dataverse
14:09 djbrooke joined #dataverse
14:12 pdurbin bsilverstein: always remember http://s3.amazonaws.com/giles/demons_010609/wtfm.jpg
14:23 nicholas_ joined #dataverse
14:36 pdurbin nicholas_: oh, hey, were your ears burning this morning? :)
14:41 romainM joined #dataverse
14:41 romainM hello ?
14:42 pdurbin romainM: good morning
14:43 romainM hey, long time no see !
14:43 romainM but I'm back with a problem with our "pre-realease" dataverse
14:44 pdurbin pre-release? you mean you haven't gone live yet?
14:44 romainM something with the xml generated and sent to datacite when I try to publish a dataset
14:44 romainM well, the dataverse is live but not deployed
14:44 romainM there is still the big datas import to do
14:45 romainM and given that there is some labs to deal with, it was kinda slow to treat every datas
14:45 romainM now I can import all datas as draft mode, but cannot publish it
14:45 romainM but I can publish dataset created manually
14:45 romainM :S
14:46 pdurbin I'm confused. Let me read that again.
14:46 romainM when the publish fails, I got some log
14:46 romainM especially this one : Response code: 400, [xml] xml error: The entity "nbsp" was referenced, but not declared.
14:46 romainM oh sorry
14:46 romainM I'm just getting a little bit crazy with all my imports probs ...
14:46 pdurbin romainM: which version are you running? 4.5?
14:47 romainM 4.4
14:47 romainM we haven't updated yet
14:48 pdurbin romainM: I haven't upgraded https://apitest.dataverse.org to 4.5 yet if you'd like to try to reproduce the bug there.
14:48 pdurbin It's still running 4.4.
14:48 romainM ok
14:49 pdurbin please let me know if you see the bug there too
14:50 pdurbin romainM: were you saying you think the bug has to do with using DataCite rather than EZID? That apitest server is configured to use EZID.
14:50 romainM well
14:50 romainM the problem seems to come from an xml file
14:51 romainM and given that, from what I understood, publishing include to send an xml to datacite (with metadatas)
14:51 romainM I thought it was due to that
14:53 romainM I'm importing data
14:53 romainM will try to publish in 1-2 mins
14:53 pdurbin romainM: oh, ok. Yes, makes sense. You're right that publishing involves sending metadata to DataCite regardless of if you're using EZID or DataCite for DOIs. From what I understand. sekmiller is the expert in this area.
14:53 romainM I was trying to get that xml
14:53 romainM but didn't find any way yet
14:54 romainM the only solution I had was to try to intercept datas sent by dataverse server but ...
14:54 romainM would take some time and skill :D
14:54 romainM I'm gonna try a publish now
14:55 romainM ok, it published
14:55 romainM how does EZID works with publishing ?
14:56 pdurbin Dataverse registers the DOI with EZID. (Out of the box.)
14:56 pdurbin And Dataverse won't let you publish if the DOI hasn't been registered.
14:57 romainM oh, and the error message for the published error is this:
14:57 romainM Error – This dataset may not be published because the DataCite Service is currently inaccessible. Please try again. Does the issue continue to persist? Please contact Dataverse Support for assistance.
14:58 pdurbin romainM: are you seeing that error on the apitest server?
14:58 romainM given that datacite is accessible
14:58 romainM nono
14:58 romainM it worked on the api server
14:58 pdurbin huh
14:58 romainM this message come from our dataverse
14:59 pdurbin looks like it's coming from dataset.publish.error.datacite
15:00 romainM the full "event" log
15:00 romainM 2764 INFO retreived version: id: 257, state: DRAFT(details) edu.harvard.iq.dataverse.DatasetPage 2 sept. 2016 16:22:49.849 {levelValue=800, timeMillis=1472826169849} 2761 SEVERE This dataset may not be published because the <a href="http://status.datacite.org/" title="DataCite ... (details) edu.harvard.iq.dataverse.DatasetPage 2 sept. 2016 16:22:49.729 {levelValue=1000, timeMillis=1472826169729} 2760 WARNING javax.ejb.TransactionRolledb
15:00 romainM hum
15:00 romainM not really readable ...
15:01 pdurbin romainM: can you please attach your server.log in an email to support@dataverse.org ?
15:01 romainM yes
15:05 jri joined #dataverse
15:06 pdurbin please mention that you're running 4.4
15:08 romainM done
15:09 romainM I mentionned version, datacite use and other things (scripted imported datasets, etc)
15:12 pdurbin https://help.hmdc.harvard.edu/Ticket/Display.html?id=240514
15:12 pdurbin romainM: thanks
15:12 pdurbin it says this: Caused by: java.lang.RuntimeException: Response code: 400, [xml] xml error: The entity "nbsp" was referenced, but not declared... at edu.harvard.iq.dataverse.DataCiteRESTfullClient​.postMetadata(DataCiteRESTfullClient.java:183)
15:12 romainM thank you you
15:13 romainM yes, that's what makes me say it's an xml problem
15:16 pdurbin here's where the RuntimeException is thrown: https://github.com/IQSS/dataverse/blob/v4.4/src/main/java/edu/harvard/iq/dataverse/DataCiteRESTfullClient.java#L183
15:16 pdurbin romainM: how did you create this dataset? You imported it somehow?
15:17 romainM I create a "simple" dataset with the pythn api for dataverse
15:17 romainM then I update the metadatas of the dataset with a json
15:18 pdurbin interesting
15:18 romainM (the update is also made with python script)
15:18 romainM the datas come from different xlsx files
15:19 pdurbin romainM: and when you tested against https://apitest.dataverse.org just now you also used these scripts?
15:19 romainM gave by the labs
15:19 romainM yes
15:19 pdurbin huh
15:19 romainM testGrim Dataverse
15:19 pdurbin "works on my machine" ;)
15:19 romainM it's the testing dataverse I created
15:19 romainM ^^
15:19 djbrooke joined #dataverse
15:20 pdurbin it looks like you were able to publish a dataset there: https://apitest.dataverse.org/dataverse/testgrim
15:20 romainM just wondering the difference between ezid and datacite system
15:20 romainM yes
15:20 pdurbin but not on your server. hmm
15:20 pdurbin I wonder what's different.
15:20 romainM datacite seems to "publish" doi only when you post metadatas
15:21 romainM a xml file
15:21 romainM (if I well understood what I tried)
15:21 romainM and it seems the xml generated for this post is ... problematic
15:21 romainM maybe a bad encryption or what
15:21 romainM I ran into multiple problems with my imports
15:21 romainM and I even could do some "strange" things
15:22 pdurbin I'm not quite sure how it works. sekmiller would know. And there's a pull request, a refactoring I think, being worked on at https://github.com/IQSS/dataverse/pull/3146
15:22 romainM I made a github case
15:22 romainM for one case
15:22 pdurbin a github case? a github issue?
15:22 romainM issue sorry
15:22 pdurbin just now?
15:22 romainM nono
15:23 romainM some weeks ago
15:23 pdurbin which number please?
15:24 romainM saerching it
15:24 romainM (forgot password, had to reset, blablabla ^^")
15:24 romainM https://github.com/IQSS/dataverse/issues/3186
15:24 romainM I could "duplicate" some fields
15:24 romainM in the metadatas
15:25 romainM as you can see in the last picture, 3 "kind of data" titles appear
15:25 romainM should not be possible, no ?
15:48 pdurbin romainM: this reminds me of a different bug. related
15:49 pdurbin romainM: but "can do strange things" issue is different than the one we've been talking about, right?
15:50 romainM yes yes
15:51 romainM it's just that, maybe, some "bugs" could pass the metadatas validation for a dataset
15:51 romainM and make some problems with the xml generation for datacite
15:51 pdurbin romainM: but the new issue cannot be reproduced on the apitest server running 4.4
15:51 romainM yes, but it seems related to the datacite use
15:52 romainM you don't use any datacite dataverse ?
15:52 pdurbin right. apitest uses EZID instead of datacite for DOIs
15:52 pdurbin romainM: are you asking if we have any servers configured to used DataCite instead of EZID? I don't know.
15:52 romainM that's why I'm asking
15:53 romainM yes :)
15:53 romainM *what I'm asking
15:53 romainM I don't see any other option to test this
15:53 romainM or
15:53 romainM if there is a way to get the xml generated for datacite
15:54 romainM don't see other ways to find out :S
15:55 pdurbin it's something with &nbsp; right? I'm looking at http://stackoverflow.com/questions/9126999/how-to-handle-html-entity-nbsp-in-xslt-without-changing-the-input-file
15:56 djbrooke joined #dataverse
15:56 romainM somthing with "   " right ? missing something or ?..
15:56 romainM (looking at the issue)
15:57 romainM ah
15:57 romainM "&" nbsp ?
15:57 romainM that's the posts I saw concerning the prob
15:58 romainM but if that's it, it happens during the metadatas => xmlForDatacite step
15:58 romainM maybe with the "entity" thing it could work ...
15:58 pdurbin romainM: try it :)
15:58 romainM but I don't have the hand for that
15:59 romainM but I'll try something
15:59 romainM I can access the datacite api
16:00 romainM I'll make a script testing with and without an nbsp entity in the xml, with or without this option "entity" thing
16:00 romainM should proc the error
16:00 pdurbin cool
16:00 romainM I won't be able to do it now or this week end, I'll try monday
16:00 pdurbin monday is a holiday for us anyway :) labor day
16:00 romainM will give you the results
16:00 romainM ah
16:00 romainM not for me
16:00 pdurbin so take your time :)
16:01 romainM yep
16:01 romainM :D
16:05 garnett joined #dataverse
16:05 pameyer joined #dataverse
16:16 pdurbin nicholas_: still there?
16:20 romainM well, finally had time to do it
16:20 romainM I reproduced the xml error
16:20 romainM just to had a &nbsp; in the xml to mess it
16:22 pdurbin romainM: you can reproduce it on apitest?
16:23 romainM hum
16:23 romainM I can't really do that
16:23 romainM given that I used the datacite api for this
16:23 pdurbin oh, oh
16:23 pdurbin makes sense
16:24 romainM the only thing that block me
16:24 pdurbin but if you could get similar data into Dataverse...
16:24 romainM is that I can't declare &nbsp
16:24 romainM for a "Message: Content is not allowed in prolog"
16:24 romainM like if I couldn't define things in the xml sent :S
16:24 romainM trying to figure out why
16:26 romainM oh
16:27 romainM it passed
16:27 romainM got an xml with the "nbsp
16:27 romainM &
16:27 romainM nbsp
16:27 romainM had to add <!DOCTYPE space[ <!ENTITY nbsp "&#160;"> ]>
16:27 romainM at the beginning of the xml file
16:27 romainM arf
16:28 romainM between the ""
16:28 pdurbin romainM: can you reproduce a bug on apitest?
16:28 romainM there is & nbsp
16:28 romainM I can't do that
16:28 romainM if you don't use datacite
16:28 pdurbin right, right. bummer
16:28 romainM ^^
16:29 pdurbin sekmiller: do we have any servers set up with DataCite for DOI?
16:30 pdurbin romainM: you could provide us with some JSON to reproduce the bug on a server configured for DataCite rather than EZID? JSON to create a dataset?
16:30 romainM yes
16:31 romainM I send you on the same email ?
16:31 pdurbin romainM: maybe a GitHub issue would be better.
16:32 romainM ok
16:32 pdurbin thanks!
16:32 romainM I reexplain the problem ?
16:32 pdurbin romainM: yes, please!
16:32 romainM (asking cause I have to go in a few)
16:32 romainM ok
16:32 pdurbin romainM: no rush. we're off monday :)
16:33 romainM well
16:33 pdurbin pameyer: next week is fine
16:36 pameyer pdurbin: great
16:37 pameyer romainM: quick question - are you seeing the problem generating the xml, or when it gets send to datacite?
16:37 romainM the generation
16:37 romainM when I switch a nbsp element on the xml
16:37 romainM gets an error with, no error without
16:38 romainM and the error message is the one in the dataverse log
16:38 romainM (the nbsp not defined thing)
16:38 romainM I think the problem is a combinaison of imported datas
16:38 pameyer ok - I'd been wondering if there was a difference in validation between datacite and ezid, but it looks like this is unrelated
16:38 romainM (dunno how dataverse keeps it, encoding, etc)
16:38 romainM because it's only when datas are imported
16:39 romainM that this problem happens
16:39 romainM with dataset created "by hand", it works
16:39 romainM maybe ezid doesn't need metadatas related to the dataset ?
16:39 romainM and only do a redirection ?
16:39 romainM dunno how ezid works
16:39 romainM for datacite, it needs metadatas first
16:40 romainM it won't make a redirection link entry if no metadatas are given
16:40 romainM and there goes the xml file
16:40 romainM for the metadatas
16:41 pameyer when I've used ezid, I've always passed metadata along with the creation request
16:43 romainM if you have an xml example for ezid
16:43 romainM could be usefulto compare, if structures differe
16:47 pameyer I don't have an example from dataverse - but ezid uses (or can accept) datacite xml metadata
16:47 pdurbin romainM: this issue is a good start but we need more details, I think: Publishing fails for script-added dataset with a Dataverse using Datacite · Issue #3328 · IQSS/dataverse - https://github.com/IQSS/dataverse/issues/3328
16:47 romainM I'm adding
16:47 romainM the validation was a mistake
16:47 pdurbin romainM: ok, great. We need to know how to reproduce it, etc.
16:48 romainM adding the files too
16:48 romainM with log
16:48 romainM json
16:48 pdurbin perfect. thanks!
16:49 romainM hum
16:49 romainM if you want to reproduce it
16:49 romainM you need scripts for the json import ?
16:49 romainM or json will be enough ?
16:50 pdurbin romainM: meh, just the JSON is fine. The JSON will have nbsp stuff in it?
16:50 romainM well, "hidden" nbsp
16:50 pdurbin ok
16:50 romainM because my datasets were uploaded with this exact json kind
16:51 romainM this exact json, actually
16:51 romainM it's the output of one of them
16:51 pdurbin cool. should be enough to reproduce the bug
16:51 romainM ok
16:51 romainM I updated
16:51 romainM there is the log
16:51 romainM and the json
16:51 romainM I added the xml line
16:52 romainM oh
16:52 romainM now that I think about it, the "space" was interpreted in the post ><
16:53 pdurbin romainM: I don't see "nbsp" in the JSON.
16:53 romainM because there is none
16:53 romainM what I upload
16:53 romainM don't have this
16:53 romainM but when the xml is made
16:53 romainM I don't know why, dataverse seems to add it
16:53 pdurbin huh
16:54 romainM that's the point I don't understand
16:54 romainM now sorry, I really have to go
16:54 romainM but I can be on my phone
16:54 pdurbin romainM: have a good weekend! thanks!
16:54 romainM (I won't have access to computer, that's my point)
16:55 RomainMPhone joined #dataverse
16:56 RomainMPhone So mike i said, there is no nbsp in my file, maybe encoding problem at some point ?
16:58 RomainMPhone joined #dataverse
16:58 RomainMPhone Sorry, connection switch :S
17:00 djbrooke joined #dataverse
17:01 pdurbin RomainMPhone: heh, no worries. I'm impressed by your dedication! :)
17:07 djbrooke joined #dataverse
17:07 metamattj joined #dataverse
17:17 djbrooke joined #dataverse
17:28 RomainMPhone joined #dataverse
17:29 RomainMPhone It's also because I want it to work :D
17:29 RomainMPhone There is a pseudo dead line for datasets publication near 13 of september
17:31 pdurbin !
17:31 pdurbin djbrooke: not enough time to roll a fix into 4.5.1
17:32 RomainMPhone95 joined #dataverse
17:33 RomainMPhone95 Well
17:33 RomainMPhone95 I got a solution for that
17:33 RomainMPhone95 "In case"
17:34 RomainMPhone95 But I would prefer a clean way, yeah :D
17:37 pdurbin RomainMPhone95: is your solution a pull request? :)
17:45 pdurbin pameyer: dunno if you remember what was on http://dataverse.org/releases-roadmap but we just removed one of the tabs and now we're promising nothing with regard to large scale data or whatever by the fall
17:52 pameyer 4.6 and 4.7 are fall?
17:58 pdurbin hmm, good question
17:58 pdurbin I imagine we'll ship *something* in the fall!
18:03 RomainMPhone joined #dataverse
18:03 RomainMPhone No, my solution is not a pull request ... it's an ugly python script :D
18:04 RomainMPhone For a pull request, I should do clean code and tests ... omg ! :D
18:05 RomainMPhone And my java skill is kinda rusty now :S
18:05 pameyer RomainMPhone - I think you're on the right track about it being an encoding issue
18:05 RomainMPhone But if this is my last resort (tudu :D), i'll give it a look
18:06 RomainMPhone I'll try to analyse my datas in a first place
18:06 donsizemore joined #dataverse
18:07 RomainMPhone Just in case (even if it's clearly the case,99% chance given all we got now)
18:07 RomainMPhone Do you know what is dataverse encoding ?
18:08 pameyer from a very quick look, everything was utf-8 internally
18:08 RomainMPhone I mean, do you have a "clear" encoding used like utf8 ?
18:08 RomainMPhone Ok
18:08 pameyer I'm not the best person to have an opinion on it though
18:08 RomainMPhone My python output is utf8 ...
18:08 pameyer are you using python2?
18:09 RomainMPhone 2.7
18:09 pameyer this is sounding familiar to something I ran across a while back....
18:09 pameyer this may not be your problem, but I ran into issues with utf8 and python2 urllib
18:10 pameyer from memory (aka - exact details may be off), with python2 utf8 encoded strings sent to remote urls were translated to ascii inside one of the libraries, and that was causing problems
18:11 pameyer maybe python3 or python "requests" library would help you?
18:11 pdurbin oh, there's a chance this isn't a bug in Dataverse?
18:11 pameyer but if you're on your phone - probably not something you can check
18:11 pameyer pdurbin: potentially, it could be something else
18:12 * pdurbin puts his feet up
18:12 djbrooke_ joined #dataverse
18:13 RomainMPhone I'll take a look
18:13 RomainMPhone Diner time :S
18:13 RomainMPhone Ty anayway
18:14 pdurbin thanks
18:16 pdurbin pameyer: I love https://github.com/IQSS/dataverse/pull/3329 but I hate that there are no tests for you to change to make sure they continue to pass. :/
18:16 RomainMPhone joined #dataverse
18:17 RomainMPhone Coudn't restrain myself : I do use requests
18:17 RomainMPhone (Go really away)
18:19 pdurbin donsizemore: ping!
18:35 djbrooke joined #dataverse
18:51 djbrooke joined #dataverse
19:02 djbrooke_ joined #dataverse
19:03 djbrooke_ joined #dataverse
19:04 djbrooke_ joined #dataverse
19:20 jri joined #dataverse
19:29 jri_ joined #dataverse
19:45 jri joined #dataverse
20:12 pdurbin pameyer: what time should I show up on Tuesday for the pre-demo demo?
20:14 pameyer did bmckinney email you?
20:14 pdurbin we've been slacking or whatever
20:14 pameyer ah - gotcha
20:14 pdurbin perhaps it's time to start a channel
20:17 djbrooke joined #dataverse
20:19 jri joined #dataverse
20:22 djbrooke joined #dataverse
20:26 jri_ joined #dataverse
20:27 garnett joined #dataverse
20:35 djbrooke joined #dataverse
23:03 agarnett joined #dataverse
23:17 djbrooke joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.