IQSS logo

IRC log for #dataverse, 2019-11-21

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
07:50 jri joined #dataverse
08:20 stefankasberger joined #dataverse
09:05 poikilotherm joined #dataverse
09:05 MrK joined #dataverse
09:33 jri joined #dataverse
11:10 jri joined #dataverse
11:13 poikilotherm Good morning America :-)
11:13 poikilotherm Welcome aboard flight DV4181
11:15 jri_ joined #dataverse
11:52 pdurbin poikilotherm: heh. Back on deck?
11:53 poikilotherm TADA :-D
11:53 pdurbin Great!
12:01 pdurbin poikilotherm: it looks like you merged the latest into https://github.com/IQSS/dataverse/pull/6365 . Are you ready for me to send it to QA?
12:01 poikilotherm Sure
12:04 pdurbin poikilotherm: done! Can you please leave a comment on the pull request about how to test the changes? kcondon was asking questions like "Is Shib affected?"
12:04 poikilotherm Sure. As soon as I finished that little thing for a colleague :-)
12:04 poikilotherm Thanks for sending me that question :-)
12:05 poikilotherm More like those around? These are easy to answer
12:12 pdurbin Um. Did you see I gave you a shout out at https://groups.google.com/d/msg/dataverse-community/uKretKox_io/4FyPVAMYBgAJ ? :)
12:21 MrK joined #dataverse
12:29 poikilotherm Yeah :-)
12:29 poikilotherm Alright, that tiny other job is done...
12:29 poikilotherm Now back to Dataverse hacking
12:30 pdurbin :)
12:36 poikilotherm Danny wrote: can you provide some guidance about areas of the code that had heavy changes and areas that you see as particularly complex and that have more risk
12:36 poikilotherm I'm puzzled
12:37 pdurbin Yeah, that's why I pinged you.
12:37 pdurbin Don't worry about it.
12:37 poikilotherm Aye
12:37 pdurbin You can just say roughly what functionality you touched.
12:37 poikilotherm That sounds like you guys really scratched your head over what I've done...
12:37 poikilotherm Ok
12:38 pdurbin QA will be testing for regressions.
12:38 poikilotherm Maybe, just maybe, this would be a good thing to express via labels or in the description of a PR, so a template thing.
12:38 pdurbin But not regressions across every bit of functionality in the app.
12:38 pdurbin Sure, sounds fine.
12:39 poikilotherm The K8s people and other these days often flag things with labels like "feature/x" and "risk/y" etc
12:40 poikilotherm https://github.com/GoogleContainerTools/skaffold/pulls
12:40 poikilotherm https://github.com/kubernetes/kubernetes/pulls
12:43 pdurbin I can't find any examples of "feature/x" or "risk/y" but I'll take your word for it. :)
12:44 poikilotherm It's not precisely what they use, but we could adapt it to such labels
12:44 pdurbin ok
12:45 pdurbin You've done a good job of making code review easy. Small diff. Tests added. Now you can try to make QA easy. "Based on what I changed, I suggest testing the following:"
12:45 poikilotherm K8s is using "feature" in a more differentiated manner like "sig" and "area"
12:46 poikilotherm That sounds really nice as a template string
12:46 poikilotherm Want me to create an issue?
12:47 pdurbin sure!
12:48 poikilotherm Or should I add a comment to https://github.com/IQSS/dataverse/issues/6226
12:49 pdurbin hmm, that would probably be better
12:57 poikilotherm Done.https://github.com/IQSS/dataverse/issues/6226#issuecomment-557073166
12:58 pdurbin looks good, thanks
12:58 poikilotherm Sure.
12:59 poikilotherm Another thing that would be very helpfull for the community :-)
12:59 poikilotherm Better communication :-)
13:00 poikilotherm Should I add my comments about testing for #6365 in a comment or in the description?
13:00 poikilotherm Comments sometimes tend to get lost
13:02 pdurbin In this case I think a new comment would be best. Since there has already been some chatter in that pull request.
13:03 poikilotherm Hmm... Let's do both. I'll add it like you had it above as a section caption to the description and add a comment with pings about the changed description.
13:03 pdurbin QA tries to read all of the comments in the issue and all of the comment in the pull request but sometimes there are many, many comments (especially in issues) and what QA wants is an understanding of the latest code, the code that actually needs to be tested. Not code that has changed over and over in code review.
13:05 pdurbin (Very little of this applies in your pull request, which is small and targeted. And basically no chatter in the issue.)
13:05 poikilotherm Yeah. But I get your point
13:06 poikilotherm So might it be an idea to enforce people to flag the description as QA ready?
13:06 poikilotherm So QA doesn't have to read all the stuff in the comments?
13:06 poikilotherm If I where to do QA I would hate to scroll through all of the comments
13:07 poikilotherm Like it happened in the PR for the Microsoft OAuth2 stuff
13:12 poikilotherm I added some thoughts to https://github.com/IQSS/dataverse/issues/6226#issuecomment-557078413
13:15 pdurbin good thoughts, good additions
13:16 pdurbin In practice, Kevin usually stops by to chat with me about pull requests that I've either made or reviewed. So what's written is important but a quick chat also helps.
13:16 poikilotherm Chatting is always usefull
13:17 poikilotherm Or even a video call
13:17 pdurbin yep
13:20 donsizemore joined #dataverse
14:03 MrK joined #dataverse
14:08 amay02 joined #dataverse
14:10 amay02 Quick question: for self-deposited objects is it possible for admins to mediate the metadata at a later date?
14:23 pdurbin amay02: well, a popular workflow is to allow authors the ability to create datasets and fill in as much metadata as possible and then click "Submit for Review" at which point a curator takes a look and either clicks "Return to Author" or "Publish". At any time, the curator can make edits to the dataset. You can read a bit more about this at
14:23 pdurbin http://guides.dataverse.org/en/4.18/user/dataset-management.html#submit-for-review
14:34 amay02 Thanks!
14:55 donsizemore @pdurbin good morning @poikilotherm guten Tag — do y'all have a minute to talk about #6124?
14:56 poikilotherm Sure
14:56 poikilotherm Go ahead
14:56 * poikilotherm looks at java.time.Clock and others for faking clock in unit tests, so tests don't fail anymore...
14:57 donsizemore should i go ahead and cobble http://guides.dataverse.org/en/4.17/developers/testing.html#measuring-coverage-of-integration-tests into dataverse-ansible or do we want to pursue the maven route
14:59 poikilotherm For me, running this via Maven makes most sense, as it is independent. I can reuse it in other tool like docker, k8s etc
15:00 poikilotherm But I don't know details about your plans of "cobbling it into ansible".
15:00 poikilotherm Please enlighten me with more details
15:01 donsizemore pete and phil had been doing this https://github.com/IQSS/dataverse/blob/738405892ec90d23c61774e402be9fc3fceb7bcc/doc/sphinx-guides/source/_static/util/instrument_war_jacoco.bash
15:04 poikilotherm Ok so they do offline instrumentation, right?
15:05 poikilotherm pdurbin: was there a particular reason to do so instead of using the agent variant?
15:05 donsizemore correct. and i see this https://automationrhapsody.com/code-coverage-with-jacoco-offline-instrumentation-with-maven/
15:05 poikilotherm Seems like it boils down to https://www.jacoco.org/jacoco/trunk/doc/agent.html vs https://www.jacoco.org/jacoco/trunk/doc/offline.html
15:07 pdurbin_m joined #dataverse
15:07 pdurbin_m poikilotherm: no particular reason apart from getting anything working quickly
15:08 poikilotherm pdurbin_m: did you try with the agent way and failed?
15:08 poikilotherm That might be a timesaver ;-)
15:10 pdurbin_m I did not.
15:10 pdurbin_m I confirmed that Pete's approach worked and added to his docs.
15:11 poikilotherm OK.
15:11 poikilotherm Reasons why you went with jacoco CLI for instrumentation instead of Maven target?
15:12 donsizemore the agent would be launched by a JVM option, and not something we'd want for general use warfiles, correct?
15:13 poikilotherm donsizemore: yes. It's just like I do for JRebel, JMX profiling et al
15:13 poikilotherm You need to place the agentjar at some useable place and configure domain.xml to start the JVM with the agent
15:13 poikilotherm Pretty straight forward. On the fly instrumentation.
15:13 donsizemore excellent
15:14 poikilotherm IMHO we should try that one first
15:14 poikilotherm It could save a lot of headaches
15:14 poikilotherm Like how to collect the results, etc
15:14 donsizemore i had an un-pushed branch to implement the work-around solution but i'll move it aside and fart around with the agent today
15:15 poikilotherm pdurbin_m: do we need to provide a fallback solution / keep the instrumented variant alive?
15:17 donsizemore @poikilotherm the agent doc above says "If you use the JaCoCo Ant tasks or JaCoCo Maven plug-in you don't have to care about the agent and its options directly. This is transparently handled by the them."
15:17 donsizemore @poikilotherm so all i need is the jar and the jvm option?
15:17 poikilotherm Beware
15:17 poikilotherm This will only be true for running test "locally"
15:18 poikilotherm But you want to run tests on a remote end
15:18 poikilotherm When we would use an embedded app server, we might benefit from that
15:18 poikilotherm But you spin up everything remotely
15:19 poikilotherm This might be a good example to get inspired from: https://github.com/piczmar/maven-jacoco-remote
15:22 poikilotherm You should also think about NOT using surefire, but failsafe maven plugin for IT tests
15:22 poikilotherm https://stackoverflow.com/questions/28986005/what-is-the-difference-between-the-maven-surefire-and-maven-failsafe-plugins
15:25 pdurbin_m poikilotherm: again, I was just confirming that a solution works. I am able to get reports of API test code coverage now, through manual effort. The next step is to add it to Jenkins. :)
15:25 poikilotherm :-)
15:26 poikilotherm Yeah. :-)
15:26 poikilotherm I propose it will be the easiest way to run the agent on the remote EC2 instance, collecting the coverage report locally on the jenkins machine when running the integration tests via maven failsafe
15:28 pdurbin_m donsizemore: are you following all that? I'm still at the gym. :)
15:29 poikilotherm If donsizemore finds it easier to instrument the classes first and load them as a package to EC2, that'd be fine too.
15:29 poikilotherm Moving more stones, though
15:38 donsizemore i'm following and like piczmar's stuff on principle
15:42 pdurbin donsizemore: awesome
15:45 poikilotherm pdurbin: https://github.com/IQSS/dataverse/pull/6365#issuecomment-557143466
15:47 pdurbin poikilotherm: timeouts, huh? Are we still trying to get this into QA in the next half hour?
15:47 poikilotherm I'll go on a hunt now. Kids waiting...
15:47 poikilotherm Read you guys tomorrow
15:47 pdurbin sounds like not :)
15:48 pdurbin donsizemore: thanks for looking into the code coverage stuff. Again, I haven't been following the conversation very closely. Is there anything you need from me to help move it forward? Or anything else? Did you figure out which monitor to buy? :)
15:51 pdurbin xarthisius: I just added https://github.com/jupyterhub/binderhub/pull/969 to the agenda for a JupyterHub/Binder Community Call that's supposed to start in a little over an hour. Right now it's first on the agenda! Are you interested in joining? Details at https://discourse.jupyter.org/t/jupyterhub-binder-community-call-november-2019/2471
15:51 donsizemore @pdurbin so, workflow. who's going to run the tests? who's going to have access to the remote target? and in the spirit of babysteps/granular changes, is testing locally a god first step?
15:51 pdurbin Everyone here is welcome, of course!
15:51 donsizemore because the scope just increased (and for the better) but small changes are the name of the game
15:52 pdurbin donsizemore: I'm going to need to grab a room to call in to that Binder call anyway so would you like me to give you a call first? Just after standup? Maybe aroun 11:30?
16:10 donsizemore @pdurbin i'm about to grab lunch with a former boss, any chance this afternoon?
16:30 pdurbin donsizemore: sure! That actually gives me time to hunt down a Dataverse DOI to try with Binder. Can you suggest any DOIs from UNC Dataverse that have a Jupyter Notebook or Python or R?
17:15 Jim95 joined #dataverse
17:19 Jim95 @pdurbin - Do you know if MDC/counter has been tested with log entries for non-published datasets? I'm getting errors since it looks like entries from events on non-published datasets are going into the MDC log, but without an identifier, etc. which then causes counter_processor to barf.
17:19 Jim95 If that's a real issue versus some misconfig/misunderstanding of the setup on my part, I can dig into it.
17:54 Jim95 actually, looking a bit more it may be specific api calls rather than specific datasets.
18:05 Jim95 Strangely  https://dataverse-dev.tdl.org/api/v1/datasets/:persistentId/versions/:latest/files is one of those calls and I don't even see an MDC logging call in that method...
18:47 pdurbin Jim95: hi! Sorry, I was just on a call about the Binderverse and haven't had lunch yet. There's free food on the floor below me that's going fast and then I'm going to take a little walk (first sunny day in quite a while). I don't think drafts should be logged. Also, can you help diagnose this TDL issue? https://groups.google.com/d/msg/dataverse-community/C-HLdPQwf70/JUNAZc2DBQAJ
18:54 donsizemore @pdurbin scaring up a DOI now
18:54 donsizemore @pdurbin in the mean time, check out Jon: https://youtu.be/ZvQQi2Z3hzI?t=441
19:00 Jim95 @pdurbin - no rush. I knew you were away. Note that I don't think it's draft stuff now - it's some (api?) calls. I see the same thing at QDR and TDL so I think it's a real issue though I'm confused at this point as to where the trigger for the MDC logging is...
19:01 Jim95 For the TDL issue - I can get you any info you need from the DB, but I don't know much about harvesting in general.
19:07 donsizemore @pdurbin R https://dataverse.unc.edu/dataset.xhtml?persistentId=doi:10.15139/S3/YCSYUN
19:36 pdurbin donsizemore: I have a good feeling this belongs on DataverseTV
19:36 pdurbin Jim95: I was sort of hoping you could look at some export files on the file system, actually. Or we can talk MDC first. Up to you. :)
19:39 pdurbin donsizemore: let's give this a try: https://mybinder.org/v2/zenodo/10.15139/S3/YCSYUN/
19:40 donsizemore it look kind of like co-ray-ray!
19:40 pdurbin heh
19:42 pdurbin donsizemore: oh, did you still want to do a quick video call?
19:43 donsizemore whenever you have time
19:43 pdurbin I have half an hour before my one on one with Danny. Please shoot me a zoom link or whatever if you're ready.
19:44 donsizemore https://unc.zoom.us/my/sizemore
19:46 donsizemore My session is taking longer than usual to start!
20:00 Jim95 @pd
20:00 Jim95 @pdurbin - what do you need me to look at on the file system?
20:06 Jim95 guessing - looks like the cached export files are from 2019-02-27 and didn't get updated when we went to 4.17 - does that help?
20:23 pdurbin Jim95: yes! That was my question... why does the UI say one date but harvesting says another?
20:24 donsizemore @pdurbin mandy say "One thing to note (not sure if this is for Phil or just a thing to note....) but when you click on a file and it opens it up in jupyter...the Visit Repo button has a broken link in it...it looks like it is throwing zenodo into the doi which is breaking the page, so it doesn't resolve to the SPPQ Dataset record."
20:30 Jim95 OK - I haven't followed the details of the issue, but assuming this means we need to re-export, I'll go chat with TDL to get that done...
21:03 pdurbin cool
21:04 pdurbin donsizemore: oh! I didn't know there's a Visit Repo button!
21:09 yoh_ joined #dataverse
21:09 pmauduit_ joined #dataverse
21:14 pdurbin donsizemore: also, here are the HTML reports that you are already creating for us (thanks!) for *unit* test coverage: https://jenkins.dataverse.org/job/IQSS-dataverse-develop/ws/target/site/jacoco/index.html
21:16 pdurbin And if you scroll around any of the files you can see red vs yellow vs green on a line by line basis: https://jenkins.dataverse.org/job/IQSS-dataverse-develop/ws/target/site/jacoco/edu.harvard.iq.dataverse.export.dublincore/DublinCoreExportUtil.java.html
21:46 pdurbin https://dataverse.harvard.edu and https://demo.dataverse.org have been upgraded to Dataverse 4.18.1.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.