IQSS logo

IRC log for #dataverse, 2018-12-10

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
01:32 jri joined #dataverse
03:48 jri joined #dataverse
06:56 jri joined #dataverse
07:26 poikilotherm joined #dataverse
08:04 jri joined #dataverse
08:17 juancorr joined #dataverse
09:30 dataverse-user joined #dataverse
11:14 pdurbin joined #dataverse
11:32 pdurbin poikilotherm: mornin'. Are you working on an agenda for https://github.com/IQSS/dataverse/issues/5373 ? The reason I ask is that meetings without agendas make me nervous. :)
11:35 poikilotherm Good morning :-)
11:35 poikilotherm I'll answer that later ... Having some lunch right now
11:40 pdurbin breakfast here, enjoy!
12:09 poikilotherm Alright, now am ready ;-)
12:10 poikilotherm About #5373: Nope, not yet. I wanted to come up with an agenda, once I have more feedback from others
12:13 pdurbin poikilotherm: ok, is there a way I can help?
12:14 poikilotherm Oh you did already :-)
12:14 poikilotherm Thx for forwarding ;-)
12:15 pdurbin sure
12:16 pdurbin I don't see much feedback in that issue or in the Google Group.
12:16 poikilotherm Yeah.
12:17 poikilotherm I am not sure if this is a lack of interest or just a matter of not reaching the relevant people
12:18 pdurbin lack of interest, I think
12:18 poikilotherm Let's wait till Wednesday (as I wrote in the issue) and then get those who responded together
12:19 poikilotherm Err you can't tell - statistics is quite specific about that you cannot anticipate a cause from a correlation ;-)
12:19 pdurbin if it helps, here are the GitHub usernames I have in mind: 4tikhonov aculich ataturk bricas craig-willis danmcp DirectXMan12 donsizemore joelmarkanderson landreev omaralsoudanii patrickdillon phillipross poikilotherm scolapasta thaorell vsoch xibri
12:20 pdurbin whoops, xibriz*
12:20 poikilotherm Wow, that's already quite a bunch of people!
12:21 pdurbin but I haven't contacted them
12:22 poikilotherm I'll have a look at all of these people and try to estimate wether they are still into the field of Dataverse
12:23 pdurbin Sounds good. If you have any questions about any of them, please let me know.
12:23 poikilotherm :-)
12:23 poikilotherm Thy man
12:23 poikilotherm -y+x
12:25 pdurbin I've been thinking that it would be nice if members of the community could create their own profile page. I think Drupal does a pretty good job of this. Here's a random example of a profile: https://www.drupal.org/u/jlicht
12:26 poikilotherm Just a thought: what about a Jekyll based site on Github Pages?
12:26 poikilotherm Current example: https://github.com/DE-RSE/www
12:26 poikilotherm I really like the map: https://www.de-rse.org/en/map.html
12:28 pdurbin We had a Sphinx based on at https://github.com/IQSS/dataverse.org/blob/master/docs/community/source/index.rst
12:28 pdurbin one*
12:30 pdurbin you can see the old URL at https://github.com/IQSS/dataverse.org/issues/42
12:30 poikilotherm https://jefflirion.github.io/sphinx-github-pages.html
12:34 poikilotherm But just using Jekyll might be much easier...
12:34 poikilotherm What is GDCC using for hosting?
12:35 pdurbin if you look at the bottom of http://dataversecommunity.global you'll see "powered by openscholar"
12:36 poikilotherm Oh
12:36 poikilotherm OK
12:36 poikilotherm It might be interesting to host the community devs at GDCC pages, but then this most certainly is no good option...
12:37 poikilotherm Could be tricky to get them inside... ;-)
12:37 pdurbin "host the community devs"?
12:38 pdurbin What do you mean?
12:39 poikilotherm Hosting the profile pages at GDCC
12:40 poikilotherm Rereading your post bove: ok, you didn't want only devs, but all community members
12:40 pdurbin oh, but the profile pages wouldn't just be devs, any member of the community could create a profile page
12:40 poikilotherm That sounded like a good job for the consortium ;-)
12:41 pdurbin most of the people who call in to the community call are not devs
12:41 poikilotherm Yeah, that was mixed up in my head
12:41 poikilotherm Sry
12:42 pdurbin no worries
12:57 donsizemore joined #dataverse
14:07 donsizemore joined #dataverse
14:23 poikilotherm pdurbin I just added some usage instructions on #5292 :-)
14:25 poikilotherm And I wanted to drop a comment about my issues #5379 and #5378...
14:26 poikilotherm When I rebased my stuff onto latest develop, I was kind of upset that #5377 slipped through QA. Then I saw those commits and wondered about them... I hope this doesn't place to much load on kcondon...
14:43 pdurbin So many numbers. Let me get some coffee first.
14:43 poikilotherm Oh you didn't have one yet????
14:43 poikilotherm Poor Phil
14:49 pdurbin poikilotherm: usage instructions in a commit message? A better place might be a future version of http://guides.dataverse.org/en/4.9.4/developers/containers.html
14:50 pdurbin When I look at https://github.com/IQSS/dataverse/compare/develop...poikilotherm:5292-small-container my first thought is "There is no documentation in this pull request."
14:57 poikilotherm No, I added the instructions on the initial issue comment  of #5292
14:57 poikilotherm Yeah, this is just a feature branch as WIP
14:58 pdurbin Sorry, branch.
14:58 poikilotherm So no docs yet, as this is not ready
14:58 pdurbin ok
14:58 poikilotherm This might change any time, when moving forward
14:59 pdurbin That was the first number above. Do you want me to keep going?
14:59 poikilotherm Sure, go ahead
15:00 donsizemore joined #dataverse
15:04 pdurbin poikilotherm: ok for the next two, you have something to say about commit messages?
15:05 poikilotherm Err - what issue are you talking about right now?
15:05 pdurbin the ones you opened about commit messages
15:06 poikilotherm Ok... :-)
15:06 poikilotherm Yeah I just stumbled of the commits of #5341 and they gave me a hard time
15:06 poikilotherm https://github.com/IQSS/dataverse/pull/5341
15:07 pdurbin merge conflicts?
15:07 poikilotherm Such a commit history is a good example where a rebase would have been appropriate... ;-)
15:08 poikilotherm No, just very tedious to try to figure out what happened where
15:08 poikilotherm Lots of noise with the merge commits
15:08 pdurbin But what problem were you having? No merge conflicts, you said.
15:09 poikilotherm I needed to track down, where the Bundle imports happened to fix this for #5377
15:10 poikilotherm And most of this stuff could have just been done in one single commit
15:10 poikilotherm The merge of this PR just made the git history in develop bloated
15:11 poikilotherm And who knows what DAT-176 is?
15:11 pdurbin I agree with the bloat but do you think we should have sent the issue back to the contributor saying "please rebase"? That's not a very friendly thing to do.
15:12 poikilotherm Actually: yes. But of course in a polite way. Maybe offer some guidance about rebasing or squashing
15:12 pdurbin I wouldn't volunteer for this unfriendly task.
15:12 poikilotherm Now as this is merged, it has a negative impact on the quality of Dataverse codebase
15:13 poikilotherm Well, that should be done by QA
15:13 poikilotherm Isn't that what QA is about?
15:13 poikilotherm At least I think you guys should discuss this.
15:13 pdurbin You have a lot of opinions of how people should run their software projects. :)
15:13 poikilotherm Of course it is not your commit, but someone looking on the commits, will blame IQSS for quality standards
15:14 poikilotherm Please be aware that this is just my opinion and if IQSS doesn't like it, that's just fine for me
15:14 pdurbin I don't speak for all of IQSS but I like being friendly to contributors.
15:14 poikilotherm I am just adding cents about quality standards. Sorry, influenced by my wife being a professional quality manager
15:15 poikilotherm Yeah, I like that too!
15:15 poikilotherm I always try to be polite and friendly, always seeing a positive attitude of people
15:15 poikilotherm That's why I would offer help here
15:16 poikilotherm Most certainly this has just happened due to some dev not being very experienced with git.
15:16 poikilotherm But that can be helped
15:16 pdurbin It's good feedback but I don't think you're changing my mind. Perhaps you could discuss this with contributors who are doing things wrong from your perspective. Am I doing anything wrong? I'd be happy to hear about how to improve.
15:16 poikilotherm Nope, what I have seen from IQSS people so far is perfect
15:17 pdurbin Ok, because most of us at IQSS are not in the habit of rebasing.
15:17 pdurbin The good news is that none of us "force push" either. :)
15:17 poikilotherm I don't think I am in a position to address this. I just sensed there is a lack of docs about how to do stuff when you create a PR
15:18 poikilotherm But I cannot set standard at IQSS
15:18 pdurbin I agree. The lack of docs is because we don't want to overwhelm contributors. Death by a thousands cuts.
15:18 poikilotherm Yeah
15:19 pdurbin The perspective I have on this is that the team at IQSS and the software itself is slowly maturing over the years. It takes time. I'm patient.
15:20 poikilotherm Call me advocatus diaboli: this stuff might backfire down the road. But if you guys are fine with this, just tell me and I will keep my mouth shut :-)
15:20 pdurbin This kind of feedback about code quality is good but please keep in mind that I'm the only IQSS employee who hangs out in this IRC channel. Emailing dataverse-dev would be a good way of reaching more people.
15:21 poikilotherm Yeah... That worked well last time... ;-)
15:21 pdurbin I'm not sure what to tell you. It's hard being on the outside.
15:21 poikilotherm ;-)
15:21 pdurbin I sympathize with you.
15:22 pdurbin Are you coming to the community meeting in June?
15:22 poikilotherm I just wanted to make sure that at least someone at IQSS might give this some attention and maybe, just maybe, when there is a good chance, might remember this and start a discussion at IQSS :-)
15:22 poikilotherm I don't know yet
15:22 poikilotherm Me boss is busy...
15:23 pdurbin I'm well aware that we do not have a clean git history. I have to chose my battles.
15:23 poikilotherm Yeah :-)
15:23 pameyer joined #dataverse
15:23 pdurbin Is a clean git history more important that making more features available via API (rather than GUI-only)?
15:24 poikilotherm If you ask someone from QA: yes. In the long term it enhances the project and ensures stability.
15:24 pameyer why?
15:24 poikilotherm If you ask a sales person or any feature dev: NOOOOOOO
15:25 pameyer something works or it doesn't; path-dependency seems to me to be largely irrelevant
15:25 pdurbin Making more features available via API allows them to be tested using our existing tools and process. That's more important than a clean git history to me.
15:26 poikilotherm I think truth is somewhere in the middle
15:26 poikilotherm Both is important. That is why you have people for features and people for QA
15:26 pdurbin poikilotherm: how well aligned do you feel with our strategic goals? They are listed at https://dataverse.org/goals-roadmap-and-releases
15:28 poikilotherm In an ideal world, feature devs are slowly adapting and make commits and PRs in a way that QA is satisfied quickly and the strategic goal of a mature codebase is easier to achieve
15:28 poikilotherm There is a reason why it is hard work to be a Scrum Product Owner
15:29 poikilotherm pdurbin: see point 8 of the goals
15:29 pdurbin poikilotherm: the last point :)
15:30 poikilotherm Things is as most of the time in research software development: things start small and tend to grow big. If this happens, you need people that ensure this is not "the last point", but gets integrated into the rest
15:30 poikilotherm This really is a very complex and hard task!
15:31 poikilotherm Just like research data management for scientists, it seems to be a task that makes the impression of "oh no, this just more work and no win"
15:31 pdurbin poikilotherm: have you read the quote at http://guides.dataverse.org/en/4.9.4/developers/testing.html#the-health-of-a-codebase ?
15:32 poikilotherm Yeah
15:32 pdurbin there is a business case for code quality
15:32 pdurbin I added that quote to the guides.
15:32 pdurbin I'm with you.
15:32 poikilotherm And I am all in. But this quote says nothing about how to get there ;-)
15:33 pdurbin the quote is meant to be inspirational :)
15:33 pdurbin I'm not sure if it's working. :)
15:33 poikilotherm I don't think there is a good recipe to this in RSE
15:34 poikilotherm But that's why I came up with #5378 and #5379
15:34 poikilotherm ;-)
15:35 poikilotherm pameyer: what did you mean with path dep?
15:38 pameyer poikilotherm: for example, when I was hunting resource leaks I was more concerened with "are there any that will impact this performance profile" and not at all concerned with "which developer introduced them"
15:39 poikilotherm Ah this is not about "who did that" - I don't want to point at someone and blame them. A good history is all about "why was this change made?" and to understand what where circumstances and things tried
15:41 poikilotherm A cluttered history IMHO makes this a task harder than it needs to be
15:41 pdurbin Sure. The Linux kernel maintains a very clean git history, I believe. I understand the benefits. How are you going to get the word out to more people?
15:43 pameyer my impression is that if a dev needs to go hunting through history, the code is already has more accidental complexity than it should
15:43 poikilotherm Hmm. Maybe. How to avoid that? More inline comments?
15:45 pameyer it's never completely avoidable; but usually keeping a close correspondance between the code and the problems space
15:46 poikilotherm This leads to KISS, right?
15:46 poikilotherm And to "small changes"
15:47 pameyer not always small changes :( because if the problem is sufficiently complex, the initial overall structure may be wrong
15:50 poikilotherm Oh bummer guys, gotta go... 16:50 over here.
15:50 poikilotherm Won't be around tomorrow
15:50 poikilotherm Cu
15:51 pdurbin donsizemore: do have a moment to talk shib?
15:51 pdurbin do you*
15:55 pameyer pdurbin: isn't donsizemore upgrading today?
15:56 pdurbin yeah
15:57 donsizemore @pameyer @pdurbin Tomorrow! Tomorrow! (I'll bite ya, tomorrow...) — Bill the Cat, Bloom County
15:58 donsizemore @pdurbin what's going on with Shibboleth?
15:59 pdurbin donsizemore: it's a long story but here's the latest: https://github.com/IQSS/dataverse/issues/2122#issuecomment-444980507
16:00 donsizemore do y'all need a test system with shib?
16:01 pdurbin well, I'm installing shib on https://dev1.dataverse.org
16:01 donsizemore i can build a warfile from that branch and deploy it on dataverse-test if you want
16:01 pdurbin donsizemore: do you know if a valid ssl cert is required for shib to work or not? Especially the "TestShib" IdP?
16:05 donsizemore @pdurbin "valid" SSL certs are nearly always required. let me see if there's a flag or other setting
16:06 pdurbin that's ok, I have a valid cert
16:07 pdurbin on dev1 anyway
16:08 donsizemore what error are you getting?
16:08 pdurbin no errors yet but we don't have valid certs on the EC2 instances we spin up. so I was blocked using EC2
16:08 donsizemore it's all snake oil.
16:08 pdurbin heh
16:09 donsizemore i'm happy to deploy a warfile on dataverse-test and give you a login.
16:09 donsizemore you can even help me decode this nondescript JSON error i'm getting trying to do a test dataset import
16:10 pdurbin donsizemore: how easily can you give us a UNC login to test with? And the metadata exchange?
16:11 donsizemore the UNC login part would have to go through HR... i thought you were just testing permission lookups
16:11 donsizemore but shell access to the box would be no problem
16:12 pdurbin no worries, I'll try the TestShib IdP
16:13 donsizemore (how come y'all are running 2.5? Odum has been on 3.0 for a while now)
16:13 pdurbin Where are you seeing 2.5?
16:14 donsizemore #5369
16:14 pdurbin hmm, dunno
16:14 donsizemore there were security problems in 2.6, and the version changed promised to retain backwards compatibility (for a while)
16:15 pdurbin I'm testing 2.6.1 because that's what you specified in https://github.com/IQSS/dataverse/pull/4873/files
16:16 donsizemore oh, i wondered what happened with that PR. you can kill it.
16:16 pdurbin I was already merged.
16:16 pdurbin it*
16:16 donsizemore it never showed up in the docs so i thought you all threw it away
16:17 donsizemore i say forge ahead with 3.0.2
16:17 pdurbin it showed up. here it is: http://guides.dataverse.org/en/4.9.4/installation/shibboleth.html#install-shibboleth-via-yum
16:17 pdurbin should we change it?
16:18 donsizemore absolutely. odum has run 3.0.2 for months with only deprecation warnings
16:18 donsizemore want me to submit another PR?
16:35 pdurbin donsizemore: that's ok, I already started a branch: https://github.com/IQSS/dataverse/pull/5386
16:39 donsizemore @pdurbin excellent. sorry for dropping the ball on that
16:40 pdurbin no no, it's great to year that shib 3 has been working for you in production for months
16:40 donsizemore @pdurbin who's your resident JSON dataset :import expert? Mandy wanted me to bend the rules a little bit and I'm not sure we can
16:40 pdurbin gives us confidence that we should get everyone moved over
16:40 donsizemore @pdurbin oh, I would've sounded my Chicken Little alarm months ago had we run into problems
16:41 pdurbin donsizemore: me, I guess. What's your import question? I'm probably going to ask you to open an issue.
16:42 donsizemore I wrote some Python to read IPUMS values and spit out JSON hopefully for use with Dataverse's :import API endpoint
16:43 pdurbin ok, pameyer has done something similar
16:43 donsizemore Mandy doesn't want to specify a license but more importantly we're not specifying any files in the dataset
16:44 pameyer I don't *think* you need to specify files with the import api
16:45 pdurbin donsizemore: according to https://github.com/IQSS/dataverse/issues/3357 the default license is "None". Is that your experience?
16:46 donsizemore ah, the example said CC0; I tried leaving it blank. I'll try with 'None'
16:46 pdurbin ok
16:47 pameyer yeah - if you leave it out, you don't get a license
16:47 pameyer I think I opened an issue because I was expecting the "deafult license" to get applied to api stuff
17:06 pdurbin donsizemore: do have time to think about how we would improve our "ec2 spin up" script so that the instance has a valid cert on it at the end? Again, I think I need a valid cert for Shibboleth testing.
17:47 pameyer pdurbin: am I remembering right that dns for your ec2 is under amazon.com?
17:48 pameyer might be able to use letsencrypt there
17:49 pdurbin yeah, an example DNS entry when we spin up an EC2 instance is ec2-18-232-90-63.compute-1.amazonaws.com
19:27 donsizemore @pdurbin my best stab, as i see pete just said, would be letsencrypt
19:28 pdurbin donsizemore: is this something you'd actually want to work on?
19:28 pdurbin no pressure!
19:42 pdurbin pameyer: question about Docker and Dataverse and https://github.com/CDLUC3/counter-processor
19:43 pameyer pdurbin: am I supposed to know what clduc3 is?
19:44 pdurbin UC Curation Center... "UC3 provides digital curation, preservation and research data management at the California Digital Library."
19:44 pdurbin pameyer: actually, this a better introduction: https://github.com/IQSS/dataverse/issues/5385
19:45 pameyer ah - gotcha
19:45 pdurbin I'm wondering if we should follow your docker-dcm pattern.
19:46 pameyer would probably make the counter image ligher weight
19:46 pameyer but if it's not a service, it might be overkill
19:47 pdurbin it's not a service
19:47 pdurbin I guess I could throw it into our Vagrant setup scripts.
19:47 pameyer I think container best practices are to have one container for each process; whether it's service or batch
19:48 pdurbin oh, it's a batch
19:48 pdurbin I mean, it's a nightly cron job. I think nightly.
19:49 pdurbin If it's a separate container, it would need a shared filesystem with the Dataverse container. Is that what "hold:/hold" is?
19:49 pameyer yeah
19:50 pdurbin from the perspective of docker-compose.yml is everything a service? I don't see "batch" anywhere.
19:51 pameyer docker-dcm doesn't have any batch stuff
19:52 pameyer the "run this thing with docker exec" is how it works for what would be a cron job
19:52 pameyer there's probably a more best practices way; but I didn't go looking for it
19:53 pdurbin ok, maybe I should stick with Vagrant for now
19:56 pdurbin the devil I know :)
19:57 xarthisius pdurbin: I don't think plain docker supports cron jobs, so the best practice is to run cron in docker container ;P
19:57 pdurbin hmm, that's good to know, thanks
19:57 pdurbin at the moment I'm just trying to install this counter-processor thing on centos 7 and trying to get it to run
19:58 xarthisius k8s have a concept of a periodic job/task, but i'm not sure
19:58 xarthisius craig would definitely know
19:59 pdurbin well, the simpler the better to start. vagrant then docker, then docker-compose, then maybe k8s
20:00 pameyer well, they've got a requirements file
20:00 pameyer I'd start with a fresh virtualenv, and go from there if necessary
20:01 pdurbin Is that what you do with DCM?
20:03 pameyer well, you can't get redis and lighttpd in a virtualenv
20:03 pameyer but for the mock dcm, yeah
20:06 pdurbin nothing about virtualenv at https://github.com/sbgrid/data-capture-module/blob/master/doc/installation.md and I probably won't bother with it either since I'm in Vagrant
20:07 pameyer doc/mock.md`
20:07 pameyer but yeah - simpler is usually better
20:11 pdurbin Python 3 is required for counter-processor and while it looks like Fedora might have python3-virtualenv as an RPM, there doesn't seem to be one for el7.
20:14 pameyer I've never had problems using the same virtualenv package with python2 and python3
20:15 pdurbin oh, that's good to know
20:15 pdurbin Is there a Python 3 rpm from el7 in "base"?
20:17 pdurbin or any recommended Python 3 RPM for el7?
20:20 pameyer I'd start with whatever epel gives me
20:22 pdurbin `yum install python36` works with epel, thanks
20:25 pameyer np
20:35 pdurbin Uh oh. `yum install python36-pip` doesn't work.
20:42 pameyer odd - liks like 34 has it
20:42 pdurbin https://stackoverflow.com/questions/50408941/recommended-way-to-install-pip3-on-centos7/52518512#52518512 said to use `python3.6 -m ensurepip` and it seemed to work.
20:43 pdurbin Looks like it created /usr/local/bin/pip3
20:43 pdurbin Python 3 is not the smoothest ride on el7.
21:31 pdurbin pameyer: thanks for your help. This is what I have so far: https://github.com/IQSS/dataverse/blob/02c5538e014e1e8fd36678f754d53b19da72d8d4/scripts/vagrant/setup-counter-processor.sh
21:36 pameyer pdurbin: it may not matter, but it looks like that might mix python package from pip with packages from yum
21:37 pdurbin "ensurepip"?
21:38 pameyer mixing packages from two different package managers in the same library directory seems orthogonal to how one of the package managers gets on the system
22:47 pdurbin Ok, I'll try to remember to pick your brain later about this. At least I got counter-processor to spit out some JSON. :)
22:49 pameyer that sounds like success
22:50 pdurbin progress
23:16 pdurbin heh. Adam Bien called me "Patrick" but at least he's talking about Dataverse: https://youtu.be/kaZwT3IdGZk?t=956

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.