Time
S
Nick
Message
01:32
jri joined #dataverse
03:48
jri joined #dataverse
06:56
jri joined #dataverse
07:26
poikilotherm joined #dataverse
08:04
jri joined #dataverse
08:17
juancorr joined #dataverse
09:30
dataverse-user joined #dataverse
11:14
pdurbin joined #dataverse
11:32
pdurbin
poikilotherm: mornin'. Are you working on an agenda for https://github.com/IQSS/dataverse/issues/5373 ? The reason I ask is that meetings without agendas make me nervous. :)
11:35
poikilotherm
Good morning :-)
11:35
poikilotherm
I'll answer that later ... Having some lunch right now
11:40
pdurbin
breakfast here, enjoy!
12:09
poikilotherm
Alright, now am ready ;-)
12:10
poikilotherm
About #5373: Nope, not yet. I wanted to come up with an agenda, once I have more feedback from others
12:13
pdurbin
poikilotherm: ok, is there a way I can help?
12:14
poikilotherm
Oh you did already :-)
12:14
poikilotherm
Thx for forwarding ;-)
12:15
pdurbin
sure
12:16
pdurbin
I don't see much feedback in that issue or in the Google Group.
12:16
poikilotherm
Yeah.
12:17
poikilotherm
I am not sure if this is a lack of interest or just a matter of not reaching the relevant people
12:18
pdurbin
lack of interest, I think
12:18
poikilotherm
Let's wait till Wednesday (as I wrote in the issue) and then get those who responded together
12:19
poikilotherm
Err you can't tell - statistics is quite specific about that you cannot anticipate a cause from a correlation ;-)
12:19
pdurbin
if it helps, here are the GitHub usernames I have in mind: 4tikhonov aculich ataturk bricas craig-willis danmcp DirectXMan12 donsizemore joelmarkanderson landreev omaralsoudanii patrickdillon phillipross poikilotherm scolapasta thaorell vsoch xibri
12:20
pdurbin
whoops, xibriz*
12:20
poikilotherm
Wow, that's already quite a bunch of people!
12:21
pdurbin
but I haven't contacted them
12:22
poikilotherm
I'll have a look at all of these people and try to estimate wether they are still into the field of Dataverse
12:23
pdurbin
Sounds good. If you have any questions about any of them, please let me know.
12:23
poikilotherm
:-)
12:23
poikilotherm
Thy man
12:23
poikilotherm
-y+x
12:25
pdurbin
I've been thinking that it would be nice if members of the community could create their own profile page. I think Drupal does a pretty good job of this. Here's a random example of a profile: https://www.drupal.org/u/jlicht
12:26
poikilotherm
Just a thought: what about a Jekyll based site on Github Pages?
12:26
poikilotherm
Current example: https://github.com/DE-RSE/www
12:26
poikilotherm
I really like the map: https://www.de-rse.org/en/map.html
12:28
pdurbin
We had a Sphinx based on at https://github.com/IQSS/dataverse.org/blob/master/docs/community/source/index.rst
12:28
pdurbin
one*
12:30
pdurbin
you can see the old URL at https://github.com/IQSS/dataverse.org/issues/42
12:30
poikilotherm
https://jefflirion.github.io/sphinx-github-pages.html
12:34
poikilotherm
But just using Jekyll might be much easier...
12:34
poikilotherm
What is GDCC using for hosting?
12:35
pdurbin
if you look at the bottom of http://dataversecommunity.global you'll see "powered by openscholar"
12:36
poikilotherm
Oh
12:36
poikilotherm
OK
12:36
poikilotherm
It might be interesting to host the community devs at GDCC pages, but then this most certainly is no good option...
12:37
poikilotherm
Could be tricky to get them inside... ;-)
12:37
pdurbin
"host the community devs"?
12:38
pdurbin
What do you mean?
12:39
poikilotherm
Hosting the profile pages at GDCC
12:40
poikilotherm
Rereading your post bove: ok, you didn't want only devs, but all community members
12:40
pdurbin
oh, but the profile pages wouldn't just be devs, any member of the community could create a profile page
12:40
poikilotherm
That sounded like a good job for the consortium ;-)
12:41
pdurbin
most of the people who call in to the community call are not devs
12:41
poikilotherm
Yeah, that was mixed up in my head
12:41
poikilotherm
Sry
12:42
pdurbin
no worries
12:57
donsizemore joined #dataverse
14:07
donsizemore joined #dataverse
14:23
poikilotherm
pdurbin I just added some usage instructions on #5292 :-)
14:25
poikilotherm
And I wanted to drop a comment about my issues #5379 and #5378...
14:26
poikilotherm
When I rebased my stuff onto latest develop, I was kind of upset that #5377 slipped through QA. Then I saw those commits and wondered about them... I hope this doesn't place to much load on kcondon...
14:43
pdurbin
So many numbers. Let me get some coffee first.
14:43
poikilotherm
Oh you didn't have one yet????
14:43
poikilotherm
Poor Phil
14:49
pdurbin
poikilotherm: usage instructions in a commit message? A better place might be a future version of http://guides.dataverse.org/en/4.9.4/developers/containers.html
14:50
pdurbin
When I look at https://github.com/IQSS/dataverse/compare/develop...poikilotherm:5292-small-container my first thought is "There is no documentation in this pull request."
14:57
poikilotherm
No, I added the instructions on the initial issue comment of #5292
14:57
poikilotherm
Yeah, this is just a feature branch as WIP
14:58
pdurbin
Sorry, branch.
14:58
poikilotherm
So no docs yet, as this is not ready
14:58
pdurbin
ok
14:58
poikilotherm
This might change any time, when moving forward
14:59
pdurbin
That was the first number above. Do you want me to keep going?
14:59
poikilotherm
Sure, go ahead
15:00
donsizemore joined #dataverse
15:04
pdurbin
poikilotherm: ok for the next two, you have something to say about commit messages?
15:05
poikilotherm
Err - what issue are you talking about right now?
15:05
pdurbin
the ones you opened about commit messages
15:06
poikilotherm
Ok... :-)
15:06
poikilotherm
Yeah I just stumbled of the commits of #5341 and they gave me a hard time
15:06
poikilotherm
https://github.com/IQSS/dataverse/pull/5341
15:07
pdurbin
merge conflicts?
15:07
poikilotherm
Such a commit history is a good example where a rebase would have been appropriate... ;-)
15:08
poikilotherm
No, just very tedious to try to figure out what happened where
15:08
poikilotherm
Lots of noise with the merge commits
15:08
pdurbin
But what problem were you having? No merge conflicts, you said.
15:09
poikilotherm
I needed to track down, where the Bundle imports happened to fix this for #5377
15:10
poikilotherm
And most of this stuff could have just been done in one single commit
15:10
poikilotherm
The merge of this PR just made the git history in develop bloated
15:11
poikilotherm
And who knows what DAT-176 is?
15:11
pdurbin
I agree with the bloat but do you think we should have sent the issue back to the contributor saying "please rebase"? That's not a very friendly thing to do.
15:12
poikilotherm
Actually: yes. But of course in a polite way. Maybe offer some guidance about rebasing or squashing
15:12
pdurbin
I wouldn't volunteer for this unfriendly task.
15:12
poikilotherm
Now as this is merged, it has a negative impact on the quality of Dataverse codebase
15:13
poikilotherm
Well, that should be done by QA
15:13
poikilotherm
Isn't that what QA is about?
15:13
poikilotherm
At least I think you guys should discuss this.
15:13
pdurbin
You have a lot of opinions of how people should run their software projects. :)
15:13
poikilotherm
Of course it is not your commit, but someone looking on the commits, will blame IQSS for quality standards
15:14
poikilotherm
Please be aware that this is just my opinion and if IQSS doesn't like it, that's just fine for me
15:14
pdurbin
I don't speak for all of IQSS but I like being friendly to contributors.
15:14
poikilotherm
I am just adding cents about quality standards. Sorry, influenced by my wife being a professional quality manager
15:15
poikilotherm
Yeah, I like that too!
15:15
poikilotherm
I always try to be polite and friendly, always seeing a positive attitude of people
15:15
poikilotherm
That's why I would offer help here
15:16
poikilotherm
Most certainly this has just happened due to some dev not being very experienced with git.
15:16
poikilotherm
But that can be helped
15:16
pdurbin
It's good feedback but I don't think you're changing my mind. Perhaps you could discuss this with contributors who are doing things wrong from your perspective. Am I doing anything wrong? I'd be happy to hear about how to improve.
15:16
poikilotherm
Nope, what I have seen from IQSS people so far is perfect
15:17
pdurbin
Ok, because most of us at IQSS are not in the habit of rebasing.
15:17
pdurbin
The good news is that none of us "force push" either. :)
15:17
poikilotherm
I don't think I am in a position to address this. I just sensed there is a lack of docs about how to do stuff when you create a PR
15:18
poikilotherm
But I cannot set standard at IQSS
15:18
pdurbin
I agree. The lack of docs is because we don't want to overwhelm contributors. Death by a thousands cuts.
15:18
poikilotherm
Yeah
15:19
pdurbin
The perspective I have on this is that the team at IQSS and the software itself is slowly maturing over the years. It takes time. I'm patient.
15:20
poikilotherm
Call me advocatus diaboli: this stuff might backfire down the road. But if you guys are fine with this, just tell me and I will keep my mouth shut :-)
15:20
pdurbin
This kind of feedback about code quality is good but please keep in mind that I'm the only IQSS employee who hangs out in this IRC channel. Emailing dataverse-dev would be a good way of reaching more people.
15:21
poikilotherm
Yeah... That worked well last time... ;-)
15:21
pdurbin
I'm not sure what to tell you. It's hard being on the outside.
15:21
poikilotherm
;-)
15:21
pdurbin
I sympathize with you.
15:22
pdurbin
Are you coming to the community meeting in June?
15:22
poikilotherm
I just wanted to make sure that at least someone at IQSS might give this some attention and maybe, just maybe, when there is a good chance, might remember this and start a discussion at IQSS :-)
15:22
poikilotherm
I don't know yet
15:22
poikilotherm
Me boss is busy...
15:23
pdurbin
I'm well aware that we do not have a clean git history. I have to chose my battles.
15:23
poikilotherm
Yeah :-)
15:23
pameyer joined #dataverse
15:23
pdurbin
Is a clean git history more important that making more features available via API (rather than GUI -only)?
15:24
poikilotherm
If you ask someone from QA: yes. In the long term it enhances the project and ensures stability.
15:24
pameyer
why?
15:24
poikilotherm
If you ask a sales person or any feature dev: NOOOOOOO
15:25
pameyer
something works or it doesn't; path-dependency seems to me to be largely irrelevant
15:25
pdurbin
Making more features available via API allows them to be tested using our existing tools and process. That's more important than a clean git history to me.
15:26
poikilotherm
I think truth is somewhere in the middle
15:26
poikilotherm
Both is important. That is why you have people for features and people for QA
15:26
pdurbin
poikilotherm: how well aligned do you feel with our strategic goals? They are listed at https://dataverse.org/goals-roadmap-and-releases
15:28
poikilotherm
In an ideal world, feature devs are slowly adapting and make commits and PRs in a way that QA is satisfied quickly and the strategic goal of a mature codebase is easier to achieve
15:28
poikilotherm
There is a reason why it is hard work to be a Scrum Product Owner
15:29
poikilotherm
pdurbin: see point 8 of the goals
15:29
pdurbin
poikilotherm: the last point :)
15:30
poikilotherm
Things is as most of the time in research software development: things start small and tend to grow big. If this happens, you need people that ensure this is not "the last point", but gets integrated into the rest
15:30
poikilotherm
This really is a very complex and hard task!
15:31
poikilotherm
Just like research data management for scientists, it seems to be a task that makes the impression of "oh no, this just more work and no win"
15:31
pdurbin
poikilotherm: have you read the quote at http://guides.dataverse.org/en/4.9.4/developers/testing.html#the-health-of-a-codebase ?
15:32
poikilotherm
Yeah
15:32
pdurbin
there is a business case for code quality
15:32
pdurbin
I added that quote to the guides.
15:32
pdurbin
I'm with you.
15:32
poikilotherm
And I am all in. But this quote says nothing about how to get there ;-)
15:33
pdurbin
the quote is meant to be inspirational :)
15:33
pdurbin
I'm not sure if it's working. :)
15:33
poikilotherm
I don't think there is a good recipe to this in RSE
15:34
poikilotherm
But that's why I came up with #5378 and #5379
15:34
poikilotherm
;-)
15:35
poikilotherm
pameyer: what did you mean with path dep?
15:38
pameyer
poikilotherm: for example, when I was hunting resource leaks I was more concerened with "are there any that will impact this performance profile" and not at all concerned with "which developer introduced them"
15:39
poikilotherm
Ah this is not about "who did that" - I don't want to point at someone and blame them. A good history is all about "why was this change made?" and to understand what where circumstances and things tried
15:41
poikilotherm
A cluttered history IMHO makes this a task harder than it needs to be
15:41
pdurbin
Sure. The Linux kernel maintains a very clean git history, I believe. I understand the benefits. How are you going to get the word out to more people?
15:43
pameyer
my impression is that if a dev needs to go hunting through history, the code is already has more accidental complexity than it should
15:43
poikilotherm
Hmm. Maybe. How to avoid that? More inline comments?
15:45
pameyer
it's never completely avoidable; but usually keeping a close correspondance between the code and the problems space
15:46
poikilotherm
This leads to KISS, right?
15:46
poikilotherm
And to "small changes"
15:47
pameyer
not always small changes :( because if the problem is sufficiently complex, the initial overall structure may be wrong
15:50
poikilotherm
Oh bummer guys, gotta go... 16:50 over here.
15:50
poikilotherm
Won't be around tomorrow
15:50
poikilotherm
Cu
15:51
pdurbin
donsizemore: do have a moment to talk shib?
15:51
pdurbin
do you*
15:55
pameyer
pdurbin: isn't donsizemore upgrading today?
15:56
pdurbin
yeah
15:57
donsizemore
@pameyer @pdurbin Tomorrow! Tomorrow! (I'll bite ya, tomorrow...) — Bill the Cat, Bloom County
15:58
donsizemore
@pdurbin what's going on with Shibboleth?
15:59
pdurbin
donsizemore: it's a long story but here's the latest: https://github.com/IQSS/dataverse/issues/2122#issuecomment-444980507
16:00
donsizemore
do y'all need a test system with shib?
16:01
pdurbin
well, I'm installing shib on https://dev1.dataverse.org
16:01
donsizemore
i can build a warfile from that branch and deploy it on dataverse-test if you want
16:01
pdurbin
donsizemore: do you know if a valid ssl cert is required for shib to work or not? Especially the "TestShib" IdP?
16:05
donsizemore
@pdurbin "valid" SSL certs are nearly always required. let me see if there's a flag or other setting
16:06
pdurbin
that's ok, I have a valid cert
16:07
pdurbin
on dev1 anyway
16:08
donsizemore
what error are you getting?
16:08
pdurbin
no errors yet but we don't have valid certs on the EC2 instances we spin up. so I was blocked using EC2
16:08
donsizemore
it's all snake oil.
16:08
pdurbin
heh
16:09
donsizemore
i'm happy to deploy a warfile on dataverse-test and give you a login.
16:09
donsizemore
you can even help me decode this nondescript JSON error i'm getting trying to do a test dataset import
16:10
pdurbin
donsizemore: how easily can you give us a UNC login to test with? And the metadata exchange?
16:11
donsizemore
the UNC login part would have to go through HR... i thought you were just testing permission lookups
16:11
donsizemore
but shell access to the box would be no problem
16:12
pdurbin
no worries, I'll try the TestShib IdP
16:13
donsizemore
(how come y'all are running 2.5? Odum has been on 3.0 for a while now)
16:13
pdurbin
Where are you seeing 2.5?
16:14
donsizemore
#5369
16:14
pdurbin
hmm, dunno
16:14
donsizemore
there were security problems in 2.6, and the version changed promised to retain backwards compatibility (for a while)
16:15
pdurbin
I'm testing 2.6.1 because that's what you specified in https://github.com/IQSS/dataverse/pull/4873/files
16:16
donsizemore
oh, i wondered what happened with that PR. you can kill it.
16:16
pdurbin
I was already merged.
16:16
pdurbin
it*
16:16
donsizemore
it never showed up in the docs so i thought you all threw it away
16:17
donsizemore
i say forge ahead with 3.0.2
16:17
pdurbin
it showed up. here it is: http://guides.dataverse.org/en/4.9.4/installation/shibboleth.html#install-shibboleth-via-yum
16:17
pdurbin
should we change it?
16:18
donsizemore
absolutely. odum has run 3.0.2 for months with only deprecation warnings
16:18
donsizemore
want me to submit another PR?
16:35
pdurbin
donsizemore: that's ok, I already started a branch: https://github.com/IQSS/dataverse/pull/5386
16:39
donsizemore
@pdurbin excellent. sorry for dropping the ball on that
16:40
pdurbin
no no, it's great to year that shib 3 has been working for you in production for months
16:40
donsizemore
@pdurbin who's your resident JSON dataset :import expert? Mandy wanted me to bend the rules a little bit and I'm not sure we can
16:40
pdurbin
gives us confidence that we should get everyone moved over
16:40
donsizemore
@pdurbin oh, I would've sounded my Chicken Little alarm months ago had we run into problems
16:41
pdurbin
donsizemore: me, I guess. What's your import question? I'm probably going to ask you to open an issue.
16:42
donsizemore
I wrote some Python to read IPUMS values and spit out JSON hopefully for use with Dataverse's :import API endpoint
16:43
pdurbin
ok, pameyer has done something similar
16:43
donsizemore
Mandy doesn't want to specify a license but more importantly we're not specifying any files in the dataset
16:44
pameyer
I don't *think* you need to specify files with the import api
16:45
pdurbin
donsizemore: according to https://github.com/IQSS/dataverse/issues/3357 the default license is "None". Is that your experience?
16:46
donsizemore
ah, the example said CC0; I tried leaving it blank. I'll try with 'None'
16:46
pdurbin
ok
16:47
pameyer
yeah - if you leave it out, you don't get a license
16:47
pameyer
I think I opened an issue because I was expecting the "deafult license" to get applied to api stuff
17:06
pdurbin
donsizemore: do have time to think about how we would improve our "ec2 spin up" script so that the instance has a valid cert on it at the end? Again, I think I need a valid cert for Shibboleth testing.
17:47
pameyer
pdurbin: am I remembering right that dns for your ec2 is under amazon.com?
17:48
pameyer
might be able to use letsencrypt there
17:49
pdurbin
yeah, an example DNS entry when we spin up an EC2 instance is ec2-18-232-90-63.compute-1.amazonaws.com
19:27
donsizemore
@pdurbin my best stab, as i see pete just said, would be letsencrypt
19:28
pdurbin
donsizemore: is this something you'd actually want to work on?
19:28
pdurbin
no pressure!
19:42
pdurbin
pameyer: question about Docker and Dataverse and https://github.com/CDLUC3/counter-processor
19:43
pameyer
pdurbin: am I supposed to know what clduc3 is?
19:44
pdurbin
UC Curation Center... "UC3 provides digital curation, preservation and research data management at the California Digital Library."
19:44
pdurbin
pameyer: actually, this a better introduction: https://github.com/IQSS/dataverse/issues/5385
19:45
pameyer
ah - gotcha
19:45
pdurbin
I'm wondering if we should follow your docker-dcm pattern.
19:46
pameyer
would probably make the counter image ligher weight
19:46
pameyer
but if it's not a service, it might be overkill
19:47
pdurbin
it's not a service
19:47
pdurbin
I guess I could throw it into our Vagrant setup scripts.
19:47
pameyer
I think container best practices are to have one container for each process; whether it's service or batch
19:48
pdurbin
oh, it's a batch
19:48
pdurbin
I mean, it's a nightly cron job. I think nightly.
19:49
pdurbin
If it's a separate container, it would need a shared filesystem with the Dataverse container. Is that what "hold:/hold" is?
19:49
pameyer
yeah
19:50
pdurbin
from the perspective of docker-compose.yml is everything a service? I don't see "batch" anywhere.
19:51
pameyer
docker-dcm doesn't have any batch stuff
19:52
pameyer
the "run this thing with docker exec" is how it works for what would be a cron job
19:52
pameyer
there's probably a more best practices way; but I didn't go looking for it
19:53
pdurbin
ok, maybe I should stick with Vagrant for now
19:56
pdurbin
the devil I know :)
19:57
xarthisius
pdurbin: I don't think plain docker supports cron jobs, so the best practice is to run cron in docker container ;P
19:57
pdurbin
hmm, that's good to know, thanks
19:57
pdurbin
at the moment I'm just trying to install this counter-processor thing on centos 7 and trying to get it to run
19:58
xarthisius
k8s have a concept of a periodic job/task, but i'm not sure
19:58
xarthisius
craig would definitely know
19:59
pdurbin
well, the simpler the better to start. vagrant then docker, then docker-compose, then maybe k8s
20:00
pameyer
well, they've got a requirements file
20:00
pameyer
I'd start with a fresh virtualenv, and go from there if necessary
20:01
pdurbin
Is that what you do with DCM?
20:03
pameyer
well, you can't get redis and lighttpd in a virtualenv
20:03
pameyer
but for the mock dcm, yeah
20:06
pdurbin
nothing about virtualenv at https://github.com/sbgrid/data-capture-module/blob/master/doc/installation.md and I probably won't bother with it either since I'm in Vagrant
20:07
pameyer
doc/mock.md`
20:07
pameyer
but yeah - simpler is usually better
20:11
pdurbin
Python 3 is required for counter-processor and while it looks like Fedora might have python3-virtualenv as an RPM, there doesn't seem to be one for el7.
20:14
pameyer
I've never had problems using the same virtualenv package with python2 and python3
20:15
pdurbin
oh, that's good to know
20:15
pdurbin
Is there a Python 3 rpm from el7 in "base"?
20:17
pdurbin
or any recommended Python 3 RPM for el7?
20:20
pameyer
I'd start with whatever epel gives me
20:22
pdurbin
`yum install python36` works with epel, thanks
20:25
pameyer
np
20:35
pdurbin
Uh oh. `yum install python36-pip` doesn't work.
20:42
pameyer
odd - liks like 34 has it
20:42
pdurbin
https://stackoverflow.com/questions/50408941/recommended-way-to-install-pip3-on-centos7/52518512#52518512 said to use `python3.6 -m ensurepip` and it seemed to work.
20:43
pdurbin
Looks like it created /usr/local/bin/pip3
20:43
pdurbin
Python 3 is not the smoothest ride on el7.
21:31
pdurbin
pameyer: thanks for your help. This is what I have so far: https://github.com/IQSS/dataverse/blob/02c5538e014e1e8fd36678f754d53b19da72d8d4/scripts/vagrant/setup-counter-processor.sh
21:36
pameyer
pdurbin: it may not matter, but it looks like that might mix python package from pip with packages from yum
21:37
pdurbin
"ensurepip"?
21:38
pameyer
mixing packages from two different package managers in the same library directory seems orthogonal to how one of the package managers gets on the system
22:47
pdurbin
Ok, I'll try to remember to pick your brain later about this. At least I got counter-processor to spit out some JSON . :)
22:49
pameyer
that sounds like success
22:50
pdurbin
progress
23:16
pdurbin
heh. Adam Bien called me "Patrick" but at least he's talking about Dataverse: https://youtu.be/kaZwT3IdGZk?t=956