Time
S
Nick
Message
01:47
xarthisius joined #dataverse
01:47
xarthisius joined #dataverse
02:23
sivoais joined #dataverse
08:29
jri joined #dataverse
08:30
poikilotherm joined #dataverse
09:03
stefankasberger joined #dataverse
09:43
MrK joined #dataverse
10:00
icarito[m] joined #dataverse
10:00
juancorr joined #dataverse
10:00
poikilotherm joined #dataverse
10:00
stefankasberger joined #dataverse
10:00
jri joined #dataverse
10:00
xarthisius joined #dataverse
10:00
bricas joined #dataverse
10:00
andrewSC joined #dataverse
10:00
pmauduit joined #dataverse
10:00
larsks joined #dataverse
10:01
poikilotherm joined #dataverse
10:01
juancorr joined #dataverse
10:01
icarito[m] joined #dataverse
10:01
Youssef_Ouahalou joined #dataverse
10:01
MrK joined #dataverse
10:01
pdurbin joined #dataverse
10:04
bjonnh joined #dataverse
10:04
JonathanNeal joined #dataverse
12:11
MrK joined #dataverse
12:26
poikilotherm
Guten Morgen an alle :-)
12:34
pdurbin
mornin'
12:50
donsizemore joined #dataverse
12:50
poikilotherm
pdurbin: I can report that Google OpenID Connect works flawless
12:51
pdurbin
phew
12:53
pdurbin
I'm reading through your comments. I'm not particularly interested in a pull request that only removes star imports.
12:54
pdurbin
I am wondering about next steps for this OIDC stuff though.
12:54
pdurbin
What more is needed for https://github.com/IQSS/dataverse/issues/5974 ?
12:55
pdurbin
Is the epic over? :)
12:57
poikilotherm
That depends on what we want to achieve
12:57
poikilotherm
We do have basic support now
12:57
poikilotherm
But no groups
12:58
poikilotherm
No custom claims/attributes
12:58
pdurbin
oh
12:58
poikilotherm
No refactored JSON
12:58
poikilotherm
No verified email address support AFAIK
12:59
poikilotherm
No good tests
12:59
pdurbin
Should we add it to dataverse-ansible so we can play with it when we spin up Dataverse on EC2? I've never seen it working.
12:59
poikilotherm
Sure, why not
13:00
poikilotherm
I really love that new flexibility
13:00
poikilotherm
Add providers with configuration, not code changes
13:00
pdurbin
But is there some sort of test IdP we can use? This was the main thing I was trying to get across as I passed the pull request to QA... that we can't test it unless we have an IdP. And ideally there's a free one in the cloud we can use. Like https://samltest.id
13:00
poikilotherm
For manual testing we can always use Google or similar
13:01
poikilotherm
What I would like to see is automated testing
13:01
poikilotherm
The basic support is ok with Google
13:01
poikilotherm
But I do have mapping of groups and attributes in mind
13:01
pdurbin
Is it possible to use the new OIDC provider with https://samltest.id ? (Or is that crazy talk.)
13:02
poikilotherm
Nope.
13:03
poikilotherm
samltest.id is SAML only
13:04
poikilotherm
For automated testing we could just use a small docker based container
13:04
poikilotherm
Like keycloak
13:05
pdurbin
ok, and what about for demos?
13:06
poikilotherm
People could either use Google or the OIDC playground
13:06
poikilotherm
https://openidconnect.net/
13:08
pdurbin
Interesting. Maybe you should put https://openidconnect.net in the dev guide under a future version of http://guides.dataverse.org/en/4.18.1/developers/remote-users.html
13:09
pdurbin
If https://demo.dataverse.org were to be powered by dataverse-kubernetes some day which login options would appear for people to try out?
13:11
poikilotherm
Wouldn't that be up to you, depending on what you want to offer?
13:11
poikilotherm
It's just a matter of configuration, right?
13:13
pdurbin
I guess. But I don't know what's supported, what's automated. Let's say we spin in up fresh once a week (or once a month). Can we automate the setup of any many auth providers as possible?
13:13
poikilotherm
Sure.
13:14
poikilotherm
Dataverse on K8s is still dataverse
13:14
poikilotherm
I don't have a job yet to load the provider JSON files into Dataverse yet
13:14
poikilotherm
But that will happen anyway :-D
13:15
pdurbin
nice
13:15
poikilotherm
It would be really cool to have a proper configuration option for all of this
13:16
poikilotherm
Like store your configuration for the provider somewhere, but retrieve the client credentials from somewhere else because secrets...
13:16
pdurbin
Oh, I should mention that https://demo.dataverse.org is already a blessed Research & Scholarship Service Provider by InCommon, which should make automation easier. There is no need to exchange metadata as long as we use the same keys.
13:17
poikilotherm
Great
13:17
poikilotherm
For a future demo service, we could think about providing a demo IDM connected to different providers
13:17
pdurbin
You can see it here: https://incommon.org/custom/federation/info/entity.html?entityID=https%3A%2F%2Fdemo.dataverse.org%2Fsp&technical=true
13:17
poikilotherm
So we have a better showcase what we can do...
13:17
pdurbin
Yes! Exactly! A better showcase over all.
13:18
pdurbin
I forget if dataverse-kubernetes supports https://github.com/IQSS/dataverse-sample-data or not.
13:18
pdurbin
not yet: https://github.com/IQSS/dataverse-kubernetes/issues/66
13:18
poikilotherm
Nope, it does not yet, because I need an API key...
13:19
poikilotherm
Or endpoint
13:20
pdurbin
Can you use https://github.com/IQSS/dataverse-sample-data/blob/ca7eca8d93da42ca1735551001684b34cc9a6b6b/get_api_token.py ?
13:21
poikilotherm
Err... Don't you have to use an API key for this already?
13:21
pdurbin
nope
13:21
poikilotherm
https://github.com/IQSS/dataverse-sample-data/blob/ca7eca8d93da42ca1735551001684b34cc9a6b6b/get_api_token.py#L5-L6
13:22
poikilotherm
Ok so you just pass in an empty token?
13:22
pdurbin
whoops, those lines can be deleted. you pass in a password (admin1)
13:23
poikilotherm
I could for sure use that.
13:23
poikilotherm
Would you do a refactoring?
13:24
poikilotherm
It would be awesome not to have user and password hardcoded, but either use an env var and/or parameter
13:24
pdurbin
oh, in that silly script? sure
13:25
poikilotherm
Didn't I create a different version of the config or sth like that with such things?
13:25
pdurbin
Please be advised that you have to enable :AllowApiTokenLookupViaApi like donsizemore and I did at https://github.com/IQSS/dataverse-ansible/pull/82
13:25
poikilotherm
Someting back in my head...
13:25
poikilotherm
Ah that undocumented and bad setting?
13:25
poikilotherm
Didn't we two talk about that a while ago?
13:26
pdurbin
I think it's documented. Yes, it is bad. :)
13:26
poikilotherm
Here you go with that thing bakc in my head https://github.com/poikilotherm/dataverse-sample-data/blob/dockerize/dvconfig.py.sample
13:26
pdurbin
docs here: http://guides.dataverse.org/en/4.18.1/installation/config.html#allowapitokenlookupviaapi
13:26
poikilotherm
I could continue where I left of...
13:27
pdurbin
Well, what's in focus right now? :)
13:28
poikilotherm
Actually I was going to create docker images for 4.17, 4.18 and 4.18.1
13:28
pdurbin
That sounds like a much higher priority. :)
13:28
poikilotherm
So I could as well make a little stop and get that thing going
13:28
pdurbin
Provides much more value.
13:28
poikilotherm
Looks pretty easy
13:29
pdurbin
gotta love that low hanging fruit
13:29
poikilotherm
Would be a cool addition for a release of the docker images... ;-)
13:29
pdurbin
Yeah. I should go look again at your milestones. I remember some cool stuff coming.
13:29
poikilotherm
As I have no access to dockerhub iqss org: maybe you could create a repo for the docker image?
13:29
poikilotherm
(And give me & dataversebot access to it?)
13:30
pdurbin
I'm pretty sure you have full access.
13:30
poikilotherm
I do? On Docker Hub?
13:30
poikilotherm
Let me check
13:30
pdurbin
If I'm wrong I can fix it.
13:30
pdurbin
No one at IQSS pushes to it. :)
13:31
poikilotherm
:-D
13:31
poikilotherm
No I have no admin access to the Docker org
13:31
poikilotherm
Just to my two repos
13:32
pdurbin
Ah, you're right. You're a member. And someone who isn't at IQSS anymore is an owner.
13:33
poikilotherm
:-D
13:33
pdurbin
dataversebot is a member
13:34
pdurbin
Ok, now you're an owner.
13:34
pdurbin
And now I feel more comfortable removing the former IQSSer so that I'm not the only owner. :)
13:35
poikilotherm
LOL
13:35
poikilotherm
I'm honoured
13:35
poikilotherm
-u
13:35
pdurbin
please make it awesome
13:35
poikilotherm
Yes Sir!
13:36
pdurbin
You've already taken us a long, long way.
13:36
poikilotherm
Oh while I look at the org page: what shall we do with that drunken ~~sailor~~ dead images?
13:37
pdurbin
Are you talking about dataverse-glassfish and dataverse-solr? The old stuff?
13:38
poikilotherm
Aye
13:38
pdurbin
Hmm, do we mention them in the guides?
13:38
poikilotherm
https://i.imgur.com/zyaqxap.png
13:38
pdurbin
Yeah, they are mentioned all over http://guides.dataverse.org/en/4.18.1/developers/containers.html
13:39
poikilotherm
Yeah that's what I have been looking at...
13:39
pdurbin
So let's rewrite that "containers" page first (probably from scratch). And *then* delete that old cruft.
13:39
poikilotherm
Right
13:39
poikilotherm
Ok back to samples.
13:40
poikilotherm
Any preferences about the image name?
13:40
pdurbin
nope
13:40
poikilotherm
iqss/sample-data-loader?
13:40
poikilotherm
iqss/sample-loader?
13:41
poikilotherm
iqss/deploy-sample-data?
13:41
pdurbin
iqss/dataverse-sample-data-loader?
13:41
pdurbin
because there's more at IQSS than just Dataverse :)
13:41
poikilotherm
Ok then let's stick with the repo name
13:41
poikilotherm
iqss/dataverse-sample-data
13:41
poikilotherm
That should be fine
13:42
poikilotherm
Sounds good?
13:44
pdurbin
ship it!
13:50
poikilotherm
:-)
15:02
poikilotherm
Almost there...
15:03
poikilotherm
The uploading and publishing is pretty slow, isn't it?
15:03
pdurbin
yeah, and it's only going to get slower as we add more data
15:03
pdurbin
more diverse data hopefully
15:04
pdurbin
from a variety of scientific fields
15:04
poikilotherm
Guesses why it is that slow?
15:04
poikilotherm
Uploading the real data is obviously limited to processing, but creating the datasets is slow, too
15:05
poikilotherm
File ingest seems to be slow... Many many locks waiting
15:06
pdurbin
Where are you running Dataverse? Within minikube on your laptop?
15:06
poikilotherm
Aye
15:07
pdurbin
Is it IO bound? CPU bound? Memory bound?
15:07
poikilotherm
I'm not sure. I disbelieve IO , this is a fast SSD
15:07
poikilotherm
The cluster has 4 GB of RAM , should be OK too
15:08
poikilotherm
So most likely CPU ... I gave the VM 2 CPUs and my laptop has only 2 CPUs
15:08
poikilotherm
I dunno if ingest is using parallel processing at all
15:08
pdurbin
Maybe some day we should work on https://github.com/IQSS/dataverse/issues/4201 :)
15:10
poikilotherm
https://github.com/IQSS/dataverse-sample-data/pull/14#discussion_r354368459
15:11
poikilotherm
Should we ping donsizemore on this and ask for his opinion?
15:12
pdurbin
Well, I already created a read only team and added him to it. Then I clicked "request review" from him.
15:12
poikilotherm
Making it the way I use it now results in me calling "API_TOKEN=`python get_api_token.py` python create_sample_data.py`
15:13
poikilotherm
Great
15:13
pdurbin
yeah, that's cleaner than what I was doing which was to manually update the config file
15:15
poikilotherm
Jesus, that dataset with the "DE1_0_2008_Beneficiary_Summary_File_Sample_1.csv" files is taking AGES
15:16
poikilotherm
Ah, its https://github.com/IQSS/dataverse-sample-data/tree/master/data/dataverses/cms/datasets/cmssampledata
15:16
poikilotherm
Seems to be huge...
15:18
donsizemore
@poikilotherm whut i do?
15:18
donsizemore
(we're bring matthew into the fold today!)
15:20
poikilotherm
donsizemore: pdurbin is complaining that I break dataverse ansible sample data loading...
15:20
poikilotherm
See https://github.com/IQSS/dataverse-sample-data/pull/14
15:20
pdurbin
poikilotherm: I said "Do not include files larger than 10 MB " at https://github.com/IQSS/dataverse-sample-data/blob/ca7eca8d93da42ca1735551001684b34cc9a6b6b/CONTRIBUTING.md :)
15:21
poikilotherm
pdurbin: OK. Shouldn't we remove that one from the default config?
15:21
poikilotherm
or at least move them into a separate field?
15:21
poikilotherm
Whatever
15:22
poikilotherm
Meh. Error 500
15:23
pdurbin
poikilotherm: well, each file is under 10 MB , right? So that sample dataset is not in violation of the rules.
15:23
poikilotherm
OK
15:23
poikilotherm
Looks like I should enable FAKE provider by default for K8s
15:24
poikilotherm
Registration is failing... ;-)
15:24
poikilotherm
That lead to a Error 500
15:24
pdurbin
poikilotherm: on a related note, you might like the graphs donsizemore sent around yesterday about the amount of time it takes to ingest various sizes of files (more and more observations or columns)
15:25
poikilotherm
I missed those... dataverse-dev mailinglist?
15:25
pdurbin
it was non-SLOPI
15:28
poikilotherm
pdurbin: CMS used ZIP files...
15:28
pdurbin
a loophole? :)
15:28
poikilotherm
The unpacked data flowing through ingest is 10 files of ~15 MB
15:29
poikilotherm
https://i.imgur.com/0cGOIVe.png
15:37
pdurbin
meh, I think I'll allow it
15:37
donsizemore
do we want a dataverse-ansible flag to raise the default upload filesize limit?
15:37
pdurbin
donsizemore: couldn't hurt. What's the default?
15:39
MrK joined #dataverse
15:42
donsizemore
oh, i had assumed 10MB
15:42
donsizemore
in any case, we can make a configurable flag pretty easily
15:42
pdurbin
meh, let's wait until it's a problem :)
15:44
poikilotherm
pdurbin: you might be tempted to take a look at the new commit...
15:46
poikilotherm
What would you like me to do about docs?
15:46
pdurbin
Looking. And I don't think get_api_token.py has any docs. :)
15:47
poikilotherm
Yeah. That's why I'm asking
15:47
poikilotherm
Just add stuff to README?
15:47
pdurbin
sure, if you feel like it
15:48
poikilotherm
Feel like what? Letting down contribution quality? No way ;-)
15:48
pdurbin
:)
15:48
pdurbin
donsizemore: did you see the bit about the regex? poikilotherm is breaking our toys. :)
15:48
* poikilotherm
giggles
15:49
poikilotherm
/me feels like https://upload.wikimedia.org/wikipedia/commons/7/77/Rumplestiltskin_-_Anne_Anderson.jpg
15:56
pdurbin
poikilotherm: do you want me to talk about your pull request at standup? It's in 20 minutes. Or should we move it to "community dev" and let you and donsizemore think about it more. And me.
15:56
poikilotherm
Feel free to do as you please
15:57
poikilotherm
Finishing touches to docs
15:57
poikilotherm
THis still need the Jenkinsfile pipeline
15:57
poikilotherm
And job
15:58
MrK joined #dataverse
15:59
poikilotherm
https://github.com/poikilotherm/dataverse-sample-data/tree/13-dockerize#usage-in-automated-processes-without-api-key
15:59
poikilotherm
Feel free to leave a comment on the PR ;-)
15:59
poikilotherm
I'm outta here now...
16:00
poikilotherm
My three steel uprights are arriving in ~20 minutes :-D
16:00
poikilotherm
Read you tomorrow
16:01
pdurbin
donsizemore: lemme know when you're ready for me to bring you up to speed on the regex thing. No rush. :)
17:21
donsizemore
@pdurbin we're about to take matthew to lunch, and i see the links above, but... what broke?
17:31
pdurbin
donsizemore: nothing broke. Something will break after we merge something. Some day. Please tell Matthew I said what's up. And enjoy lunch!