IQSS logo

IRC log for #dataverse, 2019-12-05

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
01:47 xarthisius joined #dataverse
01:47 xarthisius joined #dataverse
02:23 sivoais joined #dataverse
08:29 jri joined #dataverse
08:30 poikilotherm joined #dataverse
09:03 stefankasberger joined #dataverse
09:43 MrK joined #dataverse
10:00 icarito[m] joined #dataverse
10:00 juancorr joined #dataverse
10:00 poikilotherm joined #dataverse
10:00 stefankasberger joined #dataverse
10:00 jri joined #dataverse
10:00 xarthisius joined #dataverse
10:00 bricas joined #dataverse
10:00 andrewSC joined #dataverse
10:00 pmauduit joined #dataverse
10:00 larsks joined #dataverse
10:01 poikilotherm joined #dataverse
10:01 juancorr joined #dataverse
10:01 icarito[m] joined #dataverse
10:01 Youssef_Ouahalou joined #dataverse
10:01 MrK joined #dataverse
10:01 pdurbin joined #dataverse
10:04 bjonnh joined #dataverse
10:04 JonathanNeal joined #dataverse
12:11 MrK joined #dataverse
12:26 poikilotherm Guten Morgen an alle :-)
12:34 pdurbin mornin'
12:50 donsizemore joined #dataverse
12:50 poikilotherm pdurbin: I can report that Google OpenID Connect works flawless
12:51 pdurbin phew
12:53 pdurbin I'm reading through your comments. I'm not particularly interested in a pull request that only removes star imports.
12:54 pdurbin I am wondering about next steps for this OIDC stuff though.
12:54 pdurbin What more is needed for https://github.com/IQSS/dataverse/issues/5974 ?
12:55 pdurbin Is the epic over? :)
12:57 poikilotherm That depends on what we want to achieve
12:57 poikilotherm We do have basic support now
12:57 poikilotherm But no groups
12:58 poikilotherm No custom claims/attributes
12:58 pdurbin oh
12:58 poikilotherm No refactored JSON
12:58 poikilotherm No verified email address support AFAIK
12:59 poikilotherm No good tests
12:59 pdurbin Should we add it to dataverse-ansible so we can play with it when we spin up Dataverse on EC2? I've never seen it working.
12:59 poikilotherm Sure, why not
13:00 poikilotherm I really love that new flexibility
13:00 poikilotherm Add providers with configuration, not code changes
13:00 pdurbin But is there some sort of test IdP we can use? This was the main thing I was trying to get across as I passed the pull request to QA... that we can't test it unless we have an IdP. And ideally there's a free one in the cloud we can use. Like https://samltest.id
13:00 poikilotherm For manual testing we can always use Google or similar
13:01 poikilotherm What I would like to see is automated testing
13:01 poikilotherm The basic support is ok with Google
13:01 poikilotherm But I do have mapping of groups and attributes in mind
13:01 pdurbin Is it possible to use the new OIDC provider with https://samltest.id ? (Or is that crazy talk.)
13:02 poikilotherm Nope.
13:03 poikilotherm samltest.id is SAML only
13:04 poikilotherm For automated testing we could just use a small docker based container
13:04 poikilotherm Like keycloak
13:05 pdurbin ok, and what about for demos?
13:06 poikilotherm People could either use Google or the OIDC playground
13:06 poikilotherm https://openidconnect.net/
13:08 pdurbin Interesting. Maybe you should put https://openidconnect.net in the dev guide under a future version of http://guides.dataverse.org/en/4.18.1/developers/remote-users.html
13:09 pdurbin If https://demo.dataverse.org were to be powered by dataverse-kubernetes some day which login options would appear for people to try out?
13:11 poikilotherm Wouldn't that be up to you, depending on what you want to offer?
13:11 poikilotherm It's just a matter of configuration, right?
13:13 pdurbin I guess. But I don't know what's supported, what's automated. Let's say we spin in up fresh once a week (or once a month). Can we automate the setup of any many auth providers as possible?
13:13 poikilotherm Sure.
13:14 poikilotherm Dataverse on K8s is still dataverse
13:14 poikilotherm I don't have a job yet to load the provider JSON files into Dataverse yet
13:14 poikilotherm But that will happen anyway :-D
13:15 pdurbin nice
13:15 poikilotherm It would be really cool to have a proper configuration option for all of this
13:16 poikilotherm Like store your configuration for the provider somewhere, but retrieve the client credentials from somewhere else because secrets...
13:16 pdurbin Oh, I should mention that https://demo.dataverse.org is already a blessed Research & Scholarship Service Provider by InCommon, which should make automation easier. There is no need to exchange metadata as long as we use the same keys.
13:17 poikilotherm Great
13:17 poikilotherm For a future demo service, we could think about providing a demo IDM connected to different providers
13:17 pdurbin You can see it here: https://incommon.org/custom/federation/info/entity.html?entityID=https%3A%2F%2Fdemo.dataverse.org%2Fsp&technical=true
13:17 poikilotherm So we have a better showcase what we can do...
13:17 pdurbin Yes! Exactly! A better showcase over all.
13:18 pdurbin I forget if dataverse-kubernetes supports https://github.com/IQSS/dataverse-sample-data or not.
13:18 pdurbin not yet: https://github.com/IQSS/dataverse-kubernetes/issues/66
13:18 poikilotherm Nope, it does not yet, because I need an API key...
13:19 poikilotherm Or endpoint
13:20 pdurbin Can you use https://github.com/IQSS/dataverse-sample-data/blob/ca7eca8d93da42ca1735551001684b34cc9a6b6b/get_api_token.py ?
13:21 poikilotherm Err... Don't you have to use an API key for this already?
13:21 pdurbin nope
13:21 poikilotherm https://github.com/IQSS/dataverse-sample-data/blob/ca7eca8d93da42ca1735551001684b34cc9a6b6b/get_api_token.py#L5-L6
13:22 poikilotherm Ok so you just pass in an empty token?
13:22 pdurbin whoops, those lines can be deleted. you pass in a password (admin1)
13:23 poikilotherm I could for sure use that.
13:23 poikilotherm Would you do a refactoring?
13:24 poikilotherm It would be awesome not to have user and password hardcoded, but either use an env var and/or parameter
13:24 pdurbin oh, in that silly script? sure
13:25 poikilotherm Didn't I create a different version of the config or sth like that with such things?
13:25 pdurbin Please be advised that you have to enable :AllowApiTokenLookupViaApi like donsizemore and I did at https://github.com/IQSS/dataverse-ansible/pull/82
13:25 poikilotherm Someting back in my head...
13:25 poikilotherm Ah that undocumented and bad setting?
13:25 poikilotherm Didn't we two talk about that a while ago?
13:26 pdurbin I think it's documented. Yes, it is bad. :)
13:26 poikilotherm Here you go with that thing bakc in my head https://github.com/poikilotherm/dataverse-sample-data/blob/dockerize/dvconfig.py.sample
13:26 pdurbin docs here: http://guides.dataverse.org/en/4.18.1/installation/config.html#allowapitokenlookupviaapi
13:26 poikilotherm I could continue where I left of...
13:27 pdurbin Well, what's in focus right now? :)
13:28 poikilotherm Actually I was going to create docker images for 4.17, 4.18 and 4.18.1
13:28 pdurbin That sounds like a much higher priority. :)
13:28 poikilotherm So I could as well make a little stop and get that thing going
13:28 pdurbin Provides much more value.
13:28 poikilotherm Looks pretty easy
13:29 pdurbin gotta love that low hanging fruit
13:29 poikilotherm Would be a cool addition for a release of the docker images... ;-)
13:29 pdurbin Yeah. I should go look again at your milestones. I remember some cool stuff coming.
13:29 poikilotherm As I have no access to dockerhub iqss org: maybe you could create a repo for the docker image?
13:29 poikilotherm (And give me & dataversebot access to it?)
13:30 pdurbin I'm pretty sure you have full access.
13:30 poikilotherm I do? On Docker Hub?
13:30 poikilotherm Let me check
13:30 pdurbin If I'm wrong I can fix it.
13:30 pdurbin No one at IQSS pushes to it. :)
13:31 poikilotherm :-D
13:31 poikilotherm No I have no admin access to the Docker org
13:31 poikilotherm Just to my two repos
13:32 pdurbin Ah, you're right. You're a member. And someone who isn't at IQSS anymore is an owner.
13:33 poikilotherm :-D
13:33 pdurbin dataversebot is a member
13:34 pdurbin Ok, now you're an owner.
13:34 pdurbin And now I feel more comfortable removing the former IQSSer so that I'm not the only owner. :)
13:35 poikilotherm LOL
13:35 poikilotherm I'm honoured
13:35 poikilotherm -u
13:35 pdurbin please make it awesome
13:35 poikilotherm Yes Sir!
13:36 pdurbin You've already taken us a long, long way.
13:36 poikilotherm Oh while I look at the org page: what shall we do with that drunken ~~sailor~~ dead images?
13:37 pdurbin Are you talking about dataverse-glassfish and dataverse-solr? The old stuff?
13:38 poikilotherm Aye
13:38 pdurbin Hmm, do we mention them in the guides?
13:38 poikilotherm https://i.imgur.com/zyaqxap.png
13:38 pdurbin Yeah, they are mentioned all over http://guides.dataverse.org/en/4.18.1/developers/containers.html
13:39 poikilotherm Yeah that's what I have been looking at...
13:39 pdurbin So let's rewrite that "containers" page first (probably from scratch). And *then* delete that old cruft.
13:39 poikilotherm Right
13:39 poikilotherm Ok back to samples.
13:40 poikilotherm Any preferences about the image name?
13:40 pdurbin nope
13:40 poikilotherm iqss/sample-data-loader?
13:40 poikilotherm iqss/sample-loader?
13:41 poikilotherm iqss/deploy-sample-data?
13:41 pdurbin iqss/dataverse-sample-data-loader?
13:41 pdurbin because there's more at IQSS than just Dataverse :)
13:41 poikilotherm Ok then let's stick with the repo name
13:41 poikilotherm iqss/dataverse-sample-data
13:41 poikilotherm That should be fine
13:42 poikilotherm Sounds good?
13:44 pdurbin ship it!
13:50 poikilotherm :-)
15:02 poikilotherm Almost there...
15:03 poikilotherm The uploading and publishing is pretty slow, isn't it?
15:03 pdurbin yeah, and it's only going to get slower as we add more data
15:03 pdurbin more diverse data hopefully
15:04 pdurbin from a variety of scientific fields
15:04 poikilotherm Guesses why it is that slow?
15:04 poikilotherm Uploading the real data is obviously limited to processing, but creating the datasets is slow, too
15:05 poikilotherm File ingest seems to be slow... Many many locks waiting
15:06 pdurbin Where are you running Dataverse? Within minikube on your laptop?
15:06 poikilotherm Aye
15:07 pdurbin Is it IO bound? CPU bound? Memory bound?
15:07 poikilotherm I'm not sure. I disbelieve IO, this is a fast SSD
15:07 poikilotherm The cluster has 4 GB of RAM, should be OK too
15:08 poikilotherm So most likely CPU... I gave the VM 2 CPUs and my laptop has only 2 CPUs
15:08 poikilotherm I dunno if ingest is using parallel processing at all
15:08 pdurbin Maybe some day we should work on https://github.com/IQSS/dataverse/issues/4201 :)
15:10 poikilotherm https://github.com/IQSS/dataverse-sample-data/pull/14#discussion_r354368459
15:11 poikilotherm Should we ping donsizemore on this and ask for his opinion?
15:12 pdurbin Well, I already created a read only team and added him to it. Then I clicked "request review" from him.
15:12 poikilotherm Making it the way I use it now results in me calling "API_TOKEN=`python get_api_token.py` python create_sample_data.py`
15:13 poikilotherm Great
15:13 pdurbin yeah, that's cleaner than what I was doing which was to manually update the config file
15:15 poikilotherm Jesus, that dataset with the "DE1_0_2008_Beneficiary_Summary_File_Sample_1.csv" files is taking AGES
15:16 poikilotherm Ah, its https://github.com/IQSS/dataverse-sample-data/tree/master/data/dataverses/cms/datasets/cmssampledata
15:16 poikilotherm Seems to be huge...
15:18 donsizemore @poikilotherm whut i do?
15:18 donsizemore (we're bring matthew into the fold today!)
15:20 poikilotherm donsizemore: pdurbin is complaining that I break dataverse ansible sample data loading...
15:20 poikilotherm See https://github.com/IQSS/dataverse-sample-data/pull/14
15:20 pdurbin poikilotherm: I said "Do not include files larger than 10 MB" at https://github.com/IQSS/dataverse-sample-data/blob/ca7eca8d93da42ca1735551001684b34cc9a6b6b/CONTRIBUTING.md :)
15:21 poikilotherm pdurbin: OK. Shouldn't we remove that one from the default config?
15:21 poikilotherm or at least move them into a separate field?
15:21 poikilotherm Whatever
15:22 poikilotherm Meh. Error 500
15:23 pdurbin poikilotherm: well, each file is under 10 MB, right? So that sample dataset is not in violation of the rules.
15:23 poikilotherm OK
15:23 poikilotherm Looks like I should enable FAKE provider by default for K8s
15:24 poikilotherm Registration is failing... ;-)
15:24 poikilotherm That lead to a Error 500
15:24 pdurbin poikilotherm: on a related note, you might like the graphs donsizemore sent around yesterday about the amount of time it takes to ingest various sizes of files (more and more observations or columns)
15:25 poikilotherm I missed those... dataverse-dev mailinglist?
15:25 pdurbin it was non-SLOPI
15:28 poikilotherm pdurbin: CMS used ZIP files...
15:28 pdurbin a loophole? :)
15:28 poikilotherm The unpacked data flowing through ingest is 10 files of ~15 MB
15:29 poikilotherm https://i.imgur.com/0cGOIVe.png
15:37 pdurbin meh, I think I'll allow it
15:37 donsizemore do we want a dataverse-ansible flag to raise the default upload filesize limit?
15:37 pdurbin donsizemore: couldn't hurt. What's the default?
15:39 MrK joined #dataverse
15:42 donsizemore oh, i had assumed 10MB
15:42 donsizemore in any case, we can make a configurable flag pretty easily
15:42 pdurbin meh, let's wait until it's a problem :)
15:44 poikilotherm pdurbin: you might be tempted to take a look at the new commit...
15:46 poikilotherm What would you like me to do about docs?
15:46 pdurbin Looking. And I don't think get_api_token.py has any docs. :)
15:47 poikilotherm Yeah. That's why I'm asking
15:47 poikilotherm Just add stuff to README?
15:47 pdurbin sure, if you feel like it
15:48 poikilotherm Feel like what? Letting down contribution quality? No way ;-)
15:48 pdurbin :)
15:48 pdurbin donsizemore: did you see the bit about the regex? poikilotherm is breaking our toys. :)
15:48 * poikilotherm giggles
15:49 poikilotherm /me feels like https://upload.wikimedia.org/wikipedia/commons/7/77/Rumplestiltskin_-_Anne_Anderson.jpg
15:56 pdurbin poikilotherm: do you want me to talk about your pull request at standup? It's in 20 minutes. Or should we move it to "community dev" and let you and donsizemore think about it more. And me.
15:56 poikilotherm Feel free to do as you please
15:57 poikilotherm Finishing touches to docs
15:57 poikilotherm THis still need the Jenkinsfile pipeline
15:57 poikilotherm And job
15:58 MrK joined #dataverse
15:59 poikilotherm https://github.com/poikilotherm/dataverse-sample-data/tree/13-dockerize#usage-in-automated-processes-without-api-key
15:59 poikilotherm Feel free to leave a comment on the PR ;-)
15:59 poikilotherm I'm outta here now...
16:00 poikilotherm My three steel uprights are arriving in ~20 minutes :-D
16:00 poikilotherm Read you tomorrow
16:01 pdurbin donsizemore: lemme know when you're ready for me to bring you up to speed on the regex thing. No rush. :)
17:21 donsizemore @pdurbin we're about to take matthew to lunch, and i see the links above, but... what broke?
17:31 pdurbin donsizemore: nothing broke. Something will break after we merge something. Some day. Please tell Matthew I said what's up. And enjoy lunch!

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.