IQSS logo

IRC log for #dataverse, 2019-05-23

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
03:33 jri joined #dataverse
06:33 jri joined #dataverse
06:57 jri joined #dataverse
07:00 juancorr joined #dataverse
07:08 poikilotherm joined #dataverse
07:36 jri_ joined #dataverse
09:34 skasberger joined #dataverse
09:40 stefankasberger joined #dataverse
10:03 stefankasberger I dont know. I use now since yesterday xchat, an irc client for ubuntu. works for me.
10:27 pdurbin I think I've heard good things about xchat.
10:28 poikilotherm Using Konversation on KDE here :-)
10:29 pdurbin weechat because irssi doesn't expand the little bar to type in when you type more than one line :)
10:31 pdurbin "Now Mozilla had (and still has) two communication platforms. IRC for public community facing discussions, Slack for private, NDA’ed only discussions." https://ahal.ca/blog/2019/fragmented-communication/
12:16 jri joined #dataverse
12:20 donsizemore joined #dataverse
12:31 donsizemore @pdurbin i saw a bunch of jenkins.dataverse mentions from your testing meeting. any of those i can peck off first, and/or want to draw up a battle plan?
12:32 pdurbin donsizemore: hi! I posted some thoughts on automated testing yesterday at https://groups.google.com/d/msg/dataverse-dev/ISot5k4VjZQ/t-hzPk8tAwAJ and I'd love to hear what you think. :)
12:33 donsizemore @pdurbin yes yes that's what i saw. looks like reaching parity with phoenix is a good start
12:38 pdurbin Yes! Exactly. I know I'm being a little picky about having the nice charts. I'm sorry.
12:41 pdurbin How can I help?
12:41 donsizemore um, i'm going to run the api test suite on gdcc-jenkins (but as me) and see if i still break in the same spot
12:42 pdurbin ok, fingers crossed
12:47 donsizemore O_o unzip:  cannot find or open dvinstall.zip, dvinstall.zip.zip or dvinstall.zip.ZIP.
12:48 poikilotherm Hey donsizemore :-) What errors did you recognize yesterday? Was it again a crash-loop?
12:49 poikilotherm (When you had 8GB of mem in minikube)
12:49 donsizemore @poikilotherm give me just a minute and i'll go back and look? i posted my reply from my phone this morning
12:49 poikilotherm Thx!
12:50 donsizemore @pdurbin warfile compilation failed due to test failures
12:50 donsizemore SiteMapUtilTest.testUpdateSiteMap:91 expected null, but was:<java.io.FileNotFoundException: /tmp/sitemap.xml (No such file or directory)>
12:50 donsizemore SchemaDotOrgExporterTest.testExportDataset:347 » FileNotFound /tmp/dvjsonld.js...
12:51 pdurbin donsizemore: let me give it a try on dev2. Thanks for the heads up.
12:51 donsizemore @poikilotherm oh, i tore it all down. i'll re-run now
12:51 poikilotherm *thumb up*
12:52 donsizemore @poikilotherm now solr is in a crashloop backoff. it did this a time or two yesterday
12:53 poikilotherm WTF?
12:53 pdurbin donsizemore: I'm doing a `mvn package` on my dev2 server but at least Travis is happy with the latest commit on develop: https://travis-ci.org/IQSS/dataverse/builds/535968240
12:53 poikilotherm Did you do a minikube delete?
12:53 donsizemore @poikilotherm i can break anything
12:53 poikilotherm Otherwise there might be some stale stuff hanging around
12:53 donsizemore yes i deleted it a time or two
12:53 poikilotherm I had a problem today about the pod running as root which is absolute nonsense...
12:54 pdurbin donsizemore: BUILD SUCCESS on dev2. And Travis. ^^ I'm on this commit on dev2: f699a85 Merge pull request #5876 from IQSS/5862-add-postquestion-if-not-exists
12:54 poikilotherm Does the solr log tell you you try to run as root?
12:54 poikilotherm kubectl logs solr-...
12:54 donsizemore @poikilotherm with the virtualbox driver i ran it as unprivileged me, as kvm2 i had to run it as root
12:55 poikilotherm Yeah, but K8s should not run as root inside the VM
12:55 donsizemore oh, i remember this: /opt/solr/bin/solr: line 1452: [: unlimited: integer expression expected
12:55 poikilotherm And it should run containers as users specified in the Dockerfile
12:55 poikilotherm Yeah, I remember that too
12:55 donsizemore i may just SSH to the minikube VM and remove iqss docker images
12:56 poikilotherm Those haven't changed ;-)
12:56 poikilotherm When I got this today, I deleted the cluster (minikube delete) and redeployed the hole thin
12:56 poikilotherm +g
12:56 poikilotherm +w
12:56 pdurbin Are there tests for dataverse-kubernetes that we can add to https://jenkins.dataverse.org ? If so I'd be happy to create an issue for this at https://github.com/IQSS/dataverse-jenkins/issues :)
12:57 poikilotherm @pdurbin I would love to see tests, but we will need to have plenty of resources and a matrix...
12:57 poikilotherm Dataverse eats a lot of RAM
12:57 poikilotherm Plus the virtualization
12:58 donsizemore @poikilotherm we gave our Dataverse 3 VM (RHEL 6) 256GB of RAM
12:58 poikilotherm WOW
12:58 pdurbin what if for now we just run this `./test/lint.sh` thing?
12:58 donsizemore @poikilotherm our current production VM only has 64GB because that's the maximum ESXi 6.0 allows for Fault Tolerant VMs
12:58 poikilotherm Then you are definitly over the 64GB peak point of 32bit pointers in the VM :-D
12:59 donsizemore @poikilotherm a 'docker rmi' in the minikube VM got solr back in gear
12:59 poikilotherm @pdurbin: I already run those on travis.
12:59 donsizemore @poikilotherm in my experience it's harvesting that eats the memory
12:59 poikilotherm I tried with Jenkins...
12:59 poikilotherm It was a burden :-(
12:59 poikilotherm Good to know...
12:59 poikilotherm We are going to do a lot of harvesting here
13:00 poikilotherm We'll see how this is going to work out...
13:00 pdurbin poikilotherm: right but my idea (which I probably haven't expressed yet) is to gather all the testing config stuff into https://github.com/IQSS/dataverse-jenkins ... if that makes sense.
13:00 rigelk hi there - wondering what in the harvesting process takes so much memory donsizemore
13:01 donsizemore @rigelk i haven't dug into it, but harvesting runs are when the glassfish memory footprint balloons
13:02 donsizemore @rigelk it may just be cache, but if you run DV in under 24GB or so of RAM things will fall over
13:02 poikilotherm Hey pdurbin you remember our discussion about scaling...? ;-)
13:03 donsizemore @poikilotherm k8s just created all the DB tables, now i'm waiting for glassfish to say something
13:03 poikilotherm Yeah that will take while...
13:03 poikilotherm Remember you need to bootstrap
13:04 donsizemore that's next =)
13:04 poikilotherm (I failed at that point today - for whatever reason things broke when unchanged).
13:04 rigelk thanks for thhe insight donsizemore
13:04 poikilotherm Might be related to Minikube update from 0.35 to 1.1.0 und Kubernetes from 1.13 to 1.14
13:05 donsizemore @rigelk it's not much insight, just observation over time. 4 requires much less RAM than 3 over time
13:05 rigelk I'm still curious as to why that happens, because 24GB is quite a requirement if that's what is needed to bootstrap
13:06 poikilotherm No no no
13:06 poikilotherm Dataverse needs about 1 GB when empty
13:06 poikilotherm That 24G was about later, when you do harvesting etc
13:07 rigelk sure, but isn't harvesting something most instances do anyway ?
13:07 poikilotherm Dunno
13:07 poikilotherm Maybe pdurbin has some insights into this?
13:10 donsizemore @rigelk harvesting must be enabled, and performing as a harvesting server is where you'll see the increase
13:10 pdurbin Harvesting is a favorite feature according to https://groups.google.com/d/msg/dataverse-community/cy6Jc0oZ-wM/1fkwgfaaAgAJ
13:11 pdurbin And here's where Dataverse installations tell each other the sets they've made available for harvesting: https://docs.google.com/spreadsheets/d/12cxymvXCqP_kCsLKXQD32go79HBWZ1vU_tdG4kvP5S8/edit?usp=sharing
13:12 donsizemore @poikilotherm bootstrapping!
13:12 poikilotherm Success?
13:12 poikilotherm Or failure?
13:13 poikilotherm pdurbin about testing: we might have a chance with minikube when using --vm-driver=none
13:13 pdurbin poikilotherm: yes, and I remember all the scaling meetings and pull requests from the Red Hat interns: https://groups.google.com/d/msg/dataverse-community/TSxf4MTYYjg/7VJB_-GJBAAJ :)
13:14 poikilotherm Or we need KVM2 or similar on Jenkins box :-D
13:14 donsizemore error. 4, in fact.
13:14 poikilotherm @donsizemore: great! Lemme see :-D
13:14 poikilotherm (Pastebin?)
13:14 pdurbin poikilotherm: no rush on the testing, I'm just trying to over communicate my future plans, as usual :)
13:14 donsizemore psql: FATAL:  password authentication failed for user "dataverse"
13:14 poikilotherm YEESSSSS
13:14 poikilotherm Same error here
13:15 rigelk thanks pdurbin - after reading the spreadsheet I'm wondering what type of OAI-PMH providers DV supports. Is the list implying DV instances can only harvest from other DV instances?
13:15 poikilotherm https://github.com/IQSS/dataverse-kubernetes/issues/60
13:15 poikilotherm @donsizemore could you leave a comment?
13:21 pdurbin rigelk: not at all. Dataverse can harvest from any OAI-PMH server. And Dataverse can be an OAI-PMH server for any client. Here's a screenshot of how it looks when you set up Dataverse as a client: https://github.com/IQSS/dataverse/issues/4318#issuecomment-487007005
13:23 pdurbin rigelk: of the 5 supported formats that can be used for harvesting, all are based on standards expect one, the dataverse_json format which we invented because we needed a format that's as flexible as Dataverse when it comes to metadata. That said, we usually recommend harvesting using DDI.
13:24 pdurbin oai_ddi in that screenshot
13:24 pdurbin except*
13:24 rigelk good to know! Where could I read about the dataverse_json format?
13:25 pdurbin There's no spec. No JSON Schema. :(
13:26 rigelk ah, that's sad. It could have made for a good requirements spec for an upcoming AP object
13:26 pdurbin rigelk: here's how it looks at least: https://github.com/IQSS/dataverse/blob/v4.14/scripts/api/data/dataset-create-new-all-default-fields.json :)
13:27 pdurbin It's ugly. Let's invent a newer, better format. :)
13:27 poikilotherm Hmm at least OAI-PMH is a common industry standard
13:28 poikilotherm And it comes at night and sucks out anything you are willing to give
13:28 rigelk blank slate it is then!
13:28 pdurbin Sure but we're talking about which format to use over OAI-PHM. Dublin Core is required.
13:28 rigelk (but all formats are ugly)
13:28 poikilotherm Is Dataverse JSON running over OAI, too?
13:29 poikilotherm I understood it like it is a separate protocol & format like "download a json file"
13:29 pdurbin poikilotherm: yes, the dataverse_json format is the only way to harvest from custom metadata blocks like Pete's block for structural biology.
13:30 poikilotherm So OAI-PMH is not used for transport?
13:31 poikilotherm This is really relevant for us, as we want to harvest other repos with custom metadata to be able to have not only "short tail" datasets, but "long tail", too
13:31 poikilotherm And this will in almost all cases involve more than Dublin Core or DataCite
13:32 poikilotherm The other side won't be Dataverse in most cases, but custom software
13:32 poikilotherm Like sample databases, etc
13:32 poikilotherm But also other repos like TERENO
13:32 poikilotherm http://www.tereno.net/overview-en?set_language=en
13:33 poikilotherm TERENO for example offers metadata which is important for the community of scientists and it might get relevant to have those in Dataverse, too
13:33 stefankasberger I dont know. I use now since yesterday xchat, an irc client for ubuntu. works for me.
13:33 pdurbin poikilotherm: like I showed in that screenshot, you can pick your poison when setting up OAI-PMH clients. Dublin Core, DDI, DataCite, OpenAIRE, Dataverse JSON. :)
13:34 poikilotherm Ok, so Dataverse JSON gets transported over OAI-PMH as a payload
13:34 pdurbin yes, payload
13:34 poikilotherm Great :-)
13:34 pdurbin stefankasberger: thanks for merging my pull request into https://github.com/IQSS/dataverse/pull/5878 . :) I'll move it to QA.
13:35 poikilotherm Thx for the clarification
13:36 pdurbin sure!
13:37 pdurbin stefankasberger: oh, and thanks for commenting on https://github.com/SwissDataScienceCenter/renku-python/issues/536
14:17 donsizemore joined #dataverse
14:21 stefankasberger joined #dataverse
14:44 stefankasberger @pdurbin: regarding the hack-session on tuesday before the conference: do you know, if there are some other people interested in joining? and: can i invite some? am thinking to offer this in the pyDataverse announcement today.
14:45 poikilotherm T_T
15:04 pdurbin When Stefan comes back should I tell him I'm going to this on that Tuesday instead? https://projects.iq.harvard.edu/osshealthindex/attendee-list :/
15:26 stefankasberger joined #dataverse
15:32 pdurbin stefankasberger: welcome back! Did you see my last message?
15:34 stefankasberger yes.
15:36 stefankasberger but do i remember correct: the plan so far is, that some folks sit together and hack around at your place, right? are there already some groups or established ideas of activities?
15:36 pdurbin I'm sorry that I won't personally be very available that Tuesday. This is an opportunity for me to meet a ton of interesting people in open source, a topic I'm passionate about. You should absolutely invite other people!
15:36 pdurbin Yes, I believe a room has been reserved. I can go ask which one if you want.
15:42 donsizemore @pdurbin i shall reserve the front corner table at Dumpling House!
15:42 pdurbin do it :)
15:54 pdurbin stefankasberger: you know about the Google Doc right? We deprecated the spreadsheet.
16:06 pdurbin stefankasberger: the Google doc is linked as "Learn More" from the top of https://projects.iq.harvard.edu/dcm2019/agenda
16:08 pdurbin stefankasberger: wow, great announcement!  Announcement: pyDataverse "0.1.0 - Marietta Blau" released https://groups.google.com/d/msg/dataverse-community/qXVwSQrrtqI/iwVZ4rXQAAAJ
16:11 stefankasberger thanks for the link. i updated it a bit. so it looks fine for me, cause i maybe merge with slava, when we are not enough for a sole group.
16:11 pdurbin o.O
16:12 stefankasberger And please share the anouncement with everyone, who could be interested in. And please test it!!! Thats the most important right now. :)
16:13 pdurbin sorry, one thing at a time
16:13 pdurbin stefankasberger: there's a lot of conversation going on outside the google doc (thanks for updating it) in this "Pre Community meeting 2019 catchup opp - Archivematica & self-Ingest quality/compliance" thread: https://groups.google.com/d/msg/dataverse-community/k5cqzZXnUGE/lOY8dVwfAwAJ
16:14 pdurbin stefankasberger: for testing pyDataverse, can I add an issue to https://github.com/AUSSDA/pyDataverse to start testing it with https://jenkins.dataverse.org ?
16:30 stefankasberger yes, of course. btw: what would be the advantage of using jenkins over travis ci?
16:31 pdurbin Well, there are a few things I have in mind.
16:31 pdurbin I think I'd like https://jenkins.dataverse.org to be a sort of dashboard of software in the Dataverse ecosystem.
16:32 pdurbin These days there's a lot more than just the main Java app.
16:32 pdurbin And your new Python module certainly qualifies. :)
16:32 pdurbin Does that make sense?
16:38 stefankasberger in regards to the idea of an app market? or to test proper integration of external tools into dataverse?
16:41 pdurbin Well, I'm thinking about how often I say things like, "You should try dataverse-ansible." or "You should try dataverse-kubernetes" or "You should try dataverse-client-python" or "You should try dataverse-client-r" or "You should try dataverse-client-java." But do I know if the tests are passing for all those projects? I'd like one dashboard, I think. :)
16:41 pdurbin Is that a little more clear?
16:44 stefankasberger got you.
16:44 stefankasberger central overview makes sense.
16:45 stefankasberger can you maybe send me the dataverse logo as a vector graphic? i have an idea for the pyDataverse logo, and would like to play around a bit.
16:45 pdurbin stefankasberger: great! I didn't both mentioning the dashboard idea in this new issue I just created but I did put my "belt and suspenders" thought: https://github.com/AUSSDA/pyDataverse/issues/6 :)
16:46 stefankasberger Would get in touch with you, when the idea is more mature. will not do/publish anything with your branding of course, without your consent.
16:46 pdurbin stefankasberger: sure, here's an SVG: https://github.com/IQSS/dataverse/blob/v4.14/doc/sphinx-guides/source/_static/dataverse_project_logo.svg
16:46 stefankasberger svg would be best, eps or ai should work too.
16:46 pdurbin didn't bother* I meant
16:47 stefankasberger awesome. thanks.
16:47 pdurbin sure
16:58 stefankasberger here the sketch. what do you think? https://www.dropbox.com/s/l74l9h8dnj23zvg/pydataverse-logo.svg.png?dl=0
16:59 pdurbin stefankasberger: I love it but let me share it on Slack. One sec.
17:02 pdurbin stefankasberger: no objections. Looks great! Ship it!
17:08 stefankasberger good. will go now home. its already late here. "I'll be back."
17:08 pdurbin heh
17:48 donsizemore joined #dataverse
18:43 pdurbin donsizemore: did you see we merged https://github.com/IQSS/dataverse/pull/5861 ?
19:00 donsizemore @pdurbin i did, i did
19:00 pdurbin Can I pm you?
19:22 donsizemore sure thing
19:39 pdurbin Oh, speaking of security, I just replied on this thread: https://ask.cyberinfrastructure.org/t/what-are-best-practices-for-running-vulnerability-scanners-on-a-research-project/919
19:48 donsizemore @pdurbin no Qualys web application scanning after all. security sez too much $$$
19:49 pdurbin bah
19:50 pdurbin security wants to be free
19:50 pdurbin or was that information?
19:51 pdurbin donsizemore: I'm thinking about attempting to follow your advice at https://groups.google.com/d/msg/dataverse-dev/CTRpKg0xP2o/QsPJeoqoCwAJ to upgrade postgres to 9.6 on this stupid phoenix server. It's easy, right?
19:52 pdurbin That's weird. "rpm -qf `which psql`" shows postgresql-8.4.20-8.el6_9.x86_64
19:53 pdurbin but `psql --version` shows "psql (PostgreSQL) 9.3.25"
19:54 pdurbin oh. duh. /usr/bin/psql: symbolic link to `/usr/pgsql-9.3/bin/psql'
19:55 pdurbin donsizemore: should I do this as root or the postgres user?
21:06 donsizemore joined #dataverse
21:06 donsizemore @pdurbin if you want to keep any data, pg_dump (but this is phoenix)
21:07 donsizemore @pdurbin then just stop, disable and remove 8.4, then install 9.6
21:07 donsizemore @pdurbin (then import your tables if you want to keep any data)
21:26 pdurbin_m joined #dataverse
21:27 pdurbin_m Here's what I did: https://github.com/IQSS/dataverse/issues/5872#issuecomment-495388418
21:27 pdurbin_m Don, thanks for commenting after me already.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.