IQSS logo

IRC log for #dataverse, 2020-02-07

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
08:23 jri joined #dataverse
08:59 pkiraly joined #dataverse
09:07 jri_ joined #dataverse
09:09 jri_ joined #dataverse
09:11 jri_ joined #dataverse
11:24 pdurbin pkiraly: any outages today? :)
11:24 poikilotherm joined #dataverse
11:24 poikilotherm Morning pdurbin
11:25 pdurbin poikilotherm: morning. Sample data still in focus? :)
11:25 poikilotherm Does your script you mention at https://github.com/IQSS/dataverse-sample-data/pull/14#issuecomment-583348684 really result in a cleaned up index and database?
11:25 pdurbin You don't trust me? :)
11:25 poikilotherm Little side project... ;-)
11:26 pkiraly pdurbin: At least I was not aware of, it worked when I checked, and no user reclaimed
11:26 poikilotherm Have my test cluster running now and found some tweaks necessary for the main project ;-)
11:26 pdurbin tweaks doesn't sound too bad
11:28 pdurbin pkiraly: have you considered creating a board for you installation? Please see https://groups.google.com/d/msg/dataverse-community/WhbkdML6Jbs/VeDdK5UfEwAJ
11:58 GitterIntegratio joined #dataverse
11:58 icarito[m] joined #dataverse
12:33 pdurbin your*
12:37 poikilotherm pdurbin: any idea how to ping @kcondon about https://github.com/IQSS/dataverse/issues/6599 ?
12:38 poikilotherm I would really like to see this resolved, as I would deploy the latest version of Solr then in our prod env...
12:43 juancorr25 joined #dataverse
12:46 pdurbin poikilotherm: that issue bricas opened? You asked "what would you like to see tested for a PR? API test suite passing? Anything else?" I'd say you could model a pull request off old pull requests for Solr upgrades.
12:47 poikilotherm Do you have an example from the top of your head?
12:47 pdurbin I'm looking at https://github.com/IQSS/dataverse/commits/develop/conf/solr
12:47 poikilotherm Master of Puppets... erh. Issues :-D
12:48 pdurbin The problem is we tend to merge one pull request, and then find some bugs, and then merge another.
12:49 pdurbin poikilotherm: this one looks pretty complete: https://github.com/IQSS/dataverse/pull/5443
12:50 poikilotherm I also found https://github.com/IQSS/dataverse/issues/4579
12:51 poikilotherm Is anyone using the docker-aio thing anymore?
12:52 poikilotherm When this is now obsolete maybe we should think about removing it, so we don't have to maintain it...
12:52 poikilotherm (Just saw it in the PRs being updated...)
12:53 poikilotherm And makes less moving parts that need testing...
12:54 poikilotherm pdurbin lately I had a crazy idea. What if we provide a vagrant file with K8s inside?
12:54 poikilotherm Less moving parts............
13:01 pdurbin_m joined #dataverse
13:03 pdurbin_m poikilotherm: docker-aio is used by docker-dcm, which is used. But maybe it could be moved to some other repo?
13:06 pdurbin_m Would Slava want it in dataverse-docker?
13:07 donsizemore joined #dataverse
13:14 poikilotherm Uh no idea
13:14 poikilotherm Maybe we should try to get DCM into Dataverse Kubernetes
13:14 poikilotherm Everything else is already present ;-)
13:17 pdurbin_m That would be great!!
13:17 pdurbin_m At a high level, we should make it easy to upgrade Solr.
13:18 poikilotherm Yeah
13:18 pdurbin_m With a pull request, I mean.
13:18 poikilotherm Right now there are a lot of places that need be maintained, which are not part of core Dataverse
13:19 pdurbin_m yeah
13:19 pdurbin_m poikilotherm: if you get a pull request started, others can help
13:22 pdurbin_m We call it "swarming". :)
13:29 pdurbin_m poikilotherm: and sure, if you want to change the Vagrantfile to point to dataverse-kubernetes, I'm fine with that.
14:15 pdurbin I guess I was thinking I'd use that Vagrantfile for upgrading to Payara. I'm not sure how that would work if it's pointing at dataverse-kubernetes.
14:25 pdurbin donsizemore: morning. For you: https://twitter.com/philipdurbin/status/1225787075045122049
14:43 pkiraly pdurbin: sorry, this is a crazy day for me with lots of distruption. I was not aware of the installation boards so far. Yes, we have favorite tickets, I'll define a board next week.
14:43 poikilotherm OK gotta run. Maybe see you later. If not: have a nice weekend @all
14:44 pkiraly poikilotherm: same to you!
14:44 pdurbin Have a good weekend!
14:46 pdurbin pkiraly: no worries, no rush. :)
14:53 donsizemore @pdurbin so i have sampledata doctored up (touch one thing, something downstream breaks) though i'm hitting check_dataset_lock(dataset_dbid)\n  File \"create_sample_data.py\", line 22, in check_dataset_lock\n    check_dataset_lock(dataset_dbid)\n  File \"create_sample_data.py\", line 18, in check_dataset_lock\n    locks = resp.json()['data']\n  File \"/usr/local/lib/python3.6/site​-packages/requests/models.py\", line 897, in json\n    re
14:54 donsizemore @pdurbin matthew is giving a short talk about co-ray-ray here on march 23rd if you'd like to come down!
14:54 pdurbin I'd love to! :)
14:54 pdurbin those line numbers are from poikilotherm's pull request?
14:55 donsizemore "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for d
14:55 donsizemore I think it's the 5400RPM HD on my buildbox
14:55 donsizemore correct, i've pointed my dataverse-ansible branch at his fork/branch
14:56 donsizemore next to spin it up in EC2 to see if it's just slow storage
14:56 pdurbin but you're saying an exception is being thrown?
14:57 donsizemore yeah, à la the API test suite in docker
15:01 donsizemore https://github.com/IQSS/dataverse-ansible/compare/143_update_sample_data
15:01 pdurbin I'm looking at https://github.com/poikilotherm/dataverse-sample-data/blob/4ccd3d5fab7b4110a63b827e10f243cc973fe648/create_sample_data.py#L22
15:02 pdurbin donsizemore: is there more in that stacktrace? It's hard for me to tell where it's failing.
15:05 donsizemore gimme a second. ima convert the commas to carriage returns and send you the result in slack? (lest i junk up the IRC logs here)
15:07 pdurbin sounds fine
15:22 pdurbin donsizemore: it seems like we're getting JSONDecodeError from https://github.com/poikilotherm/dataverse-sample-data/blob/4ccd3d5fab7b4110a63b827e10f243cc973fe648/create_sample_data.py#L18 ... which is this... locks = resp.json()['data']
15:22 pdurbin but why I don't know
15:24 pdurbin donsizemore: what if you limit the amount of datasets that are created? Maybe start with just one? Is it clear how to do that? It's in the config.
15:34 donsizemore I got a lot of non-JSON responses out of Dataverse trying to upload those 20GB datafiles for Thu-Mai
15:34 donsizemore I'm fairly certain it's just my slow hard drive here, as the sampledata run took forever
15:35 pdurbin hmm, ok
15:35 pdurbin not sure how to fix that :)
15:35 * pdurbin hands donsizemore a faster hard drive
15:35 donsizemore but i'll tag dvconfig.py down to one dataset json file and see what happens
15:35 pdurbin should we add more sleep?
15:35 pdurbin sounds good, thanks
15:36 donsizemore on the one finger, you'd want to accomodate folks with slower storage. on the other finger, i'll gladly accept an SSD
15:49 donsizemore testing with only open-source-at-harvard.json enabled
15:51 pdurbin and? :)
15:55 poikilotherm Some datasets gave me a hard time, too.
15:55 poikilotherm I didn't track down which were stressing us
15:56 poikilotherm Maybe I can investigate later, but if donsizemore volunteers I don't mind either :-D
15:57 pdurbin I'm not sure why this is high priority but who am I to judge. :)
15:57 poikilotherm To be usefully, dataverse-sample-data should provide at least one configuration that deploys within max 10 secs for tests. Ideally even faster.
15:58 poikilotherm Slava will kill us if loading takes minutes :-D
15:59 * pdurbin hides
16:19 donsizemore @pdurbin i just wanted the ansible role to be ready for @poikilotherm's PR
16:19 donsizemore @pdurbin Slava's selenium stuff is next on my list
16:32 pdurbin donsizemore: cool. If your ears were burning at standup we talked about https://github.com/IQSS/dataverse/issues/6510
16:33 pdurbin Is that something you're helping with?
17:51 jri joined #dataverse
18:06 donsizemore @pdurbin yup with only one sample dataset                    : ok=143  changed=105  unreachable=0    failed=0
18:06 pdurbin it works? good
18:09 pdurbin donsizemore: anything to add for #6510? I'm not sure what they were talking about at standup.
18:19 donsizemore @pdurbin i can ask akio for input / feedback. your paste above is the first time i remember seeing #6510
18:20 pdurbin Huh. Ok. Maybe I have the wrong issue. Would you be able to ping Leonid about it in Slack?
18:25 jri joined #dataverse
18:30 donsizemore @pdurbin he was probably talking about the duplicate dvobject entries for datafiles
18:30 donsizemore @pdurbin i'm pushing for one of our archivists to handle the clean-up in the web interface, but they're wisely ignoring me (for now)
18:31 pdurbin oh, duplicate dvobjects, ok
18:35 pdurbin nice note from Jim about GDCC updates, Google summer of code, etc: https://groups.google.com/d/msg/dataverse-community/zIi7Grycav4/relzlpMKAgAJ
18:36 pdurbin Lots of ideas! https://tinyurl.com/GDCC-GSoC-Ideas ... Did we forget any? :)
19:32 poikilotherm Good evening gentlemen
19:32 poikilotherm Here I am and "I solemnly swear that I'm up to no good."
19:32 poikilotherm donsizemore: you still around?
19:32 poikilotherm Should I poke anything with a stick in dataverse-sample-data?
19:37 * pdurbin hands poikilotherm a stick
19:46 * poikilotherm tries to drill inside and looks for termites
19:47 donsizemore @poikilotherm I gave it my usual make-it-work stab, if you want to check out the PR?
19:47 pdurbin donsizemore: I only took a quick look. Sorry. Doing many things.
19:47 poikilotherm https://github.com/IQSS/dataverse-ansible/pull/145/files ?
19:47 donsizemore @poikilotherm yis
19:50 poikilotherm That looks indeed promising
19:51 poikilotherm So we're good with the exception that we should take a closer look on these sample dataset taking ages to load?
19:57 donsizemore @poikilotherm on my part i blame my hand-me-down buildbox that's a 6(?) year-old desktop with a 5400rpm HD
19:59 poikilotherm I had some troubles with some of the datasets
19:59 poikilotherm And I'm working on FAST hardware
19:59 poikilotherm New laptop... ;-)
20:09 pdurbin mmm, new laptop
20:14 donsizemore @pdurbin on doing many things: i just don't want the bottleneck to be me!
20:14 pdurbin :)
20:14 pdurbin everybody love sample data all of a sudden :)
20:18 poikilotherm pdurbin: how about another crazy idea
20:22 pdurbin hit me
20:24 poikilotherm You remember my idea about one day we have kind of a dataverse cli app?
20:24 poikilotherm What if we start small in dataverse-sample-data
20:25 poikilotherm Reshape this collection of simple scripts into a slightly more advanced thing: a cli tool with commands etc
20:27 poikilotherm Slava already mentioned crazy ideas about loading your own sample data from an arbitrary place
20:27 pdurbin I'm fine with a Dataverse CLI app. We already have https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader for example.
20:27 poikilotherm There might be a demand for a nice and simple cli tool
20:27 pdurbin dataverse-ansible already supports arbitrary URLS (forks) of sample data. (thanks, donsizemore)
20:30 poikilotherm If I where to start such a thing, should we do it in dataverse-sample-data or should we go big and create a full fledged repo?
20:30 donsizemore @pdurbin on selenium. got a JS error out of selenium-side-runner, but i was running it without sample-data for brevity. trying again with full sample data (which will take longer)
20:33 pdurbin donsizemore: oh, did some actual tests get pushed?
20:33 pdurbin poikilotherm: I'm fine with a new repo. What language are you going to use?
20:33 poikilotherm Python
20:33 poikilotherm More hackable
20:34 poikilotherm Building on beautiful pyDataverse
20:34 pdurbin Should it go in the pyDataverse repo?
20:35 poikilotherm No, I think this should be on its own
20:35 donsizemore @pdurbin not knowing anything about selenium, i blindly pointed it at demo-dataverse.side, after preening the "urls" line
20:35 donsizemore @pdurbin so, eh, the framework is there in dataverse-ansible.
20:36 poikilotherm If it goes big, pyDataverse is a building block
20:36 donsizemore @poikilotherm it will need a good name
20:36 pdurbin donsizemore: oh. that demo-dataverse.side JSON file. Interesting. I've barely used Selenium.
20:37 pdurbin poikilotherm: want me to create a repo? dataverse-cli?
20:37 poikilotherm donsizemore: any suggestions?
20:38 poikilotherm pdurbin: let me start some experiments in github.com/poikilotherm first...
20:39 poikilotherm Transfering later is easy
20:44 pdurbin sure
20:55 pdurbin donsizemore: I'm pretty excited about browser-based automated tests so if can help at all, please let me know.
20:58 poikilotherm donsizemore pdurbin what about an easy and short name like dvcli
21:01 Slava63 joined #dataverse
21:02 pdurbin Well sure, we can call `dvcli` from the command line but I like dataverse-* for GitHub repos under IQSS.
21:02 Slava63 Hi guys, I'm jumping in the discussion about Selenium stuff. We've experimented a lot with it and there are some things you should know.
21:03 pdurbin Slava!
21:03 poikilotherm OMG SLAVA
21:03 poikilotherm :-D
21:04 Slava63 It's very easy to get Selenium IDE plugin for Google Chrome and Firefox and play .side file there. But this file is suitable for Jenkins pipelines as well.
21:05 pdurbin oh good
21:05 Slava63 Look, Google is directing Selenium community to CI/CD integrations, it's just fantastic development. I'm watching for a long time and I think Selenium IDE will be fully ready for CI/CD in the middle of the year.
21:06 pdurbin nice
21:06 Slava63 We've experimented with demo.dataverse.nl server but I've discovered after running all Selenium tests we need to wipe it out and restore to the same state again to be able to reproduce bugs.
21:08 poikilotherm Slava63: I'm starting https://github.com/poikilotherm/dvcli for these tasks
21:08 Slava63 That's why I've started to move datasets from samples data repository that should be installed before every Selenium run and inside of Jenkins pipeline in the future.
21:09 poikilotherm Those things you wished for in the PR are beyond the scope of dataverse-sample-data but are a perfect fit for an idea I had a while ago...
21:09 Slava63 Don, you need to install selenium-side-runner to be able to run .side file in Jenkins pipeline. I'll look for example now.
21:11 Slava63 Look, guys, how it should work with Jenkins https://www.visiontemenos.com/blog/selenium-ide . It's perfect match for what we need if we'll get community behind us to create those tests in .side files.
21:12 pdurbin I've never heard of .side files but I'm fine with them. :)
21:13 Slava63 Phil, it's just json with all actions in the forms of patterns, you can open in any editor, it's not Rocket Science. I'm even thinking to use some Dataverse to synchronize all .side files coming from the different parties, it can create a synergy. :)
21:15 pdurbin Sure, sounds fine. Sounds modern.
21:16 Slava63 Next week we're going to deliver about 100 tests in .side files for sample-data repository, we've agreed inside of DANS. After we need to ask community to follow http://guides.dataverse.org/en/latest/user/index.html and create every test for every action.
21:17 pdurbin wow, 100 tests. nice!
21:17 pdurbin donsizemore: that means more code coverage :)
21:19 Slava63 I'm working on other EU projects to set up the same Software Quality baseline for the services running in the European Open Science Cloud (EOSC). We can start the integration of hundreds of EOSC services with Dataverse that will follow the same policy.
21:22 Slava63 I want to get the highest possible level of maturity in this way. All microservices should be tested first and after we can test GUIs and run integration tests with Selenium running in CI/CD pipeline.
21:23 pdurbin I love it.
21:25 Slava63 We also need to work on the new policy to describe issues that people create in GitHub. If they've found bug, we need to ask them to install Selenium IDE and record the script with bug. It can speed up bug fixing massively.
21:27 pdurbin But will the users listen to us? :)
21:29 Slava63 That's the power of community, right? If they want to get their problem fixed asap, they should listen. Priorities should be made for issues with live tests.
21:32 Slava63 That's why we also need a standardized test set to make bugs reproducible. The repo with sample datasets should play this role nicely.
21:32 pdurbin Yeah, I agree with all this. :)
21:35 Slava63 We can easily get thousands of Dataverse installations around the world but I'm not forcing yet before it's not ready. After some critical mass the community will get the certain level of maturity and will start to manage itself.
21:36 pdurbin Thousands? Really? :)
21:36 pdurbin Easily? :)
21:36 pdurbin world domination!
21:38 Slava63 Look, it's just phase 2 according to the Customer Development Methology https://en.wikipedia.org/wiki/Customer_development . Selenium stuff is going to be a bridge to the next phase.
21:40 pdurbin agreed
21:40 pdurbin I've seen Dataverse mature a lot in 7 years.
21:40 Slava63 GDCC is the beginning for the next phase of scaling, btw.
21:40 pdurbin At FOSDEM I was asked why companies aren't installing Dataverse, why there aren't 4-5 companies doing this.
21:41 pdurbin installing and supporting
21:42 pdurbin And I was asked by someone who installs/supports a *different* repository software if he should start offering Dataverse. :)
21:42 Slava63 There is no proof of the technical maturity yet, it's clear. That's why we started to follow CESSDA Maturity Model that was basically copied from NASA https://zenodo.org/record/2591055#.Xj3ZsC2ZPAI
21:43 pdurbin Right. Someday we'll put Dataverse on the moon.
21:43 pdurbin https://github.com/dataversebot already has the right helmet.
21:45 pdurbin Anyway, I should get out of here. Slava63 you should hang around more often. :)
21:45 pdurbin Have a good weekend, all!
21:45 pdurbin left #dataverse
21:45 Slava63 You too, cu!
21:46 Slava63 left #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.