Time
S
Nick
Message
08:23
jri joined #dataverse
08:59
pkiraly joined #dataverse
09:07
jri_ joined #dataverse
09:09
jri_ joined #dataverse
09:11
jri_ joined #dataverse
11:24
pdurbin
pkiraly: any outages today? :)
11:24
poikilotherm joined #dataverse
11:24
poikilotherm
Morning pdurbin
11:25
pdurbin
poikilotherm: morning. Sample data still in focus? :)
11:25
poikilotherm
Does your script you mention at https://github.com/IQSS/dataverse-sample-data/pull/14#issuecomment-583348684 really result in a cleaned up index and database?
11:25
pdurbin
You don't trust me? :)
11:25
poikilotherm
Little side project... ;-)
11:26
pkiraly
pdurbin: At least I was not aware of, it worked when I checked, and no user reclaimed
11:26
poikilotherm
Have my test cluster running now and found some tweaks necessary for the main project ;-)
11:26
pdurbin
tweaks doesn't sound too bad
11:28
pdurbin
pkiraly: have you considered creating a board for you installation? Please see https://groups.google.com/d/msg/dataverse-community/WhbkdML6Jbs/VeDdK5UfEwAJ
11:58
GitterIntegratio joined #dataverse
11:58
icarito[m] joined #dataverse
12:33
pdurbin
your*
12:37
poikilotherm
pdurbin: any idea how to ping @kcondon about https://github.com/IQSS/dataverse/issues/6599 ?
12:38
poikilotherm
I would really like to see this resolved, as I would deploy the latest version of Solr then in our prod env...
12:43
juancorr25 joined #dataverse
12:46
pdurbin
poikilotherm: that issue bricas opened? You asked "what would you like to see tested for a PR? API test suite passing? Anything else?" I'd say you could model a pull request off old pull requests for Solr upgrades.
12:47
poikilotherm
Do you have an example from the top of your head?
12:47
pdurbin
I'm looking at https://github.com/IQSS/dataverse/commits/develop/conf/solr
12:47
poikilotherm
Master of Puppets... erh. Issues :-D
12:48
pdurbin
The problem is we tend to merge one pull request, and then find some bugs, and then merge another.
12:49
pdurbin
poikilotherm: this one looks pretty complete: https://github.com/IQSS/dataverse/pull/5443
12:50
poikilotherm
I also found https://github.com/IQSS/dataverse/issues/4579
12:51
poikilotherm
Is anyone using the docker-aio thing anymore?
12:52
poikilotherm
When this is now obsolete maybe we should think about removing it, so we don't have to maintain it...
12:52
poikilotherm
(Just saw it in the PRs being updated...)
12:53
poikilotherm
And makes less moving parts that need testing...
12:54
poikilotherm
pdurbin lately I had a crazy idea. What if we provide a vagrant file with K8s inside?
12:54
poikilotherm
Less moving parts............
13:01
pdurbin_m joined #dataverse
13:03
pdurbin_m
poikilotherm: docker-aio is used by docker-dcm, which is used. But maybe it could be moved to some other repo?
13:06
pdurbin_m
Would Slava want it in dataverse-docker?
13:07
donsizemore joined #dataverse
13:14
poikilotherm
Uh no idea
13:14
poikilotherm
Maybe we should try to get DCM into Dataverse Kubernetes
13:14
poikilotherm
Everything else is already present ;-)
13:17
pdurbin_m
That would be great!!
13:17
pdurbin_m
At a high level, we should make it easy to upgrade Solr.
13:18
poikilotherm
Yeah
13:18
pdurbin_m
With a pull request, I mean.
13:18
poikilotherm
Right now there are a lot of places that need be maintained, which are not part of core Dataverse
13:19
pdurbin_m
yeah
13:19
pdurbin_m
poikilotherm: if you get a pull request started, others can help
13:22
pdurbin_m
We call it "swarming". :)
13:29
pdurbin_m
poikilotherm: and sure, if you want to change the Vagrantfile to point to dataverse-kubernetes, I'm fine with that.
14:15
pdurbin
I guess I was thinking I'd use that Vagrantfile for upgrading to Payara. I'm not sure how that would work if it's pointing at dataverse-kubernetes.
14:25
pdurbin
donsizemore: morning. For you: https://twitter.com/philipdurbin/status/1225787075045122049
14:43
pkiraly
pdurbin: sorry, this is a crazy day for me with lots of distruption. I was not aware of the installation boards so far. Yes, we have favorite tickets, I'll define a board next week.
14:43
poikilotherm
OK gotta run. Maybe see you later. If not: have a nice weekend @all
14:44
pkiraly
poikilotherm: same to you!
14:44
pdurbin
Have a good weekend!
14:46
pdurbin
pkiraly: no worries, no rush. :)
14:53
donsizemore
@pdurbin so i have sampledata doctored up (touch one thing, something downstream breaks) though i'm hitting check_dataset_lock(dataset_dbid)\n File \"create_sample_data.py\", line 22, in check_dataset_lock\n check_dataset_lock(dataset_dbid)\n File \"create_sample_data.py\", line 18, in check_dataset_lock\n locks = resp.json()['data']\n File \"/usr/local/lib/python3.6/site-packages/requests/models.py\", line 897, in json\n re
14:54
donsizemore
@pdurbin matthew is giving a short talk about co-ray-ray here on march 23rd if you'd like to come down!
14:54
pdurbin
I'd love to! :)
14:54
pdurbin
those line numbers are from poikilotherm's pull request?
14:55
donsizemore
"Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for dataset id 39... sleeping...", "Lock found for d
14:55
donsizemore
I think it's the 5400RPM HD on my buildbox
14:55
donsizemore
correct, i've pointed my dataverse-ansible branch at his fork/branch
14:56
donsizemore
next to spin it up in EC2 to see if it's just slow storage
14:56
pdurbin
but you're saying an exception is being thrown?
14:57
donsizemore
yeah, à la the API test suite in docker
15:01
donsizemore
https://github.com/IQSS/dataverse-ansible/compare/143_update_sample_data
15:01
pdurbin
I'm looking at https://github.com/poikilotherm/dataverse-sample-data/blob/4ccd3d5fab7b4110a63b827e10f243cc973fe648/create_sample_data.py#L22
15:02
pdurbin
donsizemore: is there more in that stacktrace? It's hard for me to tell where it's failing.
15:05
donsizemore
gimme a second. ima convert the commas to carriage returns and send you the result in slack? (lest i junk up the IRC logs here)
15:07
pdurbin
sounds fine
15:22
pdurbin
donsizemore: it seems like we're getting JSONDecodeError from https://github.com/poikilotherm/dataverse-sample-data/blob/4ccd3d5fab7b4110a63b827e10f243cc973fe648/create_sample_data.py#L18 ... which is this... locks = resp.json()['data']
15:22
pdurbin
but why I don't know
15:24
pdurbin
donsizemore: what if you limit the amount of datasets that are created? Maybe start with just one? Is it clear how to do that? It's in the config.
15:34
donsizemore
I got a lot of non-JSON responses out of Dataverse trying to upload those 20GB datafiles for Thu-Mai
15:34
donsizemore
I'm fairly certain it's just my slow hard drive here, as the sampledata run took forever
15:35
pdurbin
hmm, ok
15:35
pdurbin
not sure how to fix that :)
15:35
* pdurbin
hands donsizemore a faster hard drive
15:35
donsizemore
but i'll tag dvconfig.py down to one dataset json file and see what happens
15:35
pdurbin
should we add more sleep?
15:35
pdurbin
sounds good, thanks
15:36
donsizemore
on the one finger, you'd want to accomodate folks with slower storage. on the other finger, i'll gladly accept an SSD
15:49
donsizemore
testing with only open-source-at-harvard.json enabled
15:51
pdurbin
and? :)
15:55
poikilotherm
Some datasets gave me a hard time, too.
15:55
poikilotherm
I didn't track down which were stressing us
15:56
poikilotherm
Maybe I can investigate later, but if donsizemore volunteers I don't mind either :-D
15:57
pdurbin
I'm not sure why this is high priority but who am I to judge. :)
15:57
poikilotherm
To be usefully, dataverse-sample-data should provide at least one configuration that deploys within max 10 secs for tests. Ideally even faster.
15:58
poikilotherm
Slava will kill us if loading takes minutes :-D
15:59
* pdurbin
hides
16:19
donsizemore
@pdurbin i just wanted the ansible role to be ready for @poikilotherm's PR
16:19
donsizemore
@pdurbin Slava's selenium stuff is next on my list
16:32
pdurbin
donsizemore: cool. If your ears were burning at standup we talked about https://github.com/IQSS/dataverse/issues/6510
16:33
pdurbin
Is that something you're helping with?
17:51
jri joined #dataverse
18:06
donsizemore
@pdurbin yup with only one sample dataset : ok=143 changed=105 unreachable=0 failed=0
18:06
pdurbin
it works? good
18:09
pdurbin
donsizemore: anything to add for #6510? I'm not sure what they were talking about at standup.
18:19
donsizemore
@pdurbin i can ask akio for input / feedback. your paste above is the first time i remember seeing #6510
18:20
pdurbin
Huh. Ok. Maybe I have the wrong issue. Would you be able to ping Leonid about it in Slack?
18:25
jri joined #dataverse
18:30
donsizemore
@pdurbin he was probably talking about the duplicate dvobject entries for datafiles
18:30
donsizemore
@pdurbin i'm pushing for one of our archivists to handle the clean-up in the web interface, but they're wisely ignoring me (for now)
18:31
pdurbin
oh, duplicate dvobjects, ok
18:35
pdurbin
nice note from Jim about GDCC updates, Google summer of code, etc: https://groups.google.com/d/msg/dataverse-community/zIi7Grycav4/relzlpMKAgAJ
18:36
pdurbin
Lots of ideas! https://tinyurl.com/GDCC-GSoC-Ideas ... Did we forget any? :)
19:32
poikilotherm
Good evening gentlemen
19:32
poikilotherm
Here I am and "I solemnly swear that I'm up to no good."
19:32
poikilotherm
donsizemore: you still around?
19:32
poikilotherm
Should I poke anything with a stick in dataverse-sample-data?
19:37
* pdurbin
hands poikilotherm a stick
19:46
* poikilotherm
tries to drill inside and looks for termites
19:47
donsizemore
@poikilotherm I gave it my usual make-it-work stab, if you want to check out the PR?
19:47
pdurbin
donsizemore: I only took a quick look. Sorry. Doing many things.
19:47
poikilotherm
https://github.com/IQSS/dataverse-ansible/pull/145/files ?
19:47
donsizemore
@poikilotherm yis
19:50
poikilotherm
That looks indeed promising
19:51
poikilotherm
So we're good with the exception that we should take a closer look on these sample dataset taking ages to load?
19:57
donsizemore
@poikilotherm on my part i blame my hand-me-down buildbox that's a 6(?) year-old desktop with a 5400rpm HD
19:59
poikilotherm
I had some troubles with some of the datasets
19:59
poikilotherm
And I'm working on FAST hardware
19:59
poikilotherm
New laptop... ;-)
20:09
pdurbin
mmm, new laptop
20:14
donsizemore
@pdurbin on doing many things: i just don't want the bottleneck to be me!
20:14
pdurbin
:)
20:14
pdurbin
everybody love sample data all of a sudden :)
20:18
poikilotherm
pdurbin: how about another crazy idea
20:22
pdurbin
hit me
20:24
poikilotherm
You remember my idea about one day we have kind of a dataverse cli app?
20:24
poikilotherm
What if we start small in dataverse-sample-data
20:25
poikilotherm
Reshape this collection of simple scripts into a slightly more advanced thing: a cli tool with commands etc
20:27
poikilotherm
Slava already mentioned crazy ideas about loading your own sample data from an arbitrary place
20:27
pdurbin
I'm fine with a Dataverse CLI app. We already have https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader for example.
20:27
poikilotherm
There might be a demand for a nice and simple cli tool
20:27
pdurbin
dataverse-ansible already supports arbitrary URLS (forks) of sample data. (thanks, donsizemore)
20:30
poikilotherm
If I where to start such a thing, should we do it in dataverse-sample-data or should we go big and create a full fledged repo?
20:30
donsizemore
@pdurbin on selenium. got a JS error out of selenium-side-runner, but i was running it without sample-data for brevity. trying again with full sample data (which will take longer)
20:33
pdurbin
donsizemore: oh, did some actual tests get pushed?
20:33
pdurbin
poikilotherm: I'm fine with a new repo. What language are you going to use?
20:33
poikilotherm
Python
20:33
poikilotherm
More hackable
20:34
poikilotherm
Building on beautiful pyDataverse
20:34
pdurbin
Should it go in the pyDataverse repo?
20:35
poikilotherm
No, I think this should be on its own
20:35
donsizemore
@pdurbin not knowing anything about selenium, i blindly pointed it at demo-dataverse.side, after preening the "urls" line
20:35
donsizemore
@pdurbin so, eh, the framework is there in dataverse-ansible.
20:36
poikilotherm
If it goes big, pyDataverse is a building block
20:36
donsizemore
@poikilotherm it will need a good name
20:36
pdurbin
donsizemore: oh. that demo-dataverse.side JSON file. Interesting. I've barely used Selenium.
20:37
pdurbin
poikilotherm: want me to create a repo? dataverse-cli?
20:37
poikilotherm
donsizemore: any suggestions?
20:38
poikilotherm
pdurbin: let me start some experiments in github.com/poikilotherm first...
20:39
poikilotherm
Transfering later is easy
20:44
pdurbin
sure
20:55
pdurbin
donsizemore: I'm pretty excited about browser-based automated tests so if can help at all, please let me know.
20:58
poikilotherm
donsizemore pdurbin what about an easy and short name like dvcli
21:01
Slava63 joined #dataverse
21:02
pdurbin
Well sure, we can call `dvcli` from the command line but I like dataverse-* for GitHub repos under IQSS.
21:02
Slava63
Hi guys, I'm jumping in the discussion about Selenium stuff. We've experimented a lot with it and there are some things you should know.
21:03
pdurbin
Slava!
21:03
poikilotherm
OMG SLAVA
21:03
poikilotherm
:-D
21:04
Slava63
It's very easy to get Selenium IDE plugin for Google Chrome and Firefox and play .side file there. But this file is suitable for Jenkins pipelines as well.
21:05
pdurbin
oh good
21:05
Slava63
Look, Google is directing Selenium community to CI/CD integrations, it's just fantastic development. I'm watching for a long time and I think Selenium IDE will be fully ready for CI/CD in the middle of the year.
21:06
pdurbin
nice
21:06
Slava63
We've experimented with demo.dataverse.nl server but I've discovered after running all Selenium tests we need to wipe it out and restore to the same state again to be able to reproduce bugs.
21:08
poikilotherm
Slava63: I'm starting https://github.com/poikilotherm/dvcli for these tasks
21:08
Slava63
That's why I've started to move datasets from samples data repository that should be installed before every Selenium run and inside of Jenkins pipeline in the future.
21:09
poikilotherm
Those things you wished for in the PR are beyond the scope of dataverse-sample-data but are a perfect fit for an idea I had a while ago...
21:09
Slava63
Don, you need to install selenium-side-runner to be able to run .side file in Jenkins pipeline. I'll look for example now.
21:11
Slava63
Look, guys, how it should work with Jenkins https://www.visiontemenos.com/blog/selenium-ide . It's perfect match for what we need if we'll get community behind us to create those tests in .side files.
21:12
pdurbin
I've never heard of .side files but I'm fine with them. :)
21:13
Slava63
Phil, it's just json with all actions in the forms of patterns, you can open in any editor, it's not Rocket Science. I'm even thinking to use some Dataverse to synchronize all .side files coming from the different parties, it can create a synergy. :)
21:15
pdurbin
Sure, sounds fine. Sounds modern.
21:16
Slava63
Next week we're going to deliver about 100 tests in .side files for sample-data repository, we've agreed inside of DANS. After we need to ask community to follow http://guides.dataverse.org/en/latest/user/index.html and create every test for every action.
21:17
pdurbin
wow, 100 tests. nice!
21:17
pdurbin
donsizemore: that means more code coverage :)
21:19
Slava63
I'm working on other EU projects to set up the same Software Quality baseline for the services running in the European Open Science Cloud (EOSC). We can start the integration of hundreds of EOSC services with Dataverse that will follow the same policy.
21:22
Slava63
I want to get the highest possible level of maturity in this way. All microservices should be tested first and after we can test GUIs and run integration tests with Selenium running in CI/CD pipeline.
21:23
pdurbin
I love it.
21:25
Slava63
We also need to work on the new policy to describe issues that people create in GitHub. If they've found bug, we need to ask them to install Selenium IDE and record the script with bug. It can speed up bug fixing massively.
21:27
pdurbin
But will the users listen to us? :)
21:29
Slava63
That's the power of community, right? If they want to get their problem fixed asap, they should listen. Priorities should be made for issues with live tests.
21:32
Slava63
That's why we also need a standardized test set to make bugs reproducible. The repo with sample datasets should play this role nicely.
21:32
pdurbin
Yeah, I agree with all this. :)
21:35
Slava63
We can easily get thousands of Dataverse installations around the world but I'm not forcing yet before it's not ready. After some critical mass the community will get the certain level of maturity and will start to manage itself.
21:36
pdurbin
Thousands? Really? :)
21:36
pdurbin
Easily? :)
21:36
pdurbin
world domination!
21:38
Slava63
Look, it's just phase 2 according to the Customer Development Methology https://en.wikipedia.org/wiki/Customer_development . Selenium stuff is going to be a bridge to the next phase.
21:40
pdurbin
agreed
21:40
pdurbin
I've seen Dataverse mature a lot in 7 years.
21:40
Slava63
GDCC is the beginning for the next phase of scaling, btw.
21:40
pdurbin
At FOSDEM I was asked why companies aren't installing Dataverse, why there aren't 4-5 companies doing this.
21:41
pdurbin
installing and supporting
21:42
pdurbin
And I was asked by someone who installs/supports a *different* repository software if he should start offering Dataverse. :)
21:42
Slava63
There is no proof of the technical maturity yet, it's clear. That's why we started to follow CESSDA Maturity Model that was basically copied from NASA https://zenodo.org/record/2591055 #.Xj3ZsC2ZPAI
21:43
pdurbin
Right. Someday we'll put Dataverse on the moon.
21:43
pdurbin
https://github.com/dataversebot already has the right helmet.
21:45
pdurbin
Anyway, I should get out of here. Slava63 you should hang around more often. :)
21:45
pdurbin
Have a good weekend, all!
21:45
pdurbin left #dataverse
21:45
Slava63
You too, cu!
21:46
Slava63 left #dataverse