IQSS logo

IRC log for #dataverse, 2020-06-16

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
01:21 jri joined #dataverse
03:42 jri joined #dataverse
06:59 poikilotherm1 nightowl313 you should also have a backup of your index... A complete reindex of a larger installation is taking ages.
07:00 poikilotherm1 And yes, as long as your S3 bucket is not called the same name, you'll need to edit the storage identifiers
07:01 poikilotherm1 But why would you loose your S3 bucket? If that is broken, you are in deeper trouble...
07:02 poikilotherm1 And it should be entirely possible to restore a bucket to the same name it had before if were to accidentially delete it
07:15 jri joined #dataverse
07:19 jri joined #dataverse
09:47 pkiraly joined #dataverse
09:48 pkiraly poikilotherm1, Hi Oliver, do you have some minutes?
09:53 poikilotherm1 Hi pkiraly
09:53 poikilotherm1 Just working on my slides for DCM
09:53 poikilotherm1 Hit me
09:57 pkiraly It is not that urgent. I would like to talk to you a bit on the status of Payara and on the management of custom metadata blocks
09:58 pkiraly but it is perfectly fine if we do that after DCM
10:03 poikilotherm1 Hit me
10:09 pkiraly Payara: I had a Dataverse tutorial this morning, and somebody asked about Glassfish, and asked if there is an alternative. I know that you work on Payara for a while. Do you have an estimation when it will be production ready to install Dataverse on it?
10:11 poikilotherm1 The official statement about that: it will be done when Dataverse 5 is released.
10:12 poikilotherm1 It will still pretty much depend on the Payara ecosystem inheriting from Glassfish. Porting it to other app servers like Wildfly, Liberty etc should be possible, but will require lots of work and extentive testing
10:12 poikilotherm1 But it will be a lot easier than coming from good ol GF 4.1 :-D
10:13 pkiraly Great!
10:13 poikilotherm1 Although pdurbin will kill me for this: I don't see Dataverse transformed to a Quarkus or Spring application. The task is too big for a scientific open source project
10:16 pkiraly I do not have enough experience to judge it
10:17 poikilotherm1 Maybe one day there will be a Dataverse-NG building on what the crazy folks at Warsaw did...
10:17 pkiraly Custom metadata blocks: in the next weeks I will work on an old idea of mine. Instead of modifying Solr schema file, firing Solr Schema API. It affects your previous work on this field, as the script you wrote will be no needed (if my approach works). I have a question however: do you know if are there others using xinclude in Solr schema?
10:18 poikilotherm1 ;-) Just wanted to paint a more complete picture about where we are in the Java ecosystem...
10:20 pkiraly The Dataverse source code is quite mixed, reflects different coding paradims and styles. I can not estimate how much work would be (in person month) to unify the codebase under technology X.
10:21 pkiraly And it definitively can not be done with baby steps, only in one giant step...
10:22 poikilotherm1 Well one of those giant steps was getting of GF 4.1
10:23 poikilotherm1 The code base has some very dusty and even ugly places. But fixing all of 'em is quite hard because that would be too much workload at IQSS IMHO
10:24 poikilotherm1 It's not so much about technology, more on the human, politics and change management side of things
10:25 poikilotherm1 Some people's life is dedicated to the well being of this project, so we need to be carefull ;-)
10:26 poikilotherm1 Hey it's been 2 years for me of constant talk and people like pdurbin spreading the word and acting as amplifiers to get things moving. I'm so glad we all are together on this way in this amazing project.
10:29 pkiraly I agree
10:35 pkiraly What about metadata block? Do you have an objection against my plan or things I should take care in the implementation?
10:48 poikilotherm1 Could you give me quick overview of your latest plans on this?
10:50 pkiraly Sure. In Solr there is a Schema API, which lets the admin to add, modify or remove fields programatically via REST calls.
10:52 pkiraly To turn it in the Solr config file we have to change the mode of schema management, and the schema.xml should be renamed to managed-schema.
10:53 pkiraly If this step is done, we can write the Solr field manipulation function within Dataverse, and we do not need any extra step to fetch the list of fields, and add them to Solr.
10:53 pkiraly We can trigger the whole process with a single Dataverse API call
10:54 pkiraly That's the essence. I would like to make this process a kind of transactional, so it happen only if no error occured meantime, and excluding partial success..
11:08 poikilotherm1 Would it make sense to use a migration tool like Flyway for this?
11:08 poikilotherm1 There are other ideas floating around, too. Jim and I were talking about adding metadata to NoSQL datastore because of the issues with Solr schemas
11:08 poikilotherm1 (Thus thinking in the direction of becoming schemaless)
11:09 poikilotherm1 How would you like to handle metadata schema changes?
11:10 poikilotherm1 I see a lot of open questions on this, what Dataverse could do for us and where to draw the line
11:11 poikilotherm1 Oh that NoSQL stuff was an idea for using MongoDB to store additional metadata, which does not fit in available schemas, maybe because there is no standard for a community etc
11:42 donsizemore joined #dataverse
12:23 pkiraly To make it schemaless in Solr is also a possibility, I use that elsewhere. In Solr schema you can use * as a wildcard, and treat *_ss as a string type filed, so author_ss, title_ss etc. will be indexed.
12:25 pkiraly MongoDB could not solve the searching functionalities I think
12:26 pkiraly schema change: it depends. As I mentioned the Solr Schema API is able manage some changes. Other changes require reindexing.
12:27 pkiraly But the same is true for the current situation
13:27 pdurbin joined #dataverse
14:18 JonathanNeal joined #dataverse
14:19 pdurbin poikilotherm1 pkiraly: you should go look at the DVN 3 code, which was even worse. :)
14:19 pdurbin From my perspective, things are slowly getting better. :)
14:26 pkiraly pdurbin, Hi, did not say that it is good or bad, just that it is in mixed style, so it would be some effort to turn it to a unified style.
14:27 pkiraly pdurbin, Today I just heard that a new Dataverse instance is on the horizon in Hungary.
14:35 dataverse-user joined #dataverse
14:35 pdurbin Yes, definitely mixed.
14:36 pdurbin And yes, some folks from that installation have been in this chat room, I believe.
14:40 pdurbin pkiraly poikilotherm1: What do you think about the new tiny Java microservice in https://github.com/IQSS/dataverse/pull/6986 ? Is this a chance to use Quarkus or whatever?
14:45 poikilotherm1 I'm crying :'-(
14:45 pdurbin Why?
14:49 poikilotherm1 I might be a fan of over-engineering, but hacks in Dataverse seem to me of having a long history of becoming a defacto standard.
14:50 pdurbin Oh, sure, plenty of hacks.
14:50 pdurbin But which hack do you mean? :)
14:51 poikilotherm1 The very PR you mentioned
14:52 poikilotherm1 My heart is bleeding :-(
14:52 donsizemore @poikilotherm1 well clean it up, you'll ruin the carpet.
14:54 pdurbin Or buy darker carpet.
15:20 dataverse-user joined #dataverse
15:43 poikilotherm1 Guys if you are interested, there is a first draft of my DCM slides. I appreciate any feedback. It's a 10 minute lightning talk. http://talks.bertuch.name/dcm2020/#/
15:46 * pdurbin clicks
15:46 pdurbin poikilotherm1: by the way, your pull request is in QA now. For email groups.
15:49 poikilotherm1 Oh I didn't notice :-D
15:50 pdurbin poikilotherm1: the diagram is cool but "dataverse from cmdline" is hard to read. Orange on dark grey, almost black.
15:54 pdurbin Yes to this: Provide a Java-based Operator and reuse existing
15:54 pdurbin poikilotherm1: oh... actually. Is that a typo? Existing what?
15:55 poikilotherm1 Existing operators for Solr and Postgres
15:55 pdurbin ok, it's cut off for me
15:55 poikilotherm1 No need to write those ourselfs ;-)
15:56 pdurbin oh, you mean for it to be cut off
15:56 pdurbin might help to add a period at least
15:57 pdurbin also, typo on the last slide: "dailyp"
15:57 pdurbin otherwise, looks great!
15:57 pdurbin Thank you for dragging us into the present and future!
15:58 poikilotherm1 Anything missing?
15:58 poikilotherm1 To much detail?
15:59 poikilotherm1 Too long for 10 minutes?
15:59 pdurbin I guess what I want to know is how we can enable what you're doing. You wanted us to get off Glassfish 4. Done (in develop). Will you be talking about other stuff like this?
16:04 poikilotherm1 Nope. Slava requested a talk about the project and what it can do now. Do you feel like I should emphasize a bit more on the "Make Dataverse cloud-native!" part?
16:05 pdurbin Well, it's fine to focus on what Slava wants. But you understand what I want. Maybe that could be a future talk.
16:06 pdurbin Also, I'd be happy to trade practice talks if you'd like. I'm giving a 10 minute talk to introduce external tools.
16:07 poikilotherm1 Maybe :-D
16:07 poikilotherm1 BTW pdurbin. I had an idea just this morning about deployment times... Wanna hear?
16:08 pdurbin sure!
16:10 poikilotherm1 Remember https://github.com/IQSS/dataverse/issues/5871 ?
16:10 poikilotherm1 :-)
16:10 poikilotherm1 My idea is as follows: everytime Dataverse is deployed, the complete code is scanned for entities and applied to the database. I wonder if it might give a speedup if we don't do that. And it would reduce warnings....
16:11 poikilotherm1 It should be much faster to lookup the flyway migration status in the database instead of doing scanning and applying with failures
16:13 pdurbin So... I'm not opposed to doing whatever we can to speed up deployment. But isn't deployment slow because our war file is over 200 MB?
16:13 poikilotherm1 This is just a theory. Dunno if this is a real issue.
16:14 poikilotherm1 And I have no idea how to measure before doing a fix to verify
16:14 poikilotherm1 I don't believe this is a single issue thing for deployment times. Technical debts are high in the codebase...
16:16 pdurbin Sure. Multiple issues. Agreed.
16:17 poikilotherm1 OK guys, gotta go now... Construction site calling.
16:18 poikilotherm1 Read you later. Send me a ping if you need my attention :-D
16:18 poikilotherm1 (Looking at the co-chair @donsizemore here...)
16:18 pdurbin o/
16:31 jri joined #dataverse
17:09 donsizemore @poikilotherm1 looks good to me
19:04 jri joined #dataverse
20:12 poikilotherm1 donsizemore U still around?
21:25 jri joined #dataverse
21:47 jri joined #dataverse
22:06 JonathanNeal joined #dataverse
22:39 pmauduit joined #dataverse
22:39 yoh joined #dataverse
22:39 JonathanNeal joined #dataverse
22:39 pdurbin joined #dataverse
22:39 icarito[m] joined #dataverse
22:39 sivoais joined #dataverse
22:39 larsks joined #dataverse
22:39 bjonnh joined #dataverse
22:42 poikilotherm1 joined #dataverse
22:46 sivoais joined #dataverse
23:47 jri joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.