IQSS logo

IRC log for #dataverse, 2021-02-10

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
05:18 poikilotherm2 joined #dataverse
07:16 Virgile joined #dataverse
09:14 Virgile joined #dataverse
11:23 Virgile joined #dataverse
13:41 poikilotherm2 Rejoice, fellow Dataversians
13:41 poikilotherm2 Payara released their new version https://github.com/payara/Payara/releases/tag/payara-server-5.2021.1
13:41 poikilotherm2 Including my enhanced DirConfigSource :-)
13:42 poikilotherm2 donsizemore pdurbin we should talk about the production domain removal since 5.2020.7
13:43 poikilotherm2 See https://github.com/gdcc/dataverse-kubernetes/issues/218
13:45 poikilotherm2 And pdurbin donsizemore if you have a minute later, I'd like to know your opinion on using the Payara Docker images or going for custom made.
14:21 donsizemore joined #dataverse
15:04 pdurbin joined #dataverse
15:15 pdurbin poikilotherm2: which do you want to talk about first?
15:29 poikilotherm2 All :-D
15:29 pdurbin first in, all out
15:31 poikilotherm2 What sounds most interesting to you?
15:34 pdurbin Well, a quick one should be the "production" domain. We never used it. We stuck with "domain1" one. So I don't think "production" being removed affects us.
15:34 poikilotherm2 Certainly.
15:35 poikilotherm2 Although the production ready domain has some nice stuff out of the box
15:35 pdurbin ah
15:35 poikilotherm2 So I was wondering if we should create sth reusable in all types of installs
15:36 poikilotherm2 I copied the differences from the Payara docs in https://github.com/gdcc/dataverse-kubernetes/issues/218#issuecomment-776635621
15:37 poikilotherm2 Pool sizes and others sound pretty useful for any non-dev installation
15:48 pdurbin Well, the installer already has a concept of a "dev install". Maybe we could build on that.
15:48 poikilotherm2 I created a diff of the two domain.xml
15:48 poikilotherm2 I'll add it to the issue
15:49 pdurbin sounds good
15:50 poikilotherm2 https://github.com/gdcc/dataverse-kubernetes/issues/218#issuecomment-776807042
15:51 poikilotherm2 So maybe this is connected to my other question
15:51 poikilotherm2 Currently, my idea was to rely on the Payara Image builds
15:51 poikilotherm2 (For container images)
15:52 poikilotherm2 As we are on JDK 11, we might choose to do otherwise
15:52 poikilotherm2 We can tweak a bit more to our use cases if we don't rely on them...
15:53 poikilotherm2 And we could either go with the openjdk11 image (used by solr, so reducing image pulls)
15:53 poikilotherm2 Or we could go with the Redhat cloud images
15:54 pdurbin Sorry, I missed something. The Payara builds use JDK 8?
15:54 poikilotherm2 Both.
15:55 poikilotherm2 Switched to JDK11 recently
15:55 poikilotherm2 Just a matter of switching tags
15:55 poikilotherm2 I also have to admit that Payara does not do daily builds
15:55 poikilotherm2 So their images might contain more security issues
15:56 poikilotherm2 OpenJDK pushes daily
15:56 poikilotherm2 And has ARM based images
15:57 pdurbin Well, it's smart to not rely on images you don't trust (for security reasons or whatever).
16:05 pdurbin We're not really in the Docker world enough for this to be a concern.
16:44 poikilotherm2 So you think we should go for building our own images with daily security updates?
16:47 pdurbin Do I have to do anything? :)
16:47 poikilotherm2 Tell me if I should go with Redhat Containers or with Debian (openjdk images use it)
16:48 pdurbin !
16:48 pdurbin That sounds like a question for donsizemore
16:48 pdurbin poikilotherm2: meanwhile, please check out this German I can't read: https://github.com/IQSS/dataverse/issues/7598#issuecomment-776840786
16:48 pdurbin this: https://www.izus.uni-stuttgart.de/fokus/fdm-projekte/xsample/
16:51 poikilotherm2 What would you need?
16:52 poikilotherm2 Need translation?
16:52 pdurbin Meh, that's ok. I just thought you might be interested.
16:55 pdurbin I don't have any opinion for the images question. I've heard of Debian. I haven't heard of Redhat Containers. :)
16:55 poikilotherm2 The XSamples website is pretty much saying nothing...
16:55 poikilotherm2 Bla bla bla
16:56 pdurbin heh, like my talks!
16:56 poikilotherm2 All "we want to do this" but no "that's what we do"
16:56 pdurbin dreams
16:57 poikilotherm2 Aye
17:04 dataverse-user joined #dataverse
18:34 nightowl313 joined #dataverse
18:39 nightowl313 hi all ... wonder if I could ask a quick question (maybe quick?) ... I am analyzing exactly what happens to files in s3 when they are deleted in dataverse as we are trying to complete our replication/DR workflow; we have a bucket in prod and are replicating that to a bucket in our DR account and changeing to glacier storage; I've noticed that if a file is uploaded and deleted in draft mode in DV, the file is completely deleted in DV, bu
18:39 nightowl313 does this sound correct?
18:41 nightowl313 and, when the file is uploaded, it also creates "cached" copies of all of the various download versions available in S3?
18:43 pdurbin nightowl313: sorry, your first line got a little cut off. What's after "but"? http://irclog.iq.harvard.edu/dataverse/2021-02-10#i_134701
18:44 nightowl313 haha i need to stop typing long posts!
18:44 nightowl313 but kept in AWS s3 with a delete marker; if the file is uploaded and the dataset published, and then deleted, the file is still saved in DV and accessible in the version history, and is still an active file in aws s3 (no delete markers) 11:39 does this sound correct?
18:45 nightowl313 file uploaded - not published - deleted - completely deleted from dataverse version history, still shows in versions in aws with delete marker
18:45 nightowl313 file uploaded - published - deleted - file still appears and is accessed in dv by selecting previous version - version shows in aws as active file - not deleted
18:45 pdurbin Unfortunately, I'm not very familiar with the S3 code. Let me see if I can summon someone who is.
18:46 nightowl313 that's what it appears to be doing .. just wanted to verify ... we are just trying to figure out what really needs to go to glacier
18:46 nightowl313 if cached copies of all of the download formats are created, we probably want to exclude those?
18:47 nightowl313 just wondering what other folks are doing here with backing up and data retention
18:48 pdurbin I think you have the right idea that the originals are the most important to backup.
18:48 Jim46 joined #dataverse
18:49 nightowl313 does it just create those copies for quicker access?
18:49 nightowl313 sorry I always have weird questions!
18:49 Jim46 The Dataverse code just deletes the file in S3. I think s3 can be configured with versioning, in which case deleted files are just marked as deleted.
18:49 nightowl313 yea, we have to turn on versioning in order to replicate the bucket to another account ... so I think that is what is happening
18:50 nightowl313 we are probably making this much more complicated than it has to be!
18:51 Jim46 I haven't fully thought things through, but I suspect s3 versioning isn't needed as Dataverse manages it's own versions and never changes a stored file.
18:52 Jim46 For backup - it's a general open question as to whether you could/should store things that, if the data were re-entered into Dataverse, would be recreated - ingested tab files, DDI metadata extracted in ingest, metadata exports, thumbnails, etc.
18:53 nightowl313 yea, we are overcomplicating it because of audit requirements (3 copies in 3 different storage locations, etc.. ) ... okay ... so in general, when a file is uploaded, does dataverse create cached versions of the file formats?
18:53 nightowl313 oh yes, that makes sense .. that we may actually need those cached versions?
18:53 nightowl313 that helps a lot .. something we need to decide
18:54 nightowl313 cached versions of the metadata exports I think is what those are
18:56 pdurbin donsizemore: Mandy's up! https://reusableresearch.com
18:58 pdurbin nightowl313: Dataverse only creates archival versions of tabular files. The idea is to take a proprietary Stata file and create a plain text TSV from it.
18:58 Jim46 the cached metadata exports duplicate what's in the database, so if you are backing that up, you don't need them/they'll be reproduced on demand (there's an api call to recreate them too).
19:00 nightowl313 oh perfect! Those two comments sum it up (I was testing with a tabular file) ... that helps a lot! Thank you so much!
19:02 pdurbin donsizemore: she claims to have coined "co ray ray"!
19:03 donsizemore she also said Matthew _had_ to use Django ;)
19:03 pdurbin damn
19:11 pdurbin "Data Curation Result: Major Issues"
19:52 donsizemore Mandy always holds back
19:54 pdurbin Heh. She's a live wire!
20:02 pdurbin "Software has continued to weasel its way into the very fabric of society" -- Titus Winters
21:45 nightowl313 joined #dataverse
22:01 pdurbin left #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.