Time
S
Nick
Message
05:18
poikilotherm2 joined #dataverse
07:16
Virgile joined #dataverse
09:14
Virgile joined #dataverse
11:23
Virgile joined #dataverse
13:41
poikilotherm2
Rejoice, fellow Dataversians
13:41
poikilotherm2
Payara released their new version https://github.com/payara/Payara/releases/tag/payara-server-5.2021.1
13:41
poikilotherm2
Including my enhanced DirConfigSource :-)
13:42
poikilotherm2
donsizemore pdurbin we should talk about the production domain removal since 5.2020.7
13:43
poikilotherm2
See https://github.com/gdcc/dataverse-kubernetes/issues/218
13:45
poikilotherm2
And pdurbin donsizemore if you have a minute later, I'd like to know your opinion on using the Payara Docker images or going for custom made.
14:21
donsizemore joined #dataverse
15:04
pdurbin joined #dataverse
15:15
pdurbin
poikilotherm2: which do you want to talk about first?
15:29
poikilotherm2
All :-D
15:29
pdurbin
first in, all out
15:31
poikilotherm2
What sounds most interesting to you?
15:34
pdurbin
Well, a quick one should be the "production" domain. We never used it. We stuck with "domain1" one. So I don't think "production" being removed affects us.
15:34
poikilotherm2
Certainly.
15:35
poikilotherm2
Although the production ready domain has some nice stuff out of the box
15:35
pdurbin
ah
15:35
poikilotherm2
So I was wondering if we should create sth reusable in all types of installs
15:36
poikilotherm2
I copied the differences from the Payara docs in https://github.com/gdcc/dataverse-kubernetes/issues/218#issuecomment-776635621
15:37
poikilotherm2
Pool sizes and others sound pretty useful for any non-dev installation
15:48
pdurbin
Well, the installer already has a concept of a "dev install". Maybe we could build on that.
15:48
poikilotherm2
I created a diff of the two domain.xml
15:48
poikilotherm2
I'll add it to the issue
15:49
pdurbin
sounds good
15:50
poikilotherm2
https://github.com/gdcc/dataverse-kubernetes/issues/218#issuecomment-776807042
15:51
poikilotherm2
So maybe this is connected to my other question
15:51
poikilotherm2
Currently, my idea was to rely on the Payara Image builds
15:51
poikilotherm2
(For container images)
15:52
poikilotherm2
As we are on JDK 11, we might choose to do otherwise
15:52
poikilotherm2
We can tweak a bit more to our use cases if we don't rely on them...
15:53
poikilotherm2
And we could either go with the openjdk11 image (used by solr, so reducing image pulls)
15:53
poikilotherm2
Or we could go with the Redhat cloud images
15:54
pdurbin
Sorry, I missed something. The Payara builds use JDK 8?
15:54
poikilotherm2
Both.
15:55
poikilotherm2
Switched to JDK11 recently
15:55
poikilotherm2
Just a matter of switching tags
15:55
poikilotherm2
I also have to admit that Payara does not do daily builds
15:55
poikilotherm2
So their images might contain more security issues
15:56
poikilotherm2
OpenJDK pushes daily
15:56
poikilotherm2
And has ARM based images
15:57
pdurbin
Well, it's smart to not rely on images you don't trust (for security reasons or whatever).
16:05
pdurbin
We're not really in the Docker world enough for this to be a concern.
16:44
poikilotherm2
So you think we should go for building our own images with daily security updates?
16:47
pdurbin
Do I have to do anything? :)
16:47
poikilotherm2
Tell me if I should go with Redhat Containers or with Debian (openjdk images use it)
16:48
pdurbin
!
16:48
pdurbin
That sounds like a question for donsizemore
16:48
pdurbin
poikilotherm2: meanwhile, please check out this German I can't read: https://github.com/IQSS/dataverse/issues/7598#issuecomment-776840786
16:48
pdurbin
this: https://www.izus.uni-stuttgart.de/fokus/fdm-projekte/xsample/
16:51
poikilotherm2
What would you need?
16:52
poikilotherm2
Need translation?
16:52
pdurbin
Meh, that's ok. I just thought you might be interested.
16:55
pdurbin
I don't have any opinion for the images question. I've heard of Debian. I haven't heard of Redhat Containers. :)
16:55
poikilotherm2
The XSamples website is pretty much saying nothing...
16:55
poikilotherm2
Bla bla bla
16:56
pdurbin
heh, like my talks!
16:56
poikilotherm2
All "we want to do this" but no "that's what we do"
16:56
pdurbin
dreams
16:57
poikilotherm2
Aye
17:04
dataverse-user joined #dataverse
18:34
nightowl313 joined #dataverse
18:39
nightowl313
hi all ... wonder if I could ask a quick question (maybe quick?) ... I am analyzing exactly what happens to files in s3 when they are deleted in dataverse as we are trying to complete our replication/DR workflow; we have a bucket in prod and are replicating that to a bucket in our DR account and changeing to glacier storage; I've noticed that if a file is uploaded and deleted in draft mode in DV, the file is completely deleted in DV, bu
18:39
nightowl313
does this sound correct?
18:41
nightowl313
and, when the file is uploaded, it also creates "cached" copies of all of the various download versions available in S3?
18:43
pdurbin
nightowl313: sorry, your first line got a little cut off. What's after "but"? http://irclog.iq.harvard.edu/dataverse/2021-02-10#i_134701
18:44
nightowl313
haha i need to stop typing long posts!
18:44
nightowl313
but kept in AWS s3 with a delete marker; if the file is uploaded and the dataset published, and then deleted, the file is still saved in DV and accessible in the version history, and is still an active file in aws s3 (no delete markers) 11:39 does this sound correct?
18:45
nightowl313
file uploaded - not published - deleted - completely deleted from dataverse version history, still shows in versions in aws with delete marker
18:45
nightowl313
file uploaded - published - deleted - file still appears and is accessed in dv by selecting previous version - version shows in aws as active file - not deleted
18:45
pdurbin
Unfortunately, I'm not very familiar with the S3 code. Let me see if I can summon someone who is.
18:46
nightowl313
that's what it appears to be doing .. just wanted to verify ... we are just trying to figure out what really needs to go to glacier
18:46
nightowl313
if cached copies of all of the download formats are created, we probably want to exclude those?
18:47
nightowl313
just wondering what other folks are doing here with backing up and data retention
18:48
pdurbin
I think you have the right idea that the originals are the most important to backup.
18:48
Jim46 joined #dataverse
18:49
nightowl313
does it just create those copies for quicker access?
18:49
nightowl313
sorry I always have weird questions!
18:49
Jim46
The Dataverse code just deletes the file in S3. I think s3 can be configured with versioning, in which case deleted files are just marked as deleted.
18:49
nightowl313
yea, we have to turn on versioning in order to replicate the bucket to another account ... so I think that is what is happening
18:50
nightowl313
we are probably making this much more complicated than it has to be!
18:51
Jim46
I haven't fully thought things through, but I suspect s3 versioning isn't needed as Dataverse manages it's own versions and never changes a stored file.
18:52
Jim46
For backup - it's a general open question as to whether you could/should store things that, if the data were re-entered into Dataverse, would be recreated - ingested tab files, DDI metadata extracted in ingest, metadata exports, thumbnails, etc.
18:53
nightowl313
yea, we are overcomplicating it because of audit requirements (3 copies in 3 different storage locations, etc.. ) ... okay ... so in general, when a file is uploaded, does dataverse create cached versions of the file formats?
18:53
nightowl313
oh yes, that makes sense .. that we may actually need those cached versions?
18:53
nightowl313
that helps a lot .. something we need to decide
18:54
nightowl313
cached versions of the metadata exports I think is what those are
18:56
pdurbin
donsizemore: Mandy's up! https://reusableresearch.com
18:58
pdurbin
nightowl313: Dataverse only creates archival versions of tabular files. The idea is to take a proprietary Stata file and create a plain text TSV from it.
18:58
Jim46
the cached metadata exports duplicate what's in the database, so if you are backing that up, you don't need them/they'll be reproduced on demand (there's an api call to recreate them too).
19:00
nightowl313
oh perfect! Those two comments sum it up (I was testing with a tabular file) ... that helps a lot! Thank you so much!
19:02
pdurbin
donsizemore: she claims to have coined "co ray ray"!
19:03
donsizemore
she also said Matthew _had_ to use Django ;)
19:03
pdurbin
damn
19:11
pdurbin
"Data Curation Result: Major Issues"
19:52
donsizemore
Mandy always holds back
19:54
pdurbin
Heh. She's a live wire!
20:02
pdurbin
"Software has continued to weasel its way into the very fabric of society" -- Titus Winters
21:45
nightowl313 joined #dataverse
22:01
pdurbin left #dataverse