Time
S
Nick
Message
06:11
Lincoln joined #dataverse
06:19
LSherpa joined #dataverse
07:24
Virgile joined #dataverse
09:50
lincoln joined #dataverse
09:51
LSherpa joined #dataverse
09:52
juancorr joined #dataverse
11:12
Lincoln joined #dataverse
11:23
pkiraly joined #dataverse
12:14
donsizemore joined #dataverse
13:21
Lincoln
i have a question with s3 ---A dataset when created in s3 is saved in different bucket ( s3://10.5072 , which is not even present in the cluster ).Although I have another bucket made and configured in domain.xml file.
13:21
Lincoln
can anyone assist me in this please?
13:24
Virgile joined #dataverse
13:29
pkiraly joined #dataverse
13:30
Virgile joined #dataverse
14:45
pdurbin joined #dataverse
14:47
pdurbin
Lincoln: hmm, did you possibly create that dataset before configuring your DOI "authority" (the 10.whatever number)?
14:56
Lincoln
@pdurbin: i did not configure doi authority and went directly with creating dataset using GUI
14:59
Lincoln
there already is a default doi and I am creating dataset using GUI
14:59
pdurbin
Ok. I don't know a lot about the S3 stuff. Let me try to ping someone who knows more.
15:01
Jim86 joined #dataverse
15:01
Lincoln
@pdurbin: appreciate it.
15:02
pdurbin
In dev, on my laptop, I almost always use the filesystem instead of S3 or Swift. I've configured for S3 before but it's probably been two years.
15:04
Jim86
It should be stored in the bucket you defined. The s3://10.5072... shown for the dataset does not indicate the bucket. Look at the file entries and you'll see they include the bucket.
15:06
Jim86
The overall process is the storageidentifier for the datafiles shows the storageid (s3://) and bucket and the part of the id unique to the file, while the dataset (which has no representation is s3/any store) has the store id and the path used as offsets for the file (which is the DOI auth/shoulder<id> of the dataset.
15:06
Lincoln
i have checked and there is no bucket called s3://10.5072 or any objects but from pgadmin i can see s3://10.5072/somealphabets
15:07
Jim86
So the two entries have to be combined and you should see files in the bucket show in their storageidentifier, with an object id like 10.5072/FK2ABCDEF/4365943546763874853
15:07
Jim86
what bucket do you have configured in your jvm options?
15:08
Lincoln
right now the bucket configured is s3docs
15:09
Lincoln
i have even tried with 10.5072 and it didnot work
15:09
Jim86
and can you do 'aws s3 ls -recursive s3docs' and see entries
15:10
Lincoln
aws access key id does not exist in record
15:11
Jim86
In what I typed above you actually need --recursive (two - chars).
15:11
Lincoln
yes i tried with --recursive
15:11
Jim86
So to get aws client to work, you have to be using a unix client that has the aws credentials
15:12
Lincoln
yes
15:12
Jim86
So the aws command above gives the 'aws access key id does not exist in record' as a response?
15:14
Lincoln
yeah it says that The aws access key id does not exist in our records. I have checked both file (config and credentials ) of .aws folder and they are the same also while calling aws --endpoint-url=*** s3api list-buckets
15:15
Lincoln
it works as normal
15:15
Jim50 joined #dataverse
15:16
Lincoln
error incurred when calling listObjects operation
15:17
Jim50
So when I type 'aws s3 ls --recursive <qdr's dev bucket name> I see entries like 10.5072/FK2ACAHCN/16a36db5829-a5dbbff23f65 in it. Those correspond to a a dataset with storeageidentifier s3://10.5072/FK2ACAHCN and a file with s3://qdr-dev-bucket/16a36db5829-a5dbbff23f65 in the postgres database.
15:19
Jim50
My guess would be that you have something misconfigured with aws, at least for the account/machine where you're using the command-line client. That's hard to debug remotely, but once you can view inside your s3docs bucket, you should find your files
15:19
Lincoln
@jim50: in config file of .aws folder i have no regions defined and i have left it blank
15:19
Lincoln
(there is no region defined)
15:19
Jim50
Does your Dataverse work to upload and download files from that s3 store?
15:20
Lincoln
no the dataverse doesnot work in upload and download ( i get internal server error)
15:20
Jim50
If so, the .aws/* files for the account running payara on that machine should be good, so you can verify you have those and/or just try that account from that same machine with the aws client.
15:21
Jim50
OK, so that could be because it can't connect to aws (the same error you have with the client).
15:23
Jim50
I don't know all of the things that can go wrong with aws credentials, but its possible you need the region (I don't know if they assume a default) or that the account you're using doesn't have the right permissions - it would need to be able to listObjects and to create them (for file uploads).
15:25
Lincoln
i am using s3cmd command ..what would be the analogous command to aws s3cmd ls --recursive bucket name
15:27
Jim50
I don't know. From https://s3tools.org/usage: List objects or buckets s3cmd ls [s3://BUCKET[/PREFIX]], List all object in all buckets s3cmd la
15:29
Lincoln
@jim50: thank you for the help. But the sad part is that the datasets are not stored
15:29
Jim50
Not using --recursive you should still be able to do things like s3cmd ls s3://s3docs/10.5072/FK2ACAHCN/ to just get the files for one dataset (with aws s3, the last / is needed)
15:29
Lincoln
although from postgres you can see s3://10.5072/something
15:29
Jim50
Datasets are never stored in s3, only the files.
15:29
Jim50
Datasets are database-only
15:31
Lincoln
oh thanks.. i am new to s3...do i need rsync as well
15:31
Lincoln
i mean dcm module
15:32
Jim50
Once configured correctly, Dataverse will allow upload/download of files, so you wouldn't need to do anything else. If you want to backup the files, I think the aws client has an rsynch equivalent (sync?) that can copy all files from your bucket to a locak disk.
15:32
Jim50
No, s3 stores are separate from the rsync/dcm mechanism
15:33
Jim50
That was one of the reasons for using s3 and the direct-upload option - it is a way to handle larger files without having to set anything else up.
15:34
Lincoln
Jim50:thank you..that saved a lot of time
15:34
Jim50
FWIW: One thing to explore might be to set up a local file store and then see how the files are layed out - it's the same organization in s3 once you get that working.
15:34
Jim50
YW
15:34
Jim50
Good luck!
15:36
Lincoln
Jim50: much appreciated . ill try with local file store first
15:36
pdurbin
yes, thanks Jim50
15:59
pkiraly joined #dataverse
16:00
pkiraly
Hi, is there anybody, who runs integration tests? I found problems with how the environment should set up.
16:01
* pdurbin
raises his hand
16:02
pdurbin
pkiraly: I'll tell you what I do. First I run the installer to get everything set up. Then I run the dev rebuild script to get set up for integration tests.
16:03
pdurbin
The script I added in https://github.com/IQSS/dataverse/pull/7363
16:03
pdurbin
./setup-all.sh --insecure is important, for example
16:04
pdurbin
not to be used in prod, of course!
16:04
pkiraly
pdurbin, what I am doing is this set of commands:
16:04
pkiraly
cd conf/docker-aio
16:04
pkiraly
./0prep_deps.sh
16:04
pkiraly
./1prep.sh
16:04
pkiraly
docker build -t dv0 -f c8.dockerfile .
16:05
pkiraly
docker run -d -p 8083:8080 -p 8084:80 --name dv dv0
16:05
pkiraly
docker exec dv /opt/dv/setupIT.bash
16:05
pkiraly
docker exec dv /usr/local/glassfish4/bin/asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8084"
16:05
pkiraly
cd ../..
16:05
pkiraly
conf/docker-aio/run-test-suite.sh
16:06
pdurbin
Ah, the old docker-aio. I haven't tried it in a while.
16:07
pdurbin
donsizemore: have you tried it lately? And good morning.
16:07
pkiraly
It is quite strange that ./1prep.sh creates it's own Maven environment, and reset JAVA_HOME with a nonexistent path. Maybe this path is existing in the machine where the script was written, but not at my side.
16:07
pdurbin
pkiraly: you're on "develop" right? Fairly recent commit on develop?
16:07
donsizemore
I cribbed from it for the podman-aio dealy I'm pecking on
16:07
pkiraly
pdurbin, yes, I am develop
16:09
pdurbin
donsizemore: have you gotten a successful api test run from docker-aio lately? I haven't tried in a coon's age.
16:09
pkiraly
I am on develop
16:09
donsizemore
@pkiraly the maven packaged in CentOS 7 and 8 both require java 1.8, hence the installation of a custom maven binary for java 11
16:09
donsizemore
I welcome smarter ways to deal with that.
16:10
pkiraly
donsizemore, the 1prep.sh doesn't says anything about version, look at this line:
16:10
pkiraly
echo "export JAVA_HOME=/usr/lib/jvm/jre-openjdk" > maven/maven.sh
16:11
donsizemore
that part was historically necessary for Jim, but I forget why
16:11
donsizemore
no, wait. could you send me a link to that line?
16:12
pkiraly
I am fine with it as far as there is a documentation somewhere that what is expected to be there.
16:13
pkiraly
https://github.com/IQSS/dataverse/blob/develop/conf/docker-aio/1prep.sh#L20
16:14
donsizemore
the mkdir is on line 18, and it's there so we can source it to call a newer maven with java 11
16:15
donsizemore
you're saying /usr/lib/jvm/jre-openjdk is a non-existent path?
16:16
pkiraly
I do not have it in my machines
16:17
pkiraly
I have JDKs such as /usr/lib/jvm/java-11-openjdk-amd64/
16:19
donsizemore
but if you'd like a switch for Debian/Ubuntu that would make the script more robust
16:21
pkiraly
pdurbin, you said: "First I run the installer to get everything set up." am I correct that you do not do it in a Dockerized environment, but you have all components installed normally on the host machine?
16:22
pkiraly
donsizemore, I forgot to mention that I am on an Ubuntu machine
16:22
donsizemore
yes, I asked that part, but my first message at 11:19 didn't go through
16:22
donsizemore
I said that /usr/lib/jvm/jre-openjdk was a safe, generic bet for RHEL /CentOS
16:23
donsizemore
but if you'd like a switch for Debian/Ubuntu that would make the script more robust
16:23
pkiraly
what if the script checks first if that path is available, and if not it stops with a message?
16:24
pdurbin
pkiraly: right, I have Payara, Postgres, and Solr installed directly on my Mac, like the dev guide describes in the "setting up a dev envionment" page.
16:24
pdurbin
I'm trying docker-aio on develop, by the way.
16:25
donsizemore
@pkiraly better to determine the path based on OS , and just set it? (or make it configurable)
16:26
pdurbin
At the end I got:
16:26
pdurbin
- docker-aio ready to run integration tests
16:26
pdurbin
- {"status":"OK","data":{"version":"5.3","build":"develop-bae37ca1c"}}
16:27
pdurbin
Hmm! When I run ./conf/docker-aio/run-test-suite.sh I get a message about JAVA_HOME
16:28
pkiraly
pdurbin, sometimes I saw that, sometimes I don't. Maybe it is due to my machines (I try it in two different Ubuntu machines)
16:29
pdurbin
I think the problem (for me on Mac) export JAVA_HOME=/usr/lib/jvm/jre-openjdk becasue that directory doesn't exist.
16:30
pkiraly
pdurbin, maybe you also do not have /usr/lib/jvm/jre-openjdk. My dirty solution for that was editing maven/maven.sh
16:30
pdurbin
Yeah, or I could probably change it to just `mvn` because I use it all the time.
16:30
pdurbin
That is, I have `mvn` installed on my Mac and use it all the time.
16:31
pdurbin
Oh, or rather, I could try just deleting `source maven/maven.sh && ` since mvn is already in my path.
16:32
donsizemore
it's definitely there to work around RHEL /CentOS' packaged maven requiring java-1.8
16:32
donsizemore
but some smarter solution would be welcome
16:32
pdurbin
For now the easiest thing for me is to just delete that "source" command. Really I'm just curious about if all the tests pass in docker-aio or not.
16:33
pdurbin
The fact that no one has complained is probably a good indication that no developers are regularly running the API tests locally. :)
16:34
pkiraly
pdurbin, BTW I am working on this ticket: https://github.com/IQSS/dataverse/issues/7431 , and I found that an important settings `:OAIServerEnabled` is not documented. I also found that OAI server classes mostly are missing the unit tests, and only two OAI verbs have integration test.
16:35
pdurbin
Ah, if you can add tests and docs along with a code fix, that would be great.
16:35
donsizemore
@pdurbin they succeeded for me yesterday and again this morning.
16:40
pdurbin
Tests run: 146, Failures: 0, Errors: 0, Skipped: 8
16:40
pkiraly
pdurbin, yes I wrote some tests, just not was not able so far to run them... Anyway I'll figure it out, and do some documentation as part of the PR
16:41
pdurbin
Looking good. So docker-aio works for me on my Mac if I remove the "source" command from the "run API tests" script. I also got slightly tripped up by having an oldish war file lying around in the "target" directory but a clean and a package got me fixed up there.
16:42
pdurbin
pkiraly: keep in mind that Jenkins will run these tests after you make a pull request. So you can always check the results there. But I can understand wanting to run them locally first.
16:43
pdurbin
Usually instead of running the full API test suite, I run the tests I'm working on. And then I check the results in Jenkins.
17:08
pkiraly
pdurbin, the Jenkins tasks run for forked repos as well, am I correct?
17:32
pkiraly joined #dataverse
17:36
donsizemore
@pkiraly yes, but in the -PR job rather than -develop
18:10
donsizemore
@pkiraly and sadly, comparing /usr/lib/jvm on CentOS 8 vs. my Ubuntu 20.04 machine, I see no symlink in common. perhaps we could make docker-aio to create one.
18:23
pdurbin
donsizemore: my thought was: 1. check if `mvn` is available. 2. if not, run the "source" script
18:23
pdurbin
pkiraly: they're linked from your pull requests (everyone's pull requests)
18:24
donsizemore
@pdurbin sounds good unless there are multiple mvn binaries
18:25
pdurbin
Well, if there are multiple, developers probably have something in their PATH to point to the right one.
18:25
pdurbin
I'm fine with whatever. I'm also fine with how it is now. I can delete the call to that script when I want to run the tests.
18:27
pdurbin
pkiraly: for example, https://github.com/IQSS/dataverse/pull/7673 links to https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7673/1/display/redirect (under "Show all checks")
20:14
pkiraly
donsizemore, pdurbin thanks!
20:15
donsizemore
@pkiraly if nothing else, I took a look?
22:03
dataverse-user joined #dataverse