IQSS logo

IRC log for #dataverse, 2021-03-12

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
06:11 Lincoln joined #dataverse
06:19 LSherpa joined #dataverse
07:24 Virgile joined #dataverse
09:50 lincoln joined #dataverse
09:51 LSherpa joined #dataverse
09:52 juancorr joined #dataverse
11:12 Lincoln joined #dataverse
11:23 pkiraly joined #dataverse
12:14 donsizemore joined #dataverse
13:21 Lincoln i have a question with s3 ---A dataset when created in s3 is saved in different bucket ( s3://10.5072 , which is not even present in the cluster ).Although I have another bucket made and configured in domain.xml file.
13:21 Lincoln can anyone assist me in this please?
13:24 Virgile joined #dataverse
13:29 pkiraly joined #dataverse
13:30 Virgile joined #dataverse
14:45 pdurbin joined #dataverse
14:47 pdurbin Lincoln: hmm, did you possibly create that dataset before configuring your DOI "authority" (the 10.whatever number)?
14:56 Lincoln @pdurbin: i did not configure doi authority and went directly with creating dataset using GUI
14:59 Lincoln there already is a default doi and I am creating dataset using GUI
14:59 pdurbin Ok. I don't know a lot about the S3 stuff. Let me try to ping someone who knows more.
15:01 Jim86 joined #dataverse
15:01 Lincoln @pdurbin: appreciate it.
15:02 pdurbin In dev, on my laptop, I almost always use the filesystem instead of S3 or Swift. I've configured for S3 before but it's probably been two years.
15:04 Jim86 It should be stored in the bucket you defined. The s3://10.5072... shown for the dataset does not indicate the bucket. Look at the file entries and you'll see they include the bucket.
15:06 Jim86 The overall process is the storageidentifier for the datafiles shows the storageid (s3://) and bucket and the part of the id unique to the file, while the dataset (which has no representation is s3/any store) has the store id and the path used as offsets for the file (which is the DOI auth/shoulder<id> of the dataset.
15:06 Lincoln i have checked and there is no bucket called s3://10.5072 or any objects but from pgadmin i can see s3://10.5072/somealphabets
15:07 Jim86 So the two entries have to be combined and you should see files in the bucket show in their storageidentifier, with an object id like 10.5072/FK2ABCDEF/4365943546763874853
15:07 Jim86 what bucket do you have configured in your jvm options?
15:08 Lincoln right now the bucket configured is s3docs
15:09 Lincoln i have even tried with 10.5072 and it didnot work
15:09 Jim86 and can you do 'aws s3 ls -recursive s3docs' and see entries
15:10 Lincoln aws access key id does not exist in record
15:11 Jim86 In what I typed above you actually need --recursive (two - chars).
15:11 Lincoln yes i tried with --recursive
15:11 Jim86 So to get aws client to work, you have to be using a unix client that has the aws credentials
15:12 Lincoln yes
15:12 Jim86 So the aws command above gives the 'aws access key id does not exist in record' as a response?
15:14 Lincoln yeah it says that The aws access key id does not exist in our records. I have checked both file (config and credentials ) of .aws folder and they are the same also while calling aws --endpoint-url=*** s3api list-buckets
15:15 Lincoln it works as normal
15:15 Jim50 joined #dataverse
15:16 Lincoln error incurred when calling listObjects operation
15:17 Jim50 So when I type 'aws s3 ls --recursive <qdr's dev bucket name> I see entries like 10.5072/FK2ACAHCN/16a36db5829-a5dbbff23f65 in it. Those correspond to a a dataset with storeageidentifier s3://10.5072/FK2ACAHCN and a file with s3://qdr-dev-bucket/16a36db5829-a5dbbff23f65 in the postgres database.
15:19 Jim50 My guess would be that you have something misconfigured with aws, at least for the account/machine where you're using the command-line client. That's hard to debug remotely, but once you can view inside your s3docs bucket, you should find your files
15:19 Lincoln @jim50: in config  file of .aws folder i have no regions defined and i have left it blank
15:19 Lincoln (there is no region defined)
15:19 Jim50 Does your Dataverse work to upload and download files from that s3 store?
15:20 Lincoln no the dataverse doesnot work in upload and download ( i get internal server error)
15:20 Jim50 If so, the .aws/* files for the account running payara on that machine should be good, so you can verify you have those and/or just try that account from that same machine with the aws client.
15:21 Jim50 OK, so that could be because it can't connect to aws (the same error you have with the client).
15:23 Jim50 I don't know all of the things that can go wrong with aws credentials, but its possible you need the region (I don't know if they assume a default) or that the account you're using doesn't have the right permissions - it would need to be able to listObjects and to create them (for file uploads).
15:25 Lincoln i am using s3cmd command ..what would be the analogous command to aws s3cmd ls --recursive bucket name
15:27 Jim50 I don't know. From https://s3tools.org/usage:   List objects or buckets       s3cmd ls [s3://BUCKET[/PREFIX]], List all object in all buckets       s3cmd la
15:29 Lincoln @jim50: thank you for the help. But the sad part is that the datasets are not stored
15:29 Jim50 Not using --recursive you should still be able to do things like s3cmd ls s3://s3docs/10.5072/FK2ACAHCN/ to just get the files for one dataset (with aws s3, the last / is needed)
15:29 Lincoln although from postgres you can see s3://10.5072/something
15:29 Jim50 Datasets are never stored in s3, only the files.
15:29 Jim50 Datasets are database-only
15:31 Lincoln oh thanks.. i am new to s3...do i need rsync as well
15:31 Lincoln i mean dcm module
15:32 Jim50 Once configured correctly, Dataverse will allow upload/download of files, so you wouldn't need to do anything else. If you want to backup the files, I think the aws client has an rsynch equivalent (sync?) that can copy all files from your bucket to a locak disk.
15:32 Jim50 No, s3 stores are separate from the rsync/dcm mechanism
15:33 Jim50 That was one of the reasons for using s3 and the direct-upload option - it is a way to handle larger files without having to set anything else up.
15:34 Lincoln Jim50:thank you..that saved a lot of time
15:34 Jim50 FWIW: One thing to explore might be to set up a local file store and then see how the files are layed out - it's the same organization in s3 once you get that working.
15:34 Jim50 YW
15:34 Jim50 Good luck!
15:36 Lincoln Jim50: much appreciated . ill try with local file store first
15:36 pdurbin yes, thanks Jim50
15:59 pkiraly joined #dataverse
16:00 pkiraly Hi, is there anybody, who runs integration tests? I found problems with how the environment should set up.
16:01 * pdurbin raises his hand
16:02 pdurbin pkiraly: I'll tell you what I do. First I run the installer to get everything set up. Then I run the dev rebuild script to get set up for integration tests.
16:03 pdurbin The script I added in https://github.com/IQSS/dataverse/pull/7363
16:03 pdurbin ./setup-all.sh --insecure is important, for example
16:04 pdurbin not to be used in prod, of course!
16:04 pkiraly pdurbin, what I am doing is this set of commands:
16:04 pkiraly cd conf/docker-aio
16:04 pkiraly ./0prep_deps.sh
16:04 pkiraly ./1prep.sh
16:04 pkiraly docker build -t dv0 -f c8.dockerfile .
16:05 pkiraly docker run -d -p 8083:8080 -p 8084:80 --name dv dv0
16:05 pkiraly docker exec dv /opt/dv/setupIT.bash
16:05 pkiraly docker exec dv /usr/local/glassfish4/bin/asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8084"
16:05 pkiraly cd ../..
16:05 pkiraly conf/docker-aio/run-test-suite.sh
16:06 pdurbin Ah, the old docker-aio. I haven't tried it in a while.
16:07 pdurbin donsizemore: have you tried it lately? And good morning.
16:07 pkiraly It is quite strange that ./1prep.sh creates it's own Maven environment, and reset JAVA_HOME with a nonexistent path. Maybe this path is existing in the machine where the script was written, but not at my side.
16:07 pdurbin pkiraly: you're on "develop" right? Fairly recent commit on develop?
16:07 donsizemore I cribbed from it for the podman-aio dealy I'm pecking on
16:07 pkiraly pdurbin, yes, I am develop
16:09 pdurbin donsizemore: have you gotten a successful api test run from docker-aio lately? I haven't tried in a coon's age.
16:09 pkiraly I am on develop
16:09 donsizemore @pkiraly the maven packaged in CentOS 7 and 8 both require java 1.8, hence the installation of a custom maven binary for java 11
16:09 donsizemore I welcome smarter ways to deal with that.
16:10 pkiraly donsizemore, the 1prep.sh doesn't says anything about version, look at this line:
16:10 pkiraly echo "export JAVA_HOME=/usr/lib/jvm/jre-openjdk" > maven/maven.sh
16:11 donsizemore that part was historically necessary for Jim, but I forget why
16:11 donsizemore no, wait. could you send me a link to that line?
16:12 pkiraly I am fine with it as far as there is a documentation somewhere that what is expected to be there.
16:13 pkiraly https://github.com/IQSS/dataverse/blob/develop/conf/docker-aio/1prep.sh#L20
16:14 donsizemore the mkdir is on line 18, and it's there so we can source it to call a newer maven with java 11
16:15 donsizemore you're saying /usr/lib/jvm/jre-openjdk is a non-existent path?
16:16 pkiraly I do not have it in my machines
16:17 pkiraly I have JDKs such as /usr/lib/jvm/java-11-openjdk-amd64/
16:19 donsizemore but if you'd like a switch for Debian/Ubuntu that would make the script more robust
16:21 pkiraly pdurbin, you said: "First I run the installer to get everything set up." am I correct that you do not do it in a Dockerized environment, but you have all components installed normally on the host machine?
16:22 pkiraly donsizemore, I forgot to mention that I am on an Ubuntu machine
16:22 donsizemore yes, I asked that part, but my first message at 11:19 didn't go through
16:22 donsizemore I said that /usr/lib/jvm/jre-openjdk was a safe, generic bet for RHEL/CentOS
16:23 donsizemore but if you'd like a switch for Debian/Ubuntu that would make the script more robust
16:23 pkiraly what if the script checks first if that path is available, and if not it stops with a message?
16:24 pdurbin pkiraly: right, I have Payara, Postgres, and Solr installed directly on my Mac, like the dev guide describes in the "setting up a dev envionment" page.
16:24 pdurbin I'm trying docker-aio on develop, by the way.
16:25 donsizemore @pkiraly better to determine the path based on OS, and just set it? (or make it configurable)
16:26 pdurbin At the end I got:
16:26 pdurbin - docker-aio ready to run integration tests
16:26 pdurbin - {"status":"OK","data":{"version":"​5.3","build":"develop-bae37ca1c"}}
16:27 pdurbin Hmm! When I run ./conf/docker-aio/run-test-suite.sh I get a message about JAVA_HOME
16:28 pkiraly pdurbin, sometimes I saw that, sometimes I don't. Maybe it is due to my machines (I try it in two different Ubuntu machines)
16:29 pdurbin I think the problem (for me on Mac) export JAVA_HOME=/usr/lib/jvm/jre-openjdk becasue that directory doesn't exist.
16:30 pkiraly pdurbin, maybe you also do not have /usr/lib/jvm/jre-openjdk. My dirty solution for that was editing maven/maven.sh
16:30 pdurbin Yeah, or I could probably change it to just `mvn` because I use it all the time.
16:30 pdurbin That is, I have `mvn` installed on my Mac and use it all the time.
16:31 pdurbin Oh, or rather, I could try just deleting `source maven/maven.sh && ` since mvn is already in my path.
16:32 donsizemore it's definitely there to work around RHEL/CentOS' packaged maven requiring java-1.8
16:32 donsizemore but some smarter solution would be welcome
16:32 pdurbin For now the easiest thing for me is to just delete that "source" command. Really I'm just curious about if all the tests pass in docker-aio or not.
16:33 pdurbin The fact that no one has complained is probably a good indication that no developers are regularly running the API tests locally. :)
16:34 pkiraly pdurbin, BTW I am working on this ticket: https://github.com/IQSS/dataverse/issues/7431, and I found that an important settings `:OAIServerEnabled` is not documented. I also found that OAI server classes mostly are missing the unit tests, and only two OAI verbs have integration test.
16:35 pdurbin Ah, if you can add tests and docs along with a code fix, that would be great.
16:35 donsizemore @pdurbin they succeeded for me yesterday and again this morning.
16:40 pdurbin Tests run: 146, Failures: 0, Errors: 0, Skipped: 8
16:40 pkiraly pdurbin, yes I wrote some tests, just not was not able so far to run them... Anyway I'll figure it out, and do some documentation as part of the PR
16:41 pdurbin Looking good. So docker-aio works for me on my Mac if I remove the "source" command from the "run API tests" script. I also got slightly tripped up by having an oldish war file lying around in the "target" directory but a clean and a package got me fixed up there.
16:42 pdurbin pkiraly: keep in mind that Jenkins will run these tests after you make a pull request. So you can always check the results there. But I can understand wanting to run them locally first.
16:43 pdurbin Usually instead of running the full API test suite, I run the tests I'm working on. And then I check the results in Jenkins.
17:08 pkiraly pdurbin, the Jenkins tasks run for forked repos as well, am I correct?
17:32 pkiraly joined #dataverse
17:36 donsizemore @pkiraly yes, but in the -PR job rather than -develop
18:10 donsizemore @pkiraly and sadly, comparing /usr/lib/jvm on CentOS 8 vs. my Ubuntu 20.04 machine, I see no symlink in common. perhaps we could make docker-aio to create one.
18:23 pdurbin donsizemore: my thought was: 1. check if `mvn` is available. 2. if not, run the "source" script
18:23 pdurbin pkiraly: they're linked from your pull requests (everyone's pull requests)
18:24 donsizemore @pdurbin sounds good unless there are multiple mvn binaries
18:25 pdurbin Well, if there are multiple, developers probably have something in their PATH to point to the right one.
18:25 pdurbin I'm fine with whatever. I'm also fine with how it is now. I can delete the call to that script when I want to run the tests.
18:27 pdurbin pkiraly: for example, https://github.com/IQSS/dataverse/pull/7673 links to https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7673/1/display/redirect (under "Show all checks")
20:14 pkiraly donsizemore, pdurbin thanks!
20:15 donsizemore @pkiraly if nothing else, I took a look?
22:03 dataverse-user joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.