Time 
            S 
            Nick 
            Message 
         
        
06:11  
     
 
Lincoln joined #dataverse 
 
        
06:19  
     
 
LSherpa joined #dataverse 
 
        
07:24  
     
 
Virgile joined #dataverse 
 
        
09:50  
     
 
lincoln joined #dataverse 
 
        
09:51  
     
 
LSherpa joined #dataverse 
 
        
09:52  
     
 
juancorr joined #dataverse 
 
        
11:12  
     
 
Lincoln joined #dataverse 
 
        
11:23  
     
 
pkiraly joined #dataverse 
 
        
12:14  
     
 
donsizemore joined #dataverse 
 
        
13:21  
     
Lincoln 
i have a question with s3 ---A dataset when created in s3 is saved in different bucket ( s3://10.5072 , which is not even present in the cluster ).Although I have another bucket made and configured in domain.xml file. 
 
        
13:21  
     
Lincoln 
can anyone assist me in this please? 
 
        
13:24  
     
 
Virgile joined #dataverse 
 
        
13:29  
     
 
pkiraly joined #dataverse 
 
        
13:30  
     
 
Virgile joined #dataverse 
 
        
14:45  
     
 
pdurbin joined #dataverse 
 
        
14:47  
     
pdurbin 
Lincoln: hmm, did you possibly create that dataset before configuring your DOI "authority" (the 10.whatever number)? 
 
        
14:56  
     
Lincoln 
@pdurbin: i did not configure doi authority and went directly with creating dataset using GUI  
 
        
14:59  
     
Lincoln 
there already is a default doi and I am creating dataset using GUI  
 
        
14:59  
     
pdurbin 
Ok. I don't know a lot about the S3 stuff. Let me try to ping someone who knows more. 
 
        
15:01  
     
 
Jim86 joined #dataverse 
 
        
15:01  
     
Lincoln 
@pdurbin: appreciate it. 
 
        
15:02  
     
pdurbin 
In dev, on my laptop, I almost always use the filesystem instead of S3 or Swift. I've configured for S3 before but it's probably been two years. 
 
        
15:04  
     
Jim86 
It should be stored in the bucket you defined. The s3://10.5072... shown for the dataset does not indicate the bucket. Look at the file entries and you'll see they include the bucket. 
 
        
15:06  
     
Jim86 
The overall process is the storageidentifier for the datafiles shows the storageid (s3://) and bucket and the part of the id unique to the file, while the dataset (which has no representation is s3/any store) has the store id and the path used as offsets for the file (which is the DOI auth/shoulder<id> of the dataset. 
 
        
15:06  
     
Lincoln 
i have checked and there is no bucket called s3://10.5072 or any objects but from pgadmin i can see s3://10.5072/somealphabets 
 
        
15:07  
     
Jim86 
So the two entries have to be combined and you should see files in the bucket show in their storageidentifier, with an object id like 10.5072/FK2ABCDEF/4365943546763874853 
 
        
15:07  
     
Jim86 
what bucket do you have configured in your jvm options? 
 
        
15:08  
     
Lincoln 
right now the bucket configured is s3docs 
 
        
15:09  
     
Lincoln 
i have even tried with 10.5072 and it didnot work 
 
        
15:09  
     
Jim86 
and can you do 'aws s3 ls -recursive s3docs' and see entries 
 
        
15:10  
     
Lincoln 
aws access key id does not exist in record 
 
        
15:11  
     
Jim86 
In what I typed above you actually need --recursive (two - chars). 
 
        
15:11  
     
Lincoln 
yes i tried with --recursive 
 
        
15:11  
     
Jim86 
So to get aws client to work, you have to be using a unix client that has the aws credentials 
 
        
15:12  
     
Lincoln 
yes 
 
        
15:12  
     
Jim86 
So the aws command above gives the 'aws access key id does not exist in record' as a response? 
 
        
15:14  
     
Lincoln 
yeah it says that The aws access key id does not exist in our records. I have checked both file (config and credentials ) of .aws folder and they are the same also while calling aws --endpoint-url=*** s3api list-buckets 
 
        
15:15  
     
Lincoln 
it works as normal 
 
        
15:15  
     
 
Jim50 joined #dataverse 
 
        
15:16  
     
Lincoln 
error incurred when calling listObjects operation 
 
        
15:17  
     
Jim50 
So when I type 'aws s3 ls --recursive <qdr's dev bucket name> I see entries like 10.5072/FK2ACAHCN/16a36db5829-a5dbbff23f65 in it. Those correspond to a a dataset with storeageidentifier s3://10.5072/FK2ACAHCN and a file with s3://qdr-dev-bucket/16a36db5829-a5dbbff23f65 in the postgres database. 
 
        
15:19  
     
Jim50 
My guess would be that you have something misconfigured with aws, at least for the account/machine where you're using the command-line client. That's hard to debug remotely, but once you can view inside your s3docs bucket, you should find your files 
 
        
15:19  
     
Lincoln 
@jim50: in config  file of .aws folder i have no regions defined and i have left it blank 
 
        
15:19  
     
Lincoln 
(there is no region defined) 
 
        
15:19  
     
Jim50 
Does your Dataverse work to upload and download files from that s3 store? 
 
        
15:20  
     
Lincoln 
no the dataverse doesnot work in upload and download ( i get internal server error) 
 
        
15:20  
     
Jim50 
If so, the .aws/* files for the account running payara on that machine should be good, so you can verify you have those and/or just try that account from that same machine with the aws client. 
 
        
15:21  
     
Jim50 
OK, so that could be because it can't connect to aws (the same error you have with the client). 
 
        
15:23  
     
Jim50 
I don't know all of the things that can go wrong with aws credentials, but its possible you need the region (I don't know if they assume a default) or that the account you're using doesn't have the right permissions - it would need to be able to listObjects and to create them (for file uploads). 
 
        
15:25  
     
Lincoln 
i am using s3cmd command ..what would be the analogous command to aws s3cmd ls --recursive bucket name 
 
        
15:27  
     
Jim50 
I don't know. From https://s3tools.org/usage:    List objects or buckets       s3cmd ls [s3://BUCKET[/PREFIX]], List all object in all buckets       s3cmd la 
 
        
15:29  
     
Lincoln 
@jim50: thank you for the help. But the sad part is that the datasets are not stored 
 
        
15:29  
     
Jim50 
Not using --recursive you should still be able to do things like s3cmd ls s3://s3docs/10.5072/FK2ACAHCN/ to just get the files for one dataset (with aws s3, the last / is needed) 
 
        
15:29  
     
Lincoln 
although from postgres you can see s3://10.5072/something 
 
        
15:29  
     
Jim50 
Datasets are never stored in s3, only the files. 
 
        
15:29  
     
Jim50 
Datasets are database-only 
 
        
15:31  
     
Lincoln 
oh thanks.. i am new to s3...do i need rsync as well 
 
        
15:31  
     
Lincoln 
i mean dcm module 
 
        
15:32  
     
Jim50 
Once configured correctly, Dataverse will allow upload/download of files, so you wouldn't need to do anything else. If you want to backup the files, I think the aws client has an rsynch equivalent (sync?) that can copy all files from your bucket to a locak disk. 
 
        
15:32  
     
Jim50 
No, s3 stores are separate from the rsync/dcm mechanism 
 
        
15:33  
     
Jim50 
That was one of the reasons for using s3 and the direct-upload option - it is a way to handle larger files without having to set anything else up. 
 
        
15:34  
     
Lincoln 
Jim50:thank you..that saved a lot of time 
 
        
15:34  
     
Jim50 
FWIW: One thing to explore might be to set up a local file store and then see how the files are layed out - it's the same organization in s3 once you get that working. 
 
        
15:34  
     
Jim50 
YW 
 
        
15:34  
     
Jim50 
Good luck! 
 
        
15:36  
     
Lincoln 
Jim50: much appreciated . ill try with local file store first 
 
        
15:36  
     
pdurbin 
yes, thanks Jim50 
 
        
15:59  
     
 
pkiraly joined #dataverse 
 
        
16:00  
     
pkiraly 
Hi, is there anybody, who runs integration tests? I found problems with how the environment should set up. 
 
        
16:01  
     
* pdurbin 
raises his hand 
 
        
16:02  
     
pdurbin 
pkiraly: I'll tell you what I do. First I run the installer to get everything set up. Then I run the dev rebuild script to get set up for integration tests. 
 
        
16:03  
     
pdurbin 
The script I added in https://github.com/IQSS/dataverse/pull/7363  
 
        
16:03  
     
pdurbin 
./setup-all.sh --insecure is important, for example 
 
        
16:04  
     
pdurbin 
not to be used in prod, of course! 
 
        
16:04  
     
pkiraly 
pdurbin, what I am doing is this set of commands: 
 
        
16:04  
     
pkiraly 
cd conf/docker-aio 
 
        
16:04  
     
pkiraly 
./0prep_deps.sh 
 
        
16:04  
     
pkiraly 
./1prep.sh 
 
        
16:04  
     
pkiraly 
docker build -t dv0 -f c8.dockerfile . 
 
        
16:05  
     
pkiraly 
docker run -d -p 8083:8080 -p 8084:80 --name dv dv0 
 
        
16:05  
     
pkiraly 
docker exec dv /opt/dv/setupIT.bash 
 
        
16:05  
     
pkiraly 
docker exec dv /usr/local/glassfish4/bin/asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8084" 
 
        
16:05  
     
pkiraly 
cd ../.. 
 
        
16:05  
     
pkiraly 
conf/docker-aio/run-test-suite.sh 
 
        
16:06  
     
pdurbin 
Ah, the old docker-aio. I haven't tried it in a while. 
 
        
16:07  
     
pdurbin 
donsizemore: have you tried it lately? And good morning. 
 
        
16:07  
     
pkiraly 
It is quite strange that ./1prep.sh creates it's own Maven environment, and reset JAVA_HOME with a nonexistent path. Maybe this path is existing in the machine where the script was written, but not at my side. 
 
        
16:07  
     
pdurbin 
pkiraly: you're on "develop" right? Fairly recent commit on develop? 
 
        
16:07  
     
donsizemore 
I cribbed from it for the podman-aio dealy I'm pecking on 
 
        
16:07  
     
pkiraly 
pdurbin, yes, I am develop 
 
        
16:09  
     
pdurbin 
donsizemore: have you gotten a successful api test run from docker-aio lately? I haven't tried in a coon's age. 
 
        
16:09  
     
pkiraly 
I am on develop 
 
        
16:09  
     
donsizemore 
@pkiraly the maven packaged in CentOS 7 and 8 both require java 1.8, hence the installation of a custom maven binary for java 11 
 
        
16:09  
     
donsizemore 
I welcome smarter ways to deal with that. 
 
        
16:10  
     
pkiraly 
donsizemore, the 1prep.sh doesn't says anything about version, look at this line: 
 
        
16:10  
     
pkiraly 
echo "export JAVA_HOME=/usr/lib/jvm/jre-openjdk" > maven/maven.sh 
 
        
16:11  
     
donsizemore 
that part was historically necessary for Jim, but I forget why 
 
        
16:11  
     
donsizemore 
no, wait. could you send me a link to that line? 
 
        
16:12  
     
pkiraly 
I am fine with it as far as there is a documentation somewhere that what is expected to be there. 
 
        
16:13  
     
pkiraly 
https://github.com/IQSS/dataverse/blob/develop/conf/docker-aio/1prep.sh#L20  
 
        
16:14  
     
donsizemore 
the mkdir is on line 18, and it's there so we can source it to call a newer maven with java 11 
 
        
16:15  
     
donsizemore 
you're saying /usr/lib/jvm/jre-openjdk is a non-existent path? 
 
        
16:16  
     
pkiraly 
I do not have it in my machines 
 
        
16:17  
     
pkiraly 
I have JDKs such as /usr/lib/jvm/java-11-openjdk-amd64/ 
 
        
16:19  
     
donsizemore 
but if you'd like a switch for Debian/Ubuntu that would make the script more robust 
 
        
16:21  
     
pkiraly 
pdurbin, you said: "First I run the installer to get everything set up." am I correct that you do not do it in a Dockerized environment, but you have all components installed normally on the host machine? 
 
        
16:22  
     
pkiraly 
donsizemore, I forgot to mention that I am on an Ubuntu machine 
 
        
16:22  
     
donsizemore 
yes, I asked that part, but my first message at 11:19 didn't go through 
 
        
16:22  
     
donsizemore 
I said that /usr/lib/jvm/jre-openjdk was a safe, generic bet for RHEL /CentOS 
 
        
16:23  
     
donsizemore 
but if you'd like a switch for Debian/Ubuntu that would make the script more robust 
 
        
16:23  
     
pkiraly 
what if the script checks first if that path is available, and if not it stops with a message? 
 
        
16:24  
     
pdurbin 
pkiraly: right, I have Payara, Postgres, and Solr installed directly on my Mac, like the dev guide describes in the "setting up a dev envionment" page. 
 
        
16:24  
     
pdurbin 
I'm trying docker-aio on develop, by the way. 
 
        
16:25  
     
donsizemore 
@pkiraly better to determine the path based on OS , and just set it? (or make it configurable) 
 
        
16:26  
     
pdurbin 
At the end I got: 
 
        
16:26  
     
pdurbin 
- docker-aio ready to run integration tests 
 
        
16:26  
     
pdurbin 
- {"status":"OK","data":{"version":"5.3","build":"develop-bae37ca1c"}} 
 
        
16:27  
     
pdurbin 
Hmm! When I run ./conf/docker-aio/run-test-suite.sh I get a message about JAVA_HOME 
 
        
16:28  
     
pkiraly 
pdurbin, sometimes I saw that, sometimes I don't. Maybe it is due to my machines (I try it in two different Ubuntu machines) 
 
        
16:29  
     
pdurbin 
I think the problem (for me on Mac) export JAVA_HOME=/usr/lib/jvm/jre-openjdk becasue that directory doesn't exist. 
 
        
16:30  
     
pkiraly 
pdurbin, maybe you also do not have /usr/lib/jvm/jre-openjdk. My dirty solution for that was editing maven/maven.sh 
 
        
16:30  
     
pdurbin 
Yeah, or I could probably change it to just `mvn` because I use it all the time. 
 
        
16:30  
     
pdurbin 
That is, I have `mvn` installed on my Mac and use it all the time. 
 
        
16:31  
     
pdurbin 
Oh, or rather, I could try just deleting `source maven/maven.sh && ` since mvn is already in my path. 
 
        
16:32  
     
donsizemore 
it's definitely there to work around RHEL /CentOS' packaged maven requiring java-1.8 
 
        
16:32  
     
donsizemore 
but some smarter solution would be welcome 
 
        
16:32  
     
pdurbin 
For now the easiest thing for me is to just delete that "source" command. Really I'm just curious about if all the tests pass in docker-aio or not. 
 
        
16:33  
     
pdurbin 
The fact that no one has complained is probably a good indication that no developers are regularly running the API  tests locally. :) 
 
        
16:34  
     
pkiraly 
pdurbin, BTW I am working on this ticket: https://github.com/IQSS/dataverse/issues/7431 , and I found that an important settings `:OAIServerEnabled` is not documented. I also found that OAI server classes mostly are missing the unit tests, and only two OAI verbs have integration test. 
 
        
16:35  
     
pdurbin 
Ah, if you can add tests and docs along with a code fix, that would be great. 
 
        
16:35  
     
donsizemore 
@pdurbin they succeeded for me yesterday and again this morning. 
 
        
16:40  
     
pdurbin 
Tests run: 146, Failures: 0, Errors: 0, Skipped: 8 
 
        
16:40  
     
pkiraly 
pdurbin, yes I wrote some tests, just not was not able so far to run them... Anyway I'll figure it out, and do some documentation as part of the PR 
 
        
16:41  
     
pdurbin 
Looking good. So docker-aio works for me on my Mac if I remove the "source" command from the "run API  tests" script. I also got slightly tripped up by having an oldish war file lying around in the "target" directory but a clean and a package got me fixed up there. 
 
        
16:42  
     
pdurbin 
pkiraly: keep in mind that Jenkins will run these tests after you make a pull request. So you can always check the results there. But I can understand wanting to run them locally first. 
 
        
16:43  
     
pdurbin 
Usually instead of running the full API  test suite, I run the tests I'm working on. And then I check the results in Jenkins. 
 
        
17:08  
     
pkiraly 
pdurbin, the Jenkins tasks run for forked repos as well, am I correct? 
 
        
17:32  
     
 
pkiraly joined #dataverse 
 
        
17:36  
     
donsizemore 
@pkiraly yes, but in the -PR job rather than -develop 
 
        
18:10  
     
donsizemore 
@pkiraly and sadly, comparing /usr/lib/jvm on CentOS 8 vs. my Ubuntu 20.04 machine, I see no symlink in common. perhaps we could make docker-aio to create one. 
 
        
18:23  
     
pdurbin 
donsizemore: my thought was: 1. check if `mvn` is available. 2. if not, run the "source" script 
 
        
18:23  
     
pdurbin 
pkiraly: they're linked from your pull requests (everyone's pull requests) 
 
        
18:24  
     
donsizemore 
@pdurbin sounds good unless there are multiple mvn binaries 
 
        
18:25  
     
pdurbin 
Well, if there are multiple, developers probably have something in their PATH to point to the right one. 
 
        
18:25  
     
pdurbin 
I'm fine with whatever. I'm also fine with how it is now. I can delete the call to that script when I want to run the tests. 
 
        
18:27  
     
pdurbin 
pkiraly: for example, https://github.com/IQSS/dataverse/pull/7673  links to https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7673/1/display/redirect  (under "Show all checks") 
 
        
20:14  
     
pkiraly 
donsizemore, pdurbin thanks! 
 
        
20:15  
     
donsizemore 
@pkiraly if nothing else, I took a look? 
 
        
22:03  
     
 
dataverse-user joined #dataverse