IRC log for #dataverse, 2021-03-12

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

Time	Nick	Message
06:11		Lincoln joined #dataverse
06:19		LSherpa joined #dataverse
07:24		Virgile joined #dataverse
09:50		lincoln joined #dataverse
09:51		LSherpa joined #dataverse
09:52		juancorr joined #dataverse
11:12		Lincoln joined #dataverse
11:23		pkiraly joined #dataverse
12:14		donsizemore joined #dataverse
13:21	Lincoln	i have a question with s3 ---A dataset when created in s3 is saved in different bucket ( s3://10.5072 , which is not even present in the cluster ).Although I have another bucket made and configured in domain.xml file.
13:21	Lincoln	can anyone assist me in this please?
13:24		Virgile joined #dataverse
13:29		pkiraly joined #dataverse
13:30		Virgile joined #dataverse
14:45		pdurbin joined #dataverse
14:47	pdurbin	Lincoln: hmm, did you possibly create that dataset before configuring your DOI "authority" (the 10.whatever number)?
14:56	Lincoln	@pdurbin: i did not configure doi authority and went directly with creating dataset using GUI
14:59	Lincoln	there already is a default doi and I am creating dataset using GUI
14:59	pdurbin	Ok. I don't know a lot about the S3 stuff. Let me try to ping someone who knows more.
15:01		Jim86 joined #dataverse
15:01	Lincoln	@pdurbin: appreciate it.
15:02	pdurbin	In dev, on my laptop, I almost always use the filesystem instead of S3 or Swift. I've configured for S3 before but it's probably been two years.
15:04	Jim86	It should be stored in the bucket you defined. The s3://10.5072... shown for the dataset does not indicate the bucket. Look at the file entries and you'll see they include the bucket.
15:06	Jim86	The overall process is the storageidentifier for the datafiles shows the storageid (s3://) and bucket and the part of the id unique to the file, while the dataset (which has no representation is s3/any store) has the store id and the path used as offsets for the file (which is the DOI auth/shoulder<id> of the dataset.
15:06	Lincoln	i have checked and there is no bucket called s3://10.5072 or any objects but from pgadmin i can see s3://10.5072/somealphabets
15:07	Jim86	So the two entries have to be combined and you should see files in the bucket show in their storageidentifier, with an object id like 10.5072/FK2ABCDEF/4365943546763874853
15:07	Jim86	what bucket do you have configured in your jvm options?
15:08	Lincoln	right now the bucket configured is s3docs
15:09	Lincoln	i have even tried with 10.5072 and it didnot work
15:09	Jim86	and can you do 'aws s3 ls -recursive s3docs' and see entries
15:10	Lincoln	aws access key id does not exist in record
15:11	Jim86	In what I typed above you actually need --recursive (two - chars).
15:11	Lincoln	yes i tried with --recursive
15:11	Jim86	So to get aws client to work, you have to be using a unix client that has the aws credentials
15:12	Lincoln	yes
15:12	Jim86	So the aws command above gives the 'aws access key id does not exist in record' as a response?
15:14	Lincoln	yeah it says that The aws access key id does not exist in our records. I have checked both file (config and credentials ) of .aws folder and they are the same also while calling aws --endpoint-url=*** s3api list-buckets
15:15	Lincoln	it works as normal
15:15		Jim50 joined #dataverse
15:16	Lincoln	error incurred when calling listObjects operation
15:17	Jim50	So when I type 'aws s3 ls --recursive <qdr's dev bucket name> I see entries like 10.5072/FK2ACAHCN/16a36db5829-a5dbbff23f65 in it. Those correspond to a a dataset with storeageidentifier s3://10.5072/FK2ACAHCN and a file with s3://qdr-dev-bucket/16a36db5829-a5dbbff23f65 in the postgres database.
15:19	Jim50	My guess would be that you have something misconfigured with aws, at least for the account/machine where you're using the command-line client. That's hard to debug remotely, but once you can view inside your s3docs bucket, you should find your files
15:19	Lincoln	@jim50: in config file of .aws folder i have no regions defined and i have left it blank
15:19	Lincoln	(there is no region defined)
15:19	Jim50	Does your Dataverse work to upload and download files from that s3 store?
15:20	Lincoln	no the dataverse doesnot work in upload and download ( i get internal server error)
15:20	Jim50	If so, the .aws/* files for the account running payara on that machine should be good, so you can verify you have those and/or just try that account from that same machine with the aws client.
15:21	Jim50	OK, so that could be because it can't connect to aws (the same error you have with the client).
15:23	Jim50	I don't know all of the things that can go wrong with aws credentials, but its possible you need the region (I don't know if they assume a default) or that the account you're using doesn't have the right permissions - it would need to be able to listObjects and to create them (for file uploads).
15:25	Lincoln	i am using s3cmd command ..what would be the analogous command to aws s3cmd ls --recursive bucket name
15:27	Jim50	I don't know. From https://s3tools.org/usage: List objects or buckets s3cmd ls [s3://BUCKET[/PREFIX]], List all object in all buckets s3cmd la
15:29	Lincoln	@jim50: thank you for the help. But the sad part is that the datasets are not stored
15:29	Jim50	Not using --recursive you should still be able to do things like s3cmd ls s3://s3docs/10.5072/FK2ACAHCN/ to just get the files for one dataset (with aws s3, the last / is needed)
15:29	Lincoln	although from postgres you can see s3://10.5072/something
15:29	Jim50	Datasets are never stored in s3, only the files.
15:29	Jim50	Datasets are database-only
15:31	Lincoln	oh thanks.. i am new to s3...do i need rsync as well
15:31	Lincoln	i mean dcm module
15:32	Jim50	Once configured correctly, Dataverse will allow upload/download of files, so you wouldn't need to do anything else. If you want to backup the files, I think the aws client has an rsynch equivalent (sync?) that can copy all files from your bucket to a locak disk.
15:32	Jim50	No, s3 stores are separate from the rsync/dcm mechanism
15:33	Jim50	That was one of the reasons for using s3 and the direct-upload option - it is a way to handle larger files without having to set anything else up.
15:34	Lincoln	Jim50:thank you..that saved a lot of time
15:34	Jim50	FWIW: One thing to explore might be to set up a local file store and then see how the files are layed out - it's the same organization in s3 once you get that working.
15:34	Jim50	YW
15:34	Jim50	Good luck!
15:36	Lincoln	Jim50: much appreciated . ill try with local file store first
15:36	pdurbin	yes, thanks Jim50
15:59		pkiraly joined #dataverse
16:00	pkiraly	Hi, is there anybody, who runs integration tests? I found problems with how the environment should set up.
16:01	* pdurbin	raises his hand
16:02	pdurbin	pkiraly: I'll tell you what I do. First I run the installer to get everything set up. Then I run the dev rebuild script to get set up for integration tests.
16:03	pdurbin	The script I added in https://github.com/IQSS/dataverse/pull/7363
16:03	pdurbin	./setup-all.sh --insecure is important, for example
16:04	pdurbin	not to be used in prod, of course!
16:04	pkiraly	pdurbin, what I am doing is this set of commands:
16:04	pkiraly	cd conf/docker-aio
16:04	pkiraly	./0prep_deps.sh
16:04	pkiraly	./1prep.sh
16:04	pkiraly	docker build -t dv0 -f c8.dockerfile .
16:05	pkiraly	docker run -d -p 8083:8080 -p 8084:80 --name dv dv0
16:05	pkiraly	docker exec dv /opt/dv/setupIT.bash
16:05	pkiraly	docker exec dv /usr/local/glassfish4/bin/asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8084"
16:05	pkiraly	cd ../..
16:05	pkiraly	conf/docker-aio/run-test-suite.sh
16:06	pdurbin	Ah, the old docker-aio. I haven't tried it in a while.
16:07	pdurbin	donsizemore: have you tried it lately? And good morning.
16:07	pkiraly	It is quite strange that ./1prep.sh creates it's own Maven environment, and reset JAVA_HOME with a nonexistent path. Maybe this path is existing in the machine where the script was written, but not at my side.
16:07	pdurbin	pkiraly: you're on "develop" right? Fairly recent commit on develop?
16:07	donsizemore	I cribbed from it for the podman-aio dealy I'm pecking on
16:07	pkiraly	pdurbin, yes, I am develop
16:09	pdurbin	donsizemore: have you gotten a successful api test run from docker-aio lately? I haven't tried in a coon's age.
16:09	pkiraly	I am on develop
16:09	donsizemore	@pkiraly the maven packaged in CentOS 7 and 8 both require java 1.8, hence the installation of a custom maven binary for java 11
16:09	donsizemore	I welcome smarter ways to deal with that.
16:10	pkiraly	donsizemore, the 1prep.sh doesn't says anything about version, look at this line:
16:10	pkiraly	echo "export JAVA_HOME=/usr/lib/jvm/jre-openjdk" > maven/maven.sh
16:11	donsizemore	that part was historically necessary for Jim, but I forget why
16:11	donsizemore	no, wait. could you send me a link to that line?
16:12	pkiraly	I am fine with it as far as there is a documentation somewhere that what is expected to be there.
16:13	pkiraly	https://github.com/IQSS/dataverse/blob/develop/conf/docker-aio/1prep.sh#L20
16:14	donsizemore	the mkdir is on line 18, and it's there so we can source it to call a newer maven with java 11
16:15	donsizemore	you're saying /usr/lib/jvm/jre-openjdk is a non-existent path?
16:16	pkiraly	I do not have it in my machines
16:17	pkiraly	I have JDKs such as /usr/lib/jvm/java-11-openjdk-amd64/
16:19	donsizemore	but if you'd like a switch for Debian/Ubuntu that would make the script more robust
16:21	pkiraly	pdurbin, you said: "First I run the installer to get everything set up." am I correct that you do not do it in a Dockerized environment, but you have all components installed normally on the host machine?
16:22	pkiraly	donsizemore, I forgot to mention that I am on an Ubuntu machine
16:22	donsizemore	yes, I asked that part, but my first message at 11:19 didn't go through
16:22	donsizemore	I said that /usr/lib/jvm/jre-openjdk was a safe, generic bet for RHEL/CentOS
16:23	donsizemore	but if you'd like a switch for Debian/Ubuntu that would make the script more robust
16:23	pkiraly	what if the script checks first if that path is available, and if not it stops with a message?
16:24	pdurbin	pkiraly: right, I have Payara, Postgres, and Solr installed directly on my Mac, like the dev guide describes in the "setting up a dev envionment" page.
16:24	pdurbin	I'm trying docker-aio on develop, by the way.
16:25	donsizemore	@pkiraly better to determine the path based on OS, and just set it? (or make it configurable)
16:26	pdurbin	At the end I got:
16:26	pdurbin	- docker-aio ready to run integration tests
16:26	pdurbin	- {"status":"OK","data":{"version":"5.3","build":"develop-bae37ca1c"}}
16:27	pdurbin	Hmm! When I run ./conf/docker-aio/run-test-suite.sh I get a message about JAVA_HOME
16:28	pkiraly	pdurbin, sometimes I saw that, sometimes I don't. Maybe it is due to my machines (I try it in two different Ubuntu machines)
16:29	pdurbin	I think the problem (for me on Mac) export JAVA_HOME=/usr/lib/jvm/jre-openjdk becasue that directory doesn't exist.
16:30	pkiraly	pdurbin, maybe you also do not have /usr/lib/jvm/jre-openjdk. My dirty solution for that was editing maven/maven.sh
16:30	pdurbin	Yeah, or I could probably change it to just `mvn` because I use it all the time.
16:30	pdurbin	That is, I have `mvn` installed on my Mac and use it all the time.
16:31	pdurbin	Oh, or rather, I could try just deleting `source maven/maven.sh && ` since mvn is already in my path.
16:32	donsizemore	it's definitely there to work around RHEL/CentOS' packaged maven requiring java-1.8
16:32	donsizemore	but some smarter solution would be welcome
16:32	pdurbin	For now the easiest thing for me is to just delete that "source" command. Really I'm just curious about if all the tests pass in docker-aio or not.
16:33	pdurbin	The fact that no one has complained is probably a good indication that no developers are regularly running the API tests locally. :)
16:34	pkiraly	pdurbin, BTW I am working on this ticket: https://github.com/IQSS/dataverse/issues/7431, and I found that an important settings `:OAIServerEnabled` is not documented. I also found that OAI server classes mostly are missing the unit tests, and only two OAI verbs have integration test.
16:35	pdurbin	Ah, if you can add tests and docs along with a code fix, that would be great.
16:35	donsizemore	@pdurbin they succeeded for me yesterday and again this morning.
16:40	pdurbin	Tests run: 146, Failures: 0, Errors: 0, Skipped: 8
16:40	pkiraly	pdurbin, yes I wrote some tests, just not was not able so far to run them... Anyway I'll figure it out, and do some documentation as part of the PR
16:41	pdurbin	Looking good. So docker-aio works for me on my Mac if I remove the "source" command from the "run API tests" script. I also got slightly tripped up by having an oldish war file lying around in the "target" directory but a clean and a package got me fixed up there.
16:42	pdurbin	pkiraly: keep in mind that Jenkins will run these tests after you make a pull request. So you can always check the results there. But I can understand wanting to run them locally first.
16:43	pdurbin	Usually instead of running the full API test suite, I run the tests I'm working on. And then I check the results in Jenkins.
17:08	pkiraly	pdurbin, the Jenkins tasks run for forked repos as well, am I correct?
17:32		pkiraly joined #dataverse
17:36	donsizemore	@pkiraly yes, but in the -PR job rather than -develop
18:10	donsizemore	@pkiraly and sadly, comparing /usr/lib/jvm on CentOS 8 vs. my Ubuntu 20.04 machine, I see no symlink in common. perhaps we could make docker-aio to create one.
18:23	pdurbin	donsizemore: my thought was: 1. check if `mvn` is available. 2. if not, run the "source" script
18:23	pdurbin	pkiraly: they're linked from your pull requests (everyone's pull requests)
18:24	donsizemore	@pdurbin sounds good unless there are multiple mvn binaries
18:25	pdurbin	Well, if there are multiple, developers probably have something in their PATH to point to the right one.
18:25	pdurbin	I'm fine with whatever. I'm also fine with how it is now. I can delete the call to that script when I want to run the tests.
18:27	pdurbin	pkiraly: for example, https://github.com/IQSS/dataverse/pull/7673 links to https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-7673/1/display/redirect (under "Show all checks")
20:14	pkiraly	donsizemore, pdurbin thanks!
20:15	donsizemore	@pkiraly if nothing else, I took a look?
22:03		dataverse-user joined #dataverse

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.