Time
S
Nick
Message
02:56
dataverse-user joined #dataverse
05:55
dataverse-user joined #dataverse
06:13
pdurbin
hi dataverse-user
06:34
dataverse-user
when I Uploaded files at dataverse web ui, there is a problem like "Exceed maximum number of files" I want to check if i can set configuration to enlarge the limited number.
06:40
pdurbin
dataverse-user: are you uploading a zip file? Maybe you can try changing the :ZipUploadFilesLimit database setting. Please see http://guides.dataverse.org/en/4.16/installation/config.html#zipuploadfileslimit
06:45
dataverse-user
uploading jpg files.
06:47
pdurbin
Hmm, jpg file. Maybe instead you should play with the :MultipleUploadFilesLimit setting. It doesn't appear to be documented. :(
06:54
dataverse-user
I got it. Thank you, and I will first try to upload zip typed files.
06:55
pdurbin
Ok. it looks like that setting was added in https://github.com/IQSS/dataverse/pull/3459
06:56
pdurbin
:MultipleUploadFilesLimit is 1000 files by default. If changing it helps (or even if it doesn't) please feel free to open an issue to document that setting.
06:58
pdurbin
One nice thing about using zip files is that you can organize your files into folders: http://guides.dataverse.org/en/4.16/user/dataset-management.html#file-path
07:08
dataverse-user
ok. many thanks~~
08:24
poikilotherm joined #dataverse
08:52
stefankasberger joined #dataverse
09:46
pdurbin
dataverse-user: sure. You're welcome. Also, you are welcome to choose a different "nickname" and list yourself in the "who's who" spreadsheet linked in the topic of this channel.
09:47
pdurbin
stefankasberger: do you know what would be interesting. To see how much code coverage we get from pointing pyDataverse at my server and then running the pyDataverse test suite.
09:48
pdurbin
poikilotherm: I got code coverage of the API test suite working, with help from pameyer.
09:50
pdurbin
I find it interesting to see which "commands" are being exercised by the API test suite: http://ec2-3-81-78-209.compute-1.amazonaws.com/target/coverage-it/edu.harvard.iq.dataverse.engine.command.impl/index.html
09:59
poikilotherm
Good morning pdurbin :-)
10:01
poikilotherm
pdurbin I need to get some stuff done over here for metadata...
10:01
poikilotherm
I wan't to build in support for custom metadatablocks in K8s
10:01
poikilotherm
Because we are going to use those... ;-)
10:01
poikilotherm
Also 4.16 changed the citation block, so this needs to be addressed
10:02
poikilotherm
And this also means handling schema changes for Solr
10:04
poikilotherm
I was wondering if we could change the schema.xml abit
10:05
poikilotherm
It would be totally awesome to include the dynamic fields in a separate XML file which is then xi:included in the schema.xml
10:05
poikilotherm
That way generating it on the fly would be much easier
10:06
poikilotherm
And Obviously also easier than having a template for schema.xml, which would need a separate processing step
10:33
pdurbin
poikilotherm: I thought you were interested (and even willing to help) with this API test suite code coverage stuff. :) But if Solr is in focus now, you might be interested in the comment I made a few hours ago to pkiraly: https://github.com/IQSS/dataverse/issues/5989#issuecomment-528219284 . It's exactly what you're talking about. :)
10:33
poikilotherm
Nope, it isn't ;-)
10:34
poikilotherm
Ad interest: I need to get the new release done
10:34
poikilotherm
I'm sure Slave will appreciate it ;-)
10:34
pdurbin
It's close. :)
10:34
poikilotherm
Slava
10:35
pdurbin
If you could leave a comment for Peter on that issue with your idea, it would be fantastic. On Skype the other day he said he's back from vacation and plans to work on that issue.
10:35
poikilotherm
And as it would be progress for things I do have to work on and is benefical for other stuff, this should be in focus ;-)
10:35
poikilotherm
Ok, then I'll do that
10:35
pdurbin
Thanks!
10:36
pdurbin
I can even put it in your column on my board if you want to help Peter make a pull request.
10:37
poikilotherm
I don't think so. Peter is looking into Managed Schema and more
10:37
poikilotherm
Using Schema API
10:37
poikilotherm
My idea is just a small workaround for easier shipping and deploying, but not touching the inner works
10:37
pdurbin
Ok, are you thinking you'd make a pull request with that alternate approach?
10:40
poikilotherm
Sure, if you would like me to do so
10:40
poikilotherm
I wonder if I should create another issue
10:40
poikilotherm
And reference things
10:41
poikilotherm
Peters approach seems to be pretty cool, but I don't know how long it will take to get there
10:41
pdurbin
If we go with your xi:included approach, we should make some decisions about which fields are factored out. Obvious candidates are the custom fields for Harvard such customMRA.tsv, customGSD.tsv etc. in https://github.com/IQSS/dataverse/tree/v4.16/scripts/api/data/metadatablocks . Yes a new issue would be fantastic.
10:41
poikilotherm
Actually I looked at the schema.xml in the code
10:42
poikilotherm
And I think that these parts should be moved into a included file:
10:42
poikilotherm
https://github.com/IQSS/dataverse/blob/09fe94bdc6f4f5e79c61a203b7df8736692657ea/conf/solr/7.3.1/schema.xml#L223-L450
10:43
poikilotherm
https://github.com/IQSS/dataverse/blob/09fe94bdc6f4f5e79c61a203b7df8736692657ea/conf/solr/7.3.1/schema.xml#L508-L735
10:44
poikilotherm
Those are coming from the "export" at the Dataverse API api/admin/index/solr
10:44
poikilotherm
+/schema
10:44
pdurbin
Yes they are. All in one file?
10:44
poikilotherm
We could split it. Also thought about changing the API endpoints to have two: one for fields, one copies
10:45
poikilotherm
So this can be more easily piped into a file
10:45
poikilotherm
But splitting by the "---" is also fine, just needs more script logic
10:45
pdurbin
I'd prefer separate files.
10:45
poikilotherm
Me too
10:45
pdurbin
Seems cleaner.
10:46
poikilotherm
I was going to say "leaner" :-D
10:46
pdurbin
Cleaner and leaner.
10:51
poikilotherm
All of this can be bundled in a script, fetching the fields from Dataverse, writting them to file and executing a solr core reload
10:52
poikilotherm
I can bundle this in a maintenance job
10:52
poikilotherm
Would you like me to add this script to upstream or shall it reside in dataverse-k8s?
10:53
pdurbin
Upstream please!
10:53
poikilotherm
Right. Then I am going to create an issue.
10:54
pdurbin
Thanks! Your script should probably be called from a future version of https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-all.sh
10:54
pdurbin
Here's where the "out of the box" metadata blocks are loaded: https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-datasetfields.sh
10:54
poikilotherm
Oh BTW. Is there any reason you guys stopped at Solr 7.3.1? Just asking because there is 7.7.1 now and the Docker images for 7.3.1 are not supported anymore
10:55
pdurbin
It's not? Woof. It's hard to keep up with Solr releases.
10:55
pdurbin
Have you heard that we are also on an old version of Glassfish? ;)
10:56
poikilotherm
Nope, not yet.
10:56
poikilotherm
Are you?
10:56
poikilotherm
%)
10:56
poikilotherm
Or better 8)
10:56
pdurbin
:)
11:15
poikilotherm
https://github.com/IQSS/dataverse/issues/6142
11:16
pdurbin
Yes, I see. And https://github.com/IQSS/dataverse-kubernetes/issues/85 . I have a question
11:17
pdurbin
What changes are you planning for a future version of https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-datasetfields.sh ?
11:17
poikilotherm
Nothing for now
11:18
pdurbin
!
11:18
poikilotherm
This happens during bootstrapping
11:18
poikilotherm
So only once
11:18
pdurbin
Are you small chunking me? :)
11:19
poikilotherm
Loading other blocks or loading updated upstream blocks needs to happen aside from that script
11:19
pdurbin
Ok, but are you planning any changes for a future version of https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-optional-harvard.sh#L50 ?
11:20
poikilotherm
Should I?
11:20
pdurbin
Yes please!
11:20
poikilotherm
LOL
11:20
pdurbin
That script is just for reference anyway. You won't break anything.
11:20
poikilotherm
Why is that?
11:21
poikilotherm
Actually, at some point one will need to load the TSV file
11:21
pdurbin
Including a change in that "optional harvard" script would be a good way to communicate the new way of doing things.
11:22
poikilotherm
I don't like the TSV approach very much, but it is as it is - changing that is a huge thing.
11:22
poikilotherm
But aren't the metadata fields for those already in upstream schema.xml?
11:23
poikilotherm
So there is no real necessity to change those, right?
11:23
pdurbin
They are. That's what I was trying to say earlier. The custom Harvard stuff should be factored out of schema.xml. Either into harvard.xml or one file for each of Harvard's custom metadata blocks.
11:23
poikilotherm
Ah!
11:24
pdurbin
Cleaner and leaner.
11:24
poikilotherm
Now we are on the same train
11:24
poikilotherm
I don't think this is a good idea. Here's why
11:24
poikilotherm
When you split up the fields by schema, you need to add these to schema.xml.
11:24
poikilotherm
This involves templating
11:25
poikilotherm
Which is cool, would be clean and lean
11:25
poikilotherm
But also need more stuff to be done.
11:25
poikilotherm
Like choosing the templating solution, adding it to installers etc
11:25
poikilotherm
Of course one could try to do sth like an include chain
11:26
poikilotherm
include these smaller chunks in the included file of schema.xml
11:26
poikilotherm
I don't know if this is supported by the parser
11:26
poikilotherm
And most likely it shouldn't be necessary
11:26
pdurbin
I already do some "templating" when you call into http://localhost:8080/api/admin/index/solr/schema
11:26
poikilotherm
Right
11:26
poikilotherm
And that's perfect!
11:27
poikilotherm
Just take that output and place it in those files
11:27
poikilotherm
You will have all things in place you actually use, nothing else
11:27
poikilotherm
Don't use custom schema X? Ok, will not be included in the schema.
11:28
pdurbin
But you could make it so you call into http://localhost:8080/api/admin/index/solr/schema/oliverBlock1 and it "templates" just the stuff you need for your custom metadata block.
11:28
poikilotherm
Huh is that possible now?
11:28
poikilotherm
That API endpoint has no docs in the API docs
11:28
poikilotherm
I didn't look at the code yet
11:28
pdurbin
You'd have to change that API endpoint to take an additional argument, the name of the custom metadata block.
11:29
poikilotherm
Ok
11:29
pdurbin
I'm happy to advise on this. I wrote all that nasty code. :)
11:29
pdurbin
We can even set up some REST Assured tests for it.
11:29
poikilotherm
One could of course generate the complete schema.xml
11:30
poikilotherm
And just dump and reload
11:30
poikilotherm
Would be even easier
11:30
poikilotherm
But I wanted to have a quick solution with minimum impact
11:30
poikilotherm
As the approach from Peter should be preferred
11:30
poikilotherm
Using Schema API totally makes sense
11:31
poikilotherm
Especially when looking into the direction of SolrCloud, mulit instance etc
11:31
pdurbin
I'm fine with whatever sized chunks provide value. :) I'm just trying to express what I see as a potentially larger chunk that what you may envision. And I can jump on your branch and help move it along. Make pull requests into your branch, I mean.
11:39
poikilotherm
:-D
11:40
poikilotherm
I think I should commit some stuff from my 64-dev-in-k8s stuff and share it. Might be usefull for testing this
11:40
pdurbin
Yes, all the tests please.
11:41
pdurbin
Don't forget that automated testing is the focus of our current sprint. Still two weeks left.
11:41
poikilotherm
Yeah :-/ It's a pity that I need to get other things done, too
11:42
pdurbin
There are always tests to write. :)
11:44
poikilotherm
There are not many people around this week at IQSS, right?
11:44
poikilotherm
Very low notification traffic
11:45
pdurbin
Kevin is on vacation so nothing is getting merged.
11:45
poikilotherm
:-) :-| :-/ :-( :'-(
11:46
pdurbin
First day of school. Stepping out for a bit.
11:50
poikilotherm
BTW pdurbin did you listen to Adams podcasts from monday?
11:50
donsizemore joined #dataverse
11:51
poikilotherm
http://airhacks.fm/#episode_52
11:59
poikilotherm
Hehe - I just created a branch name 6142-flex-solr-schema. This is kinda funny when you know that "flex" is a common term in German for an angle grinder (flex is a brand name still existing today. lots of people used that in the past, so it became "sticky"). thinking about slicing the schema.xml in parts with a "flex" :-D
12:00
poikilotherm
Morning donsizemore! Did you see that security warning about upgrading jenkins?
12:04
donsizemore
huh? i just updated the jenkins RPM yesterday
12:04
poikilotherm
Oh ok
12:04
poikilotherm
I saw them since Monday but had no chance to catch up
12:04
donsizemore
(but to answer your question, no, I haven't seen the security announcement)
12:04
poikilotherm
It's displayed in the UI
12:05
donsizemore
then we're good, because I was in the UI quite a bit yesterday
12:05
donsizemore
I get a number of CVE and other security announcements in various ways; I'm surprised I hadn't seen it
12:57
stefankasberger joined #dataverse
13:07
stefankasberger
@pdurbin: i dont really get what you mean regarding code coverage testing.
13:45
poikilotherm
@pdurbin: I do have working includes for the fields and copyFields
13:45
poikilotherm
But I need some syntactic sugar around it
14:36
pdurbin
poikilotherm: yes, I listened to it. I've listened to all of them. And I DM'ed him on Twitter to see if he's coming to FOSDEM. He's not but you and stefankasberger should go hear him speak at https://jax.de/programm/ . What other developers in the Dataverse community speak Geerman? :)
14:36
pdurbin
German*
14:39
stefankasberger
whom do you mean?
14:39
pdurbin
stefankasberger: Adam Bien. These two talks:
14:40
pdurbin
- https://jax.de/serverside-enterprise-java/tipps-tricks-und-workarounds-mit-jakarta-ee-microprofile-slideless/
14:40
pdurbin
- https://jax.de/web-development-javascript/web-apps-ohne-frameworks-slideless-nomigrations/
14:44
poikilotherm
Ok guys, I'm outta here for today. Gotta pickup kiddos
14:44
poikilotherm
CU
14:49
pdurbin
stefankasberger: what are you more interested in? Java or Web Components? :)
15:07
pdurbin
donsizemore: mornin'. Anything we should try to bang out before 3?
15:08
donsizemore
i think i'm good (and have an 1130) -- the next thing on my plate would be pestering you and pmauduit about JMX stuff for collectd/grafana
15:09
pdurbin
Ok, here and there and hacking on https://dev2.dataverse.org/grafana and I'd love to get it working. :)
15:10
pdurbin
donsizemore: but at the moment I'm a little more focused on getting API test code coverage reports like http://ec2-3-81-78-209.compute-1.amazonaws.com/target/coverage-it/index.html into dataverse-ansible and dataverse-jenkins. Let's discuss more at 3. :)
15:16
donsizemore
i like test coverage
15:17
stefankasberger
no java please. :) :) :)
15:17
pdurbin
donsizemore: so I had to fuss with my server to get the API tests to run. Can I put https://github.com/IQSS/dataverse-ansible/issues/67 in your column? :)
15:17
donsizemore
sure
15:18
pdurbin
donsizemore: done, thanks
15:18
pdurbin
stefankasberger: how about Web Components? :)
15:19
stefankasberger
dont know about them to be honest. but sounds interesting. but i will do mostly DevOps stuff in the next months, so am learning right now about jenkins, selenium and so on. :)
15:20
pdurbin
stefankasberger: oh! Do you want to help us with https://github.com/IQSS/dataverse-jenkins ? We have a pyDataverse job at https://jenkins.dataverse.org/job/pyDataverse/ (one of the few that's passing) :)
15:25
donsizemore
@pdurbin you can test out Leonid's commit in a branch =)
15:54
stefankasberger
am learning jenkins myself, so am not really in the situation to help others. but maybe in the future. :)
15:58
pdurbin
stefankasberger: ok. The idea is that we can learn together. I hadn't installed Jenkins myself until I wrote INSTALL.md in that repo 3 months ago. :)
15:59
pdurbin
stefankasberger: here's the "add config for pyDataverse" issue: https://github.com/IQSS/dataverse-jenkins/issues/12 :)
15:59
stefankasberger
ohh. i will have a look at it. maybe i can use it to learn.
16:00
pdurbin
stefankasberger: yes! Right now that issue is assigned to donsizemore but I can add you and me as assignees as well if you want.
16:28
stefankasberger
i have noted it and will have a look at it in a few weeks, when i will focus on jenkins.
16:29
stefankasberger
until the end of september the next release of pyDataverse is planned.
17:02
pdurbin
stefankasberger: great! Where's the plan? GitHub milestones?
17:25
stefankasberger
https://github.com/AUSSDA/pyDataverse/milestone/1
17:27
pdurbin
stefankasberger: great! Can we also include https://github.com/AUSSDA/pyDataverse/issues/32 ? :) I'll help! I need it for https://github.com/IQSS/dataverse-sample-data/issues/5 :)
17:40
stefankasberger
will check it out next week.
17:41
pdurbin
cool, thanks :)
18:55
donsizemore joined #dataverse
18:56
donsizemore
@pdurbin i'm standing by for our 3pm but am squatting in an unreserved conference room. if I ghost you two, I'm likely getting kicked out for interloping (or Dorian might take out our power)
18:59
pdurbin
puppy power
19:31
joelmarkanderson joined #dataverse
19:37
joelmarkanderson
@pdurbin: any activity on custom metadata blocks in solr?
19:40
pdurbin
joelmarkanderson: hi! Oliver went home. He's in Germany so he's several hours ahead of us. Did you see the new issue he opened? :)
19:40
joelmarkanderson
err...checking irc log now...
19:42
joelmarkanderson
6142?
19:43
pdurbin
Yes! That one. But we can fix you up. Can you share your custom metadata block? The tsv file, I mean.
19:45
joelmarkanderson
oh yes; how shall i share it?
19:53
joelmarkanderson
should i attach a file to the RT email ticket?
20:00
pdurbin
joelmarkanderson: sure! Did anyone reply to you yet?
20:01
joelmarkanderson
no, i didn't know if your RT system accepted attachments
20:03
pdurbin
It does.
20:03
pdurbin
I'm expecting a field called "type" based on the Solr error.
20:04
joelmarkanderson
it's actually "tag"
20:04
joelmarkanderson
i can't find the RT thread in my inbox; i'll start a new one; you can merge them i assume
20:05
joelmarkanderson
oh there it is
20:07
pdurbin
joelmarkanderson: yes, please feel free to create a fresh one and I can merge it.
20:07
joelmarkanderson
i found it and responded to the same ticket
20:07
pdurbin
Oh, was it tag? Bad memory. Sorry.
20:09
pdurbin
Ok. I have vtti-metadata-block.tsv. Want me to load it up on a test server?
20:11
pdurbin
This went just fine: curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file /tmp/vtti-metadata-block.tsv
20:12
pdurbin
Ok, so just the one field. Tag, like you said.
20:12
pdurbin
optional field
20:14
pdurbin
tagging with the bike one
20:14
pdurbin
Error – The metadata could not be updated. If you believe this is an error, please contact Root Support for assistance.
20:15
pdurbin
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/collection1: ERROR: [doc=dataset_72_draft] unknown field 'tag'
20:16
pdurbin
So to fix this, we need to add the "tag" field to Solr's schema.xml. Twice because I assume you'd like "tag" to be available from basic search.
20:17
pdurbin
Since there's only one field the easiest way to get what we need for schema.xml is this: curl http://localhost:8080/api/admin/index/solr/schema | grep tag
20:17
pdurbin
Two lines:
20:17
pdurbin
<field name="tag" type="text_en" multiValued="true" stored="true" indexed="true"/>
20:17
pdurbin
<copyField source="tag" dest="_text_" maxChars="3000"/>
20:18
pdurbin
su - solr
20:18
pdurbin
(to become the solr user)
20:18
pdurbin
backing up and editing /usr/local/solr/server/solr/collection1/conf/schema.xml
20:21
pdurbin
ok, I put those lines directly under these lines, respectively:
20:21
pdurbin
<!-- Dynamic Dataverse fields from http://localhost:8080/api/admin/index/solr/schema -->
20:21
pdurbin
<!--Dynamic Dataverse fields from http://localhost:8080/api/admin/index/solr/schema -->
20:22
pdurbin
systemctl restart solr.service
20:23
pdurbin
Ok, now the error is gone when I create a dataset. I just published it: http://ec2-3-81-186-127.compute-1.amazonaws.com/dataset.xhtml?persistentId=doi:10.5072/FK2/HD8VQQ
20:23
pdurbin
joelmarkanderson: does that make sense?
20:25
joelmarkanderson
i have indeed added exactly the same lines (field name and copyField) to schema.xml
20:26
joelmarkanderson
i have not written as the solr user; is that of consequence?
20:26
pdurbin
I don't think so. On my system the file is owned by the solr user.
20:27
joelmarkanderson
no, solr still owns the file here
20:28
joelmarkanderson
let me try to `systemctl restart solr.service`
20:28
pdurbin
Ok. Something else you can try is this: http://localhost:8983/solr/collection1/schema/fields
20:29
pdurbin
Whoops, and grep for "tag" I mean. I see this output: "name":"tag",
20:33
joelmarkanderson
you mean curl?
20:33
pdurbin
sorry, sorry, yes curl
20:33
pdurbin
doing 5 things at once :)
20:34
pdurbin
that "fields" endpoint will dump out all of your fields from Solr in JSON format
20:34
joelmarkanderson
there is a "name":"tag" in the fields endpoint
20:34
pdurbin
Great! But you still see the exception on the Dataverse side? In server.log?
20:36
joelmarkanderson
Success! – The metadata for this dataset has been updated.
20:36
pdurbin
hooray!
20:37
joelmarkanderson
excellent support, thank you!
20:37
pdurbin
joelmarkanderson: sure! While you're here, can I ask you about an AWS thing? :)
20:38
joelmarkanderson
go ahead; i admit we have let those muscles atrophy, though
20:39
pdurbin
Sorry, I meant Terraform. Are you still using it? And can you please comment on https://github.com/IQSS/dataverse-kubernetes/issues/81 ?
20:47
joelmarkanderson
on it (but you may be disappointed in my response)
20:48
pdurbin
heh, no problem
20:48
pdurbin
and when you're done with that, I think I have one more issue to show you, if you have time :)
21:00
joelmarkanderson
ok, what else?
21:01
pdurbin
joelmarkanderson: this issue: https://github.com/IQSS/dataverse-aws/issues/11
21:01
joelmarkanderson
ugh
21:01
pdurbin
:)
21:01
pdurbin
sorry :)
21:02
pdurbin
joelmarkanderson: have you seen http://guides.dataverse.org/en/4.16/developers/deployment.html#deploying-dataverse-to-amazon-web-services-aws ? We're really into AWS now. :)
21:02
pdurbin
But I don't use that dataverse-aws repo at all.
21:03
joelmarkanderson
we're not exactly a dataverse development shop, but it's high time i put some work into our operations side
21:03
pdurbin
:)
21:04
pdurbin
well, if we can help at all, please let us know... our latest tricks are with prometheus and grafana, thanks to pmauduit
21:04
joelmarkanderson
i have a grafana guy down the hall
21:05
joelmarkanderson
at least you've heard of aws, unlike in your 3.* days
21:05
joelmarkanderson
;)
21:05
pdurbin
heh, no kidding :)
21:05
pdurbin
joelmarkanderson: can you please show this to your grafana guy? https://github.com/IQSS/dataverse-ansible/blob/master/files/grafana-dashboard.json
21:10
joelmarkanderson
let me take a look at your new aws tricks and other repos around; we may need to refresh in the next few weeks
21:21
pdurbin
sounds good