Time
S
Nick
Message
06:48
jri joined #dataverse
08:18
jri joined #dataverse
08:49
jri joined #dataverse
09:24
Kamil84 joined #dataverse
11:53
pdurbin
pmauduit: hi! Did you get a chance to see if Prometheus installed ok in Vagrant? :)
12:04
pmauduit
I launched it, but my computer had an issue (unrelated to the process) and I had to reboot
12:04
pmauduit
I did not check the state of the vagrant machine afterwards
12:14
donsizemore joined #dataverse
12:18
pdurbin
pmauduit: ok, please keep me posted. :)
12:30
pdurbin
donsizemore: morning. Do you think we should just merge that prometheus branch? You already verified it works, right? pmauduit and I are talking about adding Grafana next (in a subsequent pull request).
12:50
donsizemore
@pdurbin i mean, it worked for me, i just wanted you to approve it
12:50
donsizemore
@pdurbin and i welcome grafana
12:51
donsizemore
@pdurbin (which i'll do if you want, but i've got a Dataverse UNC upgrade on Monday and a bunch of tests to verify ;) )
12:59
pdurbin
donsizemore: I'll spin it up quick on EC2 and have a look around before merging.
12:59
donsizemore
@pdurbin too late, i just merged it
12:59
pdurbin
even better :)
12:59
donsizemore
@pdurbin it's just a true/false in the group_vars, anyway.
13:00
pdurbin
perfect
13:02
pdurbin
donsizemore: do you know how you have blessed releases? Like 4.11 or whatever? I have bless group_vars that I'm maintaining as ec2config.yaml in https://github.com/IQSS/dataverse-sample-data :) . I'll try adding prometheus there. :)
13:04
jri joined #dataverse
13:08
pdurbin
blessed*
13:08
pdurbin
pmauduit: maybe once I have this EC2 instance spun up, you can help me get Grafana installed and configured with a basic dashboard. :)
13:12
donsizemore
@pdurbin oh, i made a boo-boo yesterday. you have 147 tests beneath src/test/java/edu/harvard/iq/dataverse/
13:12
donsizemore
@pdurbin so, i'll still iterate through them, but...
13:13
pdurbin
Before you do that, would you be able to run just InReviewWorkflowIT to see if it passes?
13:15
donsizemore
yeah, i was just making a 'short list'
13:15
pdurbin
gotcha
13:15
pdurbin
I'm psyched that you got smaller subsets of tests to pass. Great progress.
13:15
donsizemore
i'll also archive each server.log to a separate web space for y'all's (isn't that a great contraction?) review
13:16
donsizemore
i suppose it should be y'all's'
13:16
pdurbin
heh
13:16
pdurbin
Great! Would you be able to indicate the name of the job and the job run number? In the name of a folder or in the filename?
13:17
donsizemore
i was going to create a Google Sheet to share
13:18
pdurbin
Oh, sorry I meant "IQSS-Dataverse-Develop-testSubset/6/server.log" or whatever.
13:21
donsizemore
yes i'm going to pull those over to a directory outside jenkins lest they get overwritten
13:21
pdurbin
perfect
13:21
pdurbin
the new spreadsheet is for your short list? or something else?
13:22
donsizemore
um, i was going to start with my short list.
13:25
pdurbin
That's fine. I'm just confused about what the spreadsheet would be for. I'm sure all will be revealed in time. :)
13:26
pdurbin
TASK [dataverse : run sampledata] **** ... the server with prometheus (I hope) is almost up. :)
13:38
pdurbin
Ok, the EC2 instance is up. I enabled both Munin and Prometheus. Munin is working fine and you can see the Munin graphs at http://ec2-3-81-53-52.compute-1.amazonaws.com/munin/localhost/localhost/index.html
13:42
pdurbin
donsizemore: I'm seeing "punch prometheus firewall holes" in tasks/prometheus.yml. Do I need to poke some holes?
13:46
pdurbin
http://localhost:9090/metrics shows me some prometheus stuff :)
13:46
pdurbin
https://knowyourmeme.com/memes/i-have-no-idea-what-im-doing :)
13:46
pdurbin
pmauduit: help! :)
13:48
donsizemore
^^ this was me once i stood it up
13:49
donsizemore
there are JVM modules for Prometheus, but... I stopped at the initial install
13:53
pdurbin
donsizemore: right. Hopefully pmauduit will explain all. :)
13:58
pmauduit
I've not provisioned my vm yet !
13:58
pmauduit
but now you should be able to add a prometheus datasource in grafana
13:59
pmauduit
or first use the prometheus interface (I don't use it that often, but it should let you select a metrics from a dropdown list from what I remember)
14:00
pdurbin
pmauduit: do you want me to give you a shell on this EC2 server?
14:01
pmauduit
how long have you planned to keep it up ?
14:01
pdurbin
I don't know. A day? A week? :)
14:02
pmauduit
(i retried a vagrant provision, but the playbook does not converge and finish with an error)
14:02
pmauduit
anyway I also got the prometheus interface
14:03
pmauduit
from what I can see in my vagrant, there are no very interesting metrics gathered yet related to memory usage, so maybe next step will be to setup collectd (that should give "top-like" metrics)
14:05
pdurbin
Not even free disk space?
14:05
pmauduit
do you have access to your prometheus web ui ?
14:06
pdurbin
well, if I do curl http://localhost:9090 I get <a href="/graph">Found</a>. Does that count? :)
14:06
pmauduit
near the "execute" button, you can see all metrics known to prometheus, by default it seems focused on prometheus / go ones
14:07
pmauduit
pdurbin: if you can ssh onto the EC2 machine, I'll suggest you to mount a socks tunnel (-D8123) and configure your browser accordingly
14:07
pmauduit
using ssh
14:07
pmauduit
this will allow you to use a regular web browser instead of curl ;)
14:07
pdurbin
pmauduit: do you want to ssh into this EC2 machine? I could add your public keys from GitHub.
14:08
donsizemore
@pmauduit i've never used prometheus before but i'm happy to implement whatever you suggest (preferably by github issue and possibly with light hand-holding)
14:09
donsizemore
@pdurbin you can allow :9090 in the AWS settings for your VM ; i thought i mapped ports in Vagrant
14:10
pmauduit
pdurbin: ok I can have a look
14:12
pdurbin
pmauduit: done. Please try centos ec2-3-81-53-52.compute-1.amazonaws.com
14:13
pdurbin
donsizemore: do you want in here too? If so, please shoot me your public ssh key (or upload it to GitHub). Instead of messing the the firewall I've been trying to ProxyPass to localhost:9090 but I haven't been able to get it working. :(
14:13
pmauduit
pdurbin: I'm in
14:14
pdurbin
nice, two lines from `who` now ;)
14:14
pmauduit
with a french ip address with no reverse dns ;)
14:14
pdurbin
heh, I trust you
14:14
donsizemore
@pdurbin i'm retracing my steps for Monday's upgrade =) also logging our InReview failure
14:14
pdurbin
pmauduit: can you please take a look at /etc/httpd/conf.d/http.proxy.conf ?
14:15
pdurbin
donsizemore: no worries, if you free up you can just ping me :)
14:15
pdurbin
pmauduit: I'm asking because I was hoping we could just proxy prometheus
14:16
pdurbin
But I'm fine with whatever it takes to move forward. :)
14:16
pmauduit
pdurbin: once configured, I don't know if having a hand on prometheus web ui is necessary
14:16
pmauduit
but I can access it from firefox with a socks proxy
14:16
pdurbin
already?
14:16
pdurbin
with your tunnel thing?
14:16
donsizemore
@pdurbin i'm interested, but... three things at a time =)
14:17
pmauduit
yes, that's pretty simple: ssh -D8123 centos ec2-3-81-53-52.compute-1.amazonaws.com
14:17
pdurbin
donsizemore: we need to limit your Work In Progress :)
14:17
pmauduit
then configure your firefox to use a socks proxy
14:18
pmauduit
pdurbin: can I try to install collectd and configure the prometheus writer ?
14:18
pdurbin
I've never configured firefox to use a socks proxy but I have the setting up.
14:18
pmauduit
pdurbin: preferences > network settings
14:18
pdurbin
pmauduit: sure! How about if I create a issue under datavese-ansible called collectd/grafana and you can copy and paste whatever commands you ran into comments. Sound ok?
14:19
pmauduit
then manual proxy configuration / socks host: localhost port : 8123 / socks v5
14:19
pmauduit
pdurbin: ok I'll try
14:19
pmauduit
once you've your socks configuration in firefox, you can try to load ifconfig.io to make sure that your going outside via the aws instance
14:20
pmauduit
Remote Host
14:20
pmauduit
ec2-3-81-53-52.compute-1.amazonaws.com.
14:22
pdurbin
pmauduit: I just created https://github.com/IQSS/dataverse-ansible/issues/99 . How does it look? :)
14:22
pmauduit
I've no idea where goes the logs from collectd (the debian equivalent of /var/log/syslog)
14:26
pdurbin
Maybe there are some clues in /etc/collectd.conf ?
14:27
pmauduit
there is something about syslog
14:28
donsizemore
if it's an RPM, try $ rpm -ql collectd
14:29
pdurbin
"If no log plugin is loaded, collectd will write to STDERR." https://collectd.org/faq.shtml
14:31
pmauduit
if my config worked, we should have a new server on port 9103
14:33
pmauduit
does not seem to be the case, and I cannot find any module related to prometheus in the yum collectd setup
14:34
pdurbin
pmauduit: I see "#<Plugin write_prometheus>" in /etc/collectd.conf . Does that help?
14:34
pmauduit
yes, it means that it should be supported
14:34
pmauduit
I put a file in the /etc/collectd.d/
14:34
pmauduit
which does basically what is commented out
14:34
pdurbin
ah, ok
14:34
pmauduit
but, this should open a web interface to be scrapped by prometheus afterwards
14:34
pmauduit
on port 9103
14:35
pdurbin
Right, you already mentioned the file you created: https://github.com/IQSS/dataverse-ansible/issues/99#issuecomment-524335633
14:35
pmauduit
but I cannot see the port coming up
14:35
pdurbin
curl: (7) Failed connect to localhost:9103; Connection refused
14:36
pmauduit
yes, I stopped the collectd service to be able to launch it by hand (keeping it in the foreground)
14:36
pdurbin
ah, ok
14:36
pmauduit
but no luck either, it does not produce more logs
14:37
pdurbin
:(
14:39
pmauduit
ok, journalctl -u collectd
14:40
pdurbin
good
14:42
pdurbin
donsizemore: judging from the server.log files links from the new spreadsheet ( https://docs.google.com/spreadsheets/d/1geJXE1Gv4iuoDtDUBItulVb3s145TSuHnWLuDVJms8g/edit?usp=sharing ), the 404 is because Dataverse isn't being deployed due to Flyway errors.
14:42
pdurbin
pmauduit: sounds like you're still making progress. Go go go!
14:43
pmauduit
pdurbin: the collectd.conf does mention the write_prometheus plugin, but it is actually not provided by the yum package
14:44
pdurbin
oh
14:44
pmauduit
it's in a separate yum package it seems
14:44
pmauduit
[2019-08-23 14:44:43] plugin_load: plugin "write_prometheus" successfully loaded.
14:45
donsizemore
@pdurbin Caused by: org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException: Migration V4.14.0.2__2043-split-gbr-table.sql failed
14:46
pdurbin
pmauduit: it looks like you installed the collectd-write_prometheus-5.8.1-1.el7.x86_64 RPM. Cool.
14:46
pmauduit
yup
14:47
pmauduit
and the 9103 interface is working (I'm still with collectd in foreground though)
14:47
pmauduit
no we have to reconfigure prometheus so that it scrapes this interface
14:47
pmauduit
s/no/now/
14:47
pdurbin
Ok, my convention is to make a copy of the file like this: cp -a foo.config foo.config.orig
14:48
pdurbin
so I can diff it later
14:48
pdurbin
donsizemore: yes, that flyway error :)
14:48
pmauduit
ok
14:48
donsizemore
@pdurbin bah. my fork wasn't current. starting over
14:49
pdurbin
donsizemore: ok, did you catch that the entire api test suite completed ok on my Mac yesterday? I ran them against Dataverse running in docker-aio.
14:50
pmauduit
pdurbin: hmmm ... having collectd in the foreground is ok in regards to the prometheus interface, but it fails to setup the plugin if launched via systemd
14:50
donsizemore
@pdurbin yes
14:51
pdurbin
pmauduit: yuck
14:51
pmauduit
might be related to selinux or so
14:53
pdurbin
Hmm, getenforce shows Enforcing. You are very welcome to turn off selinux if you want.
15:01
pmauduit
pdurbin: any idea on how to do this ?
15:01
pdurbin
setenforce
15:02
pdurbin
`setenforce Permissive` should do it
15:02
pdurbin
then you can check it with `getenforce`
15:02
pmauduit
ok done
15:02
pmauduit
and collectd correctly launched via systemd now
15:02
pdurbin
nice!!
15:03
pdurbin
I wrote about selinux at https://github.com/IQSS/dataverse/blob/v4.15.1/doc/sphinx-guides/source/developers/selinux.rst and in the future we can try to figure out how to get it working. Let's not worry about it now. :)
15:03
jri joined #dataverse
15:04
pmauduit
pdurbin: do you know where is configured prometheus ?
15:04
pmauduit
:)
15:04
pdurbin
well, there should be some clues in https://github.com/IQSS/dataverse-ansible/pull/96/files
15:05
pdurbin
I see --config.file=/usr/local/prometheus/prometheus.yml in /usr/lib/systemd/system/prometheus.service
15:06
pmauduit
yup I found it also
15:06
donsizemore
it's currently https://github.com/IQSS/dataverse-ansible/blob/master/files/prometheus.yml but it can be what you want it to be
15:07
pmauduit
I'll try to find a config on our setups
15:07
pdurbin
Not to distract anyone but I just noticed that this is already on the file system, so maybe we'll be able to monitor Solr too: /usr/local/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml
15:10
pmauduit
pdurbin: found a config sample
15:10
pdurbin
great!
15:10
pmauduit
I just have to kill -HUP now
15:10
pmauduit
and we should be good
15:13
jri_ joined #dataverse
15:17
pdurbin
awesome
15:17
pmauduit
... it does not want to honour the scrape_config for collectd
15:20
pdurbin
"scrape_config" is in a config file?
15:20
pmauduit
yes
15:20
pmauduit
there is a one by default (job_name: 'dataverse'), and I just added a one named collectd
15:22
* pdurbin
runs diff /usr/local/prometheus/prometheus.yml.orig /usr/local/prometheus/prometheus.yml
15:22
pdurbin
I might be a little turned around. This is all new to me.
15:23
pdurbin
What is collectd collecting right now? :)
15:24
pmauduit
pdurbin: you can have a look by curl'ing http://localhost:9103/
15:25
pdurbin
ok so collectd_cpu_total, collectd_memory, etc. thanks
15:25
pmauduit
yup
15:25
pdurbin
I guess I expected the data to go in /var/lib/collectd
15:26
pdurbin
but it's empty. Is there a database or something?
15:26
pmauduit
by default (under debian at least) it should generate rrd time series database in /var/lib/collectd IIRC
15:26
pmauduit
but we don't need if we can plug it to prometheus
15:27
pmauduit
(as prometheus will be our tsdb)
15:27
pdurbin
oh, ok
15:27
pdurbin
no need to store the data twice
15:27
pdurbin
we'll just store it once in prometheus
15:27
pmauduit
yup
15:27
pmauduit
localhost:9090/config
15:27
pmauduit
:(
15:28
pmauduit
only one scrape_configs, where I defined 2
15:29
pdurbin
really? I see 2
15:29
pdurbin
- job_name: dataverse
15:29
pdurbin
- job_name: collectd
15:30
pmauduit
in the configuration file ? or via the web ui
15:31
pdurbin
I mean curl http://localhost:9090/config | grep job_name
15:31
pmauduit
if I tcpdump on the 9103 port I can see prometheus doing requests
15:32
pdurbin
can we try a request against collectd with curl? or whatever? :)
15:34
pmauduit
you can curl http://localhost:9103/ which is basically what collectd provides
15:34
pmauduit
but even if it seems to be scraped by prometheus, if I lookup for example the "collectd_memory" in the prometheus UI, it cannot find any infos
15:35
pdurbin
ok, I guess it's like looking at /proc :)
15:36
pmauduit
maybe I'm missing some parameters
15:36
jri joined #dataverse
15:41
pdurbin
in /etc/collectd.d/prometheus.conf or /usr/local/prometheus/prometheus.yml ?
15:41
pmauduit
in the prometheus.yml
15:41
pmauduit
I tried to stop it but it still alive, even if there are no processes Oo
15:42
pmauduit
oh I know ...
15:42
pmauduit
I'm still with my vagrant ...
15:42
pdurbin
There's a conf.good.yml file linked from https://prometheus.io/docs/prometheus/latest/configuration/configuration/ if that helps :)
15:43
pmauduit
i was not hitting the right service
15:44
pmauduit
if you've got the ssh socks proxy running
15:44
pdurbin
I don't. :(
15:44
pmauduit
you should be able to load this page: http://ip-172-31-36-60:9090/graph?g0.range_input=1h&g0.expr=collectd_load_shortterm&g0.tab=0
15:44
pmauduit
ok
15:44
pdurbin
Can we expose that for everyone without socks? :)
15:45
pdurbin
If I curl that URL I see "Prometheus Time Series Collection and Processing Server" :)
15:45
pmauduit
then yes, the proxy file you mentioned
15:45
jri_ joined #dataverse
15:46
pdurbin
I already made http.proxy.conf.orig. :) Do you want to try hacking on the original? :)
15:46
pdurbin
er
15:46
pdurbin
I mean hacking on the real file.
15:51
pmauduit
I'm mismatching configuration for apache vs nginx configuration
15:55
pdurbin
:)
15:56
pmauduit
https://serverfault.com/questions/924238/prometheus-1-5-2-behind-apache-2-4-reverse-proxy
15:57
pmauduit
pdurbin: I fear that some endpoints could be used by dataverse
15:57
pdurbin
Oh? What are you worried about?
15:58
pmauduit
it would be cleaner to tell prometheus that it should be reachable under /prometheus/
15:58
pdurbin
I'm not very worried about clean right now. :)
15:59
pmauduit
doesn't work either anyway
15:59
pdurbin
T-T
16:00
pdurbin
I don't know if this helps, but I was looking at this earlier: https://stackoverflow.com/questions/45914235/configure-apache-with-multiple-proxypass/45916572#45916572
16:01
pmauduit
makes sense, but I think prometheus is sending redirects onto /
16:02
pmauduit
...
16:02
pmauduit
prometheus is currently listening on all interfaces
16:02
pmauduit
which means http://ec2-3-81-53-52.compute-1.amazonaws.com:9090/graph works already
16:03
pmauduit
or it works because I'm reaching via my socks proxy
16:03
pmauduit
yes, that's why :(
16:05
pdurbin
:(
16:05
pmauduit
anyway, can we reach the grafana instance now ?
16:05
pmauduit
(we don't need to be able to browse prometheus directly, grafana can do it for us)
16:05
pdurbin
Good point. So the next step is to install grafana?
16:07
pmauduit
yes
16:08
pdurbin
Isn't it getting late for you? :)
16:08
pmauduit
it's 6PM here already, you're right
16:08
pdurbin
Should we pick this up next week?
16:09
pmauduit
If I find some time, sure :)
16:10
pdurbin
great! thanks!
17:36
donsizemore
@pdurbin DatasetsIT broketh by its lonesome https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-testSubset/16/console
17:38
pdurbin
any flyway errors?
17:39
donsizemore
no, those were a 'me' problem
17:39
donsizemore
i started over
17:39
pdurbin
phew
17:41
pdurbin
I need to drive my kids to the airport to visit my parents. I'll be back online in an hour or two.
18:26
jcain joined #dataverse
18:26
jcain
just wanted to check in and see how much space an institution using datavarse on Harvards service was permitted
19:33
jcain joined #dataverse
19:56
pdurbin
whoops, missed them
19:57
pdurbin
bjonnh might know :)
21:51
pdurbin
ok, folks, I'm out, have a good weekend!
21:51
pdurbin left #dataverse