Jekyll2024-02-02T15:04:32+00:00https://derekweitzel.com/feed.xmlDereks WebThoughts from DerekDerek Weitzeldjw8605@gmail.comProfiling the XRootD Monitoring Collector2024-01-31T05:00:00+00:002024-01-31T05:00:00+00:00https://derekweitzel.com/2024/01/31/profiling-xrootd-collector<p>The <a href="https://github.com/opensciencegrid/xrootd-monitoring-collector">XRootD Monitoring Collector</a> (collector) receives file transfer accounting messages from <a href="https://xrootd.slac.stanford.edu/">XRootD</a> servers.
This transfer information is parsed by the collector and sent to the GRACC accounting database for visualization.
Each transfer will generate multiple messages:</p>
<ol>
<li>Connection message with client information</li>
<li>Token information</li>
<li>File open with file name</li>
<li>Transfer updates (potentially multiple)</li>
<li>File close with statistics about bytes read and written</li>
<li>Disconnection</li>
</ol>
<p>We can see 1000+ messages a second from XRootD servers across the OSG. But, recently the collector has not been able to keep up. Below is the traffic of messages to the collector from the OSG’s Message Bus:</p>
<figure class="">
<img src="/images/posts/profiling-xrootd-collector/before-optimization-mq.png" alt="this is a placeholder image" /><figcaption>
Message bus traffic before optimization
</figcaption></figure>
<p>The graph is from the message bus’s perspective, so publish is incoming to the message bus, and deliver is sending to consumers (the Collector). We are receiving (Publish) ~1550 messages a second, while the collector is only able to process (Deliver) ~500 messages a second. 1550 messages a second is higher than our average, but we need to be able to process data as fast as it comes. Messages that are not processed will wait on the queue. If the queue gets too large (maximum is set to 1 Million messages) then the messages will be deleted, losing valuable transfer accounting data. At a defecit 1000 messages a second, it would only take ~16 minutes to fill the queue. It is clear that we missed data for a significant amount of time.</p>
<h2 id="profiling">Profiling</h2>
<p>The first step to optimizing the XRootD Monitoring Collector is to profile the current process. Profiling is the process of measuring the performance of the collector to identify bottlenecks and areas for improvement.</p>
<p>For profiling, I created a development environment on the <a href="https://nationalresearchplatform.org/">National Research Platform (NRP)</a> to host the collector. I started a <a href="https://docs.nationalresearchplatform.org/userdocs/jupyter/jupyterhub-service/">jupyter notebook on the NRP</a>, and used VSCode to edit the collector code and a Jupyter notebook to process the data. I used the <a href="https://docs.python.org/3/library/profile.html">cProfile</a> package built into python to perform the profiling.
I modified the collector to output a profile update every 10 seconds so I could see the progress of the collector.</p>
<p>After profiling, I used <a href="https://jiffyclub.github.io/snakeviz/">snakeviz</a> to visualize the profile. Below is a visualization of the profile before any optimization. The largest consumer of processing time was DNS resoluiton, highlighted in the below image in purple.</p>
<figure class="">
<img src="/images/posts/profiling-xrootd-collector/before-optimization-profile.png" alt="this is a placeholder image" /><figcaption>
Snakeviz profile. Purple is the DNS resolution function
</figcaption></figure>
<p>The collector uses DNS to resolve the hostnames for all IPs it receives in order to provide a human friendly name for clients and servers. Significant DNS resolution is expected as the collector is receiving messages from many different hosts. However, the DNS resolution is taking up a significant amount of time and is a bottleneck for the collector.</p>
<h2 id="improvement">Improvement</h2>
<p>After reviewing the profile, <a href="https://github.com/opensciencegrid/xrootd-monitoring-collector/pull/43">I added a cache to the DNS resolution</a> so that the collecotr only needs to resolve the host once every 24 hours. When I profiled after making the change, I saw a significant improvement in DNS resolution time. Below is another visualization of the profile after the DNS caching, purple is the DNS resolution.</p>
<figure class="">
<img src="/images/posts/profiling-xrootd-collector/after-optimization-profile.png" alt="this is a placeholder image" /><figcaption>
Snakeviz profile. Purple is the DNS resolution function
</figcaption></figure>
<p>Notice that the DNS resolution is a much smaller portion of the overall running time when compared to the previous profile.</p>
<p>In the following graph, I show the time spent on DNS resolution over time for both before and after the optimization. I would expect DNS resolution to increase for both, but as you can see, the increase after adding DNS caching is much slower.</p>
<figure class="">
<img src="/images/posts/profiling-xrootd-collector/dns-resolution.png" alt="this is a placeholder image" /><figcaption>
Growth of DNS resolution time
</figcaption></figure>
<h2 id="production">Production</h2>
<p>When we applied the changes into production, we saw a significant improvement in the collector’s ability to process messages. Below is the graph of the OSG’s Message Bus after the change:</p>
<figure class="">
<img src="/images/posts/profiling-xrootd-collector/edited-production-mq.png" alt="this is a placeholder image" /><figcaption>
RabbitMQ Message Parsing
</figcaption></figure>
<p>The incoming messages decreased, but the collector is now able to process messages as fast as they are received. This is a significant improvement over the previous state. I suspect that the decrease in incoming messages is due to server load of sending more outgoing messages to the improved collector. The message bus can slow down the incoming messages under heavier load.</p>
<h2 id="conclusions-and-future-work">Conclusions and Future Work</h2>
<p>Since we implemented the cache for DNS resolution, the collector has been able to keep up with the incoming messages. This is a significant improvement over the previous state. Over time, we expect the DNS cache to capture nearly all of the hosts, and the DNS resolution time to decrease even further.</p>
<p>We continue to look for optimizations to the collector. When looking at the output from the most recent profile, we noticed the collector is spending a significant amount of time in the logging functions. By default, we have debug logging turned on. We will look at turning off debug logging in the future.</p>
<p>Additionally, the collector is spending a lot of time polling for messages. In fact, the message bus is receiving ~1500 messages a second, which is increasing the load on the message bus. After reading through optimizations for RabbitMQ, it appears that less but larger messages are better for the message bus. We will look at batching messages in the future.</p>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioThe XRootD Monitoring Collector (collector) receives file transfer accounting messages from XRootD servers. This transfer information is parsed by the collector and sent to the GRACC accounting database for visualization. Each transfer will generate multiple messages:Dashboards for Learning Data Visualizations2022-09-14T05:00:00+00:002022-09-14T05:00:00+00:00https://derekweitzel.com/2022/09/14/dashboards<p>Creating dashboards and data visualizations are a favorite past time of mine. Also, I jump at any chance to learn a new technology. That is why I have spent the last couple of months building dashboards and data visualizations for various projects while learning several web technologies.</p>
<p>Through these dashboards, I have learned many new technologies:</p>
<ul>
<li><a href="https://reactjs.org/">React</a> and <a href="https://nextjs.org/">NextJS</a></li>
<li>Mapping libraries such as <a href="https://leafletjs.com/">Leaflet</a> and <a href="https://www.mapbox.com/">Mapbox</a></li>
<li>CSS libraries such as <a href="TailwindCSS">TailwindCSS</a></li>
<li>Data access JS clients for <a href="Elasticsearch">Elasticsearch</a> and <a href="Prometheus">Prometheus</a></li>
<li>Website hosting service <a href="Vercel">Vercel</a></li>
<li>Data Visualization library <a href="D3.js">D3.js</a></li>
</ul>
<h2 id="gp-argo-dashboard"><a href="https://gp-argo.greatplains.net/">GP-ARGO Dashboard</a></h2>
<p><a href="https://gp-argo.greatplains.net/">The Great Plains Augmented Regional Gateway to the Open Science Grid</a> (GP-ARGO) is a regional collaboration of 16 campuses hosting computing that is made available to the OSG. My goal with the GP-ARGO dashboard was to show who is using the resources, as well as give high level overview of the region and sites hosting GP-ARGO resources.</p>
<p>The metrics are gathered from OSG’s <a href="https://gracc.opensciencegrid.org/">GRACC Elasticsearch</a>. The list of projects are also from GRACC, and the bar graph in the bottom right are from OSG is simply an iframe to a grafana panel from GRACC.</p>
<p>Technologies used: <a href="https://reactjs.org/">React</a>, <a href="https://nextjs.org/">NextJS</a>, <a href="https://leafletjs.com/">Leaflet</a>, <a href="https://github.com/elastic/elasticsearch-js">Elasticsearch</a></p>
<p><strong>Repo:</strong> <a href="https://github.com/djw8605/gp-argo-map">GP-ARGO Map</a></p>
<p><a href="https://gp-argo.greatplains.net/"><img src="/images/posts/Dashboards/gp-argo-screenshot.png" alt="GP-ARGO" /></a></p>
<h2 id="osdf-website"><a href="https://osdf.osg-htc.org/">OSDF Website</a></h2>
<p>My next website was the <a href="https://osdf.osg-htc.org/">Open Science Data Federation</a> landing page. I was more bold in the design of the OSDF page. I took heavy inspiration from other technology websites such as the <a href="https://www.mapbox.com/">Mapbox</a> website and the <a href="https://k8slens.dev/">Lens</a> website. The theme is darker and it was also my first experience with the TailwindCSS library. Additionally, I learned the CSS <a href="https://en.wikipedia.org/wiki/CSS_Flexible_Box_Layout">flexbox</a> layout techniques.</p>
<p>The spinning globe is using the <a href="https://globe.gl/">Globe.gl</a> library. The library is great to create visualizations to show distribution throughout the world. On the globe I added “transfers” between the OSDF origins and caches. Each origin sends transfers to every cache in the visualization, though it’s all just animation. There is no data behind the transfers, it’s only for visual effect. Also, on the globe, each cache location is labeled. The globe can be rotated and zoomed with your mouse.</p>
<p>The number of bytes read and files read is gathered using the Elasticsearch client querying GRACC, the OSG’s accounting service. The OSG gathers statistics on every transfer a cache or origin perform. Additionally, we calculate the rate of data transfers and rate of files being read using GRACC.</p>
<p>One unique feature of the OSDF website is the resiliency of the bytes read and files read metrics. We wanted to make sure that the metrics would be shown even if a data component has failed. The metrics are gathered in 3 different ways for resiliency:</p>
<ol>
<li>If all components are working correctly, the metrics are downloaded from the OSG’s Elasticsearch instance.</li>
<li>If OSG Elasticsearch has failed, the dashboard pulls saved metrics from NRP’s S3 storage. The metrics are saved everytime they are succesfully gathered from Elasticsearch, so they should be fairly recent.</li>
<li>The metrics are gathered and saved on each website build. The metrics are static and immediatly available upon website load. If all else fails, these saved static metrics are always available, even if they may be old.</li>
</ol>
<p>Technologies used: <a href="https://reactjs.org/">React</a>, <a href="https://nextjs.org/">NextJS</a>, <a href="https://globe.gl/">Globe.gl</a></p>
<p><strong>Repo:</strong> <a href="https://github.com/djw8605/osdf-website">OSDF Website</a></p>
<p><a href="https://osdf.osg-htc.org/"><img src="/images/posts/Dashboards/osdf-screenshot.png" alt="OSDF" /></a></p>
<h2 id="nrp-dashboard"><a href="https://dash.nrp-nautilus.io/">NRP Dashboard</a></h2>
<p>The National Research Platform dashboard is largely similar to the <a href="#gp-argo-dashboard">GP-ARGO</a> dashboard. It uses the same basic framework and technologies. But, the data acquisition is different.</p>
<p>The metrics shown are the number of gpus allocated, number of pod running, and the number of active research groups. The metrics are gathered from the NRP’s <a href="https://prometheus.io/">prometheus</a> server on-demand. The graph in the background of the metric is generated with <a href="https://d3js.org/">D3.js</a>.</p>
<p>Technologies used: <a href="https://reactjs.org/">React</a>, <a href="https://nextjs.org/">NextJS</a>, <a href="https://d3js.org/">D3.js</a>, <a href="https://github.com/siimon/prom-client">Prometheus</a>, <a href="https://tailwindcss.com/">TailwindCSS</a></p>
<p><strong>Repo:</strong> <a href="https://github.com/djw8605/nrp-map-app">NRP Map App</a></p>
<p><a href="https://dash.nrp-nautilus.io/"><img src="/images/posts/Dashboards/nrp-dashboard-screenshot.png" alt="NRP Dashboard" /></a></p>
<h2 id="pnrp-website"><a href="https://nrp-website.vercel.app/">PNRP Website</a></h2>
<p>The <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=2112167&HistoricalAwards=false">Prototype National Research Platform</a> is a NSF research platform. The dashboard is also in prototype stage as the PNRP hardware is not fully delivered and operational yet.</p>
<p>The dashboard is my first experience with a large map from <a href="https://www.mapbox.com/">Mapbox</a>. I used a <a href="https://visgl.github.io/react-map-gl/">React binding</a> to interface with the <a href="https://www.mapbox.com/">Mapbox</a> service. Also, when you click on a site, it zooms into the building where the PNRP hardware will be hosted.</p>
<p>The transfer metrics come from the NRP’s prometheus which shows the bytes moving into and out of the node. The transfer metrics are for cache nodes nearby the sites, but once PNRP hardware becomes operational the transfer metrics will show the site’s cache.</p>
<p>Technologies Used: <a href="https://reactjs.org/">React</a>, <a href="https://nextjs.org/">NextJS</a>, <a href="https://www.mapbox.com/">Mapbox</a>, <a href="https://tailwindcss.com/">TailwindCSS</a>, <a href="https://github.com/siimon/prom-client">Prometheus</a></p>
<p><strong>Repo:</strong> <a href="https://github.com/djw8605/nrp-website">NRP Website</a></p>
<p><a href="https://nrp-website.vercel.app/"><img src="/images/posts/Dashboards/nrp-website-screenshot.png" alt="PNRP Website" /></a></p>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioCreating dashboards and data visualizations are a favorite past time of mine. Also, I jump at any chance to learn a new technology. That is why I have spent the last couple of months building dashboards and data visualizations for various projects while learning several web technologies.Improving the Open Science Data Federation’s Cache Selection2022-01-22T05:00:00+00:002022-01-22T05:00:00+00:00https://derekweitzel.com/2022/01/22/improving-geoip<p>Optimizing data transfers requires tuning many parameters. High latency between the client and a server can decrease data transfer throughput. The Open Science Data Federation (OSDF) attempts to optimize the latency between a client and cache by using GeoIP to locate the nearest cache to the client. But, using GeoIP alone has many flaws. In this post, we utilize <a href="https://workers.cloudflare.com/">Cloudflare Workers</a> to provide GeoIP information during cache selection. During the evaluation, we found that location accuracy grew from <strong>86%</strong> accurate with the original GeoIP service to <strong>95%</strong> accurate with Cloudflare Workers.</p>
<figure class="">
<img src="/images/posts/CloudflareWorkers/CacheMap.png" alt="Map of U.S. OSDF" /><figcaption>
Map of OSDF locations
</figcaption></figure>
<p>GeoIP has many flaws, first, the nearest physical cache may not be the nearest in the network topology. Determining the nearest cache in the network would require probing the network topology between the client and every cache, a intensive task to perform for each client startup, and may be impossible with some network configurations, such as blocked network protocols.</p>
<p>Second, the GeoIP database is not perfect. It does not have every IP address, and the addresses may not have accurate location information. When GeoIP is unable to determine a location, it will default to “guessing” the location is a lake in Kansas (<a href="https://arstechnica.com/tech-policy/2016/08/kansas-couple-sues-ip-mapping-firm-for-turning-their-life-into-a-digital-hell/">a well known issue</a>).</p>
<p>Following a review of the Open Science Data Federation (OSDF), we found that we could improve effeciency by improving the geo locating of clients. In the review, several sites where detected to not be using the nearest cache.</p>
<h2 id="implementation">Implementation</h2>
<p>StashCP queries the <a href="https://cernvm.cern.ch/fs/">CVMFS</a> geo location service which relies on the <a href="https://www.maxmind.com/en/home">MaxMind GeoIP database</a>.</p>
<p><a href="https://workers.cloudflare.com/">Cloudflare Workers</a> are designed to run at Cloudflare’s many colocation facilities near the client. Cloudflare directs a client’s request to a nearby data center using DNS. Each request is annotaed with an approximate location of the client, as well as the colocation center that received the request. Cloudflare uses a GeoIP database much like MaxMind, but it also falls back to the colocation site that the request was serviced.</p>
<p>I wrote a Cloudflare worker, <a href="https://github.com/djw8605/cache-locator"><code class="language-plaintext highlighter-rouge">cache-locator</code></a>, which calculates the nearest cache to the client. It uses the GeoIP location of the client to calculate the ordered list of nearest caches. If the GeoIP fails for a location, the incoming request to the worker will not be annotated with the location but will include the <code class="language-plaintext highlighter-rouge">IATA</code> airport code of the colocation center that received the client request. We then return the ordered list of nearest caches to the airport.</p>
<p>We imported a <a href="https://www.partow.net/miscellaneous/airportdatabase/">database of airport codes</a> to locations that is pubically available. The database is stored in the <a href="https://developers.cloudflare.com/workers/learning/how-kv-works">Cloudflare Key-Value</a>, keyed by the <code class="language-plaintext highlighter-rouge">IATA</code> code of the airport.</p>
<h2 id="evaluation">Evaluation</h2>
<p>To evaluate the location, I submitted test jobs to each site available in the OSG OSPool, 43 different sites at the time of evaluation. The test jobs:</p>
<ol>
<li>
<p>Run the existing <code class="language-plaintext highlighter-rouge">stashcp</code> to retrieve the closest cache.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> stashcp --closest
</code></pre></div> </div>
</li>
<li>
<p>Run a custom <a href="https://github.com/djw8605/closest-cache-cloudflare">closest script</a> that will query the Cloudflare worker for the nearest caches and print out the cache.</p>
</li>
</ol>
<p>After the jobs completed, I compiled the caches decisions to a <a href="https://docs.google.com/spreadsheets/d/1mo1FHYW2vpCyhSeCCd_bwP21rFFzqedv0dZ0z8EY4gg/edit?usp=sharing">spreadsheet</a> and manually evaluated each cache selection decision. The site names in the spreadsheet are the somewhat arbitrary internal names given to sites.</p>
<p>In the spreadsheet, you can see that the correct cache was choosen <strong>86%</strong> of the time with the old GeoIP service, and <strong>95%</strong> of the time with Cloudflare workers.</p>
<h3 id="notes-during-the-evaluation">Notes during the Evaluation</h3>
<p>Cloudflare was determined to be incorrect at two sites, the first being <code class="language-plaintext highlighter-rouge">UColorado_HEP</code> (University of Colorado in Boulder). In this case, the Colorado clients failed the primary GeoIP lookup and the cloudflare workers fell back to using the <code class="language-plaintext highlighter-rouge">IATA</code> code from the request. The requests from Colorado all where recieved by the Cloudflare Dallas colocation site, which is nearest the Houston cache. The original GeoIP service choose the Kansas City cache, which is the correct decision. It is unknown if the orignal GeoIP service choose KC cache because it knew the GeoIP location of the clients, or it defaulted to the Kansas default.</p>
<p>The second site where the Cloudflare worker implementation was incorrect was <code class="language-plaintext highlighter-rouge">SIUE-CC-production</code> (Southern Illinois University Edwardsville). In this case, the original GeoIP service choose Chicago, while the new service choose Kansas. Edwardsville is almost equal distance from both the KC cache and Chicago. The difference in the distance to the caches is ~0.6 KM, with Chicago being closer.</p>
<!-- TODO: Find out why KC cache was choosen SIUE -->
<p>An example of a site that did not work with GeoIP was <code class="language-plaintext highlighter-rouge">ASU-DELL_M420</code> (Arizona Statue University). The original service returned that the KC cache was the nearest. The Cloudflare service gave the default Lat/Log if GeoIP failed, the middle of Kansas, but the data center serving the request had the airport code of <code class="language-plaintext highlighter-rouge">LAX</code> (Los Angeles). The nearest cache to <code class="language-plaintext highlighter-rouge">LAX</code> is the UCSD cache, which is the correct cache decision.</p>
<p>During the evaluation, I originally used the Cloudflare worker development DNS address, <a href="https://stash-location.djw8605.workers.dev">stash-location.djw8605.workers.dev</a>. Purdue University and the American Museum of Natural History sites both blocked the development DNS address. The block was from an OpenDNS service which reported the domain had been linked to malware and phishing. Since the DNS hostname was hours old, it’s likely that most <code class="language-plaintext highlighter-rouge">*workers.dev</code> domains were blocked.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Improving the cache selection can improve the download effeciency. It is left as future work to measure if the nearest geographical cache is the best choice. While the OSDF is using GeoIP service for cache selection, it is important to select the correct cache. Using the new Cloudflare service results in <strong>95%</strong> correct cache decision vs. <strong>86%</strong> with the original service.</p>
<p>Cloudflare Workers is also very affordable for the scale that the OSDF would require. The first 100,000 requests are free, while it is $5/mo for the next 10 Million requests. The OSPool runs between 100,000 to 230,000 jobs per day, easily fitting within the $5/mo tier.</p>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioOptimizing data transfers requires tuning many parameters. High latency between the client and a server can decrease data transfer throughput. The Open Science Data Federation (OSDF) attempts to optimize the latency between a client and cache by using GeoIP to locate the nearest cache to the client. But, using GeoIP alone has many flaws. In this post, we utilize Cloudflare Workers to provide GeoIP information during cache selection. During the evaluation, we found that location accuracy grew from 86% accurate with the original GeoIP service to 95% accurate with Cloudflare Workers.XRootD Client Manager2020-10-11T05:00:00+00:002020-10-11T05:00:00+00:00https://derekweitzel.com/2020/10/11/xrootd-client-manager<p>The validation project for XRootD Monitoring is moving to phase 2, scale
testing. Phase 1 focused on correctness of single server monitoring. <a href="https://doi.org/10.5281/zenodo.3981359">The
report</a> is available.</p>
<p>We are still forming the testing plan for the scale test of XRootD, but a
component of the testing will be multiple clients downloading from multiple
servers. In addition, we must record exactly how much data each client reads
from each server in order to validate the monitoring with the client’s real behavior.</p>
<p>This level of testing will require detailed coordination and recording of client
actions. I am not aware of a testing framework that can coordinate and record
accesses of multiple clients and servers, therefore I spent the weekend
developing a simple framework for coordinating these tests.</p>
<p>Some requirements for the application are:</p>
<ul>
<li>Easy to use interface</li>
<li>Easy to add clients and servers</li>
<li>Authenticated access for clients, servers, and interface</li>
<li>Storage of tests and results</li>
</ul>
<p>I chose <a href="https://heroku.com">Heroku</a> for prototyping this application.</p>
<h2 id="interface">Interface</h2>
<p>The web interface is available at https://xrootd-client-manager.herokuapp.com/.
I chose to host it on heroku as it is my go to for pet projects. I will likely
move this over to OSG’s production kubernetes installation soon. The entire
application is only the web interface and a back-end <a href="https://redis.io/">Redis</a>
data store.</p>
<figure class="">
<img src="/images/posts/XRootDClientManager/Interface.png" alt="Screenshot of web interface" /><figcaption>
Screenshot of simple web interface
</figcaption></figure>
<p>The web interface shows the connected clients and servers. The web interface
also connects to the web server with an persistent connection to update the list
of connected clients.</p>
<h2 id="client-communication">Client Communication</h2>
<p>Client communcation is handled through a Socket.IO connection. Socket.IO is a
library that will at create a bi-directional event based communcation between
the client and the server. The communcation is over websockets if possible, but
will fall back to HTTP long polling. A good discussion of long polling vs.
websockets is available from
<a href="https://www.ably.io/blog/websockets-vs-long-polling/">Ably</a>. The Socket.IO
connection is established between each worker, server, and web client and the
web server.</p>
<p>The difficult part is authenticating the Socket.IO connections. We discuss this
in the security session.</p>
<h2 id="security">Security</h2>
<p>Securing the commands and web interface is required since the web interface is
sending commands to the connected worker nodes and servers.</p>
<h3 id="socketio-connections">Socket.IO Connections</h3>
<p>The Socket.IO connection is secured with a shared key. The communication flow
for a non-web client (worker/server):</p>
<ol>
<li>A JWT is created from the secret key. The secret key is communicated through
a separate secure channel. In most cases, it will be through the command
line arguments of the client. The JWT has a limited lifetime and a scope.</li>
<li>The client registers with the web server, with an Authentication bearer token
in the headers. The registration includes details about the client. It
returns a special (secret) <code class="language-plaintext highlighter-rouge">client_id</code> that will be used to authenticate the
Socket.IO connection. The registration is valid for 30
seconds before the <code class="language-plaintext highlighter-rouge">client_id</code> is no longer valid.</li>
<li>The client creates a Socket.IO connection with the <code class="language-plaintext highlighter-rouge">client_id</code> in the request
arguments.</li>
</ol>
<h3 id="web-interface">Web Interface</h3>
<p>The web interface is secured with an OAuth login from GitHub. There is a whitelist
of allowed GitHub users that can access the interface.</p>
<p>The flow for web clients connecting with Socket.IO is much easier since they are already authenticated
with OAuth from GitHub.</p>
<ol>
<li>The user authenticates with GitHub</li>
<li>The Socket.IO connection includes cookies such as the session, which is a
signed by a secret key on the server. The session’s github key is compared to the
whitelist of allowed users.</li>
</ol>
<h2 id="storage-of-tests-and-results">Storage of tests and results</h2>
<p>Storage of the tests and results are still being designed. Most likely, the
tests and results will be stored in a database such as Postgres.</p>
<h1 id="conclusions">Conclusions</h1>
<p><a href="https://heroku.com">Heroku</a> provides a great playing ground to prototype these
web applications. I hope that I can find an alternative eventually that will run on
OSG’s production kubernetes installation.</p>
<p>The web application is still be developed, and there is much to be done before
it can be fully utilized for the scale validation. But, many of the difficult
components are completed, including the communcation and eventing, secure web
interface, and clients.</p>
<p>The GitHub repos are available at:</p>
<ul>
<li><a href="https://github.com/djw8605/xrootd-client-manager">XRootD Client Manager</a></li>
<li><a href="https://github.com/djw8605/xrootd-ws-client">XRootD Client</a></li>
</ul>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioThe validation project for XRootD Monitoring is moving to phase 2, scale testing. Phase 1 focused on correctness of single server monitoring. The report is available.GRACC Transition Visualization2020-03-08T05:00:00+00:002020-03-08T05:00:00+00:00https://derekweitzel.com/2020/03/08/gracc-transition<p>The OSG is in the progress of transitioning from an older ElasticSearch (ES) cluster to a new version. Part of this process is reindexing (copying) data from the old to the new. Unfortunately, it’s not easy to capture a status of this transition. For this, I have created the <a href="https://gracc-transition.herokuapp.com/">GRACC Transition page</a>.</p>
<p>The goal is to transition when both the old and new ES have the same data. A simple measure of this is if they share the same number of documents in all of the indexes.</p>
<p>Source for this app is available on github: <a href="https://github.com/djw8605/gracc-transition">GRACC Transition</a></p>
<h2 id="data-collection">Data Collection</h2>
<p>Data collection is performed by a probe on each the new and old ElasticSearch clusters. Upload is performed with a POST to the gracc transition website. Authorization is performed with a shared random token between the probe and the website.</p>
<p>The probe is very simple. It queries ES for all indexes, as well as the number of documents and data size inside the index.</p>
<p>There are also many indexes that the OSG is not transitioning to the new ES. In order to ignore these indexes, a set of regular expressions is used to remove the indexes from consideration. Those regular expressions are:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/^osg.*/, // Start with osg.*
/^ps_.*/, // Start with ps_*
/^shrink\-ps_.*/, // Start with shrink-ps_*
/^glidein.*/, // Start with glidein*
/^\..*/, // Start with .
/^ps\-itb.*/ // Start with ps-itb*
</code></pre></div></div>
<h2 id="the-website">The Website</h2>
<p><img src="/images/posts/gracc-transition/gracc-transition-website.png" alt="GRACC Transition Website" /></p>
<p>The gracc transition app is hosted on the <a href="https://www.heroku.com/">Heroku</a>. I choose Heroku because it provides a simple hosting platform with a database for free.</p>
<p>The website pushes alot of the data processing to the client. The data is stored in the database as JSON and is sent to the client without any transformation. The client pulls the data from the website for both the new and old ES and begins to process the data within javascript.</p>
<p>The website breaks the statistics into three visualizations:</p>
<ol>
<li><strong>Progress Bars</strong>: Comparing the total documents and total data size of the old and new. The progress is defined as new / old. The bars provide a very good visualization of the progress of the transition as they need to reach 100% before we are able to fully transition.</li>
<li><strong>Summary Statistics</strong>: The summary statistics show the raw number of either missing or mismatched indexes. If an index is in the old ES but is not in the new ES, it is counted as <strong>missing</strong>. If the index is a different size in the old vs. the new, it is counted as <strong>mismatched</strong>.</li>
<li><strong>Table of Indices</strong>: Finally, a table of indices is shown with the number of documents that are missing, or simply <strong>Missing</strong> if the index is missing in the new ES.</li>
</ol>
<p>In addition to the table, I also provide a button to download the list of indexes that are missing or mismatched. This can be useful for an administrator to make sure it matches what they expect or to process with elasticsearch.</p>
<h2 id="improvements-and-future">Improvements and Future</h2>
<p>In the future, I would like to generate a weekly or even daily email to show the progress of the transition. This would give provide a constant reminder of the state of the transition.</p>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioThe OSG is in the progress of transitioning from an older ElasticSearch (ES) cluster to a new version. Part of this process is reindexing (copying) data from the old to the new. Unfortunately, it’s not easy to capture a status of this transition. For this, I have created the GRACC Transition page.LetsEncrypt for Multiple Hosts2019-10-11T19:38:14+00:002019-10-11T19:38:14+00:00https://derekweitzel.com/2019/10/11/letsencrypt-for-multiple-hosts<p>Using <a href="https://letsencrypt.org/">LetsEncrypt</a> for certificate creation and management has made secure communications much easier. Instead of contacting the IT department of your university to request a certificate, you can skip the middle man and generate your own certificate which it trusted around the world.</p>
<p>A common use case of certificates is to secure data transfers. Data transfers that use the GridFTP, XRootD, or HTTPS transfer protocols can load balance between multiple servers to increase throughput. <a href="https://www.keepalived.org/">keepalived</a> is used to load balance between multiple transfer servers. The certificate provided to the clients need to have the virtual host address of the load balancer, as well as the hostname of each of the worker nodes.</p>
<ol>
<li>Create a shared directory between the data transfer nodes</li>
<li>Install httpd on each of the data transfer nodes</li>
<li>Configure httpd to use the shared directory as the “webroot”</li>
<li>Configure <code class="language-plaintext highlighter-rouge">keepalived</code> to use virtualize port 80 to at least 1 of your data transfer nodes.</li>
<li>Run certbot with the webroot option, as well as the multiple hostnames of the data transfer nodes.</li>
</ol>
<p>Create a NFS share that each of the data transfer nodes can read. The steps in creating a NFS shared directory is outside the scope of this guide. In this guide, the shared directory will be referred as <code class="language-plaintext highlighter-rouge">/mnt/nfsshare</code> . Next, install httpd on each of the data transfer nodes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@host $ yum install httpd
</code></pre></div></div>
<p>Create a webroot directory within the shared directory on one of the nodes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@host $ mkdir /mnt/nfsshare/webroot
</code></pre></div></div>
<p>Configure httpd to export the same webroot on each of the data transfer nodes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><VirtualHost *:80>
DocumentRoot "/mnt/nfsshare/webroot"
<Directory "/mnt/nfsshare/webroot">
Require all granted
</Directory>
</VirtualHost>
</code></pre></div></div>
<p>Configure <code class="language-plaintext highlighter-rouge">keepalived</code> to virtualize port 80 to at least one of your data transfer nodes.
Add to your configuration:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>virtual_server <VIRTUAL-IP-ADDRESS> 80 {
delay_loop 10
lb_algo wlc
lb_kind DR
protocol tcp
real_server <GRIDFTP-SERVER-#1-IP ADDRESS> {
TCP_CHECK {
connect_timeout 3
connect_port 80
}
}
}
</code></pre></div></div>
<p>Run <code class="language-plaintext highlighter-rouge">certbot</code> with the webroot options on only 1 of the data nodes. The first domain in the command line should be the virtual hostname:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>root@host $ certbot certonly -w /mnt/nfsshare/webroot -d <VIRTUAL_HOSTNAME> -d <DATANODE_1> -d <DATANODE_N>...
</code></pre></div></div>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioUsing LetsEncrypt for certificate creation and management has made secure communications much easier. Instead of contacting the IT department of your university to request a certificate, you can skip the middle man and generate your own certificate which it trusted around the world.StashCache By The Numbers2018-09-26T05:00:00+00:002018-09-26T05:00:00+00:00https://derekweitzel.com/2018/09/26/stashcache-by-the-numbers<p>The StashCache federation is comprised of 3 components: Origins, Caches, and Clients. There are additional components that increase the usability of StashCache which I will also mention in this post.</p>
<figure class="">
<img src="/images/posts/StashCache-By-Numbers/StashCache-Diagram.png" alt="Diagram of StashCache Infrastructure" /><figcaption>
Diagram of the StashCache Federation
</figcaption></figure>
<figure class="">
<img src="/images/posts/StashCache-By-Numbers/StashCache-Cumulative.png" alt="Cumulative Usage of StashCache" /><figcaption>
Cumulative Usage of StashCache over the last 90 days
</figcaption></figure>
<h2 id="origins">Origins</h2>
<p>A StashCache Origin is the authoritative source of data. The origin receives data location requests from the central redirectors. These requests take the form of “Do you have the file X”, to which the origin will respond “Yes” or “No”. The redirector then returns a list of origins that claim to have the requested file to the client.</p>
<p>An Origin is a simple XRootD server, exporting a directory or set of directories for access.</p>
<table>
<thead>
<tr>
<th>Origin</th>
<th>Base Directory</th>
<th>Data Read</th>
</tr>
</thead>
<tbody>
<tr>
<td>LIGO Open Data</td>
<td>/gwdata</td>
<td>926TB</td>
</tr>
<tr>
<td>OSG Connect</td>
<td>/user</td>
<td>246TB</td>
</tr>
<tr>
<td>FNAL</td>
<td>/pnfs</td>
<td>166TB</td>
</tr>
<tr>
<td>OSG Connect</td>
<td>/project</td>
<td>63TB</td>
</tr>
</tbody>
</table>
<p>A list of Origins and their base directories.</p>
<h2 id="clients">Clients</h2>
<p>The clients interact with the StashCache federation on the user’s behalf. They are responsible for choosing the “best” cache. The available clients are <a href="https://cernvm.cern.ch/portal/filesystem">CVMFS</a> and <a href="https://github.com/opensciencegrid/StashCache">StashCP</a>.</p>
<figure class="half ">
<a href="/posts/StashCache-By-Numbers/StashCache-CVMFS.png" title="Client Usage By Tool">
<img src="/posts/StashCache-By-Numbers/StashCache-CVMFS.png" alt="Client Usage By Tool" />
</a>
<a href="/posts/StashCache-By-Numbers/StashCP-Usage.png" title="StashCP Usage">
<img src="/posts/StashCache-By-Numbers/StashCP-Usage.png" alt="StashCP Usage" />
</a>
<figcaption>StashCache Client Usage
</figcaption>
</figure>
<p>In the pictures above, you can see that most users of StashCache use CVMFS to access the federation. GeoIP is used by all clients in determining the “best” cache. GeoIP location services are provided by the CVMFS infrastructure in the U.S. The geographically nearest cache is used.</p>
<p>The GeoIP service runs on multiple CVMFS Stratum 1s and other servers. The request to the GeoIP service includes all of the cache hostnames. The GeoIP service takes the requesting IP address and attempts to locate the requester. After determining the location of all of the caches, the service returns an ordered list of nearest caches.</p>
<p>The GeoIP service uses the <a href="https://www.maxmind.com/">MaxMind database</a> to determine locations by IP address.</p>
<h3 id="cvmfs">CVMFS</h3>
<p>Most (if not all) origins on are indexed in an <code class="language-plaintext highlighter-rouge">*.osgstorage.org</code> repo. For example, the OSG Connect origin is indexed in the <code class="language-plaintext highlighter-rouge">stash.osgstorage.org</code> repo. It uses a special feature of CVMFS where the namespace and data are separated. The file metadata such as file permissions, directory structure, and checksums are stored within CVMFS. The file contents are not within CVMFS.</p>
<p>When accessing a file, CVMFS will use the directory structure to form an HTTP request to an external data server. CVMFS uses GeoIP to determine the nearest cache.</p>
<p>The indexer may also configure a repo to be “authenticated”. A whitelist of certificate DN’s is stored within the repo metadata and distributed to each client. The CVMFS client will pull the certificate from the user’s environment. If the certificate DN matches a DN in the whitelist, it uses the certificate to authenticate with an authenticated cache.</p>
<h3 id="stashcp">StashCP</h3>
<p>StashCP works in the order:</p>
<ol>
<li>Check if the requested file is available from CVMFS. If it is, copy the file from CVMFS.</li>
<li>Determine the nearest cache by sending cache hostnames to the GeoIP service.</li>
<li>After determining the nearest cache, run the <code class="language-plaintext highlighter-rouge">xrdcp</code> command to copy the data from the nearest cache.</li>
</ol>
<h2 id="caches">Caches</h2>
<figure class="">
<img src="/images/posts/StashCache-By-Numbers/CacheLocations.png" alt="Cache Locations" /><figcaption>
Cache Locations in the U.S.
</figcaption></figure>
<p>The cache is half XRootD cache and half XRootd client. When a cache receives a data request from a client, it searches it’s own cache directory for the files. If the file is not in the cache, it uses the built-in client to retrieve the file from one of the origins. The cache will request the data location from the central redirector which in turn, asks the origins for the file location.</p>
<p>The cache listens on port 1094 to regular XRootD protocol, and port 8000 for HTTP.</p>
<h3 id="authenticated-caches">Authenticated Caches</h3>
<p>Authenticated caches use GSI certificates to authenticate access to files within the cache. The client will authenticate with the cache using the client’s certificate. If the file is not in the cache, the cache will use it’s own certificate to authenticate with the origin to download the file.</p>
<p>Authenticated caches use port 8443 for HTTPS.</p>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioThe StashCache federation is comprised of 3 components: Origins, Caches, and Clients. There are additional components that increase the usability of StashCache which I will also mention in this post.HTCondor Pull Mode2018-08-31T17:28:42+00:002018-08-31T17:28:42+00:00https://derekweitzel.com/2018/08/31/htcondor-pull-mode<p>For a recent project to utilize HPC clusters for HTC workflows, I had to add the ability to transfer the input and output sandboxes to and from HTCondor. HTCondor already has the ability to spool input files to a SchedD, and pull the output sandbox. These functions are intended to stage jobs to an HTCondor pool. But, HTCondor did not have the ability to pull jobs from an HTCondor pool.</p>
<p>The anticipated steps for a job pulled from an HTCondor pool:</p>
<ol>
<li>Download the <strong>input</strong> sandbox</li>
<li>Submit the job to the local scheduler</li>
<li>Watch the job status of the job</li>
<li>Once completed, transfer the <strong>output</strong> sandbox to the origin SchedD</li>
</ol>
<p>The sandboxes are:</p>
<ul>
<li><strong>Input</strong>:
<ul>
<li>Input files</li>
<li>Executable</li>
<li>Credentials</li>
</ul>
</li>
<li><strong>Output</strong>:
<ul>
<li>Stdout / Stderr from job</li>
<li>Output files or any files that may have changed while the job ran</li>
</ul>
</li>
</ul>
<h2 id="api-additions">API Additions</h2>
<p>In order to transfer the input sandbox and output sandbox, 2 new commands where added to the SchedD, as well as a new client function and python bindings to use them.</p>
<p>The function for transferring input files is:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>transferInputSandbox(constraint, destination)
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">jobs</code> is a HTCondor constraint selecting the jobs whose input files should be transferred. <code class="language-plaintext highlighter-rouge">destination</code> is a directory to put the sandboxes. The sandboxes will be placed in directories named <code class="language-plaintext highlighter-rouge">destination/<ClusterId>/<ProcId>/</code>.</p>
<p>For transferring output files, the function is:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>transferOutputSandbox( jobs )
</code></pre></div></div>
<p>Where <code class="language-plaintext highlighter-rouge">jobs</code> is a list of tuples. The structure of the tuple is <code class="language-plaintext highlighter-rouge">( classad, sandboxdir )</code>. <code class="language-plaintext highlighter-rouge">classad</code> is the full classad of the original job, and <code class="language-plaintext highlighter-rouge">sandboxdir</code> is the location of the output sandbox to send.</p>
<h2 id="current-status">Current Status</h2>
<p>I have created a <a href="https://github.com/djw8605/htcondor-pull">repo</a> for an example that uses these functions in order to pull a job from a remote SchedD.</p>
<p>Also, my changes to <a href="https://github.com/djw8605/htcondor/tree/add_sandbox_transfers">HTCondor</a> are in my repo, and I have begun the discussion about merging in my changes.</p>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioFor a recent project to utilize HPC clusters for HTC workflows, I had to add the ability to transfer the input and output sandboxes to and from HTCondor. HTCondor already has the ability to spool input files to a SchedD, and pull the output sandbox. These functions are intended to stage jobs to an HTCondor pool. But, HTCondor did not have the ability to pull jobs from an HTCondor pool.Cleaning Up GRACC2017-11-06T19:09:23+00:002017-11-06T19:09:23+00:00https://derekweitzel.com/2017/11/06/cleaning-up-gracc<p>The <a href="https://opensciencegrid.github.io/gracc/">GRid ACcounting Collector</a> (GRACC) is the OSG’s new version of accounting software, replacing Gratia. It has been running in production since March 2017. Last week, on Friday November 3rd, we held a GRACC Focus Day. Our goal was to clean up data that is presented in GRACC. My changes where:</p>
<ul>
<li>Update the GRACC-Collector to version <a href="https://github.com/opensciencegrid/gracc-collector/tree/v1.1.8">1.1.8</a>. The primary change in this release is setting the messages sent to RabbitMQ to be “persistent”. The persistent messages are then saved to disk in order to survive a RabbitMQ reboot.</li>
<li>Use case-insenstive comparisons to determine the <a href="https://oim.grid.iu.edu/oim/home">Open Science Grid Information Management system</a> (OIM) information. This was an issue with GPGrid (Fermilab), which was registered as <strong>GPGRID</strong>.</li>
<li>Set the <code class="language-plaintext highlighter-rouge">OIM_Site</code> equal to the <code class="language-plaintext highlighter-rouge">Host_description</code> attribute if the OIM logic is unable to determine the registered OIM site. This is especially useful for the LIGO collaboration, which uses sites in Europe that are not registered in OIM. Now, instead of a lot of Unknown sites listed on the LIGO site listing, it shows the somewhat reported site name of where the job ran.</li>
</ul>
<figure class="">
<img src="/images/posts/GRACC-Cleanup/GRACC_Projects_Ligo.png" alt="GRACC Projects Page" /><figcaption>
GRACC Projects Page for LIGO
</figcaption></figure>
<h2 id="regular-expression-corrections"><a id="regex"></a>Regular Expression Corrections</h2>
<p>One of the common problems we have in GRACC is poor data coming from the various probes installed at hundreds of sites. We don’t control the data coming into GRACC, so occasionally we must make corrections to the data for clarity or correctness. One of these corrections is misreporting the “site” that the jobs ran on.</p>
<p>In many instances, the probe is unable to determine the site and simply lists the hostname of the worker node where the job ran. This can cause the cardinality of sites listed in GRACC to increase dramatically as we get new hostnames inserted into the sites listing. If the hostnames are predictable, a regular expression matching algorithm can match a worker node hostname to a proper site name.</p>
<p>The largest change for GRACC was the regular expression corrections. With this new feature, GRACC administrators can set corrections to match on attributes using regular expression patterns. For example, consider the following correction configuration.</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[[Corrections]]</span>
<span class="py">index</span> <span class="p">=</span> <span class="s">'gracc.corrections'</span>
<span class="py">doc_type</span> <span class="p">=</span> <span class="s">'host_description_regex'</span>
<span class="py">match_fields</span> <span class="p">=</span> <span class="nn">['Host_description']</span>
<span class="py">source_field</span> <span class="p">=</span> <span class="s">'Corrected_OIM_Site'</span>
<span class="py">dest_field</span> <span class="p">=</span> <span class="s">'OIM_Site'</span>
<span class="py">regex</span> <span class="p">=</span> <span class="kc">true</span>
</code></pre></div></div>
<p>This configuration means:</p>
<blockquote>
<p>Match the <code class="language-plaintext highlighter-rouge">Host_description</code> field in the incoming job record with the regular expression <code class="language-plaintext highlighter-rouge">Host_description</code> field in the corrections table. If they are a match, take the value in the <code class="language-plaintext highlighter-rouge">Corrected_OIM_Site</code> field in the corrections table and place it into the <code class="language-plaintext highlighter-rouge">OIM_Site</code> field in the job record.</p>
</blockquote>
<p>And the correction document would look like:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"_index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"gracc.corrections-0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"host_description_regex"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"asldkfj;alksjdf"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_score"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"_source"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"Host_description"</span><span class="p">:</span><span class="w"> </span><span class="s2">".*</span><span class="se">\.</span><span class="s2">bridges</span><span class="se">\.</span><span class="s2">psc</span><span class="se">\.</span><span class="s2">edu"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Corrected_OIM_Site"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PSC Bridges"</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>The regular expression is in the <code class="language-plaintext highlighter-rouge">Host_description</code> FIELD.</p>
<p>So, if the incoming job record is similar to :</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="err">...</span><span class="w">
</span><span class="nl">"Host_description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"l006.pvt.bridges.psc.edu"</span><span class="w">
</span><span class="err">...</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Then the correction would modify or create values such that the final record would approximate:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="err">...</span><span class="w">
</span><span class="nl">"Host_description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"l006.pvt.bridges.psc.edu"</span><span class="p">,</span><span class="w">
</span><span class="nl">"OIM_Site"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PSC Bridges"</span><span class="p">,</span><span class="w">
</span><span class="nl">"RawOIM_Site"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="w">
</span><span class="err">...</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Note that the <code class="language-plaintext highlighter-rouge">Host_description</code> field stays the same. We must keep it the same because it is used in record duplicate detection. If we modified the field and resummarized previous records, then it would cause multiple records to represent the same job.</p>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioThe GRid ACcounting Collector (GRACC) is the OSG’s new version of accounting software, replacing Gratia. It has been running in production since March 2017. Last week, on Friday November 3rd, we held a GRACC Focus Day. Our goal was to clean up data that is presented in GRACC. My changes where:Installing SciTokens on a Mac2017-09-07T18:20:04+00:002017-09-07T18:20:04+00:00https://derekweitzel.com/2017/09/07/installing-scitokens-on-a-mac<p>In case I ever have to install <a href="https://scitokens.org/">SciTokens</a> again, the steps I took to make it work on my mac. The most difficult part of this is installing openssl headers for the jwt python library. I followed the advice on this <a href="https://solitum.net/openssl-os-x-el-capitan-and-brew/">blog post</a>.</p>
<ol>
<li>Install <a href="https://brew.sh/">Homebrew</a></li>
<li>
<p>Install openssl:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> brew install openssl
</code></pre></div> </div>
</li>
<li>
<p>Download the SciTokens library:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> git clone https://github.com/scitokens/scitokens.git
cd scitokens
</code></pre></div> </div>
</li>
<li>
<p>Create the virtualenv to install the <a href="https://jwt.io/">jwt</a> library</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> virtualenv jwt
. jwt/bin/activate
</code></pre></div> </div>
</li>
<li>
<p>Install jwt pointing to the Homebrew installed openssl headers:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/include" pip install cryptography PyJWT
</code></pre></div> </div>
</li>
</ol>Derek Weitzeldjw8605@gmail.comhttps://djw8605.github.ioIn case I ever have to install SciTokens again, the steps I took to make it work on my mac. The most difficult part of this is installing openssl headers for the jwt python library. I followed the advice on this blog post.