Friday, August 05, 2016

Pundit selected for the Innovation Radar Prize 2016!

Just a very quick post (before leaving for the Summer vacation...) to report that Pundit was selected for the Innovation Radar Prize 2016, a European Commission initiative to identify high potential innovations and innovators in EU-funded projects.

Our project is one of the 40 short-listed EU-funded innovators. Now everybody is invited to vote, until August 31st, the top 16 innovators, who will compete in the Innovation Radar final, at the ICT proposers' day in Bratislava on 26 September 2016.

So, if you happen to see this post... please give your vote to Pundit here.

Monday, June 27, 2016

The context for web annotation

As I wrote a few days ago, the final review of the StoM project was held on Tuesday June 21st. All went more than fine as the final assessment of the work states, for the quality of the results, that "…the actions have been performed very well and the project has achieved excellent progress"!

I'm very happy of course: the preparation of the review was a hard work. I prepared 5 presentations, most of which related to the work done in the project. One of them was the "Presentation of context: Web Annotations (& Pundit) during the StoM Project (May 2014 – Apr. 2016)".

The purpose was to introduce, at the beginning of the review, the context for the two main StoM products, the event management SaaS platform EventPlace and the Pundit Annotation System.

While my friend George Ioannidis introduced how the Event Industry is evolving and needs new tools to better engage attendees and exhibitors, I presented what has happened in the two years of the project in the context of web annotation. My talk was based on the presentation attached below.

Enjoy (and don't hesitate to provide feedbacks)!


Saturday, June 18, 2016

Working on StoM final review

I'm preparing the slides for the StoM project final review, that will take place next Tuesday, June 21st, in Bruxelles.

Still a lot to adjust today and tomorrow, before taking the plane to Belgium on Monday morning. It was my first European project as a coordinator and despite some (inevitable) problems every now and then, I'm quite happy of this experience.

I'm especially happy of how, thanks also to StoM, Pundit, our web annotation platform, came out.

By the way, we made also some very cool videos to showcase Pundit. Have a look, download Pundit and... start annotating the web!

Saturday, December 05, 2015

Presentation at Italian Drupal Day

I'm just back from Bologna, where I attended with my colleagues the Italian Drupal Day conference. We at Net7 are working on several Drupal based projects (the latest I've managed being the website of Scuola Sant'Anna, one of the most prestigious universities in Italy).

We decided to do a presentation on the project, in which we exploited the semantic API of Dandelion to completely automatise the work of an editorial team. Software services fetch articles from more than 40 web sites (in Italian, English and French) and analyse their texts using Dandelion's Named Entity Extraction and Automatic Classification services. If the article matches with the topics of interest of the portal, it is automatically published, if not it gets discarded.

The site has been in production for several months now, publishing hundreds and hundreds of selected contents, in three languages, without a hiccup and without any manual intervention.

The Drupal Day slides follow (in Italian). Enjoy!

Wednesday, August 05, 2015

My Nerdie Bookshelf - "Linked Data - Structured data on the web" by David Wood, Marsha Zaidman, Luke Ruth and Michael Hausenblas

This book has been a bit of a disappointment to me, the first one I had from Manning Publications.

Despite being published in 2014 you have the impression that the information provided here are stale. Only in the final chapter ("The evolving Web") a comprehensive, well written and updated viewpoint on Linked Data (and the Semantic Web) is provided, although in concise form.

The foreword by Tim Berners-Lee and the collaboration with Michael Hausenblas lured me to blind purchase the book. In particular I was looking for insights in what, in my viewpoint, is a powerful use case for Linked Data which hasn't been addressed enough, that is Semantic Enterprise Data Integration, hoping to get, as it is common for Manning books, a lot of advanced technical information. In particular I was, and still am, looking for technical advice, integration patterns and product reviews that can guide me in using Semantics to effectively interconnect enterprise data silos.

The book on the other hand revolves around a different perspective, those of a data publisher, with little (if any) notion of the technology behind Linked Data. It presents therefore all the basic concepts at a quite simple level.
This is of course a legit editorial choice but what annoyed me the most was the fact that the information provided are often outdated. No mention on JSON-LD or to the Linked Data Platform principles; CKAN, a widely used platform for creating open data repositories, is just cited but only in connection to the DataHub site. Moreover, the motivations, advantages, pros and cons of working on Linked Data are presented in a very basic, if not superficial, way.

The mention of Callimachus, the "Linked Data application server" created by the authors, left me unimpressed as well, even if it is correct to say that it has been used in interesting projects.

I must admit that I am biased and might sound arrogant (sorry if this is the case): at the end of the day I've been working on these topics for 5 years. The fact is that this book could have been appealing to beginners if only could present more up-to-date information and more detailed use cases. Linked Data looks like it was written in 2010: it could make sense to publish it in 2011, not, as it was the case, in 2014.

Friday, June 05, 2015

Introducing Social Proxy

I've finally published on SlideShare the presentation of Social Proxy, a project I've been working on since 2010.

If you ask, "why this platform and not HootSuite or Radian6?", well I think it still has some strengths, despite our (huge!) competitors have received tons of VC funds over the years, while basically Social Proxy has been developed through a series of orders (some very small) from our customers. In fact:

1. Social Proxy offers, in a single SaaS offering, plenty of features that you can only get by acquiring multiple services. You can get Social Media management (à la Hootsuite) and Social Media Analysis (see Radian6). It is certainly less advanced respect these famous competitors but... it still performs more than nicely!

2. Social Proxy is a framework for Net7, that can be easily extended when a new, custom feature is needed by a customer. For example this Drupal web site, doesn't have an editorial team behind. It presents content automatically fetched and "cleaned" through the Social Proxy: dozens of RSS feeds are scanned, the linked pages retrieved and their content is extracted, keeping the main text and removing all the decoration parts. Through web services, the Drupal site fetches and publishes the curated content.
Of course competitors provide APIs but the amount of things that you can do with them is limited.

Anyway, here are the slides: enjoy the reading! Other information on Social Proxy (in Italian) can be read here.

Monday, April 27, 2015

There’s Semantics in this Web

I was asked by Dr. Serena Pezzini of the CTL department of the Scuola Normale Superiore of Pisa to do a presentation on the Semantic Web on April 16th (beside there’s a photo of me taken at the event). Slides, in Italian, are available on SlideShare: the preparation has been a quite interesting process, so I thought to share it in this blog here, this time in English.

This presentation for me was in fact like opening up the legendary Pandora’s Box. It ignited a reflection about what we as a company do regarding the Semantic Web. Net7 in fact always characterizes itself as a “Semantic Web Company".

At about the same time I was contacted both by CTL and by a partner company to talk about this subject. On the one hand CTL expected suggestions and stimuli to use Semantic Web technologies in their work on the Digital Humanities field. The partner company was looking for professional training on these topics.

For 5 seconds I went into autopilot mode and started to think about explaining the Semantic Web in the standard fashion (RDF, ontologies, triple stores, SPARQL, RDFS, OWL, well, you got the idea…). Then three questions sprang to my mind...

Do these persons really need this kind of information? Are they going to really use all of this on their daily job?

The second question is a bit discomforting: do we at Net7 really use completely and especially consciously the whole of the Semantic Web technologies?

The third is even more serious: what’s the current state of the art of the Semantic Web? Is it still an important technology, with practical uses even for middle/low-sized projects, or should it stay confined in the Empyrean of research and huge knowledge management initiatives?

So, it was really important for me to do a presentation with the attempt to find answers to these questions, to present topics that could be of interest and useful to the audience and at the same time to put in a new perspective my knowledge on the field.

The presentation therefore came out as a reflection on the possible uses and advantages of "Semantics in the web", first and foremost for me, in order to reorder my mind, with the hope that it can be useful for others as well. I tried therefore to take a step back, hopefully to progress further in perspective.

For its preparation I read a great deal of material (see bibliography at the end) and was heavily influenced by the presentations and articles of Jim Hendler (not to mention the fantastic book “Semantic Web for the working ontologist” that he co-authored). So, even if you won’t read these lines, thank you very much Dr. Hendler for your insightful thoughts!

Coming back to my presentation, it is not a case that I used the concept “Semantics in the Web” and not “Semantic Web” in the title. Semantics in fact, in the light of all the readings that I did, seems to me more important than the technology behind it.

I started the presentation with a small historical digression, from the very first vision of the World Wide Web in the Tim Berners-Lee's 1989 original proposal, up to the seminal 2001 article on Scientific American, where Berners-Lee, James Hendler and Ora Lassila presented the Semantic Web.

I continued by explaining the key concepts of the Semantic Web, which served to prove how Semantics, despite the Semantic Web vocal critics, can still count huge success stories in the web of today.

The funny thing is that the Semantic Web’s vision didn’t exactly materialize as expected by its inventors. On the one hand is fundamental to comprehend how things in web history just happens through serendipity. On the other is crucial to have always in mind the Jim Hendler’s motto “a little semantics goes a long way”. Indeed just a small portion of the Semantic Web “pyramid” (see slide number 42 in my presentation, taken from a Jim Hendler’s keynote) finds a recurring use, while the rest (inferences and the most sophisticated OWL constructs included) has still a limited diffusion or is just relegated in high-end research initiatives.

So the Semantic Web hasn’t failed but materialized a bit differently than expected. One therefore should really think to Semantics first, that is to exploit the knowledge that can be extracted from documents, linked data repositories, machine readable annotations in web pages (SEO metadata included) before worrying about the orthodox application of the complete stack of Semantic Web technologies.

The Semantic Web is on the other hand a still promising and on certain aspects undiscovered territory. While I don’t honestly see it as a key technology to power web portals (there are plenty of more mature technologies, even open source, - think of Drupal or Django - that fit better this purpose) the idea of managing information through graph makes a lot of sense in several areas, including:
  • knowledge management with highly interconnected data (think of Social Network relationships). Here the capacity of triple stores to handle big graph data will really make the difference, especially if an open source product can be used for this purpose (recently we @ Net7 have bet on Blazegraph and while we have been quite satisfied until now, it must be also said that our graphs are not exactly “that big”). There is no doubt in fact that solid open source products are fundamental to skyrocket the use of specific technologies and software architectures (think of LAMP).
  • extraction of structured data from text: a great classic Semantic Web use case indeed
  • linking independent repositories of information, implemented with traditional technologies in multiple legacy systems (another Semantic Web classic).
  • raw data management and dissemination.
The latter is something that we @ Net7 would really like to explore in great details in the near future. The idea arises from a contact that we have with a local medical research centre: we noticed that very often their management of data acquired through sensors is untidy. This leads to data loss and corruption. Moreover, raw data after the medical research is over, gets dismissed. On the other hand if this data could be:
  • formally described in great detail
  • openly distributed, after a specific anonymizing process in order to remove “sensible information” from it
it might open the door for its reuse. This way scientists from all over the world can take this data and exploit it in their research, increasing the dimension of their data sets and consequently improving the probability ratio of their experiments. This isn’t something new indeed but it will become more and more relevant in the near future since the European Commission is fostering Open Access to research data in the Horizon 2020 projects.

I concluded my slides by also noticing that Semantics is becoming more and more a commodity, offered through specialized cloud services. Named Entity Recognition SaaS offerings, SpazioDati’s DataTXT and AlchemyAPI included, are a consolidated reality. Cloud Machine Learning services are becoming mainstream (see in this regard this insightful article on ZDNet). Developers therefore can enjoy “a little semantics” in their application, without embracing the Semantic Web in full. As Jim Hendler says in fact… a little semantics goes a long way!