Why You Need a Data Fabric, Not Just IT Architecture

Data fabrics offer an opportunity to track, monitor and utilize data, while IT architectures track, monitor and maintain IT assets. Both are needed for a long-term digitalization strategy.

As companies move into hybrid computing, they’re redefining their IT architectures. IT architecture describes a company’s entire IT asset base, whether on-premises or in-cloud. This architecture is stratified into three basic levels: hardware such as mainframes, servers, etc.; middleware, which encompasses operating systems, transaction processing engines, and other system software utilities; and the user-facing applications and services that this underlying infrastructure supports.

IT architecture has been a recent IT focus because as organizations move to the cloud, IT assets also move, and there is a need to track and monitor these shifts.

However, with the growth of digitalization and analytics, there is also a need to track, monitor, and maximize the use of data that can come from a myriad of sources. An IT architecture can’t provide data management, but a data fabric can. Unfortunately, most organizations lack well-defined data fabrics, and many are still trying to understand why they need a data fabric at all.

What Is a Data Fabric?

Gartner defines a data fabric as “a design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilizes continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment and utilization of integrated and reusable data across all environments, including hybrid and multi-cloud platforms.”

Let’s break it down.

Every organization wants to use data analytics for business advantage. To use analytics well, you need data agility that enables you to easily connect and combine data from any source your company uses –whether the source is an enterprise legacy database or data that is culled from social media or the Internet of Things (IoT).  You can’t achieve data integration and connectivity without using data integration tools, and you also must find a way to connect and relate disparate data to each other in meaningful ways if your analytics are going to work.

This is where data fabric enters. The data fabric contains all the connections and relationships between an organization’s data, no matter what type of data it is or where it comes from. The goal of the fabric is to function as an overall tapestry of data that interweaves all data so data in its entirety is searchable. This has the potential to not only optimize data value, but to create a data environment that can answer virtually any analytics query. The data fabric does what an IT architecture can’t: it tells you what data does, and how data relates to each other. Without a data fabric, companies’ abilities to leverage data and analytics are limited.

Building a Data Fabric

When you build a data fabric, it’s best to start small and in a place where your staff already has familiarity.

That “place” for most companies will be with the tools that they are already using to extract, transform and load (ETL) data from one source to another, along with any other data integration software such as standard and custom APIs. All of these are examples of data integration you have already achieved.

Now, you want to add more data to your core. You can do this by continuing to use the ETL and other data integration methods you already have in place as you build out your data fabric. In the process, care should be taken to also add the metadata about your data, which will include the origin point for the data, how it was created, what business and operational processes use it, what its form is (e.g.,  single field in a fixed record, or an entire image file), etc. By maintaining the data’s history, as well as all its transformations, you are in a better position to check data for reliability, and to ensure that it is secure.

As your data fabric grows, you will probably add data tools that are missing from your workbench. These might be tools that help with tracking data, sharing metadata, applying governance to data, etc. A recommendation in this area is to look for an all-inclusive data management software that contains not only all the tools that you’ll need build a data fabric, but also important automation such as built-in machine learning.

The machine learning observes how data in your data fabric is working together, and which combinations of data are used most often in different business and operational contexts. When you query the data, the ML assists in pulling the data together that is most likely to answer your queries…[…] Read more »…..

 

Making CI/CD Work for DevOps Teams

Many DevOps teams are advancing to CI/CD, some more gracefully than others. Recognizing common pitfalls and following best practices helps.

Agile, DevOps and CI/CD have all been driven by the competitive need to deliver value faster to customers. Each advancement requires some changes to processes, tools, technology and culture, although not all teams approach the shift holistically. Some focus on tools hoping to drive process changes when process changes and goals should drive tool selection. More fundamentally, teams need to adopt an increasingly inclusive mindset that overcomes traditional organizational barriers and tech-related silos so the DevOps team can achieve an automated end-to-end CI/CD pipeline.

Most organizations begin with Agile and advance to DevOps. The next step is usually CI, followed by CD, but the journey doesn’t end there because bottlenecks such as testing and security eventually become obvious.

At benefits experience platform provider HealthJoy, the DevOps team sat between Dev and Ops, maintaining a separation between the two. The DevOps team accepted builds from developers in the form of Docker images via Docker Hub. They also automated downstream Ops tasks in the CI/CD pipeline, such as deploying the software builds in AWS.

Sajal Dam, HealthJoy

Sajal Dam, HealthJoy

“Although it’s a good approach for adopting CI/CD, it misses the fact that the objective of a DevOps team is to break the barriers between Dev and Ops by collaborating with the rest of software engineering across the whole value stream of the CI/CD pipeline, not just automating Ops tasks,” said Sajal Dam, VP of engineering at HealthJoy.

Following are a few of the common challenges and advice for dealing with them.

People

People are naturally change resistant, but change is a constant when it comes to software development and delivery tools and processes.

“I’ve found the best path is to first work with a team that is excited about the change or new technology and who has the time and opportunity to redo their tooling,” said Eric Johnson, EVP of Engineering at DevOps platform provider GitLab. “Next, use their success [such as] lower cost, higher output, better quality, etc. as an example to convert the bulk of the remaining teams when it’s convenient for them to make a switch.”

Eric Johnson, GitLab

Eric Johnson, GitLab

The most fundamental people-related issue is having a culture that enables CI/CD success.
“The success of CI/CD [at] HealthJoy depends on cultivating a culture where CI/CD is not just a collection of tools and technologies for DevOps engineers but a set of principles and practices that are fully embraced by everyone in engineering to continually improve delivery throughput and operational stability,” said HealthJoy’s Dam.

At HealthJoy, the integration of CI/CD throughout the SDLC requires the rest of engineering to closely collaborate with DevOps engineers to continually transform the build, testing, deployment and monitoring activities into a repeatable set of CI/CD process steps. For example, they’ve shifted quality controls left and automated the process using DevOps principles, practices and tools.

Component provider Infragistics changed its hiring approach. Specifically, instead of hiring experts in one area, the company now looks for people with skill sets that meld well with the team.

“All of a sudden, you’ve got HR involved and marketing involved because if we don’t include marketing in every aspect of software delivery, how are they going to know what to market?” said Jason Beres, SVP of developer tools at Infragistics. “In a DevOps team, you need a director, managers, product owners, team leads and team building where it may not have been before. We also have a budget to ensure we’re training people correctly and that people are moving ahead in their careers.”

 

Jason Beres, Infragistics

Jason Beres, Infragistics

 

Effective leadership is important.

“[A]s the head of engineering, I need to play a key role in cultivating and nurturing the DevOps culture across the engineering team,” said HealthJoy’s Dam. “[O]ne of my key responsibilities is to coach and support people from all engineering divisions to continually benefit from DevOps principles and practices for an end-to-end, automated CI/CD pipeline.”

Processes

Processes should be refined as necessary, accelerated through automation and continuously monitored so they can be improved over time.

“When problems or errors arise and need to be sent back to the developer, it becomes difficult to troubleshoot because the code isn’t fresh in their mind. They have to stop working on their current project and go back to the previous code to troubleshoot,” said Gitlab’s Johnson. “In addition to wasting time and money, this is demoralizing for the developer who isn’t seeing the fruit of their labor.”

Johnson also said teams should start their transition by identifying bottlenecks and common failures in their pipelines. The easiest indicators to check pipeline inefficiencies are the runtimes of the jobs, stages and the total runtime of the pipeline itself. To avoid slowdowns or frequent failures, teams should look for problematic patterns with failed jobs.

At HealthJoy, the developers and architects have started explicitly identifying and planning for software design best practices that will continually increase the frequency, quality and security of deployments. To achieve that, engineering team members have started collaborating across the engineering divisions horizontally.

“One of the biggest barriers to changing processes outside of people and politics is the lack of tools that support modern processes,” said Stephen Magill, CEO of continuous assurance platform provider MuseDev. “To be most effective, teams need to address people, processes and technology together as part of their transformations.”

Technology

Different teams have different favorite tools that can serve as a barrier to a standardized pipeline which, unlike a patchwork of tools, can provide end-to-end visibility and ensure consistent processes throughout the SDLC with automation.

“Age and diversity of existing tools slow down migration to newer and more standardized technologies. For example, large organizations often have ancient SVN servers scattered about and integration tools are often cobbled together and fragile,” said MuseDev’s Magill. “Many third-party tools pre-date the DevOps movement and so are not easily integrated into a modern Agile development workflow.”

Integration is critical to the health and capabilities of the pipeline and necessary to achieve pipeline automation.

Stephen Magill, MuseDev

Stephen Magill, MuseDev

“The most important thing to automate, which is often overlooked, is automating and streamlining the process of getting results to developers without interrupting their workflow,” said MuseDev’s Magill. “For example, when static code analysis is automated, it usually runs in a manner that reports results to security teams or logs results in an issue tracker. Triaging these issues becomes a labor-intensive process and results become decoupled from the code change that introduced them.”

Instead, such results should be reported directly to developers as part of code review since developers can easily fix issues at that point in the development process. Moreover, they can do so without involving other parties, although Magill underscored the need for developers, QA, and security to mutually have input into which analysis tools are integrated into the development process.

GitLab’s Johnson said the upfront investment in automation should be a default decision and that the developer experience must be good enough for developers to rely on the automation.

“I’d advise adding things like unit tests, necessary integration tests, and sufficient monitoring to your ‘definition of done’ so no feature, service or application is launched without the fundamentals needed to drive efficient CI/CD,” said Johnson. “If you’re running a monorepo and/or microservices, you’re going to need some logic to determine what integration tests you need to run at the right times. You don’t want to spin up and run every integration test you have in unaffected services just because you changed one line of code.”

At Infragistics, the lack of a standard communication mechanism became an issue. About five years ago, the company had a mix of Yammer, Slack and AOL Instant Messenger.

“I don’t want silos. It took a good 12 months or more to get people weaned off those tools and on to one tool, but five years later everyone is using [Microsoft] Teams,” said Infragistics’ Beres. “When everyone is standardized on a tool like that the conversation is very fluid.”

HealthJoy encourages its engineers to stay on top of the latest software principles, technologies and practices for a CI/CD pipeline, which includes experimenting with new CI/CD tools. They’re also empowered to affect grassroots transformation through POCs and share knowledge of the CI/CD pipeline and advancements through collaborative experimentation, internal knowledge bases, and tech talks.

In fact, the architects, developers and QA team members have started collaborating across the engineering divisions to continually plan and improve the build, test, deploy, and monitoring activities as integral parts of product delivery. And the DevOps engineers have started collaborating in the SDLC and using tools and technologies that allows developers to deliver and support products without the barrier the company once had between developers and operations..[…] Read more »…..

 

Don’t Just Rely On Data Privacy Laws to Protect Information

Data privacy laws are evolving to allow individuals the opportunity to understand the types of data that companies are collecting about them and to provide ways to access or delete the data. The goals of data privacy law are to give some control of the data back to the individual, and to provide a transparent view on the collecting and safeguarding of that data.

Prior to the GDPR and CCPA, it was difficult to understand what was being collected and how it was being used. Was the website selling your information to other companies? Who knows, but chances are they were. We’ve all heard the line: “If it’s free, then you’re the product.” Also, paying for a service is no guarantee that your information is not being sold. Data privacy laws attempt to address these problems by requiring companies to obtain affirmative consent from individuals, explain what is being collected and define the purpose for its use.

This all sounds great and is a step in the right direction, but there are a lot of challenges for both individuals and companies. Various polls put the number of password protected accounts per person anywhere from 25 to 90. It would take a very concerned person to understand and track their personal information across these accounts. Companies need to understand the various data privacy laws that apply and develop internal frameworks to comply and protect the data. Even if both parties are playing fair, this is a difficult challenge.

For US-based companies, here is a non-exhaustive list of data privacy regulations that may apply:

  • US Privacy Act of 1974 – Applies to government agencies but provides a good foundation for companies to follow.
  • HIPAA (Health Insurance Portability and Accountability Act) – Created to protect health information.
  • COPPA (Children’s Online Privacy Protection Rule) – Created to protect information on children under 13.
  • GLBA (The Gramm-Leach-Bliley Act) – Requires financial institutions to document what information is shared and how it is protected.
  • CCPA (California Consumer Privacy Act) – In effect January 2020 to protect information of California citizens.
  • GDPR (General Data Protection Regulation) – An EU law that has global reach.
  • State laws – Each state may have their own privacy laws with slight variations.

On top of that, the data privacy laws can be interpreted in different ways, overlap each other and contradict each other. Like security frameworks and controls, privacy laws should be viewed as the minimum baseline to protect personal data. Individuals and companies should take a commonsense approach to data protection to fill the gaps that exist in data privacy laws. They should understand what data is being collected, what is its purpose and if it is necessary to have at all. The best way to protect data is to not have it at all. If it does not exist, then it cannot be lost. This will provide focus to the residual data and what needs to be done to safeguard it.

Here are some best practices on what firms as well as individuals can do to safeguard privacy.

  • If you collect it, protect it. Follow reasonable security measures to keep individuals’ personal information safe from inappropriate and unauthorized access. Reduce the amount of data collected to only what is needed to provide the service. Use role-based access control (RBAC) to limit access to the data. Always encrypt the data at rest and in transit. Create a robust backup strategy and test it to ensure the integrity and availability of the data.
  • Be open and honest about how you collect, use and share personal information. Think about how the individuals may expect their data to be used, and design settings to protect their information by default. Simply explain what is being collected in an understandable way and why it is needed. Allow individuals to Opt In to providing information and view what is currently stored about them.
  • Build trust by doing what you say you will do.  Communicate clearly and concisely to the public what privacy means to your organization and the steps you take to achieve and maintain privacy. This should be done with a public privacy policy that is easy to access and understand. The policy should be kept up to date as privacy laws and internal procedures evolve..[…] Read more »….

The Truth about Unstructured Data

The exponential growth rate of unstructured data is not a new phenomenon. While the hacking of a database to steal sensitive credit card or personally identifiable information is what dominates the headlines, the reality is that a large amount of an organization’s intellectual property and sensitive information is stored in documents.  But they are unsure of where it resides or how much of it they have.  And worse yet, it is accessed, shared, copied, and stored all in an unprotected state.

Managing and controlling unstructured data is by far one of the most challenging issues of data security for enterprises. All personally identifiable information and other sensitive information, corporate or otherwise, should be protected with encryption and persistent security policies so that only authorized users can access them.  In this article, I will discuss the key drivers behind the influx of unstructured data in enterprises, the risks associated with not properly managing and securing unstructured data, as well as best practices for document protection.

Unstructured data is not dark data (although it can be depending on your definition of dark data) or social media, but it is the collection and accumulation of documents (files), emails as a file in a folder, and file sharing that takes place every day in businesses around the world. It’s the on-going creation of everyday information pulled from structured databases and saved in a variety of formats from Microsoft Office files, PDF, and intellectual property such as CAD drawings – photos and graphics – created for internal use, drafted for external use, and/or published via social media and other channels, just to name a few categories.

According to Search Technologies, eighty percent of data is unstructured, yet the issue of securing unstructured data is still low on the security radar. Adding to the chaos of unstructured data are numerous challenges, including stricter regulatory requirements; protection of intellectual property (IP) and trade secrets; disparate security domains beyond traditional corporate WAN/LAN into cloud, mobile, and social computing; and preventing threats by insiders, both accidental and malicious.

Traditional security has focused on preventing a breach of the enterprise perimeter with layers of physical and electronic security, using a range of tools such as firewalls, filters, and anti-virus software to stop access. Once those measures fail or are subverted, intruders gain access to all the (figurative) candy in the candy store and potentially “crown jewels”.

The first attempts to deal with unstructured data came via Enterprise Digital Rights Management (ERDM) systems. Such dedicated systems typically didn’t work well with existing workflows, required training, needed staff time to manage, was often not realistically scoped and had unforeseen negative impacts on other IT functions. At the end of the business day, ERDM projects were often stranded at the security doorstep.

A better approach is to accept the free-wheeling chaos of unstructured data and adapt technologies that find it in the enterprise, classify and prioritize it, and protect it via encryption with policies on who can see or access the data.

The first step is discovery, using a scanning process to analyze file information across enterprise files, discovering unprotected files and looking for sensitive information. A scanning process can be instructed (on an automated basis) to review certain types of files, such as Microsoft Office (Word, Excel, Powerpoint), images, PDFs, CAD drawings, as well as the names and contents of files that match regular expressions or keywords. In addition, discovery can include analyzing unprotected data files along with files that have been encrypted by a protective process and “watermarked” with a digital rights management (DRM) token. Discovery of unstructured data is a constant process, analyzing data in motion between computers and networks, data at rest (storage), and data in use when a document is opened, with the potential for data to be shared, printed, copied, or saved in an alternative file type (i.e. word to pdf).

Securing unstructured data via encryption is a necessary and logical step, but encryption alone is not enough. A more robust approach adds a unique “tag” or embedded ID into the encryption process to the final protected file, providing the basis to track changes to and copies of files and provide user access policies through a centralized corporate file management process. The embedded tag can be used to restrict access to data in the encrypted file to a specific user or designated classes of users, as well as providing the ability to trace the creation and migration of data from one computer within the enterprise to anywhere else within or external to the enterprise, from endpoints to clouds and backup storage locations […] Read more »

The Quantum Computing Revolution

“Only six electronic digital computers would be required to satisfy the computing needs of the entire United States.” A prediction made by Howard Aiken in 1947 which on hindsight, we can all agree on has not turned out to be very prophetic. The need for processing power has continuously been on the rise and for the most part, the need has been catered through an unparalleled evolution of chip technology as forecasted by Moore’s Law. Moore’s Law states that the number of components that can fit on a computer chip will double roughly every two years, which in turn will improve the processing capabilities of computer chips. The law which is more of an observation rather than a physical law has held true over the decades and has seen digital computers which originally took up entire rooms reduced to being carried around in our very own pockets. But with components reaching atomic scales, and more and more money being fueled in to make chips smaller and faster, it has now come to a point where we cannot count on chip technology to advance as predicted by Moore’s Law. Hence, alternatives are being pursued and developments are being made which has given rise to the idea of quantum computing.

The traditional computer at its very core performs simple arithmetic operations on numbers stored in its memory. The key is the speed at which this is done, which allows computers to string these operations together to perform more complex things. But as the complexity of the problem increases, so does the number of operations that is required to reach a solution; And in this present day and age, some specific problems that we need to solve, far surpasses the computing capabilities of the modern computer. This, however, has also been used to our advantage, as modern cryptography which is at the core of cyber-security, relies on the fact that brute forcing complex mathematical problems is a practical impossibility.

Quantum computers, in theory, do things differently. Information is represented in physical states that are so small that they obey the laws of Quantum Mechanics. This information is stored in quantum bits known as qubits rather than the traditional binary bits used in conventional computers. Quantum Mechanics allows a qubit to store a probability of its value as either a 0 or 1 with the exact value of the qubit unknown until it is measured. Without getting too technical, this allows a quantum computer to contain several states at the same time, giving it the potential to be millions of times faster at solving certain problems than classical computers. This staggering computational power, in theory, could be used to render modern cryptography obsolete.

Modern cryptography relies on complex mathematical problems that would take computers hundreds, thousands or even millions of years to solve. This practical limitation is what keeps our cryptography based security systems secure. But with quantum computers, it is theoretically possible that these solutions could be reached in days or even hours, posing a massive vulnerability threat to our current encryption. If cryptography collapses, so will all our security.

But a quantum world is not all doom and gloom. Active research is already being done on quantum safe algorithms that can replace current algorithms that are under threat from the capabilities of a quantum computer. Theoretically, these quantum safe algorithms could prove to be more secure than anything we currently know of. Another area where quantum computing is likely to shine is in Big Data. With cross industry adoption of new technologies, the world is transforming itself into a digital age. This is sure to pose new problems well beyond the capabilities of modern computers as the complexity and the size of data keeps increasing. The challenge lies in converting real-world problems into a quantum language, but if that is accomplished, in quantum computing we will have a whole new computational system to tackle these problems.

It is important to realize that quantum computing is still in its infancy and almost all of the hype surrounding it is theoretical. But it is clear that the technology promises a revolution in computing, unlike anything we have seen before. It is also important to understand that quantum computers are not a replacement to the classical computer; Rather, it is specialized at solving a particular set of problems that are beyond the powers of a modern computer. This opens up a vast avenue of possibilities for quantum computing. The traditional computer will still have its place but with the world moving more and more towards a data-driven future, expect quantum computers to play a vital role in the future of technology.

 

Big Data’s Big Peril: Security

We live in a world that is more digitally connected than ever before, and this trend will continue well into the foreseeable future. Mobile phones, televisions, washers and dryers, self-driving cars, traffic lights, and the power grid – all will be connected to the Internet of Things. It has been said that by 2020 there will be 50 billion connected things. These devices produce exponentially growing amounts of data such as emails, text files, log files, videos, and photos.

The world will create 163 zettabytes (a zettabyte equals one sextillion bytes) of data annually by 2025. Enterprises of all sizes can gain competitive advantages and valuable insights by incorporating big data and predictive analytics into their business strategies to fuel growth and drive operational efficiencies. But with all this data at hand, it’s vital to understand which data is actionable, and how it needs to be considered. Here are two examples of ways businesses are utilizing big data to improve the bottom line.

First, big data analytics can reduce customer churn. Predictive models are being built using customer demographics, product profile, customer complaint frequency, social media, and disconnect orders to flag customers who are likely to churn. Companies can identify these customers to better understand their issues and improve inefficient business processes. They can also recommend products that meet customer feature and price needs.

Second, big data can help prevent network outages. This is especially critical with government, medical, and emergency services networks, where outages can have severe impacts. Predictive models can ingest network logs to look at past device performance and predict hours in advance when an outage may occur, giving network engineers time to replace faulty equipment […] Read more »