Monday, March 10, 2025

Networking


Broadband Forum launches trio of new open broadband projects
An improved user experience, including reduced latency and a wider choice of in-home applications, will be delivered to broadband consumers as the Broadband Forum launches three new projects. The three new open broadband projects will provide open source software blueprints for application providers and Broadband Service Providers (BSPs) to follow. These will deliver a foundation for Artificial Intelligence (AI) and Machine Learning (ML) for network automation, additional tools for network latency and performance measurements, and on-demand connectivity for different applications. “These new projects will play a key role in improving network performance measurement and monitoring and the end-user experience,” says Broadband Forum Technical Chair, Lincoln Lavoie. “Open source software is a crucial component in providing the blueprint for BSPs to follow and we invite interested companies to get involved.” The new Open Broadband-CloudCO-Application Software Development Kit (OB-CAS), Open Broadband – Simple Two-Way Active Measurement Protocol (OB-STAMP), and Open Broadband – Subscriber Session Steering (OB-STEER) projects will bring together software developers and standards experts from the forum. The projects will deliver open source reference implementations, which are examples of how Broadband Forum specifications can be implemented. They act as a starting point for application developers to base their designs on. In turn, those applications are available on platforms for BSPs to select and offer to their customers, shortening the path between the development of the specification to the first deployment of the technologies into the network.  “The development of open source software and open broadband standards are invaluable to the industry, laying the foundations for faster innovation through global collaboration,” says Broadband Forum CEO, Craig Thomas. “The Broadband Forum places the end-user experience at the forefront of all of our projects and is playing a crucial role in overcoming network issues.” OB-CAS aims to simplify network monitoring and maintenance for BSPs, while also offering a wider selection of applications from various software vendors. Alongside this, network operations will be simplified and automated through existing Broadband Forum cloud standards that use AI and ML to improve the end-user experience. OB-STAMP will build an easy-to-deploy component that simplifies network performance measurement between Customer Premises Equipment and IP Edge. The project will allow BSPs to proactively monitor their subscribers’ home networks to measure latency and, ultimately, avoid network failures. Four vendors have already signed up to join the efforts to reduce the cost and time associated with deploying infrastructure for measuring network latency. Building on Broadband Forum’s upcoming technical report WT-474, OB-STEER will create a reference implementation of the Subscriber Session Steering architecture to deliver flexible, on-demand connectivity and simplify network management. Interoperability of Subscriber Session Steering is of high importance as it will be implemented in the access network equipment and edge equipment from various vendors.

stc partners with Juniper for 5G and data centre modernisation
Juniper Networks has announced that Saudi Telecom Company (stc) is expanding its 5G services to reach a total of 75 cities and regions. By leveraging Juniper 400G routers in its Converged Supercore network and key data centres, stc can dramatically improve network capacity, performance and scale while reducing energy use, all fully aligned with its ongoing digital transformation agenda. Additionally, incorporating automation into the infrastructure further streamlines operations and optimises efficiency, ensuring a seamless user experience. As stc planned its 5G services expansion, it needed to increase the capacity of the core network and key data centres powering its business and operational support systems for mobile, broadband, residential and B2B services. The objective was to create an elastic fabric with proven performance, able to streamline operations and simplify the network as services grow and evolve. stc prioritised the adoption of energy-efficient technology, underlining its commitment to environmental sustainability, deploying a 400G routing solution that can meet its optimised power and space requirements in support of its goals. stc uses Juniper Networks PTX10008 Packet Transport Routers to transform its Converged Supercore network. The PTX10008 router delivers 115.2Tbps capacity within a compact 13-U footprint and is 800G-ready to meet future demands. The core upgrade results in a remarkable 1,340% (14.4 times) increase in 100G capacity and an 864-port boost in 400G capacity per rack space, with a 43% reduction in watts/gig power consumption through ground-breaking silicon innovation.  stc's modernised data centre, powered by MX10008 routers, delivers a capacity of 76.8Tbps within a 13 RU form factor and is ready for 400G adoption. This modernisation effort enabled by Juniper Networks MX Series Universal Routers delivers a 384-port boost in 400G capacity per site, whilst also achieving a 90% reduction in the physical space requirements per site and an 87% decrease in watts/gig power consumption compared to previous multi-layer designs. By integrating Juniper's highly programmable PTX and MX platforms, stc can accelerate model-driven automation to streamline operations. These platforms are equipped with gRPC services, OpenConfig, NETCONF/Yang and native data models, demonstrating the solutions' simplicity and flexibility.

LINX Mombasa ready for business in Kenya
The London Internet Exchange (LINX)’s new interconnection hub, LINX Mombasa, is now ready for business. LINX Mombasa is a multi-site, interconnected Internet Exchange Point (IXP) within the iColo MBA1 and MBA2 data centres, strategically located in Kenya’s key digital gateway. The LINX operated IXP is a resilient, future-proof fabric providing a central meeting point for networks to pass their online traffic and keep it closer to the end user. This method known as peering, lowers network latency and improves overall performance and control. Mombasa is currently seeing strong growth in the interconnection market and is also one of the most internationally connected locations in Africa with seven submarine cables connecting Kenya to the entire coastline of Africa, as well as the Middle East, Europe, and Asia. The expansion of LINX’s Kenyan footprint comes just over 12 months after the company launched its first IXP on the continent, LINX Nairobi. LINX’s global expansion strategy is always to deliver value to their existing member networks while being able to make a difference to the local connectivity ecosystem – for the good of the Internet. LINX Mombasa will provide both content delivery networks and local ISPs an alternative place to interconnect with additional services available across a resilient and redundant platform. LINX Head of Global Engagement, Nurani Nimpuno, comments, “We are thrilled to be extending our synergies with iColo, with whom we have had a successful journey at LINX Nairobi. We were seeing a demand for LINX services in Mombasa when we came to Kenya, and I am very pleased we can now start delivering the same value here.” LINX will be the first IXP to have a physical presence in both locations. The technical setup will mimic the engineering at LINX Nairobi using Nokia switches. Ranjith Cherickel, Founder & CEO of iColo, remarks, “We are delighted to host LINX Mombasa at our highly connected data centre facilities; MBA1 and MBA2. This collaboration underscores our commitment to providing best-in-class infrastructure and services that drive digital transformation in Africa. The new IXP will create significant opportunities for partnerships, innovation, and growth in the region.” With the additional exchange point at iColo’s Mombasa campus, the peering traffic will continue to experience an upward trajectory and Mombasa City will continue to be the Gateway to East Africa. For more from The London Internet Exchange, click here.

EXA Infrastructure partners with IOEMA to boost connectivity
EXA Infrastructure, a critical digital infrastructure platform provider, has been chosen by IOEMA as the landing partner for its new submarine cable in Leiston, UK. From the cable landing station, EXA will also provide critical backhaul connectivity to major data centres, including London Telehouse and Equinix. Launched in May 2024, IOEMA is a 1,600km repeated high-capacity submarine fibre optic network linking five key Northern European markets: the UK, Netherlands, Germany, Denmark, and Norway. The project, which EXA describes as a "game-changer", connects strategic locations favoured by hyperscalers and content providers - due to power availability - with the primary landing points linking Europe’s core FLAP (Frankfurt, London, Amsterdam, Paris) data hubs. Leiston, one of EXA Infrastructure’s 20 cable landing stations, serves as a critical gateway between the UK and the Netherlands via the Concerto cable. It offers low-latency direct links to London, alternative routes to Dublin and northern UK regions bypassing London, and access to multiple transatlantic pathways, further strengthening connectivity to the Nordics. Steve Roberts, SVP of Strategic Investments and Product Management at EXA Infrastructure, states, "Being selected as the landing partner for this advanced fibre optic project highlights our expertise in delivering complex subsea landing solutions. Our commitment to providing diversity and resiliency through our extensive owned fibre network in Europe, coupled with vital transatlantic routes, positions us as a market leader in enabling advanced connectivity" Eckhard Bruckschen, CTO at IOEMA Fibre, adds, ‘We are thrilled to announce our landing partnership for our second UK landing point in Leiston. Working with EXA Infrastructure enables IOEMA to link to one of Europe’s largest infrastructure footprints and beyond, increasing the connectivity solutions, diversity and reach of our system.’ For more from EXA Infrastructure, click here.

MXT Holdings improves data centre connectivity in Mexico
MXT Holdings (MXT), a telecommunications infrastructure company in Mexico that develops and operates neutral-host communication infrastructure assets, is deploying Ciena's coherent routing innovations - and in the process, taking proactive measures to handle escalating traffic demands placed on its network driven by 5G, cloud-based applications, and the evolving digital landscape. MXT manages over 3,500km of long-haul and metropolitan fibre optic networks in Central and Southeast Mexico. Its network is deployed across key states, including Quintana Roo, Chiapas, and Tabasco. To improve connectivity for its customers and create a more adaptive network, MXT is utilising Ciena’s coherent routing across its metro and long-haul networks. This upgrade will allow MXT to connect key links between Mexico City and Monterrey, creating a network that is significantly more resilient, reliable, and scalable. MXT will also be able to offer up to 400G connectivity options for data centres, high-performance computing networks, enterprises, and service provider applications. Jorge Millones, COO, MXT, comments, “At MXT, we are committed to delivering connectivity that goes beyond our customers’ expectations. With Ciena’s coherent routing innovations, we are better able to support our customers’ digital experiences and offer more robust and reliable connectivity. Ciena’s technology allows us to optimise network performance by streamlining hardware components. This not only enables faster time to market, giving our customers a distinct advantage in today’s highly competitive environment, but also drives operational efficiency.” This transformative network upgrade will not only improve the overall customer experience, but also allow MXT’s network to boost infrastructure efficiency and create a network that can seamlessly adapt to meet evolving bandwidth demands. MXT’s network leverages Ciena’s coherent routing solution comprised of the 5164 Router and 8114 Coherent Aggregation Router with WaveLogic 5 Nano (WL5n) 400ZR pluggable transceivers running over Ciena’s Coherent ELS and 6500 open line systems. With Ciena’s coherent routing, MXT can deploy less hardware, saving capex and opex, while flexibly supporting a range of use cases, including data centre interconnect (DCI). The multi-layer network will be managed by Ciena’s Navigator Network Control Suite (NCS), providing ease of deployment and management. Additionally, with Ciena’s PinPoint OTDR, MXT can use advanced analytics and software tools to monitor and identify potential trouble spots and accelerate repair times.

nLighten appoints Chair of the Board
As nLighten continues to expand its digital infrastructure platform across Europe, the company has announced the appointment of Nick Read as Chair of the Board for nLighten. With over 30 years of experience spanning five industry sectors, and a deep expertise in telecommunications and digital infrastructure, Nick will provide strategic guidance to support nLighten’s growth and innovation efforts in digital infrastructure. Nick has spent more than 16 years operating at board level, holding leadership positions across global businesses in telecoms, infrastructure, and technology. He currently holds a portfolio of board positions focused on digital infrastructure across EMEA and the US, including serving as Senior Advisor to Global Infrastructure Partners (now part of BlackRock), Consultant to I Squared Capital, Chair of the Board with EXA infrastructure, and Non-Executive Director at Booking Holdings, Radius Global Infrastructure, and Oak Consortium. Previously, Nick had a 22-year career at Vodafone Group, where he played a pivotal role in shaping the company’s global strategy. He served as Group CEO, and was a Board Member from 2014 to 2022, leading Vodafone through a period of major business transformation, network expansion, and operational evolution. Before joining Vodafone, Nick held senior global finance positions at Miller Freeman Worldwide as Group CFO and at Federal Express Corporation for 10 years. A Fellow Chartered Management Accountant, Nick was awarded an Honorary Doctor of Business Administration in 2022. He has been a long-time advocate for diversity and inclusion, serving as a United Nations HeForShe champion and United Nations Broadband Commissioner. At nLighten, Nick will leverage his extensive industry expertise to support the company’s strategic expansion plans, commercial growth, and operational excellence. His deep understanding of global telecom markets and their evolving needs will be instrumental in guiding nLighten’s leadership team as the company scales its European footprint. “We are excited to welcome Nick as Chair to the nLighten Board,” says Harro Beusker, CEO and Founder of nLighten. “His leadership experience in telecoms and digital infrastructure, combined with his track record in business transformation, will be invaluable as we continue to grow and evolve our platform. We look forward to working with him.” Commenting on his appointment, Nick Read states, “I am excited to join nLighten at such a pivotal time for the digital infrastructure industry. The increasing demand for edge computing and sustainable data centre solutions presents a unique opportunity, and nLighten is well-positioned to lead in this space. I look forward to collaborating with the team to drive nLighten’s continued innovation and growth.” Founded in 2021, nLighten has rapidly grown to 34 data centres, serving over 1,000 customers with over 200 employees. As nLighten continues to scale across key European markets, the company says that the addition of experienced leaders like Nick Read reinforces its commitment to providing high-performance, sustainable infrastructure solutions for the digital economy. For more from nLighten, click here.

LINX surpasses 700Gbps traffic peak in Manchester
The London Internet Exchange (LINX), a not-for-profit organisation working for the good of the Internet, has hit a new record maximum traffic peak of 725Gbps at its Manchester network fabric, highlighting the importance of regionalising network traffic. LINX Manchester has seen consistent growth in traffic, rising by an average of 100-200Gbps throughout 2024. Manchester prides itself on robust digital infrastructure with some of the fastest internet speeds in the UK, supported by extensive fibre-optic networks. Key initiatives such as the £23.8 million full-fibre investment have been pivotal to enhancing Manchester’s digital connectivity, enabling businesses to leverage data and technology to improve efficiency and services. To further enhance Manchester’s strong internet connectivity, LINX’s new location on its Manchester network went live in September 2024 at the Lunar Digital Data Centre, providing peering and further interconnection services to deliver improved performance, increased redundancy and lower network latency by keeping traffic local to the Manchester area. Colin Peckham, LINX Interconnection Specialist, comments, “Manchester is a thriving hub of business and technology, at the forefront of innovation and economic growth, so it’s vital that the area has fast, resilient network infrastructure. Working with our data centre partners in the area, we’re able to quickly deploy advanced peering and cross-connect services to strengthen connectivity in the region and best support the people and businesses driving forward growth. Keeping traffic local keeps latency low and bolsters network security to ensure that internet access remains strong and operational for longer.” Manchester acts as a landmark tech hub for the UK off the back of significant investment in infrastructure, technology and education. The area is home to MediaCityUK, where major organisations such as the BBC, ITV and Ericsson are based, and also nearby to innovation district, The Oxford Road Corridor. The city is also the recipient of major infrastructure funding under the Northern Powerhouse Initiative. Datum is another of the data centre partners on the LINX Manchester network, and its MCR2 data centre in South Manchester is due to go live by the end of Q1 2025. Seb Graham, Group Sales Director for Datum, comments, “We are thrilled to see Manchester continually growing its traffic year on year with LINX. Partnering with LINX has been a massive benefit and allows Datum to provide a diverse, carrier neutral offering to our growing client base from a very connectivity-rich data centre. The team at LINX have been brilliant to work with from day one and continue to develop a tight knit, supportive community. We look forward to working more closely with LINX delivering further solutions from our newly built MCR2 site. Manchester is very much open for business!” The city has ambitious plans to further its position as a leading tech hub, with the Manchester Digital Campus set to open in 2026, and the development of a new innovation district called ID Manchester, which aims to create 10,000 jobs and attract global tech firms. For more from LINX, click here.

Manchester data centre appoints connectivity partner
Network services provider, Principle Networks, has been appointed by Datum Datacentres to deliver a high-speed IP transit network for Datum’s new data centre in Manchester. The new IP transit network will enable Datum to deliver high-performance internet connectivity to its clients, and is designed to scale in a manner which ensures end users can increase their consumption of internet-based services, without concerns over connectivity limitations. Through the utilisation of Cisco’s best-in-class service provider internet edge routers to host full internet routing tables, the new network will deliver direct access to the internet backbone, ensuring that Datum has greater control over routing policies, and that it can optimise traffic profiles and maximise network availability and reliability. Datum’s new Manchester data centre facility, MCR2, is currently under construction and is due to go live in spring 2025. It will offer 24,000 square foot of technical space within the 50,000 square foot building. Matt Edgley, COO at Datum Datacentres, comments, “After Principle Networks successfully delivered a similar project at our Farnborough data centre facility, we decided to appoint them as a preferred partner for our new Manchester facility. This complex and critical project required a team that we could trust. The highly resilient IPT provision that Principle Networks is deploying will allow us to offer our clients high performance enterprise grade connectivity with low latency and consistent performance to support digital transformation journeys. “We work with best-of-breed suppliers to provide resilient links across the UK and beyond and are pleased to be continuing our relationship with Principle Networks as a premium connectivity partner.” As specialists in designing and implementing complex data centre networks and scalable, agile cloud-based networks for mid-large enterprises, Principle Networks works across all sectors, including legal, retail, logistics, social housing, automotive, financial services, IT and local government. Russell Crowley, co-founder at Principle Networks, adds, “We are proud to have been chosen to partner with Datum to deliver the IPT network for MCR2. The development of this new data centre is great news for Manchester and will offer businesses the opportunity to host their infrastructure in one of the most well connected, resilient and cutting-edge facilities in the region. We’re excited to be a part of it and are looking forward to the new data centre coming online in the very near future.” For more from Datum Datacentres, click here.

Ooredoo and DE-CIX bring Internet Exchange to Qatar
Ooredoo, a Qatar-based telecommunications operator, in partnership with DE-CIX, a global operator of carrier-neutral Internet Exchanges (IX), has officially announced Doha IX powered by DE-CIX, Qatar’s first standalone commercial Internet Exchange (IX). Leveraging DE-CIX’s global expertise, developed across nearly 60 locations worldwide, this initiative strengthens Qatar’s position as a regional digital hub by enhancing connectivity, reducing costs, and delivering exceptional customer experiences. Doha IX will offer a secure, carrier-neutral platform that facilitates low-latency traffic exchange, improves network performance, and supports remote peering services. Businesses in Qatar and across the region will benefit from cost-effective, direct access to global and regional content providers, streamlining connectivity through a single port while significantly reducing traditional IP Transit costs. Doha IX is built on DE-CIX’s cutting-edge interconnection infrastructure and Ooredoo’s state-of-the-art data centres. Supported by both partners’ established relationships with global content providers and networks, these critical assets ensure seamless and efficient traffic exchange, reducing latency, optimising network performance, and creating a robust Internet Exchange ecosystem in Qatar. “We are proud to introduce Doha IX, which represents a significant step in upgrading Qatar’s digital infrastructure,” says Thani Ali Al Malki, Chief Business Officer at Ooredoo Qatar. “Doha IX delivers faster, more reliable connectivity while reducing operational costs for businesses and driving innovation across various industries, aligning with the goals of Qatar’s National Vision 2030 and advancing our digital transformation initiatives.” Ivo Ivanov (pictured above, right), CEO of DE-CIX, adds, “With Doha IX powered by DE-CIX, we are bringing DE-CIX’s global expertise to Qatar, enabling businesses and networks to benefit from superior interconnection services. Doha IX is the ideal place for international networks interested in reaching this important Middle Eastern market. The new IX, established through the partnership between DE-CIX and Ooredoo, will unleash the potential of the country’s digital economy by providing better performance and user experience of content and applications, and affordable and high-quality Internet access for enterprises and individuals. This partnership marks an important milestone in strengthening regional connectivity and creating an advanced digital ecosystem that supports economic growth and innovation in the GCC for the amazing digital decades ahead of us.” DE-CIX is an established name in the Middle East, with a proven track record of developing healthy IXs and vibrant interconnection ecosystems. Doha IX, which will be built and operated under the DE-CIX as a Service (DaaS) model, is the sixth IX operated by DE-CIX in the region. Through this collaboration, Ooredoo and DE-CIX are setting the foundation for advanced interconnection in the region. Together, they support Qatar’s digital transformation goals and are seeking to position the country as a leader in the global digital economy, aligning with Qatar’s National Vision 2030. For more from DE-CIX, click here.

Feature - The Top Internet Outages of 2024
Ahead of their appearance at the upcoming DTX Manchester exhibition - taking place from 2-3 April 2025 - Cisco ThousandEyes, a network intelligence company, explores some of 2024’s most notable Internet outages and application issues, along with key takeaways to help ITOps teams improve digital resilience in 2025. In 2025, digital resilience is a top priority for IT Operations teams around the globe. When outages happen, it’s how you identify and recover from them that makes the big difference for users and businesses. Beyond that, consistent proactive optimisation is essential to both elevate digital experiences for users and guard against potential problems before they impact customers. The biggest outages of 2024 provide plenty of learnings for ITOps teams charged with improving digital resilience in their business, with recurring themes emerging - most notably the number of outages that were the consequence of configuration changes or automation related. Here, Cisco ThousandEyes goes through some of the most notable outages and disruptions of 2024, identifying key takeaways to help businesses assure great digital experiences for their users in 2025. Microsoft Teams Service Disruption | 26 January 2024 Microsoft Teams was disrupted for more than seven hours in January, when a problem inside Microsoft’s own network affected the collaboration service. Frozen apps, login errors, and users left hanging in meeting waiting rooms were some of the symptoms reported during the disruption, which began early in the workday for many Americans. ThousandEyes’ own observations during the incident indicated that the failure was consistent with issues in Microsoft’s own network. Failover didn’t appear to relieve the issue for many users; although further “network and backend service optimisation efforts” did eventually restore service. Meta Outage | March 5, 2024 On 5 March, Meta experienced an outage that prevented users from accessing services including Facebook, Instagram, Messenger, and Threads. While the platform appeared to be reachable, many users were unable to proceed beyond the login or authentication process. Shortly after the outage began, Meta confirmed that it was experiencing problems with its login services. The issue was likely caused by a failure in one of the dependencies that the login system relies on. ThousandEyes observations also point to a backend cause, as Meta’s systems appeared reachable and network paths connecting to the services didn’t display any significant network conditions that could have led to the outage. This outage serves as a reminder that issues with just one part of the application delivery chain can render the whole service functionally unusable. It’s crucial to have full visibility into your whole digital delivery chain to help identify any drops in performance or functionality. Atlassian Confluence Disruption | March 26, 2024 In late March, workspace application Atlassian Confluence experienced issues, resulting in customers having problems accessing the service and receiving HTTP 502 bad gateway errors. While this was a relatively short outage, lasting just over an hour, ThousandEyes’ analysis revealed it affected users all over the globe. By tracing the network paths to the application’s frontend web servers, hosted in AWS, it was clear that this was a backend issue rather than network connectivity itself. This is one of those outages where relying on error messages would only give you half the story. Identifying the root cause requires you to consider factors such as any third-party dependencies. Being able to rule out issues with a cloud hosting provider, for instance, gets you one step closer to identifying the real problem. Google.com Outage | 1 May 2024 In early May, Google.com experienced a global disruption lasting around an hour, during which users encountered HTTP 502 error messages instead of the expected search results. The HTTP 502 status code often indicates a proxy server failing to connect with the origin server. It can also be a sign of overwhelming levels of traffic, but there was no reason to suspect that Google was suddenly struggling under demand, with no extraordinary events to trigger such an influx of search traffic. ThousandEyes analysis revealed a 'lights on/lights off' scenario, where service suddenly dropped, suggesting a problem with backend name resolution or something connected to policy/security verification, rather than an issue with the search engine itself. CrowdStrike Sensor Update Incident | 19 July 2024 Organisations in Australia and New Zealand began experiencing issues on Friday 19 July, at mid-afternoon. A range of industries and major brands simultaneously reported outages as their Windows machines reportedly got stuck in a boot loop that ultimately resulted in the BSOD (Blue Screen of Death). The impact quickly spread to other geographies, causing problems with airline booking systems, grocery stores, and hospital services. And these were just the tip of the iceberg. Initial responsibility for the widespread outage was thought to lie with Microsoft, but a different common denominator emerged: CrowdStrike, a managed detection and response (MDR) service used to protect Windows endpoints from attack. CrowdStrike published guidance on actions and workarounds for IT administrators, and an early technical post-incident report that attributed the incident to an issue with a single configuration file that “triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems.” Recovery wasn’t a simple task, requiring IT staff to physically attend machines to get them functional. At one point, Microsoft reported that up to 15 reboots per machine may be needed. Cloudflare Disruption | 16 September 2024 Cloudflare is one of the world’s biggest CDN providers, so when it catches a cold, other well-known services start sneezing. Cloudflare’s 16 September outage lasted for around two hours, and affected applications such as Zoom and HubSpot. The ThousandEyes platform showed the impact on these third-party applications clearly, with agents in the US, Canada, and India all failing to connect to the various applications during the outage. This is a good example of how you can avert the “Is it just me?” problem. By tracking the entire service delivery process of your applications, you can follow the network paths taken by your apps - and the suppliers they are connected to. Microsoft Outage | 25 November 2024 Microsoft’s late November outage, which affected services such as Outlook Online, occurred in two parts and wasn’t always easy to spot. Problems emerged around 2 AM (UTC), with symptoms such as timeouts, resolution failures, and the occasional HTTP 503 error message. The problems were intermittent and not always obvious to end users, with the service sometimes presenting as slow or laggy. The issue appeared to be resolved within an hour or so, but four hours later problems emerged again, and this time with greater severity. ThousandEyes observed an increase in packet loss at the edge of the Microsoft network and increased congestion connecting to services. Microsoft later explained the problem was caused by a configuration change that caused an “influx of retry requests routed through servers.” The outage was resolved by performing “manual restarts on a subset of machines that [were] in an unhealthy state.” OpenAI Outage | 11 December 2024 We almost made it through an entire year of outages without mentioning AI. OpenAI’s December outage affected ChatGPT and the new generative video service, Sora. Users witnessed partial page loads, with requests for further information prompting HTTP 403 error messages. ThousandEyes observations pointed to backend application issues and that was later confirmed by OpenAI, which revealed that a new telemetry service deployment had “unintentionally overwhelmed the Kubernetes control plane,” causing cascading failures. Key takeaways from 2024 You’ll notice that most of the major outages of 2024 stemmed from a backend configuration change that had unintended consequences or the failure of an automated system. ITOps teams have limited control over faulty configuration changes made by service providers. However, they can enhance their overall visibility into service delivery paths, which allows them to quickly identify the source of any errors when they occur. This approach provides valuable insights into faults or degraded components, enabling teams to take appropriate actions, such as rolling back changes, redirecting to alternative resources, or implementing contingency plans. By thoroughly understanding their service delivery chains, teams can also regularly optimise services to improve digital experiences and enhance digital resilience. As we have observed in several significant outages of 2024, error messages typically provide only a hint about what has happened; they cannot in isolation identify the cause. If 2024’s major outages deliver one lesson, it’s that your digital resilience depends on knowing what’s gone wrong - or what could potentially go wrong - even before the service providers themselves acknowledge an issue. - Cisco ThousandEyes will be exhibiting at the upcoming DTX Manchester event, taking place on 2-3 April 2025. To register, and for more information about the event, click here. For more news from the DTX exhibitions, click here.



Translate »